CODE: C CPP Functions Reality

Once symbols are gone, a function is no longer a thing with a name.It is simply a region of code that obeys certain architectural rules.

Functions Reality

Imagine you open your laptop, fire up a C++ project, and write this innocent line:

cpp
int add(int a, int b) {
    return a + b;
}

You smile. You just created a function… right?

Well… not according to the CPU.

The processor staring at your compiled binary has no concept of a function, no idea what add means, no clue what parameters are, and definitely no interest in your elegant C++ syntax.

It only understands 3 brutal truths:

  1. Jump to an address
  2. Save enough state to come back
  3. Return when done

Everything else is a human invention.

Let’s walk through how chaos becomes structure.


The CPU’s World: Only Jumps Exist

You compile your code and inspect the binary. Among thousands of instructions, you see:

asm
call 0x4012a0

The CPU interprets call like this:

  1. Push the return address onto the stack
  2. Move the instruction pointer to 0x4012a0

That’s it.

There is:

  • No function name
  • No argument list
  • No signature
  • No type
  • No object representing the function

It’s just:

SAVE → JUMP → EXECUTE → RETURN

A function at this level is nothing more than organized control flow.


The Stack Frame: A Reverse Engineer’s Compass

Most functions follow a recognizable pattern called a stack frame. Not because the CPU needs it, but because the ABI encourages it.

A common x86-64 System V ABI prologue looks like:

asm
push rbp
mov rbp, rsp
sub rsp, 0x20

Let’s decode it like a detective:

InstructionMeaning we infer
push rbpSave old frame base
mov rbp, rspEstablish new frame pointer
sub rsp, 0x20Reserve 32 bytes for locals

The CPU sees only memory reservation.

The reverse engineer sees:

  • Local variables exist at fixed rbp offsets
  • The function expects 16-byte stack alignment
  • rbp marks a function boundary

Even in stripped binaries, this pattern reveals structure.

Example: after compiling our add function without optimization, it may look like:

asm
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-0x4], edi
mov DWORD PTR [rbp-0x8], esi
mov eax, edi
add eax, esi
pop rbp
ret

Notice something magical:

There are no variable names, but we still understand that:

  • edi and esi are arguments
  • They were stored as locals on the stack
  • Result returned in eax

Local variables exist without identity — just offsets and lifetimes.


Arguments Without Parameters

In C++ we declare:

cpp
int sum(int a, int b);

But the CPU only sees this:

asm
mov edi, 5
mov esi, 7
call 0x401180

Because the System V ABI defines:

  • 1st argument → rdi
  • 2nd argument → rsi
  • Return → rax

We can infer the missing signature instantly.

Another example with more parameters:

cpp
void log_values(int a, int b, int c, int d, int e, int f, int g) { }

The compiled call will look like:

asm
mov edi, 1      ; rdi
mov esi, 2      ; rsi
mov edx, 3      ; rdx
mov ecx, 4      ; rcx
mov r8d, 5      ; r8
mov r9d, 6      ; r9
push 7          ; 7th → stack
call 0x401200
add rsp, 8      ; stack cleanup

Even without names, the contract gives them meaning.


Return Values Without Types

Look at this code fragment in assembly:

asm
call 0x401180
add eax, 1

We know:

  • The function returned a value
  • It fits in 32 bits
  • It is being used as an integer
  • It came through rax/eax

The type is defined by usage pattern, not declaration.

Floating-point example:

asm
call 0x401180
add eax, 1

Call + usage in assembly:

asm
movss xmm0, [a]
movss xmm1, [b]
call 0x401300
addss xmm0, xmm2  ; returned float used in FP math
ret

We know the function returns float because it returns via xmm0 and is consumed by FP instructions.


When Stack Frames Disappear (Optimization Strikes Back)

You enable -O2 and compile again. Suddenly your detective compass is gone:

asm
sub rsp, 0x18
mov DWORD PTR [rsp+0xc], edi
mov DWORD PTR [rsp+0x8], esi
add edi, esi
mov eax, edi
ret

There is:

  • No rbp
  • No stable frame
  • No variable offsets tied to a base pointer

But the function still exists.

It just got… naked.

Reverse engineers now track:

  • Stack deltas
  • Slot lifetimes
  • Register behavior
  • Return instructions

The complexity didn’t increase — the training wheels were removed.


Leaf & Tail-Call Functions: The Minimalist Monks

Some functions don’t use the stack at all:

asm
add edi, esi
mov eax, edi
ret

This is a leaf function (calls nothing else).

Others may end with a tail call:

asm
jmp 0x401500   ; tail call instead of call+ret

Still functions, still valid — just different shape.


Caller-Saved vs Callee-Saved: The Invisible Discipline

The ABI enforces register survival rules:

Caller-savedCallee-saved
rax, rcx, rdx, rdi, rsi, r8–r11rbx, rbp, r12–r15

So when you see:

asm
push rbx
...
pop rbx

You instantly know:

  • This is a function boundary
  • It modified rbx
  • It restored it to obey the contract

Even stripped binaries reveal function edges by this behavior.


How Reverse Engineers Discover Functions

They don’t search for names.

They search for contracts and patterns:

  • call instructions
  • Prologue sequences
  • ret instructions
  • Register save/restore symmetry
  • Stack alignment fixes
  • Instruction pointer destinations

Functions are inferred, not declared.


Why This Matters to C++ System Designers

Because bad design becomes bad binary behavior.

Expensive API

cpp
struct Big { char data[512]; };

Big process(Big input);

ABI sees:

  • 512 bytes copied by value
  • Possibly hidden sret pointer
  • Stack used unnecessarily

ABI-aware fix

cpp
void process(const Big& input, Big& output);

Now:

  • No large copy
  • No hidden return pointer
  • Registers carry meaning
  • Stack stays clean

This is why high-performance C++ requires ABI awareness.


The Core Definition

A function exists in machine code if and only if:

  1. Arguments arrive in the correct places
  2. Required state survives the call
  3. Control returns to the right address

cpp
int foo();

is decoration.

This:

jump → save → restore → ret

is reality.


When you looked at:

asm
mov edi, 5
mov esi, 7
call 0x401180

You asked yourself:

_“What are edi, esi, edx… and why do they sometimes hold hex like 0x5 or 0xA?”_

Let’s answer that clearly.


Register Names: Not Variables — Just Lanes for Data

In x86-64 (the architecture your examples come from), registers are general-purpose storage slots inside the CPU.

Each register has multiple “views” depending on size:

64-bit32-bit view16-bit view8-bit view
rdiedididil
rsiesisisil
rdxedxdxdl
rcxecxcxcl
r8r8dr8wr8b
r9r9dr9wr9b
raxeaxaxal

So:

  • edi is not a variable named "edi"
  • It is the 32-bit slice of register rdi
  • Compilers use it to pass int values by convention

Why Do We See Hex Values in Registers?

Because assembly shows you the actual literal values being placed in them.

Example in C++:

cpp
int a = 10;
int b = 15;
int c = a + b;

Compiler might generate:

asm
mov edi, 0xA     ; 10 decimal → 0xA in hex
mov esi, 0xF     ; 15 decimal → 0xF in hex
add edi, esi     ; 10 + 15 = 25
mov eax, 0x19    ; 25 decimal → 0x19 hex

So hex is just another number format:

DecimalHex
50x5
100xA
150xF
250x19
2550xFF

The CPU doesn’t care — it’s all binary.


Role of Common Registers in Function Calls (System V ABI)

By contract:

Argument orderRegister lane
1strdi (32-bit: edi)
2ndrsi (esi)
3rdrdx (edx)
4thrcx (ecx)
5thr8 (r8d)
6thr9 (r9d)

So when you see this call:

asm
mov edi, 0x2C       ; 44 decimal
mov esi, 0x64       ; 100 decimal
mov edx, 0xFF       ; 255 decimal
call 0x401180

You infer:

cpp
someFunction(44, 100, 255);

Even though the signature was stripped.


Function Returns Also Use a Lane

Return values come back via rax:

asm
call 0x401180
mov ebx, eax       ; copying returned int into ebx

Equivalent to C++:

cpp
int result = foo();
int x = result;


A Mini Story Example Putting It All Together

C++ Code

cpp
int compute(int x, int y, int z) {
    return (x + y) * z;
}

int main() {
    int a = 4;
    int b = 5;
    int c = 3;
    int r = compute(a, b, c);
    r += 1;
}

What the CPU Actually Sees (one possible compiled output)

asm
main:
    mov edi, 0x4      ; a = 4
    mov esi, 0x5      ; b = 5
    mov edx, 0x3      ; c = 3
    call 0x401200     ; compute(a, b, c)

    add eax, 0x1      ; r += 1
    ret

401200 <compute>:
    add edi, esi      ; x + y  → edi now = 9 (0x9)
    imul edi, edx     ; (x + y) * z → 9 * 3 = 27 (0x1B)
    mov eax, edi      ; return value → rax lane
    ret

Reverse Engineer Interpretation

  • Function starts at 0x401200
  • 3 arguments passed in rdi, rsi, rdx
  • Math performed using integer ops (add, imul)
  • Return used as 32-bit integer (eax)
  • Control returned via ret
  • Caller modified return value after call

All meaning reconstructed without symbols.


**6. Key Insight for You Going Forward**

When you see register names + hex values:

  • They are value lanes, not variables
  • Hex is just the literal numeric representation
  • The ABI gives them semantic meaning
  • You can map them back to C/C++ parameters
  • Function boundaries are recognized by behavior symmetry, not names