Functions Reality
Imagine you open your laptop, fire up a C++ project, and write this innocent line:
int add(int a, int b) {
return a + b;
}
You smile. You just created a function… right?
Well… not according to the CPU.
The processor staring at your compiled binary has no concept of a function, no idea what add means, no clue what parameters are, and definitely no interest in your elegant C++ syntax.
It only understands 3 brutal truths:
- Jump to an address
- Save enough state to come back
- Return when done
Everything else is a human invention.
Let’s walk through how chaos becomes structure.
The CPU’s World: Only Jumps Exist
You compile your code and inspect the binary. Among thousands of instructions, you see:
call 0x4012a0
The CPU interprets call like this:
- Push the return address onto the stack
- Move the instruction pointer to
0x4012a0
That’s it.
There is:
- No function name
- No argument list
- No signature
- No type
- No object representing the function
It’s just:
SAVE → JUMP → EXECUTE → RETURN
A function at this level is nothing more than organized control flow.
The Stack Frame: A Reverse Engineer’s Compass
Most functions follow a recognizable pattern called a stack frame. Not because the CPU needs it, but because the ABI encourages it.
A common x86-64 System V ABI prologue looks like:
push rbp
mov rbp, rsp
sub rsp, 0x20
Let’s decode it like a detective:
| Instruction | Meaning we infer |
|---|---|
push rbp | Save old frame base |
mov rbp, rsp | Establish new frame pointer |
sub rsp, 0x20 | Reserve 32 bytes for locals |
The CPU sees only memory reservation.
The reverse engineer sees:
- Local variables exist at fixed
rbpoffsets - The function expects 16-byte stack alignment
rbpmarks a function boundary
Even in stripped binaries, this pattern reveals structure.
Example: after compiling our add function without optimization, it may look like:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-0x4], edi
mov DWORD PTR [rbp-0x8], esi
mov eax, edi
add eax, esi
pop rbp
ret
Notice something magical:
There are no variable names, but we still understand that:
ediandesiare arguments- They were stored as locals on the stack
- Result returned in
eax
Local variables exist without identity — just offsets and lifetimes.
Arguments Without Parameters
In C++ we declare:
int sum(int a, int b);
But the CPU only sees this:
mov edi, 5
mov esi, 7
call 0x401180
Because the System V ABI defines:
- 1st argument →
rdi - 2nd argument →
rsi - Return →
rax
We can infer the missing signature instantly.
Another example with more parameters:
void log_values(int a, int b, int c, int d, int e, int f, int g) { }
The compiled call will look like:
mov edi, 1 ; rdi
mov esi, 2 ; rsi
mov edx, 3 ; rdx
mov ecx, 4 ; rcx
mov r8d, 5 ; r8
mov r9d, 6 ; r9
push 7 ; 7th → stack
call 0x401200
add rsp, 8 ; stack cleanup
Even without names, the contract gives them meaning.
Return Values Without Types
Look at this code fragment in assembly:
call 0x401180
add eax, 1
We know:
- The function returned a value
- It fits in 32 bits
- It is being used as an integer
- It came through
rax/eax
The type is defined by usage pattern, not declaration.
Floating-point example:
call 0x401180
add eax, 1
Call + usage in assembly:
movss xmm0, [a]
movss xmm1, [b]
call 0x401300
addss xmm0, xmm2 ; returned float used in FP math
ret
We know the function returns float because it returns via xmm0 and is consumed by FP instructions.
When Stack Frames Disappear (Optimization Strikes Back)
You enable -O2 and compile again. Suddenly your detective compass is gone:
sub rsp, 0x18
mov DWORD PTR [rsp+0xc], edi
mov DWORD PTR [rsp+0x8], esi
add edi, esi
mov eax, edi
ret
There is:
- No
rbp - No stable frame
- No variable offsets tied to a base pointer
But the function still exists.
It just got… naked.
Reverse engineers now track:
- Stack deltas
- Slot lifetimes
- Register behavior
- Return instructions
The complexity didn’t increase — the training wheels were removed.
Leaf & Tail-Call Functions: The Minimalist Monks
Some functions don’t use the stack at all:
add edi, esi
mov eax, edi
ret
This is a leaf function (calls nothing else).
Others may end with a tail call:
jmp 0x401500 ; tail call instead of call+ret
Still functions, still valid — just different shape.
Caller-Saved vs Callee-Saved: The Invisible Discipline
The ABI enforces register survival rules:
| Caller-saved | Callee-saved |
|---|---|
rax, rcx, rdx, rdi, rsi, r8–r11 | rbx, rbp, r12–r15 |
So when you see:
push rbx
...
pop rbx
You instantly know:
- This is a function boundary
- It modified
rbx - It restored it to obey the contract
Even stripped binaries reveal function edges by this behavior.
How Reverse Engineers Discover Functions
They don’t search for names.
They search for contracts and patterns:
callinstructions- Prologue sequences
retinstructions- Register save/restore symmetry
- Stack alignment fixes
- Instruction pointer destinations
Functions are inferred, not declared.
Why This Matters to C++ System Designers
Because bad design becomes bad binary behavior.
Expensive API
struct Big { char data[512]; };
Big process(Big input);
ABI sees:
- 512 bytes copied by value
- Possibly hidden
sretpointer - Stack used unnecessarily
ABI-aware fix
void process(const Big& input, Big& output);
Now:
- No large copy
- No hidden return pointer
- Registers carry meaning
- Stack stays clean
This is why high-performance C++ requires ABI awareness.
The Core Definition
A function exists in machine code if and only if:
- Arguments arrive in the correct places
- Required state survives the call
- Control returns to the right address
int foo();
is decoration.
This:
jump → save → restore → ret
is reality.
When you looked at:
mov edi, 5
mov esi, 7
call 0x401180
You asked yourself:
_“What are edi, esi, edx… and why do they sometimes hold hex like 0x5 or 0xA?”_
Let’s answer that clearly.
Register Names: Not Variables — Just Lanes for Data
In x86-64 (the architecture your examples come from), registers are general-purpose storage slots inside the CPU.
Each register has multiple “views” depending on size:
| 64-bit | 32-bit view | 16-bit view | 8-bit view |
|---|---|---|---|
rdi | edi | di | dil |
rsi | esi | si | sil |
rdx | edx | dx | dl |
rcx | ecx | cx | cl |
r8 | r8d | r8w | r8b |
r9 | r9d | r9w | r9b |
rax | eax | ax | al |
So:
ediis not a variable named "edi"- It is the 32-bit slice of register rdi
- Compilers use it to pass
intvalues by convention
Why Do We See Hex Values in Registers?
Because assembly shows you the actual literal values being placed in them.
Example in C++:
int a = 10;
int b = 15;
int c = a + b;
Compiler might generate:
mov edi, 0xA ; 10 decimal → 0xA in hex
mov esi, 0xF ; 15 decimal → 0xF in hex
add edi, esi ; 10 + 15 = 25
mov eax, 0x19 ; 25 decimal → 0x19 hex
So hex is just another number format:
| Decimal | Hex |
|---|---|
| 5 | 0x5 |
| 10 | 0xA |
| 15 | 0xF |
| 25 | 0x19 |
| 255 | 0xFF |
The CPU doesn’t care — it’s all binary.
Role of Common Registers in Function Calls (System V ABI)
By contract:
| Argument order | Register lane |
|---|---|
| 1st | rdi (32-bit: edi) |
| 2nd | rsi (esi) |
| 3rd | rdx (edx) |
| 4th | rcx (ecx) |
| 5th | r8 (r8d) |
| 6th | r9 (r9d) |
So when you see this call:
mov edi, 0x2C ; 44 decimal
mov esi, 0x64 ; 100 decimal
mov edx, 0xFF ; 255 decimal
call 0x401180
You infer:
someFunction(44, 100, 255);
Even though the signature was stripped.
Function Returns Also Use a Lane
Return values come back via rax:
call 0x401180
mov ebx, eax ; copying returned int into ebx
Equivalent to C++:
int result = foo();
int x = result;
A Mini Story Example Putting It All Together
C++ Code
int compute(int x, int y, int z) {
return (x + y) * z;
}
int main() {
int a = 4;
int b = 5;
int c = 3;
int r = compute(a, b, c);
r += 1;
}
What the CPU Actually Sees (one possible compiled output)
main:
mov edi, 0x4 ; a = 4
mov esi, 0x5 ; b = 5
mov edx, 0x3 ; c = 3
call 0x401200 ; compute(a, b, c)
add eax, 0x1 ; r += 1
ret
401200 <compute>:
add edi, esi ; x + y → edi now = 9 (0x9)
imul edi, edx ; (x + y) * z → 9 * 3 = 27 (0x1B)
mov eax, edi ; return value → rax lane
ret
Reverse Engineer Interpretation
- Function starts at
0x401200 - 3 arguments passed in
rdi,rsi,rdx - Math performed using integer ops (
add,imul) - Return used as 32-bit integer (
eax) - Control returned via
ret - Caller modified return value after call
All meaning reconstructed without symbols.
**6. Key Insight for You Going Forward**
When you see register names + hex values:
- They are value lanes, not variables
- Hex is just the literal numeric representation
- The ABI gives them semantic meaning
- You can map them back to C/C++ parameters
- Function boundaries are recognized by behavior symmetry, not names