What remains is memory, addresses, and instructions.
Understanding this transition—from language-level meaning to machine-level execution—is the foundation for systems programming, performance engineering, debugging, and reverse engineering.
The Illusion of Types in C++
In C++, everything starts with types. Variables are declared with explicit intent:
int x = 10;
char c = 'A';
struct S {
char a;
int b;
};
From a programmer’s perspective:
intrepresents a numbercharrepresents a characterstructgroups related data
This mental model is correct for reasoning, but incorrect for execution.
At runtime, the CPU does not know what an int or char is.
It only knows how many bytes to move and where to move them.
Types exist only to help the compiler generate correct instructions.
Value vs Reference: Semantics vs Reality
Consider two functions:
void byValue(int x);
void byRef(int& x);
In C++ teaching:
- Passing by value → creates a copy
- Passing by reference → creates an alias
At the machine level:
- Both are just values
- One value happens to be an address
A reference is not a special runtime construct. It is simply a pointer that the compiler treats with stricter rules. When compiled, the distinction disappears entirely—what remains is register usage and memory access.
This is why reverse engineers cannot distinguish references from pointers without context.
`sizeof`: Where Types Touch Reality
The sizeof operator is one of the few places where C++ exposes its connection to the machine:
sizeof(char); // always 1
sizeof(int); // usually 4
sizeof(void*); // 8 on x86-64
Here, types collapse into byte counts.
That number dictates:
- Stack space allocation
- Memory access width
- Instruction selection
The compiler does not care that something is an int.
It only cares that it occupies 4 bytes and requires 4-byte alignment.
Alignment and Padding: The Hidden Layout
Now consider this structure:
struct S {
char a;
int b;
};
At first glance, it seems trivial:
a→ 1 byteb→ 4 bytes- Total → 5 bytes
But on most modern architectures:
sizeof(S) == 8
Why?
Because CPUs prefer aligned memory access.
An int wants to start at an address divisible by 4.
The compiler inserts padding to satisfy this requirement.
The actual memory layout looks like this:
Offset 0: char a (1 byte)
Offset 1–3: padding (3 bytes)
Offset 4–7: int b (4 bytes)
This padding:
- Is never visible in C++
- Is never named
- Exists purely for performance and correctness
Padding is one of the first things reverse engineers must reconstruct manually.
Stack vs Heap: Where Layout Becomes Critical
When a structure is allocated on the stack:
void f() {
S s;
}
The compiler generates a fixed layout relative to the stack frame:
[rbp - 8] → s.a
[rbp - 4] → s.b
The offsets encode all type information implicitly.
On the heap:
S* p = new S;
The memory allocator returns raw bytes.
There is no runtime metadata saying “this is an S”.
The only thing that makes it an S is how the program accesses it.
This distinction is fundamental in reverse engineering:
heap objects must be inferred entirely from access patterns.
When Symbols and Types Disappear
Understanding Binaries Through Architecture
At the source-code level, software feels descriptive. Variables have names, functions have signatures, and types explain intent. This richness gives the illusion that the compiled program somehow “contains” this information.
It does not.
Once compilation finishes—especially when symbols are stripped—the executable becomes a pure architectural artifact. Understanding this transition is essential for low-level programming, debugging, performance work, and reverse engineering.
What Are Symbols, Really?
Symbols Exist for Humans, Not for CPUs
Let’s start with the simplest possible function:
int add(int a, int b) {
return a + b;
}
At the source level, this function is rich with meaning:
- The function is called
add - It takes two parameters named
aandb - Each parameter has type
int - The return type is
int
During compilation, all of this information is placed into symbol tables. These tables allow the compiler and linker to reason about your program
add:
mov eax, edi
add eax, esi
ret
But once code generation is complete, the CPU does not care about _any_ of this.
- Variable name → memory address
- Function name → entry point address
- Struct field name → offset within an object
During compilation, the compiler maintains symbol tables to:
- Resolve references
- Check types
- Generate correct code
These symbols are compiler scaffolding, not runtime necessities.
Symbol Tables: Temporary by Design
During compilation and linking:
- The compiler knows variable names
- The linker resolves symbol addresses
- Debug information may be generated
After linking:
- The program no longer needs names
- The CPU cannot use names
- Addresses are sufficient
That is why symbol tables are:
- Optional
- Removable
- Frequently stripped in production builds
Stripped Binaries: What Gets Removed
00401120:
mov eax, edi
add eax, esi
ret
When a binary is stripped:
- Function names are removed
- Variable names are removed
- Type information is removed
- Debug metadata is removed
What remains:
- Machine instructions
- Absolute and relative addresses
- Constants
- Raw data sections
This is not obfuscation.
This is how machines work.
Symbols are optional metadata, not executable information.
What a Binary Actually Contains
A compiled binary typically consists of (memory segments):
- .text → executable instructions
- .data → initialized global data
- .bss → zero-initialized data
- .rodata → constants
- Relocation tables (sometimes)
- Optional symbol/debug sections
None of these sections encode:
structdefinitions- Variable names
- C++ type hierarchies
- Templates
- References
Those exist only before code generation.
Why Types Cannot Exist at Runtime
A CPU executes instructions like:
mov eax, [rbp-4]
add eax, 1
mov [rbp-4], eax
The CPU understands:
- Registers
- Memory addresses
- Instruction widths
It does not understand:
intcharstructreference
If types were preserved:
- Instructions would need type decoding
- CPUs would become language-dependent
- Performance would collapse
Instead, types are compiled away into:
- Access size (byte, word, dword, qword)
- Alignment guarantees
- Instruction selection
Final Mental Model
| C++ Concept | Binary Reality |
|---|---|
| Variable name | Memory offset |
| Type | Access width |
| Struct | Offset pattern |
| Reference | Address |
| Function | Entry point |
| Signature | ABI convention |