CODE: C CPP Variables Types Reality | Amr Tarek

One of the biggest mental traps when learning C++ is believing that types are real at runtime.C++ presents types as concrete entities—int, char, struct, references—but once the compiler finishes its job, types vanish completely.

What remains is memory, addresses, and instructions.

Understanding this transition—from language-level meaning to machine-level execution—is the foundation for systems programming, performance engineering, debugging, and reverse engineering.

The Illusion of Types in C++

In C++, everything starts with types. Variables are declared with explicit intent:

int x = 10;
char c = 'A';

struct S {
    char a;
    int b;
};

From a programmer’s perspective:

int represents a number
char represents a character
struct groups related data

This mental model is correct for reasoning, but incorrect for execution.

At runtime, the CPU does not know what an int or char is.

It only knows how many bytes to move and where to move them.

Types exist only to help the compiler generate correct instructions.

Value vs Reference: Semantics vs Reality

Consider two functions:

void byValue(int x);
void byRef(int& x);

In C++ teaching:

Passing by value → creates a copy
Passing by reference → creates an alias

At the machine level:

Both are just values
One value happens to be an address

A reference is not a special runtime construct. It is simply a pointer that the compiler treats with stricter rules. When compiled, the distinction disappears entirely—what remains is register usage and memory access.

This is why reverse engineers cannot distinguish references from pointers without context.

`sizeof`: Where Types Touch Reality

The sizeof operator is one of the few places where C++ exposes its connection to the machine:

sizeof(char); // always 1
sizeof(int);  // usually 4
sizeof(void*); // 8 on x86-64

Here, types collapse into byte counts.

That number dictates:

Stack space allocation
Memory access width
Instruction selection

The compiler does not care that something is an int.

It only cares that it occupies 4 bytes and requires 4-byte alignment.

Alignment and Padding: The Hidden Layout

Now consider this structure:

struct S {
    char a;
    int b;
};

At first glance, it seems trivial:

a → 1 byte
b → 4 bytes
Total → 5 bytes

But on most modern architectures:

sizeof(S) == 8

Why?

Because CPUs prefer aligned memory access.

An int wants to start at an address divisible by 4.

The compiler inserts padding to satisfy this requirement.

The actual memory layout looks like this:

Offset 0: char a     (1 byte)
Offset 1–3: padding  (3 bytes)
Offset 4–7: int b    (4 bytes)

This padding:

Is never visible in C++
Is never named
Exists purely for performance and correctness

Padding is one of the first things reverse engineers must reconstruct manually.

Stack vs Heap: Where Layout Becomes Critical

When a structure is allocated on the stack:

void f() {
    S s;
}

The compiler generates a fixed layout relative to the stack frame:

[rbp - 8]  → s.a
[rbp - 4]  → s.b

The offsets encode all type information implicitly.

On the heap:

S* p = new S;

The memory allocator returns raw bytes.

There is no runtime metadata saying “this is an S”.

The only thing that makes it an S is how the program accesses it.

This distinction is fundamental in reverse engineering:

heap objects must be inferred entirely from access patterns.

When Symbols and Types Disappear

Understanding Binaries Through Architecture

At the source-code level, software feels descriptive. Variables have names, functions have signatures, and types explain intent. This richness gives the illusion that the compiled program somehow “contains” this information.

It does not.

Once compilation finishes—especially when symbols are stripped—the executable becomes a pure architectural artifact. Understanding this transition is essential for low-level programming, debugging, performance work, and reverse engineering.

What Are Symbols, Really?

Symbols Exist for Humans, Not for CPUs

Let’s start with the simplest possible function:

int add(int a, int b) {
    return a + b;
}

At the source level, this function is rich with meaning:

The function is called add
It takes two parameters named a and b
Each parameter has type int
The return type is int

During compilation, all of this information is placed into symbol tables. These tables allow the compiler and linker to reason about your program

add:
    mov eax, edi
    add eax, esi
    ret

But once code generation is complete, the CPU does not care about _any_ of this.

Variable name → memory address
Function name → entry point address
Struct field name → offset within an object

During compilation, the compiler maintains symbol tables to:

Resolve references
Check types
Generate correct code

These symbols are compiler scaffolding, not runtime necessities.

Symbol Tables: Temporary by Design

During compilation and linking:

The compiler knows variable names
The linker resolves symbol addresses
Debug information may be generated

After linking:

The program no longer needs names
The CPU cannot use names
Addresses are sufficient

That is why symbol tables are:

Optional
Removable
Frequently stripped in production builds

Stripped Binaries: What Gets Removed

00401120:
    mov eax, edi
    add eax, esi
    ret

When a binary is stripped:

Function names are removed
Variable names are removed
Type information is removed
Debug metadata is removed

What remains:

Machine instructions
Absolute and relative addresses
Constants
Raw data sections

This is not obfuscation.

This is how machines work.

Symbols are optional metadata, not executable information.

What a Binary Actually Contains

A compiled binary typically consists of (memory segments):

.text → executable instructions
.data → initialized global data
.bss → zero-initialized data
.rodata → constants
Relocation tables (sometimes)
Optional symbol/debug sections

None of these sections encode:

struct definitions
Variable names
C++ type hierarchies
Templates
References

Those exist only before code generation.

Why Types Cannot Exist at Runtime

A CPU executes instructions like:

mov eax, [rbp-4]
add eax, 1
mov [rbp-4], eax

The CPU understands:

Registers
Memory addresses
Instruction widths

It does not understand:

int
char
struct
reference

If types were preserved:

Instructions would need type decoding
CPUs would become language-dependent
Performance would collapse

Instead, types are compiled away into:

Access size (byte, word, dword, qword)
Alignment guarantees
Instruction selection

Final Mental Model

C++ Concept	Binary Reality
Variable name	Memory offset
Type	Access width
Struct	Offset pattern
Reference	Address
Function	Entry point
Signature	ABI convention