CODE: C CPP Variables Types Reality

One of the biggest mental traps when learning C++ is believing that types are real at runtime.C++ presents types as concrete entities—int, char, struct, references—but once the compiler finishes its job, types vanish completely.

What remains is memory, addresses, and instructions.

Understanding this transition—from language-level meaning to machine-level execution—is the foundation for systems programming, performance engineering, debugging, and reverse engineering.

The Illusion of Types in C++

In C++, everything starts with types. Variables are declared with explicit intent:

cpp
int x = 10;
char c = 'A';

struct S {
    char a;
    int b;
};

From a programmer’s perspective:

  • int represents a number
  • char represents a character
  • struct groups related data

This mental model is correct for reasoning, but incorrect for execution.

At runtime, the CPU does not know what an int or char is.

It only knows how many bytes to move and where to move them.

Types exist only to help the compiler generate correct instructions.


Value vs Reference: Semantics vs Reality

Consider two functions:

cpp
void byValue(int x);
void byRef(int& x);

In C++ teaching:

  • Passing by value → creates a copy
  • Passing by reference → creates an alias

At the machine level:

  • Both are just values
  • One value happens to be an address

A reference is not a special runtime construct. It is simply a pointer that the compiler treats with stricter rules. When compiled, the distinction disappears entirely—what remains is register usage and memory access.

This is why reverse engineers cannot distinguish references from pointers without context.


`sizeof`: Where Types Touch Reality

The sizeof operator is one of the few places where C++ exposes its connection to the machine:

cpp
sizeof(char); // always 1
sizeof(int);  // usually 4
sizeof(void*); // 8 on x86-64

Here, types collapse into byte counts.

That number dictates:

  • Stack space allocation
  • Memory access width
  • Instruction selection

The compiler does not care that something is an int.

It only cares that it occupies 4 bytes and requires 4-byte alignment.


Alignment and Padding: The Hidden Layout

Now consider this structure:

cpp
struct S {
    char a;
    int b;
};

At first glance, it seems trivial:

  • a → 1 byte
  • b → 4 bytes
  • Total → 5 bytes

But on most modern architectures:

sizeof(S) == 8

Why?

Because CPUs prefer aligned memory access.

An int wants to start at an address divisible by 4.

The compiler inserts padding to satisfy this requirement.

The actual memory layout looks like this:

map
Offset 0: char a     (1 byte)
Offset 1–3: padding  (3 bytes)
Offset 4–7: int b    (4 bytes)

This padding:

  • Is never visible in C++
  • Is never named
  • Exists purely for performance and correctness

Padding is one of the first things reverse engineers must reconstruct manually.


Stack vs Heap: Where Layout Becomes Critical

When a structure is allocated on the stack:

cpp
void f() {
    S s;
}

The compiler generates a fixed layout relative to the stack frame:

map
[rbp - 8]  → s.a
[rbp - 4]  → s.b

The offsets encode all type information implicitly.

On the heap:

cpp
S* p = new S;

The memory allocator returns raw bytes.

There is no runtime metadata saying “this is an S”.

The only thing that makes it an S is how the program accesses it.

This distinction is fundamental in reverse engineering:

heap objects must be inferred entirely from access patterns.


When Symbols and Types Disappear

Understanding Binaries Through Architecture

At the source-code level, software feels descriptive. Variables have names, functions have signatures, and types explain intent. This richness gives the illusion that the compiled program somehow “contains” this information.

It does not.

Once compilation finishes—especially when symbols are stripped—the executable becomes a pure architectural artifact. Understanding this transition is essential for low-level programming, debugging, performance work, and reverse engineering.


What Are Symbols, Really?

Symbols Exist for Humans, Not for CPUs

Let’s start with the simplest possible function:

cpp
int add(int a, int b) {
    return a + b;
}

At the source level, this function is rich with meaning:

  • The function is called add
  • It takes two parameters named a and b
  • Each parameter has type int
  • The return type is int

During compilation, all of this information is placed into symbol tables. These tables allow the compiler and linker to reason about your program

asm
add:
    mov eax, edi
    add eax, esi
    ret

But once code generation is complete, the CPU does not care about _any_ of this.

  • Variable name → memory address
  • Function name → entry point address
  • Struct field name → offset within an object

During compilation, the compiler maintains symbol tables to:

  • Resolve references
  • Check types
  • Generate correct code

These symbols are compiler scaffolding, not runtime necessities.


Symbol Tables: Temporary by Design

During compilation and linking:

  • The compiler knows variable names
  • The linker resolves symbol addresses
  • Debug information may be generated

After linking:

  • The program no longer needs names
  • The CPU cannot use names
  • Addresses are sufficient

That is why symbol tables are:

  • Optional
  • Removable
  • Frequently stripped in production builds

Stripped Binaries: What Gets Removed

asm
00401120:
    mov eax, edi
    add eax, esi
    ret

When a binary is stripped:

  • Function names are removed
  • Variable names are removed
  • Type information is removed
  • Debug metadata is removed

What remains:

  • Machine instructions
  • Absolute and relative addresses
  • Constants
  • Raw data sections

This is not obfuscation.

This is how machines work.

Symbols are optional metadata, not executable information.

What a Binary Actually Contains

A compiled binary typically consists of (memory segments):

  • .text → executable instructions
  • .data → initialized global data
  • .bss → zero-initialized data
  • .rodata → constants
  • Relocation tables (sometimes)
  • Optional symbol/debug sections

None of these sections encode:

  • struct definitions
  • Variable names
  • C++ type hierarchies
  • Templates
  • References

Those exist only before code generation.


Why Types Cannot Exist at Runtime

A CPU executes instructions like:

asm
mov eax, [rbp-4]
add eax, 1
mov [rbp-4], eax

The CPU understands:

  • Registers
  • Memory addresses
  • Instruction widths

It does not understand:

  • int
  • char
  • struct
  • reference

If types were preserved:

  • Instructions would need type decoding
  • CPUs would become language-dependent
  • Performance would collapse

Instead, types are compiled away into:

  • Access size (byte, word, dword, qword)
  • Alignment guarantees
  • Instruction selection

Final Mental Model

C++ ConceptBinary Reality
Variable nameMemory offset
TypeAccess width
StructOffset pattern
ReferenceAddress
FunctionEntry point
SignatureABI convention