CODE: C CPP Binaries Speak Without Symbols or Types

At the source-code level, C and C++ feel expressive. We write variables with names, we define types, we organize data into structs and classes. The language gives us vocabulary, grammar, and abstraction. But none of that survives intact once the compiler finishes its job.

When a program becomes a binary, symbols disappear, types evaporate, and names die. What remains is not C++ - it is architecture.

From this moment onward, the real language of the program is defined by:

  • The Instruction Set Architecture (ISA)
  • The Application Binary Interface (ABI)
  • Calling conventions
  • The memory model

To understand binaries, crashes, exploits, or stripped executables, you must stop thinking like a C++ programmer and start thinking like the machine. Understanding binaries is not about language syntax—it is about architectural contracts.

ISA: The Vocabulary of the Machine

The Instruction Set Architecture defines _what the CPU can say_. It is the machine’s vocabulary.

The ISA specifies:

  • Which instructions exist (mov, add, call, cmp, …)
  • Operand sizes (8, 16, 32, 64 bits)
  • Addressing modes (register, immediate, base+offset)
  • Register names and roles

Consider this instruction:

asm
mov DWORD PTR [rdi+4], eax

There are no types here. No int. No struct name. No variable identifier. Yet this single line communicates a surprising amount of information.

Architecturally, we know:

  • A memory write is occurring
  • The write width is 4 bytes
  • The destination is at base pointer + offset
  • The source value comes from a 32-bit register

From behavior alone, a reverse engineer infers:

  • This memory location likely represents an int
  • It is likely a field inside a structure
  • The field is placed at offset +4
  • The structure is accessed through a pointer stored in rdi

None of this comes from C++. It comes from ISA semantics.

The CPU does not know what an int is. It only knows how many bytes you touched.


ABI: The Grammar That Gives Meaning

If the ISA is the vocabulary, the ABI is the grammar. It defines _how instructions relate to each other across boundaries_.

The Application Binary Interface specifies:

  • Stack layout
  • Alignment rules
  • How function arguments are passed
  • Where return values live
  • How structures are laid out in memory

On x86-64 System V ABI, for example:

  • int → 4 bytes, aligned to 4 bytes
  • char → 1 byte, alignment 1
  • Struct alignment = maximum alignment of its fields
  • First arguments go in rdi, rsi, rdx, rcx, r8, r9
  • Return values go in rax

When a reverse engineer reconstructs a struct, they are not “guessing C++.”

They are reconstructing ABI expectations.


Architecture Replaces Symbol Names

Consider this stripped assembly sequence:

asm
mov eax, [rdi+4]
mov byte ptr [rdi], al

There are no symbols. No type definitions. No debug info.

Yet architecture tells us everything that matters.

We observe:

  • [rdi] is accessed as 1 byte
  • [rdi+4] is accessed as 4 bytes
  • Both use the same base register (rdi)
  • The offsets are consistent
  • There is a gap between offset 0 and 4

From this, we infer a likely memory layout:

cpp
struct S {
    char a;      // offset 0
    char pad[3]; // implicit padding
    int b;       // offset 4
};

The padding was never written in source code—but the ABI forced it into existence.

This is critical:

**Padding is not a compiler accident. It is an architectural requirement.**

The architecture becomes the documentation.


Why Debug Builds Feel “Magical”

When debug symbols are present, everything feels different.

You see:

  • Function names
  • Variable names
  • Struct definitions
  • Type information
  • Source line mappings

This can trick developers into thinking:

“The binary knows my types.”

It does not.

What’s really happening is that the debugger overlays external metadata onto raw instructions. The binary itself still contains nothing but addresses, offsets, and opcodes.

Strip the symbols, and the illusion vanishes instantly.

The machine never knew your variable was called userCount.

It only knew you wrote 4 bytes at offset +12.


Architecture Thinking vs Language Thinking

At this level, your mental model must change completely.

Language ThinkingArchitecture Thinking
Variables have namesMemory has addresses
Types define behaviorAccess width defines behavior
Structs group fieldsOffsets imply structure
References are specialEverything is a value
Types enforce safetyABI enforces correctness

In binaries:

  • A reference is just a pointer value
  • A class is just memory plus functions
  • Encapsulation does not exist
  • Access patterns define meaning

If you reason about binaries using language thinking, you will be confused.

If you reason using architectural thinking, patterns emerge immediately.