As an embedded engineer, especially if you're targeting Cortex-M, ESP32, or even small AVR systems, bare metal is not just a programming style — it’s a system philosophy:
You are the operating system.
That means:
- You control memory.
- You control timing.
- You control interrupts.
- You control power.
- You control failure modes.
And if something crashes… it’s because you designed it that way.
Let’s build this idea step by step in a connected and continuous way.
What is Bare Metal Development?
Bare metal programming means running software directly on hardware without an operating system.
There is:
- No scheduler
- No process isolation
- No kernel services
- No file system (unless you build one)
- No memory protection (unless hardware provides MPU and you configure it)
The CPU resets and starts executing your code from the reset vector.
+--------------------+
| Power On Reset |
+--------------------+
|
v
+--------------------+
| Vector Table |
| Reset Handler |
+--------------------+
|
v
+--------------------+
| Startup Code |
+--------------------+
|
v
+--------------------+
| main() |
+--------------------+
|
v
+--------------------+
| Infinite Loop |
+--------------------+
Bare metal gives you:
- Deterministic timing
- Minimal latency
- Small memory footprint
- Full hardware control
But it also requires:
- Manual interrupt handling
- Manual memory layout design
- Manual peripheral configuration
- Manual stack management
- Manual fault handling
This is why bare metal is used in:
- Motor control
- Automotive ECUs
- Medical devices
- Aerospace controllers
- Ultra-low power IoT nodes
- Bootloaders
- Safety-critical systems
Choosing the Hardware Platform
The first architectural decision is choosing the MCU family.
Common platforms include:
ARM Cortex-M
- Used in STM32, NXP, Nordic, etc.
- 32-bit
- NVIC interrupt controller
- Optional MPU
- Hardware debug via SWD/JTAG
AVR (Arduino-class)
- 8-bit
- Very simple architecture
- Great for learning fundamentals
ESP32
- Dual core Xtensa (or RISC-V in newer versions)
- WiFi + Bluetooth integrated
- Can run bare metal or FreeRTOS
From a system perspective, selection affects:
- Clock architecture
- Interrupt latency
- Memory map
- Peripheral richness
- Debug interface
- Security features
Setting Up the Toolchain
Bare metal requires a cross-compilation environment because you are compiling on x86 but targeting ARM/AVR.
Core components:
Compiler
- GCC (arm-none-eabi-gcc)
- Clang
- Vendor-specific toolchains
Assembler
- Converts startup assembly into object code
Linker
- Combines object files into final ELF
- Uses a linker script to map memory
Debugger
- GDB
- OpenOCD
- ST-Link tools
- J-Link tools
The build flow looks like:
.c/.s files
|
v
+-----------+
| Compiler |
+-----------+
|
v
+-----------+
| Objects |
+-----------+
|
v
+-----------+
| Linker |
+-----------+
|
v
+-----------+
| ELF |
+-----------+
|
v
+-----------+
| Binary |
+-----------+
Startup Code — The Invisible Foundation
Before main() runs, the system must be initialized.
Startup code typically:
- Sets the stack pointer
- Copies
.datafrom Flash to RAM - Clears
.bss - Initializes system clock
- Sets up vector table
- Jumps to
main()
For ARM Cortex-M:
Vector Table:
0x00000000 -> Initial Stack Pointer
0x00000004 -> Reset Handler
0x00000008 -> NMI Handler
0x0000000C -> HardFault Handler
...
This file is often written in assembly because it runs before C runtime exists.
Without correct startup code:
- Interrupts won’t work
- Global variables will be corrupted
- The system may HardFault immediately
Memory Layout and the Linker Script
The linker script defines how Flash and RAM are used.
Typical Cortex-M layout:
FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 512K
RAM (rwx): ORIGIN = 0x20000000, LENGTH = 128K
Sections:
.text→ code (Flash).rodata→ constants (Flash).data→ initialized variables (RAM).bss→ zero-initialized variables (RAM).stack.heap
You control:
- Where interrupt vectors go
- Stack size
- Heap size
- Bootloader offset
- Memory protection regions
In secure systems, this is critical.
Direct Hardware Register Programming
Now we reach the core of bare metal: register-level access.
Every peripheral is memory-mapped.
Example using STM32 (ARM Cortex-M):
RCC->AHB1ENR |= (1 << 0); // Enable GPIOA clock
GPIOA->MODER |= (1 << 10); // PA5 as output
What actually happens:
Memory Address 0x40023830 -> RCC_AHB1ENR
Memory Address 0x40020000 -> GPIOA base
CPU writes directly to those addresses.
Architecture view:
CPU
|
| AHB Bus
|
+------------------+
| RCC |
| GPIO |
| TIMERS |
| UART |
+------------------+
You must:
- Enable peripheral clock
- Configure mode
- Configure speed
- Configure pull-up/down
- Configure alternate function if needed
Missing one step = undefined behavior.
Writing Application Logic
Bare metal application typically looks like:
int main(void)
{
init();
while(1)
{
task1();
task2();
task3();
}
}
Or interrupt-driven:
Main Loop:
Sleep()
Interrupt:
Handle event
Example LED blinking:
#include "stm32f4xx.h" // Device-specific header
void initLED(void) {
RCC->AHB1ENR |= (1 << 0); // Enable GPIOA clock
GPIOA->MODER |= (1 << 10); // Set PA5 as output
}
void delay(volatile uint32_t count) {
while (count--) {}
}
int main(void) {
initLED(); // Initialize LED (on PA5)
while (1) {
GPIOA->ODR ^= (1 << 5); // Toggle LED
delay(1000000);
}
}
Main toggles output register.
This is polling-based design.
More advanced systems use:
- Timer interrupts
- External interrupts
- DMA transfers
- Event-driven design
Interrupts — The Real Power of Bare Metal
Interrupts allow real-time response.
Flow:
Event occurs
|
v
Interrupt line triggered
|
v
NVIC prioritizes
|
v
ISR executes
|
v
Return to main
Important concepts:
- Latency
- Interrupt nesting
- Priority grouping
- Critical sections
- Race conditions
Security engineers care about:
- Fault injection
- Interrupt flooding
- Stack overflow inside ISR