ES: What is the Bare Metal development

**Bare metal development** involves writing software that runs directly on a microcontroller or processor without an operating system (OS). This approach offers full control over hardware resources but requires manual management of tasks typically handled by an OS, such as memory, I/O, and peripherals.

As an embedded engineer, especially if you're targeting Cortex-M, ESP32, or even small AVR systems, bare metal is not just a programming style — it’s a system philosophy:

You are the operating system.

That means:

  • You control memory.
  • You control timing.
  • You control interrupts.
  • You control power.
  • You control failure modes.

And if something crashes… it’s because you designed it that way.

Let’s build this idea step by step in a connected and continuous way.


What is Bare Metal Development?

Bare metal programming means running software directly on hardware without an operating system.

There is:

  • No scheduler
  • No process isolation
  • No kernel services
  • No file system (unless you build one)
  • No memory protection (unless hardware provides MPU and you configure it)

The CPU resets and starts executing your code from the reset vector.

txt
+--------------------+
|   Power On Reset   |
+--------------------+
           |
           v
+--------------------+
|  Vector Table      |
|  Reset Handler     |
+--------------------+
           |
           v
+--------------------+
|   Startup Code     |
+--------------------+
           |
           v
+--------------------+
|      main()        |
+--------------------+
           |
           v
+--------------------+
|  Infinite Loop     |
+--------------------+

Bare metal gives you:

  • Deterministic timing
  • Minimal latency
  • Small memory footprint
  • Full hardware control

But it also requires:

  • Manual interrupt handling
  • Manual memory layout design
  • Manual peripheral configuration
  • Manual stack management
  • Manual fault handling

This is why bare metal is used in:

  • Motor control
  • Automotive ECUs
  • Medical devices
  • Aerospace controllers
  • Ultra-low power IoT nodes
  • Bootloaders
  • Safety-critical systems

Choosing the Hardware Platform

The first architectural decision is choosing the MCU family.

Common platforms include:

ARM Cortex-M

  • Used in STM32, NXP, Nordic, etc.
  • 32-bit
  • NVIC interrupt controller
  • Optional MPU
  • Hardware debug via SWD/JTAG

AVR (Arduino-class)

  • 8-bit
  • Very simple architecture
  • Great for learning fundamentals

ESP32

  • Dual core Xtensa (or RISC-V in newer versions)
  • WiFi + Bluetooth integrated
  • Can run bare metal or FreeRTOS

From a system perspective, selection affects:

  • Clock architecture
  • Interrupt latency
  • Memory map
  • Peripheral richness
  • Debug interface
  • Security features

Setting Up the Toolchain

Bare metal requires a cross-compilation environment because you are compiling on x86 but targeting ARM/AVR.

Core components:

Compiler

  • GCC (arm-none-eabi-gcc)
  • Clang
  • Vendor-specific toolchains

Assembler

  • Converts startup assembly into object code

Linker

  • Combines object files into final ELF
  • Uses a linker script to map memory

Debugger

  • GDB
  • OpenOCD
  • ST-Link tools
  • J-Link tools

The build flow looks like:

txt
.c/.s files
     |
     v
+-----------+
|  Compiler |
+-----------+
     |
     v
+-----------+
|  Objects  |
+-----------+
     |
     v
+-----------+
|  Linker   |
+-----------+
     |
     v
+-----------+
|   ELF     |
+-----------+
     |
     v
+-----------+
|   Binary  |
+-----------+


Startup Code — The Invisible Foundation

Before main() runs, the system must be initialized.

Startup code typically:

  • Sets the stack pointer
  • Copies .data from Flash to RAM
  • Clears .bss
  • Initializes system clock
  • Sets up vector table
  • Jumps to main()

For ARM Cortex-M:

txt
Vector Table:
0x00000000 -> Initial Stack Pointer
0x00000004 -> Reset Handler
0x00000008 -> NMI Handler
0x0000000C -> HardFault Handler
...

This file is often written in assembly because it runs before C runtime exists.

Without correct startup code:

  • Interrupts won’t work
  • Global variables will be corrupted
  • The system may HardFault immediately

Memory Layout and the Linker Script

The linker script defines how Flash and RAM are used.

Typical Cortex-M layout:

txt
FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 512K
RAM   (rwx): ORIGIN = 0x20000000, LENGTH = 128K

Sections:

  • .text → code (Flash)
  • .rodata → constants (Flash)
  • .data → initialized variables (RAM)
  • .bss → zero-initialized variables (RAM)
  • .stack
  • .heap

You control:

  • Where interrupt vectors go
  • Stack size
  • Heap size
  • Bootloader offset
  • Memory protection regions

In secure systems, this is critical.


Direct Hardware Register Programming

Now we reach the core of bare metal: register-level access.

Every peripheral is memory-mapped.

Example using STM32 (ARM Cortex-M):

txt
RCC->AHB1ENR |= (1 << 0);   // Enable GPIOA clock
GPIOA->MODER |= (1 << 10);  // PA5 as output

What actually happens:

txt
Memory Address 0x40023830 -> RCC_AHB1ENR
Memory Address 0x40020000 -> GPIOA base

CPU writes directly to those addresses.

Architecture view:

txt
CPU
 |
 |  AHB Bus
 |
 +------------------+
 | RCC              |
 | GPIO             |
 | TIMERS           |
 | UART             |
 +------------------+

You must:

  • Enable peripheral clock
  • Configure mode
  • Configure speed
  • Configure pull-up/down
  • Configure alternate function if needed

Missing one step = undefined behavior.


Writing Application Logic

Bare metal application typically looks like:

cpp
int main(void)
{
    init();
    while(1)
    {
        task1();
        task2();
        task3();
    }
}

Or interrupt-driven:

txt
Main Loop:
  Sleep()

Interrupt:
  Handle event

Example LED blinking:

cpp
#include "stm32f4xx.h"  // Device-specific header

void initLED(void) {
    RCC->AHB1ENR |= (1 << 0);  // Enable GPIOA clock
    GPIOA->MODER |= (1 << 10); // Set PA5 as output
}

void delay(volatile uint32_t count) {
    while (count--) {}
}

int main(void) {
    initLED();  // Initialize LED (on PA5)

    while (1) {
        GPIOA->ODR ^= (1 << 5); // Toggle LED
        delay(1000000);
    }
}

Main toggles output register.

This is polling-based design.

More advanced systems use:

  • Timer interrupts
  • External interrupts
  • DMA transfers
  • Event-driven design

Interrupts — The Real Power of Bare Metal

Interrupts allow real-time response.

Flow:

txt
Event occurs  
     |  
     v  
Interrupt line triggered  
     |  
     v  
NVIC prioritizes  
     |  
     v  
ISR executes  
     |  
     v  
Return to main

Important concepts:

  • Latency
  • Interrupt nesting
  • Priority grouping
  • Critical sections
  • Race conditions

Security engineers care about:

  • Fault injection
  • Interrupt flooding
  • Stack overflow inside ISR