In embedded design, we typically classify software architecture into three main models:
- Bare-Metal (Super Loop)
- RTOS-Based Systems
- Embedded Linux / Generic OS Systems
When we design an embedded system, we are not just writing code — we are defining how the system thinks, reacts, schedules, isolates, and protects itself.
The architectural choice usually evolves like this:
Simple control system
↓
Growing complexity
↓
Concurrent activities
↓
Need for scheduling
↓
Need for isolation + networking + UI
↓
Full OS (Linux)
Let us walk step by step — and see how each design naturally leads to the next.
Bare-Metal Systems (Super Loop Architecture)
Bare-metal is where every embedded engineer begins.
Bare-metal means:
The firmware runs directly on hardware without any operating system.
There is:
- No operating system
- No scheduler
- No memory protection
- No abstraction layer
- Direct register-level control
The firmware runs directly after reset.
The CPU executes a single main() loop forever.
+-------------------+
Power → | Reset Vector |
+---------+---------+
|
v
+-------------------+
| Startup Code |
| (Init Stack/BSS) |
+---------+---------+
|
v
+-------------------+
| main() |
+---------+---------+
|
v
+----------------------+
| while(1) |
|----------------------|
| read_sensors() |
| process_data() |
| update_outputs() |
+----------------------+
int main(void)
{
init_peripherals();
while(1)
{
Task_A();
Task_B();
Task_C();
}
}
Time →
| A | B | C | A | B | C | A | ...
Everything runs sequentially.
At Bare-metal:
- Hardware is simple
- Timing is strict
- Memory is tiny
- Power must be minimal
- Cost must be lowest possible
- Ultra-low power
- No need for file system
- No need for OTA
If your system does:
- Read sensor
- Apply control law
- Drive output
You do NOT need a scheduler.
When Bare-Metal Becomes Dangerous
Problems appear when:
Blocking Code
read_uart(); // waits here...
While waiting:
- Other tasks starve
- Timing jitter increases
It looks like rotation.
But here is the key:
There is NO time control.
Each task runs until it finishes.
If Task B takes longer:
| A | BBBBBBBBB | C | A | ...
State Explosion
when you start writing:
if(flag1)
if(flag2)
if(flag3)
if(timeout_expired)
The system becomes:
+-------------------+
| Global Flags |
+-------------------+
|
---------------------------------
| | | |
Sensor UART Timer Display
Everything depends on shared globals.
No isolation. No priority.
There is a solution for it which is using the Interrupt
Interrupt Extension
To improve responsiveness, interrupts are used:
CPU
|
-------------------
| |
Main Loop Interrupt
|
v
ISR()
Example:
External Interrupt
void EXTI_IRQHandler(void)
{
button_pressed = 1;
}
Characteristics
- Deterministic (if well designed)
- Extremely small footprint (few KB)
- No context switching overhead
- Full hardware control
Limitations
- Hard to scale
- Blocking functions break timing
- No isolation
- No memory protection
- Complex state management
Real Examples
- Microchip Technology PIC firmware
- STMicroelectronics STM32 simple control loops
- Small motor drivers
- Sensor nodes
- Low-cost IoT devices
Interrupts solve one specific problem:
React immediately to an external event.
Time →
| Read | Control | ----INT---- | Display |
Latency becomes microseconds instead of milliseconds.
Great.
But now a new problem appears.
Too Many Interrupts
Imagine:
- UART interrupt
- Timer interrupt
- ADC interrupt
- GPIO interrupt
- Communication interrupt
System becomes:
+------------------+
| CPU |
+--------+---------+
|
-----------------------------------------
| | | | |
Timer UART ADC GPIO Main Loop
If ISRs become heavy:
- Nested interrupts
- Stack overflow risk
- Hard to debug timing
- Priority inversion
- Unpredictable jitter
So best practice says:
Keep ISR short. Do the work elsewhere.
So what do we do?
We move work out of ISR into scheduled tasks.
But in bare-metal, there is no scheduler.
Now we need a mechanism to fairly execute multiple tasks.
This leads naturally to Round Robin scheduling.
Round Robin Scheduling
After interrupts, system looks like:
while(1)
{
if(taskA_flag) run_taskA();
if(taskB_flag) run_taskB();
if(taskC_flag) run_taskC();
}
But this still has ordering bias.
If Task A is heavy, B and C starve.
So engineers ask:
Can we give each task equal CPU time?
That is Round Robin.
All tasks share CPU equally.
Time Slice = 5 ms
Execution:
| Task A | Task B | Task C | Task A | Task B | Task C |
Each task runs for fixed quantum.
Round Robin is usually driven by a timer interrupt.
Flow:
Timer Interrupt (every 5ms)
|
v
Scheduler selects
next task
|
v
Context Switch
Timeline:
Time →
| A | B | C | A | B | C |
^ ^ ^
| | |
Timer Timer Timer
Detailed view:
+---------------------+
| Timer ISR |
|---------------------|
| Save current stack |
| Select next task |
| Load next stack |
| Return from ISR |
+---------------------+
This is the foundation of an RTOS.
Limitations of Pure Round Robin
- What if Task A controls motor safety?
- What if Task C updates LED?
- Both get equal CPU time.
That is wrong.
Safety task must run first.
This leads to:
Priority-Based Scheduling
which will move us from Round Robin to Preemptive Priority RTOS
Instead of equal share:
High Priority Task → runs immediately
Low Priority Task → waits
Execution:
| Low | Low | HIGH | HIGH | Low |
Now system becomes:
- Deterministic
- Structured
- Real-time capable
This is full RTOS territory.
Real-Time Operating System (RTOS)
An RTOS provides:
- Task scheduler
- Context switching
- Inter-task communication
- Timers
- Synchronization primitives
- Deterministic scheduling
Popular examples:
- FreeRTOS
- Zephyr
- ThreadX
+-------------------+
| Application |
|-------------------|
| Task A (High) |
| Task B (Medium) |
| Task C (Low) |
+---------+---------+
|
v
+---------------------+
| RTOS Kernel |
|---------------------|
| Scheduler |
| Context Switch |
| Queues / Semaphores |
| IPC |
| Timers |
+---------+-----------+
|
v
+-------------------+
| Hardware |
+-------------------+
The scheduler decides:
Who runs now?
Different strategies exist.
Example in FreeRTOS:
xTaskCreate(Task1, "Task1", 1024, NULL, 2, NULL);
vTaskStartScheduler();
Scheduler
- Cooperative Scheduling
- Preemptive Scheduling
- Priority-based
- Round-robin
Cooperative Scheduling
Tasks yield manually:
while(1)
{
work();
yield();
}
Used when:
- Simple systems
- Predictable task behavior
- Lower overhead desired
Preemptive Scheduling
Higher priority interrupts lower priority.
Task C running (low)
↓
Task A becomes ready (high)
↓
Immediate preemption
Time Line:
| C | C | A | A | C | ...
Used when:
- Hard real-time tasks exist
- Deadlines must be guaranteed
RTOS is Necessary
When:
- 3 concurrent activities
- Need task prioritization
- Communication stack present
- BLE / TCP/IP stack used
- Deterministic latency required
- Firmware update logic exists
Typical memory:
- RAM: 64 KB – 512 KB
- Flash: 256 KB – 2 MB
- Medium complexity systems
Real Examples
- ESP32 IoT firmware
- Automotive ECUs
- Medical devices
- Industrial controllers
When RTOS start to Fail
When true process isolation needed, in RTOS all tasks share same memory:
+-----------------------------+
| Task A | Task B | Task C |
| Shared Heap / Globals |
+-----------------------------+
A buffer overflow in Task C can corrupt Task A.
When Complex Network needed:
- TLS
- HTTPS
- File system
- SQLite
- GUI framework
- Video streaming
RTOS becomes difficult to maintain.
Then we move to a full OS.
Generic OS
When system complexity becomes high, we move to a full OS like:
- Embedded Linux
- Android
Embedded Linux
A customized Linux distribution running on embedded hardware.
Linux exists in embedded systems because:
- Systems became mini computers
- Connectivity is default
- Security regulations increased
- OTA is mandatory
- UI expectations increased
Includes:
- Kernel
- Drivers
- User space
- File system
- Network stack
- Process management
- Memory protection (MMU)
+----------------------------------+
| User Space |
|----------------------------------|
| App1 | App2 | App3 | App4 |
+------------------+---------------+
|
v
+----------------------------------+
| Linux Kernel |
|----------------------------------|
| Scheduler (CFS) |
| Virtual Memory Manager |
| TCP/IP Stack |
| Drivers |
+------------------+---------------+
|
v
Hardware
It provides:
- Process isolation
- Virtual memory
- User/Kernel privilege separation
- Advanced scheduler
- Secure boot chains
- Package management
- Containers
Scheduler in Linux
Linux uses CFS (Completely Fair Scheduler).
Conceptually similar to advanced round-robin but weighted:
Each process gets fair share of CPU
Based on virtual runtime
Unlike RTOS:
- Not deterministic
- Optimized for fairness, not strict deadlines
Hardware Requirement
- CPU with MMU
- Large RAM (MBs to GBs)
- Flash storage
- Often Cortex-A class processors
Example boards:
- Raspberry Pi
- BeagleBone Black
Summary
| Feature | Bare-Metal | RTOS | Embedded Linux |
|---|---|---|---|
| Memory | KB | 10s-100s KB | MB-GB |
| Scheduler | No | Yes | Yes |
| MMU | No | Usually No | Yes |
| Determinism | High | Very High | Medium |
| Security Isolation | None | Limited | Strong |
| Boot Time | Very Fast | Fast | Slower |
| Complexity | Low | Medium | High |
Design Decision Flow
Is system simple?
|
+-- YES → Bare-Metal
|
+-- NO →
|
Is strict real-time needed?
|
+-- YES → RTOS
|
+-- NO →
|
Need networking / GUI / complex stack?
|
+-- YES → Linux
|
+-- NO → RTOS
Super Loop:
Rotation by function return
Interrupt Round Robin:
Rotation by timer force
Priority RTOS:
Rotation by urgency + timer
Linux:
Rotation by fairness + virtual runtime