ES: Embedded System Design | Amr Tarek

In embedded design, we typically classify software architecture into three main models:

Bare-Metal (Super Loop)
RTOS-Based Systems
Embedded Linux / Generic OS Systems

When we design an embedded system, we are not just writing code — we are defining how the system thinks, reacts, schedules, isolates, and protects itself.

The architectural choice usually evolves like this:

Simple control system
        ↓
Growing complexity
        ↓
Concurrent activities
        ↓
Need for scheduling
        ↓
Need for isolation + networking + UI
        ↓
Full OS (Linux)

Let us walk step by step — and see how each design naturally leads to the next.

Bare-Metal Systems (Super Loop Architecture)

Bare-metal is where every embedded engineer begins.

Bare-metal means:

The firmware runs directly on hardware without any operating system.

There is:

No operating system
No scheduler
No memory protection
No abstraction layer
Direct register-level control

The firmware runs directly after reset.

The CPU executes a single main() loop forever.

          +-------------------+
Power →   |   Reset Vector    |
          +---------+---------+
                    |
                    v
          +-------------------+
          |   Startup Code    |
          | (Init Stack/BSS)  |
          +---------+---------+
                    |
                    v
          +-------------------+
          |      main()       |
          +---------+---------+
					|
					v
		  +----------------------+
		  |      while(1)        |
		  |----------------------|
		  | read_sensors()       |
		  | process_data()       |
		  | update_outputs()     |
		  +----------------------+

int main(void)
{
    init_peripherals();

    while(1)
    {
		Task_A();
		Task_B();
		Task_C();
    }
}

Time →
| A | B | C | A | B | C | A | ...

Everything runs sequentially.

At Bare-metal:

Hardware is simple
Timing is strict
Memory is tiny
Power must be minimal
Cost must be lowest possible
Ultra-low power
No need for file system
No need for OTA

If your system does:

Read sensor
Apply control law
Drive output

You do NOT need a scheduler.

When Bare-Metal Becomes Dangerous

Problems appear when:

Blocking Code

read_uart();  // waits here...

While waiting:

Other tasks starve
Timing jitter increases

It looks like rotation.

But here is the key:

There is NO time control.

Each task runs until it finishes.

If Task B takes longer:

| A | BBBBBBBBB | C | A | ...

State Explosion

when you start writing:

if(flag1)
if(flag2)
if(flag3)
if(timeout_expired)

The system becomes:

                +-------------------+
                |    Global Flags   |
                +-------------------+
                        |
        ---------------------------------
        |        |         |           |
     Sensor   UART      Timer      Display

Everything depends on shared globals.

No isolation. No priority.

There is a solution for it which is using the Interrupt

Interrupt Extension

To improve responsiveness, interrupts are used:

                CPU
                 |
        -------------------
        |                 |
    Main Loop        Interrupt
                        |
                        v
                    ISR()

Example:

External Interrupt

void EXTI_IRQHandler(void)
{
    button_pressed = 1;
}

Characteristics

Deterministic (if well designed)
Extremely small footprint (few KB)
No context switching overhead
Full hardware control

Limitations

Hard to scale
Blocking functions break timing
No isolation
No memory protection
Complex state management

Real Examples

Microchip Technology PIC firmware
STMicroelectronics STM32 simple control loops
Small motor drivers
Sensor nodes
Low-cost IoT devices

Interrupts solve one specific problem:

React immediately to an external event.

Time →
| Read | Control | ----INT---- | Display |

Latency becomes microseconds instead of milliseconds.

Great.

But now a new problem appears.

Too Many Interrupts

Imagine:

UART interrupt
Timer interrupt
ADC interrupt
GPIO interrupt
Communication interrupt

System becomes:

                +------------------+
                |      CPU         |
                +--------+---------+
                         |
      -----------------------------------------
      |         |         |        |          |
    Timer     UART       ADC      GPIO     Main Loop

If ISRs become heavy:

Nested interrupts
Stack overflow risk
Hard to debug timing
Priority inversion
Unpredictable jitter

So best practice says:

Keep ISR short. Do the work elsewhere.

So what do we do?

We move work out of ISR into scheduled tasks.

But in bare-metal, there is no scheduler.

Now we need a mechanism to fairly execute multiple tasks.

This leads naturally to Round Robin scheduling.

Round Robin Scheduling

After interrupts, system looks like:

while(1)
{
    if(taskA_flag) run_taskA();
    if(taskB_flag) run_taskB();
    if(taskC_flag) run_taskC();
}

But this still has ordering bias.

If Task A is heavy, B and C starve.

So engineers ask:

Can we give each task equal CPU time?

That is Round Robin.

All tasks share CPU equally.

Time Slice = 5 ms

Execution:

| Task A | Task B | Task C | Task A | Task B | Task C |

Each task runs for fixed quantum.

Round Robin is usually driven by a timer interrupt.

Flow:

Timer Interrupt (every 5ms)
		|
		v
 Scheduler selects
 next task
		|
		v
Context Switch

Timeline:

Time →
| A | B | C | A | B | C |
     ^     ^     ^
     |     |     |
   Timer  Timer  Timer

Detailed view:

+---------------------+
| Timer ISR           |
|---------------------|
| Save current stack  |
| Select next task    |
| Load next stack     |
| Return from ISR     |
+---------------------+

This is the foundation of an RTOS.

Limitations of Pure Round Robin

What if Task A controls motor safety?
What if Task C updates LED?
Both get equal CPU time.

That is wrong.

Safety task must run first.

This leads to:

Priority-Based Scheduling

which will move us from Round Robin to Preemptive Priority RTOS

Instead of equal share:

High Priority Task → runs immediately
Low Priority Task  → waits

Execution:

| Low | Low | HIGH | HIGH | Low |

Now system becomes:

Deterministic
Structured
Real-time capable

This is full RTOS territory.

Real-Time Operating System (RTOS)

An RTOS provides:

Task scheduler
Context switching
Inter-task communication
Timers
Synchronization primitives
Deterministic scheduling

Popular examples:

FreeRTOS
Zephyr
ThreadX

+-------------------+
|     Application   |
|-------------------|
|  Task A (High)    |
|  Task B (Medium)  |
|  Task C (Low)     |
+---------+---------+
		  |
		  v
+---------------------+
|      RTOS Kernel    |
|---------------------|
| Scheduler           |
| Context Switch      |
| Queues / Semaphores |
| IPC                 |
| Timers              |
+---------+-----------+
		  |
		  v
+-------------------+
|     Hardware      |
+-------------------+

The scheduler decides:

Who runs now?

Different strategies exist.

Example in FreeRTOS:

xTaskCreate(Task1, "Task1", 1024, NULL, 2, NULL);
vTaskStartScheduler();

Scheduler

Cooperative Scheduling
Preemptive Scheduling
Priority-based
Round-robin

Cooperative Scheduling

Tasks yield manually:

while(1)
{
    work();
    yield();
}

Used when:

Simple systems
Predictable task behavior
Lower overhead desired

Preemptive Scheduling

Higher priority interrupts lower priority.

Task C running (low)
        ↓
Task A becomes ready (high)
        ↓
Immediate preemption

Time Line:

| C | C | A | A | C | ...

Used when:

Hard real-time tasks exist
Deadlines must be guaranteed

RTOS is Necessary

When:

3 concurrent activities
Need task prioritization
Communication stack present
BLE / TCP/IP stack used
Deterministic latency required
Firmware update logic exists

Typical memory:

RAM: 64 KB – 512 KB
Flash: 256 KB – 2 MB
Medium complexity systems

Real Examples

ESP32 IoT firmware
Automotive ECUs
Medical devices
Industrial controllers

When RTOS start to Fail

When true process isolation needed, in RTOS all tasks share same memory:

+-----------------------------+
| Task A | Task B | Task C    |
| Shared Heap / Globals       |
+-----------------------------+

A buffer overflow in Task C can corrupt Task A.

When Complex Network needed:

TLS
HTTPS
File system
SQLite
GUI framework
Video streaming

RTOS becomes difficult to maintain.

Then we move to a full OS.

Generic OS

When system complexity becomes high, we move to a full OS like:

Embedded Linux
Android

Embedded Linux

A customized Linux distribution running on embedded hardware.

Linux exists in embedded systems because:

Systems became mini computers
Connectivity is default
Security regulations increased
OTA is mandatory
UI expectations increased

Includes:

Kernel
Drivers
User space
File system
Network stack
Process management
Memory protection (MMU)

+----------------------------------+
|           User Space             |
|----------------------------------|
| App1 | App2 | App3 | App4        |
+------------------+---------------+
				   |
				   v
+----------------------------------+
|          Linux Kernel            |
|----------------------------------|
| Scheduler (CFS)                  |
| Virtual Memory Manager           |
| TCP/IP Stack                     |
| Drivers                          |
+------------------+---------------+
				   |
				   v
				Hardware

It provides:

Process isolation
Virtual memory
User/Kernel privilege separation
Advanced scheduler
Secure boot chains
Package management
Containers

Scheduler in Linux

Linux uses CFS (Completely Fair Scheduler).

Conceptually similar to advanced round-robin but weighted:

Each process gets fair share of CPU
Based on virtual runtime

Unlike RTOS:

Not deterministic
Optimized for fairness, not strict deadlines

Hardware Requirement

CPU with MMU
Large RAM (MBs to GBs)
Flash storage
Often Cortex-A class processors

Example boards:

Raspberry Pi
BeagleBone Black

Summary

Feature	Bare-Metal	RTOS	Embedded Linux
Memory	KB	10s-100s KB	MB-GB
Scheduler	No	Yes	Yes
MMU	No	Usually No	Yes
Determinism	High	Very High	Medium
Security Isolation	None	Limited	Strong
Boot Time	Very Fast	Fast	Slower
Complexity	Low	Medium	High

Design Decision Flow

Is system simple?
        |
        +-- YES → Bare-Metal
        |
        +-- NO →
              |
              Is strict real-time needed?
                    |
                    +-- YES → RTOS
                    |
                    +-- NO →
                          |
                          Need networking / GUI / complex stack?
                                  |
                                  +-- YES → Linux
                                  |
                                  +-- NO → RTOS

Super Loop:

Rotation by function return

Interrupt Round Robin:

Rotation by timer force

Priority RTOS:

Rotation by urgency + timer

Linux:

Rotation by fairness + virtual runtime