ES: Embedded System Design

Designing an embedded system is not just about choosing a microcontroller.It is about choosing the software architecture model that fits the system constraints, safety level, timing requirements, memory size, power consumption, and security exposure.

In embedded design, we typically classify software architecture into three main models:

  • Bare-Metal (Super Loop)
  • RTOS-Based Systems
  • Embedded Linux / Generic OS Systems

When we design an embedded system, we are not just writing code — we are defining how the system thinks, reacts, schedules, isolates, and protects itself.

The architectural choice usually evolves like this:

md
Simple control system
        ↓
Growing complexity
        ↓
Concurrent activities
        ↓
Need for scheduling
        ↓
Need for isolation + networking + UI
        ↓
Full OS (Linux)

Let us walk step by step — and see how each design naturally leads to the next.


Bare-Metal Systems (Super Loop Architecture)

Bare-metal is where every embedded engineer begins.

Bare-metal means:

The firmware runs directly on hardware without any operating system.

There is:

  • No operating system
  • No scheduler
  • No memory protection
  • No abstraction layer
  • Direct register-level control

The firmware runs directly after reset.

The CPU executes a single main() loop forever.

md
          +-------------------+
Power →   |   Reset Vector    |
          +---------+---------+
                    |
                    v
          +-------------------+
          |   Startup Code    |
          | (Init Stack/BSS)  |
          +---------+---------+
                    |
                    v
          +-------------------+
          |      main()       |
          +---------+---------+
					|
					v
		  +----------------------+
		  |      while(1)        |
		  |----------------------|
		  | read_sensors()       |
		  | process_data()       |
		  | update_outputs()     |
		  +----------------------+

cpp
int main(void)
{
    init_peripherals();

    while(1)
    {
		Task_A();
		Task_B();
		Task_C();
    }
}

md
Time →
| A | B | C | A | B | C | A | ...

Everything runs sequentially.

At Bare-metal:

  • Hardware is simple
  • Timing is strict
  • Memory is tiny
  • Power must be minimal
  • Cost must be lowest possible
  • Ultra-low power
  • No need for file system
  • No need for OTA

If your system does:

  • Read sensor
  • Apply control law
  • Drive output

You do NOT need a scheduler.

When Bare-Metal Becomes Dangerous

Problems appear when:

Blocking Code

cpp
read_uart();  // waits here...

While waiting:

  • Other tasks starve
  • Timing jitter increases
  • It looks like rotation.

    But here is the key:

There is NO time control.

Each task runs until it finishes.

If Task B takes longer:

md
| A | BBBBBBBBB | C | A | ...

State Explosion

when you start writing:

cpp
if(flag1)
if(flag2)
if(flag3)
if(timeout_expired)

The system becomes:

md
                +-------------------+
                |    Global Flags   |
                +-------------------+
                        |
        ---------------------------------
        |        |         |           |
     Sensor   UART      Timer      Display

Everything depends on shared globals.

No isolation. No priority.

There is a solution for it which is using the Interrupt

Interrupt Extension

To improve responsiveness, interrupts are used:

md
                CPU
                 |
        -------------------
        |                 |
    Main Loop        Interrupt
                        |
                        v
                    ISR()

Example:

External Interrupt

cpp
void EXTI_IRQHandler(void)
{
    button_pressed = 1;
}

Characteristics

  • Deterministic (if well designed)
  • Extremely small footprint (few KB)
  • No context switching overhead
  • Full hardware control

Limitations

  • Hard to scale
  • Blocking functions break timing
  • No isolation
  • No memory protection
  • Complex state management

Real Examples

  • Microchip Technology PIC firmware
  • STMicroelectronics STM32 simple control loops
  • Small motor drivers
  • Sensor nodes
  • Low-cost IoT devices

Interrupts solve one specific problem:

React immediately to an external event.

md
Time →
| Read | Control | ----INT---- | Display |

Latency becomes microseconds instead of milliseconds.

Great.

But now a new problem appears.

Too Many Interrupts

Imagine:

  • UART interrupt
  • Timer interrupt
  • ADC interrupt
  • GPIO interrupt
  • Communication interrupt

System becomes:

md
                +------------------+
                |      CPU         |
                +--------+---------+
                         |
      -----------------------------------------
      |         |         |        |          |
    Timer     UART       ADC      GPIO     Main Loop

If ISRs become heavy:

  • Nested interrupts
  • Stack overflow risk
  • Hard to debug timing
  • Priority inversion
  • Unpredictable jitter

So best practice says:

Keep ISR short. Do the work elsewhere.

So what do we do?

We move work out of ISR into scheduled tasks.

But in bare-metal, there is no scheduler.

Now we need a mechanism to fairly execute multiple tasks.

This leads naturally to Round Robin scheduling.


Round Robin Scheduling

After interrupts, system looks like:

cpp
while(1)
{
    if(taskA_flag) run_taskA();
    if(taskB_flag) run_taskB();
    if(taskC_flag) run_taskC();
}

But this still has ordering bias.

If Task A is heavy, B and C starve.

So engineers ask:

Can we give each task equal CPU time?

That is Round Robin.

All tasks share CPU equally.

md
Time Slice = 5 ms

Execution:

md
| Task A | Task B | Task C | Task A | Task B | Task C |

Each task runs for fixed quantum.

Round Robin is usually driven by a timer interrupt.

Flow:

md
Timer Interrupt (every 5ms)
		|
		v
 Scheduler selects
 next task
		|
		v
Context Switch

Timeline:

md
Time →
| A | B | C | A | B | C |
     ^     ^     ^
     |     |     |
   Timer  Timer  Timer

Detailed view:

md
+---------------------+
| Timer ISR           |
|---------------------|
| Save current stack  |
| Select next task    |
| Load next stack     |
| Return from ISR     |
+---------------------+

This is the foundation of an RTOS.

Limitations of Pure Round Robin

  • What if Task A controls motor safety?
  • What if Task C updates LED?
  • Both get equal CPU time.
  • That is wrong.

Safety task must run first.

This leads to:

Priority-Based Scheduling

which will move us from Round Robin to Preemptive Priority RTOS

Instead of equal share:

md
High Priority Task → runs immediately
Low Priority Task  → waits

Execution:

md
| Low | Low | HIGH | HIGH | Low |

Now system becomes:

  • Deterministic
  • Structured
  • Real-time capable

This is full RTOS territory.


Real-Time Operating System (RTOS)

An RTOS provides:

  • Task scheduler
  • Context switching
  • Inter-task communication
  • Timers
  • Synchronization primitives
  • Deterministic scheduling

Popular examples:

  • FreeRTOS
  • Zephyr
  • ThreadX

md
+-------------------+
|     Application   |
|-------------------|
|  Task A (High)    |
|  Task B (Medium)  |
|  Task C (Low)     |
+---------+---------+
		  |
		  v
+---------------------+
|      RTOS Kernel    |
|---------------------|
| Scheduler           |
| Context Switch      |
| Queues / Semaphores |
| IPC                 |
| Timers              |
+---------+-----------+
		  |
		  v
+-------------------+
|     Hardware      |
+-------------------+

The scheduler decides:

Who runs now?

Different strategies exist.

Example in FreeRTOS:

cpp
xTaskCreate(Task1, "Task1", 1024, NULL, 2, NULL);
vTaskStartScheduler();

Scheduler

  • Cooperative Scheduling
  • Preemptive Scheduling
  • Priority-based
  • Round-robin

Cooperative Scheduling

Tasks yield manually:

cpp
while(1)
{
    work();
    yield();
}

Used when:

  • Simple systems
  • Predictable task behavior
  • Lower overhead desired

Preemptive Scheduling

Higher priority interrupts lower priority.

md
Task C running (low)
        ↓
Task A becomes ready (high)
        ↓
Immediate preemption

Time Line:

md
| C | C | A | A | C | ...

Used when:

  • Hard real-time tasks exist
  • Deadlines must be guaranteed

RTOS is Necessary

When:

  • 3 concurrent activities
  • Need task prioritization
  • Communication stack present
  • BLE / TCP/IP stack used
  • Deterministic latency required
  • Firmware update logic exists

Typical memory:

  • RAM: 64 KB – 512 KB
  • Flash: 256 KB – 2 MB
  • Medium complexity systems

Real Examples

  • ESP32 IoT firmware
  • Automotive ECUs
  • Medical devices
  • Industrial controllers

When RTOS start to Fail

When true process isolation needed, in RTOS all tasks share same memory:

md
+-----------------------------+
| Task A | Task B | Task C    |
| Shared Heap / Globals       |
+-----------------------------+

A buffer overflow in Task C can corrupt Task A.

When Complex Network needed:

  • TLS
  • HTTPS
  • File system
  • SQLite
  • GUI framework
  • Video streaming

RTOS becomes difficult to maintain.

Then we move to a full OS.


Generic OS

When system complexity becomes high, we move to a full OS like:

  • Embedded Linux
  • Android

Embedded Linux

A customized Linux distribution running on embedded hardware.

Linux exists in embedded systems because:

  • Systems became mini computers
  • Connectivity is default
  • Security regulations increased
  • OTA is mandatory
  • UI expectations increased

Includes:

  • Kernel
  • Drivers
  • User space
  • File system
  • Network stack
  • Process management
  • Memory protection (MMU)

md
+----------------------------------+
|           User Space             |
|----------------------------------|
| App1 | App2 | App3 | App4        |
+------------------+---------------+
				   |
				   v
+----------------------------------+
|          Linux Kernel            |
|----------------------------------|
| Scheduler (CFS)                  |
| Virtual Memory Manager           |
| TCP/IP Stack                     |
| Drivers                          |
+------------------+---------------+
				   |
				   v
				Hardware

It provides:

  • Process isolation
  • Virtual memory
  • User/Kernel privilege separation
  • Advanced scheduler
  • Secure boot chains
  • Package management
  • Containers

Scheduler in Linux

Linux uses CFS (Completely Fair Scheduler).

Conceptually similar to advanced round-robin but weighted:

md
Each process gets fair share of CPU
Based on virtual runtime

Unlike RTOS:

  • Not deterministic
  • Optimized for fairness, not strict deadlines

Hardware Requirement

  • CPU with MMU
  • Large RAM (MBs to GBs)
  • Flash storage
  • Often Cortex-A class processors

Example boards:

  • Raspberry Pi
  • BeagleBone Black

Summary

FeatureBare-MetalRTOSEmbedded Linux
MemoryKB10s-100s KBMB-GB
SchedulerNoYesYes
MMUNoUsually NoYes
DeterminismHighVery HighMedium
Security IsolationNoneLimitedStrong
Boot TimeVery FastFastSlower
ComplexityLowMediumHigh

Design Decision Flow

md
Is system simple?
        |
        +-- YES → Bare-Metal
        |
        +-- NO →
              |
              Is strict real-time needed?
                    |
                    +-- YES → RTOS
                    |
                    +-- NO →
                          |
                          Need networking / GUI / complex stack?
                                  |
                                  +-- YES → Linux
                                  |
                                  +-- NO → RTOS

Super Loop:

Rotation by function return

Interrupt Round Robin:

Rotation by timer force

Priority RTOS:

Rotation by urgency + timer

Linux:

Rotation by fairness + virtual runtime