State Machines That Survive Noise

State Machines That Survive Noise

A lot of embedded bugs are not algorithm failures. They are state-management failures under imperfect signals. Inputs bounce, clocks drift, interrupts cluster, and peripherals report transitional nonsense. Firmware that assumes clean edges and ideal timing eventually fails in the field where noise is normal.

Robust systems treat noise as a design input, not a test surprise.

Why finite state machines still win

State machines are sometimes dismissed as “old-school” in modern embedded stacks. That is a mistake. They remain one of the best tools for making behavior explicit under uncertainty:

  • legal transitions are visible
  • invalid transitions can be handled deliberately
  • timeout behavior is encoded, not implied
  • recovery paths are first-class

Most importantly, state machines force you to name ambiguous phases that ad-hoc boolean logic usually hides.

A practical pattern: event queue + transition table

A resilient architecture separates interrupt capture from policy:

  1. ISR captures minimal event.
  2. Main loop dequeues event.
  3. Transition function updates state.
  4. Output actions run from resulting state.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
typedef enum { ST_IDLE, ST_ARMED, ST_ACTIVE, ST_FAULT } state_t;
typedef enum { EV_EDGE, EV_TIMEOUT, EV_CRC_FAIL, EV_RESET } event_t;

state_t step(state_t s, event_t e) {
  switch (s) {
    case ST_IDLE:   return (e == EV_EDGE)    ? ST_ARMED  : ST_IDLE;
    case ST_ARMED:  return (e == EV_TIMEOUT) ? ST_ACTIVE : (e == EV_CRC_FAIL ? ST_FAULT : ST_ARMED);
    case ST_ACTIVE: return (e == EV_RESET)   ? ST_IDLE   : ST_ACTIVE;
    case ST_FAULT:  return (e == EV_RESET)   ? ST_IDLE   : ST_FAULT;
  }
  return ST_FAULT;
}

This is intentionally simple. Complexity belongs in explicit transitions, not in hidden timing side effects.

Debounce is a state problem, not just delay

Naive debounce logic (delay then read) often passes bench tests and fails with variable load. Better approach:

  • maintain input state
  • require stable duration threshold
  • transition only when threshold satisfied

This aligns with Debouncing with Time and State and extends it into full system behavior.

Timeouts are architectural, not patchwork

Every state that waits on external behavior should define timeout semantics:

  • what timeout means
  • whether retry is allowed
  • max retry budget
  • fallback state

Undefined timeout behavior is one of the most expensive firmware ambiguities in production debugging.

Top-aligned diagnostics in firmware logs

When logging transitions, keep entries normalized:

ts | old_state | event | new_state | action | error_code

This format turns logs into analyzable traces instead of prose fragments. You can then diff expected transition sequences against observed ones in automated tests.

Guarding against interrupt storms

Interrupt storms can starve policy logic if ISR work is too heavy. Keep ISR minimal:

  • capture timestamp
  • capture source id
  • queue event
  • exit

Any parsing, retry decisions, or multi-step logic belongs in cooperative main-loop context where execution order is controlled.

Noise-aware testing strategy

A strong test suite includes adversarial input timing:

  1. burst edges near threshold boundaries
  2. delayed acknowledgments
  3. missing edges
  4. duplicate events
  5. out-of-order event injections

If your machine cannot survive these, it is not ready for hardware reality.

Cross references for this design style

These pieces describe the same principle at different layers: uncertainty is part of the interface contract.

Implementation details that pay off

  • Keep state enum in one header, shared by firmware and test harness.
  • Use explicit “unexpected event” handler, never silent ignore.
  • Version your transition table so behavior changes are reviewable.
  • Add build-time switch for transition tracing in debug builds.

This sounds procedural because reliability is procedural.

Final thought

Embedded systems do not get judged by elegance under ideal inputs. They get judged by behavior under messy electrical and timing conditions. State machines that survive noise are not conservative design. They are aggressive risk management.

If you are choosing between adding one more feature and hardening transitions around existing behavior, harden first. Field failures almost always happen at transitions, not in the center of stable states.

Document each state transition in one sentence that an on-call engineer can understand at 3 AM. If the sentence is unclear, the transition is probably underspecified in code as well.

2026-02-22