Debugging Eventual Consistency With A Deterministic “Outbox Storm” Simulator

I ran into a bug that felt haunted: data looked correct most of the time, then occasionally—usually right after a deploy or a load spike—it “snapped” into the wrong state for a few minutes. No errors in logs, no failed requests, just users seeing stale data long enough to file tickets.

What finally helped wasn’t a better dashboard. It was a mental model I could simulate: how an “outbox” (an event queue) can create an “outbox storm” that temporarily breaks the assumptions of the code that reads from a downstream store.

Below is a tiny deterministic simulator I built in Python that makes this failure mode obvious. I use it like a microscope: step through the system, watch the message backlog grow, and see how read-after-write assumptions collapse.

The mental model: “Outbox storms” happen when time is a participant

Many teams implement event-driven updates with an outbox: when you change a record in your primary database, you also store an “event message” in an outbox table in the same transaction. A background worker later forwards those outbox rows to a message bus (or directly to a consumer).

A simple (but often false) assumption is:

“After I write, the read model will be updated soon enough that my next read will be correct.”

Here’s the twist: the system has time dynamics. If the worker processing the outbox lags, the consumer reads will race against the backlog. That’s an outbox storm: the backlog grows faster than it drains, and downstream reads can observe an older world.

To see that clearly, I made a deterministic simulator.

The simulator: a primary write model + a delayed read model

I model three things:

Primary store: the source of truth (what you write to).
Outbox: event messages created on each write, waiting to be processed.
Read model: a denormalized projection built from events, but with processing delay.

The system loop is simple:

Each “tick” represents a unit of time.
Writes generate outbox messages.
Each tick, the worker processes a limited number of outbox messages (capacity).
The consumer updates the read model when messages are processed.
I also simulate “reads” that happen right after writes.

When outbox capacity drops for a few ticks, stale reads appear. That’s the outbox storm.

Working code (deterministic): step through the storm

from dataclasses import dataclass
from typing import Dict, List, Tuple


@dataclass
class OutboxEvent:
    order_id: str
    new_status: str
    # message "created at" tick for visibility
    created_tick: int


class OutboxStormSimulator:
    def __init__(
        self,
        worker_capacity_by_tick: Dict[int, int],
        tick_total: int,
    ):
        self.tick_total = tick_total

        # Primary source of truth: current status of each order
        self.primary: Dict[str, str] = {}

        # Outbox backlog: events waiting to be processed
        self.outbox: List[OutboxEvent] = []

        # Downstream read model: status as of last processed event
        self.read_model: Dict[str, str] = {}

        # Worker capacity: how many outbox events can be processed per tick
        self.worker_capacity_by_tick = worker_capacity_by_tick

        # For debugging: record mismatches between primary and read_model
        self.mismatches: List[Tuple[int, str, str, str]] = []

    def enqueue_write(self, tick: int, order_id: str, new_status: str) -> None:
        # This mimics the transactional behavior: primary write + outbox insert
        self.primary[order_id] = new_status
        self.outbox.append(OutboxEvent(order_id, new_status, tick))

    def process_outbox(self, tick: int) -> None:
        # Worker processes up to capacity; remaining events stay in the backlog
        capacity = self.worker_capacity_by_tick.get(tick, 0)
        processed = 0

        while processed < capacity and self.outbox:
            ev = self.outbox.pop(0)
            # Consumer updates the read model based on the event
            self.read_model[ev.order_id] = ev.new_status
            processed += 1

    def simulate_read_after_write(self, tick: int, order_id: str) -> None:
        # A "read" checks what the UI sees (read model) vs source of truth (primary)
        primary_status = self.primary.get(order_id)
        read_status = self.read_model.get(order_id)

        # If read model hasn't caught up yet, read_status can be None or older
        if read_status != primary_status:
            self.mismatches.append((tick, order_id, primary_status, read_status))

    def run(self, writes: List[Tuple[int, str, str]], reads: List[Tuple[int, str]]) -> None:
        # Pre-load writes/reads into per-tick buckets
        writes_by_tick: Dict[int, List[Tuple[str, str]]] = {}
        for t, order_id, status in writes:
            writes_by_tick.setdefault(t, []).append((order_id, status))

        reads_by_tick: Dict[int, List[str]] = {}
        for t, order_id in reads:
            reads_by_tick.setdefault(t, []).append(order_id)

        for tick in range(self.tick_total + 1):
            # 1) Writes happen
            for order_id, status in writes_by_tick.get(tick, []):
                self.enqueue_write(tick, order_id, status)

            # 2) Worker processes outbox (possibly limited or stalled)
            self.process_outbox(tick)

            # 3) Reads happen (UI reads from read model)
            for order_id in reads_by_tick.get(tick, []):
                self.simulate_read_after_write(tick, order_id)

            # Optional: print a small trace for specific ticks
            # (kept minimal here, but still deterministic)
            if tick in (0, 1, 2, 3, 4, 5, 6, 7, 8):
                print(
                    f"tick={tick} "
                    f"primary={self.primary.get('A')} "
                    f"read_model={self.read_model.get('A')} "
                    f"outbox_backlog={len(self.outbox)}"
                )

    def report(self) -> None:
        print("\n--- MISMATCHES (stale reads) ---")
        for tick, order_id, primary_status, read_status in self.mismatches:
            print(
                f"tick={tick} order={order_id} "
                f"primary={primary_status} read_model={read_status}"
            )
        print(f"\nTotal mismatches: {len(self.mismatches)}")


def demo_outbox_storm() -> None:
    # Capacity is high at first, then drops to near zero for a few ticks (the storm).
    # This simulates a deploy, GC pause, noisy neighbor, DB slowdown, etc.
    worker_capacity_by_tick = {
        0: 2,
        1: 2,
        2: 0,  # storm begins: worker can't keep up
        3: 1,
        4: 0,  # storm persists
        5: 3,  # recovers
        6: 3,
        7: 3,
        8: 3,
    }

    sim = OutboxStormSimulator(worker_capacity_by_tick=worker_capacity_by_tick, tick_total=8)

    # Writes: multiple status transitions for the same order A in quick succession.
    # Each write enqueues an outbox event.
    writes = [
        (0, "A", "CREATED"),
        (1, "A", "PAID"),
        (2, "A", "PACKED"),
        (3, "A", "SHIPPED"),
        (4, "A", "DELIVERED"),
    ]

    # Reads: UI tries to read right after each write tick.
    reads = [
        (0, "A"),
        (1, "A"),
        (2, "A"),
        (3, "A"),
        (4, "A"),
        # and one later check to show convergence
        (6, "A"),
    ]

    sim.run(writes=writes, reads=reads)
    sim.report()


if __name__ == "__main__":
    demo_outbox_storm()

What each block is doing (and why it matters)

enqueue_write(...) updates the primary state and appends an OutboxEvent to the outbox. This is the whole point of the outbox pattern: you don’t “fire and forget” an event outside the transaction.
process_outbox(...) consumes outbox events at a fixed per-tick capacity. This is the “worker” behavior. When capacity drops, backlog grows.
simulate_read_after_write(...) compares what the UI reads (read_model) to what the system wrote (primary). That mismatch is the concrete symptom.

Run it: watch stale reads appear during capacity collapse

When I run the demo, the trace shows something like this (exact values depend only on the deterministic script):

At tick 0 and 1, the worker keeps up, so read_model matches primary.
At tick 2, capacity is 0, so outbox events pile up.
Reads at tick 2/3/4 occur while the read model is behind.
By tick 6, the worker catches up and the system converges.

That’s the mental model in action: the system’s behavior is governed by the queue dynamics between “write” and “projection update,” not just by code paths.

A concrete takeaway: the “correctness boundary” is queue health

This simulator made a counterintuitive thing feel obvious:

Your write path can be perfectly correct.
Your projection logic can be perfectly correct.
And you can still serve wrong answers temporarily because your read model is governed by backlog.

In other words, the mental model shifts from:

“Is my code wrong?”

to:

“Is my system fast enough (end-to-end) to meet the timing assumptions of the UI?”

That’s why incident response often lands on worker throughput, consumer lag, and backlog growth rates—not just application exceptions.

Closing thoughts

I learned to treat eventual consistency like a system with time and queue dynamics, not just a messaging detail. By simulating an outbox storm deterministically, I can literally see how capacity collapse creates stale reads—even when every individual component is “working.”