Skip to main content

Command Palette

Search for a command to run...

Not All Race Conditions Are Threads — Race Conditions in Distributed Systems

Updated
3 min read

When engineers hear “race condition”, most imagine two threads modifying the same variable.

That is the smallest version of the problem.

In distributed systems, race conditions are far more dangerous because they don’t depend on shared memory.
They depend on time, ordering and partial knowledge.

No locks.
No stack traces.
No deterministic reproduction.

And the system can be perfectly healthy from an infrastructure perspective.

This post is about the kinds of race conditions that actually appear in production backend systems.


1) The Double-Execution Race (Duplicate Processing)

This is the most common distributed race condition.

A worker processes a message.
The message broker doesn’t receive the acknowledgement in time.
The broker redelivers.

Now two workers execute the same operation.

Typical scenario:

  • order creation

  • payment capture

  • email sending

  • inventory reservation

  • coupon redemption

Nothing crashed.

The system did exactly what it was designed to do: at-least-once delivery.

But the business operation was not idempotent.

What makes this dangerous

The second execution is not a retry from the same process.
It is a concurrent logical operation.

You now have:

  • two payment captures

  • two shipments

  • two state transitions

  • inconsistent accounting

And logs look completely valid.

Typical mistaken fixes

  • increasing visibility timeout

  • reducing consumer concurrency

  • adding delays

Those reduce probability, not the race.

Real fix

You need idempotency at the business boundary, not at the infrastructure layer.

Examples:

  • idempotency keys stored with unique constraints

  • operation tokens

  • deduplication tables

  • state transition guards

The system must be able to answer:

“Has this operation already been logically completed?”

Not:

“Has this message already been seen by this worker?”


2) The Lost Update Race (Concurrent Writers)

Two services read the same entity state and both decide to modify it.

Timeline:

  1. Service A reads balance = 100

  2. Service B reads balance = 100

  3. A subtracts 40 → writes 60

  4. B subtracts 80 → writes 20

Final state: 20
Correct state: −20 or rejected

No conflicts detected.
Database behaved correctly.

This happens frequently with:

  • wallets

  • inventory

  • quotas

  • rate limits

  • seat reservations

Why transactions don’t automatically save you

Because both transactions are individually valid.

The race is between reads, not writes.

Correct approaches

  • optimistic locking (version column)

  • compare-and-swap updates

  • conditional writes

  • atomic database operations

  • append-only ledgers instead of mutable state

The real solution is not stronger transactions.

It is state transition control.


3) The Out-of-Order Event Race

Distributed systems do not guarantee global ordering.

Even Kafka does not — only per partition.

Typical example:

  1. OrderCancelled

  2. `Order

More from this blog

Leandro Maia

10 posts

Notes on Backend Systems and Software Architecture