Not All Race Conditions Are Threads — Race Conditions in Distributed Systems
When engineers hear “race condition”, most imagine two threads modifying the same variable.
That is the smallest version of the problem.
In distributed systems, race conditions are far more dangerous because they don’t depend on shared memory.
They depend on time, ordering and partial knowledge.
No locks.
No stack traces.
No deterministic reproduction.
And the system can be perfectly healthy from an infrastructure perspective.
This post is about the kinds of race conditions that actually appear in production backend systems.
1) The Double-Execution Race (Duplicate Processing)
This is the most common distributed race condition.
A worker processes a message.
The message broker doesn’t receive the acknowledgement in time.
The broker redelivers.
Now two workers execute the same operation.
Typical scenario:
order creation
payment capture
email sending
inventory reservation
coupon redemption
Nothing crashed.
The system did exactly what it was designed to do: at-least-once delivery.
But the business operation was not idempotent.
What makes this dangerous
The second execution is not a retry from the same process.
It is a concurrent logical operation.
You now have:
two payment captures
two shipments
two state transitions
inconsistent accounting
And logs look completely valid.
Typical mistaken fixes
increasing visibility timeout
reducing consumer concurrency
adding delays
Those reduce probability, not the race.
Real fix
You need idempotency at the business boundary, not at the infrastructure layer.
Examples:
idempotency keys stored with unique constraints
operation tokens
deduplication tables
state transition guards
The system must be able to answer:
“Has this operation already been logically completed?”
Not:
“Has this message already been seen by this worker?”
2) The Lost Update Race (Concurrent Writers)
Two services read the same entity state and both decide to modify it.
Timeline:
Service A reads balance = 100
Service B reads balance = 100
A subtracts 40 → writes 60
B subtracts 80 → writes 20
Final state: 20
Correct state: −20 or rejected
No conflicts detected.
Database behaved correctly.
This happens frequently with:
wallets
inventory
quotas
rate limits
seat reservations
Why transactions don’t automatically save you
Because both transactions are individually valid.
The race is between reads, not writes.
Correct approaches
optimistic locking (version column)
compare-and-swap updates
conditional writes
atomic database operations
append-only ledgers instead of mutable state
The real solution is not stronger transactions.
It is state transition control.
3) The Out-of-Order Event Race
Distributed systems do not guarantee global ordering.
Even Kafka does not — only per partition.
Typical example:
OrderCancelled`Order