What Technical Interviews in Distributed Systems Actually Test
Modern backend engineering increasingly revolves around distributed systems.
As a consequence, many technical interviews — even for senior and leadership roles — are designed around deceptively simple scenarios: a text editor, a counter, a cart, a document, a status update.
Then the interviewer asks:
“Why did the system end up with the wrong value?”
Very often, the correct answer is not about architecture diagrams, microservices, or cloud providers.
It is about concurrency.
Below are some of the core concepts these interviews tend to probe, and why they matter in real systems.
1. Race Conditions: The Default State of Distributed Systems
A race condition occurs when multiple operations access and modify shared state concurrently, and the final result depends on the timing of execution rather than the logical order of events.
Consider a simple pattern:
read current value
apply change
write new value
If two requests execute simultaneously across two backend instances, both may read the same previous value and overwrite each other.
This is known as the lost update problem.
The system did not crash.
No exception occurred.
Every operation “succeeded”.
Yet the state is incorrect.
This is one of the most common real production bugs in multi-instance services.
2. The Illusion of Ordering
Engineers often intuitively assume that requests arrive and are processed in order.
In practice:
clients retry
networks reorder packets
load balancers distribute requests
UDP is unordered
mobile devices reconnect
clocks differ
The system does not process “events”.
It processes arrivals.
These are not the same thing.
A later user action can be processed before an earlier one.
Without safeguards, the system may persist an older state after a newer one.
3. Why “Read Then Write” Is Dangerous
Many naive implementations rely on:
SELECT state
compute new state
UPDATE state
In a single-threaded program this is safe.
In distributed systems, this is a critical section — but there is no lock.
Two processes can execute this sequence simultaneously and overwrite each other.
This is not a performance issue. It is a correctness issue.
Scaling stateless services horizontally amplifies this risk because concurrency increases with capacity.
4. Typical Solutions
There is no single universal fix. Instead, systems use different consistency strategies.
4.1 Optimistic Concurrency Control (Versioning)
Each record carries a version:
UPDATE document
SET content = ?, version = version + 1
WHERE id = ? AND version = ?
Only one writer succeeds. Others must retry.
This is effectively a compare-and-swap (CAS).
Widely used in:
relational databases
DynamoDB conditional writes
document stores
It prevents lost updates without heavy locking.
4.2 Idempotency
Requests should be safe to repeat.
If the same operation arrives twice (retries, network duplication), the system should not produce a different result.
This is essential in:
payment systems
event consumers
APIs behind unreliable networks
Idempotency keys or operation identifiers allow systems to detect duplicates.
4.3 Event Ordering
Sometimes the state should reflect the latest logical event, not the last write.
Solutions include:
timestamps (careful: clocks drift)
logical clocks
sequence numbers per entity
monotonic versioning
The key insight:
Last write ≠ most recent action.
4.4 Serialization via Queues
Instead of multiple concurrent writers:
clients → queue → single consumer → database
Queues provide ordering and eliminate write races at the cost of latency and throughput constraints.
Common in:
collaborative editing
inventory systems
financial ledgers
5. Why Timestamps Alone Are Not Enough
A common first instinct is to store timestamps and keep the “latest”.
This works only if:
clocks are synchronized
events are monotonic
no client is offline
no retries occur
In real systems:
client clocks lie
mobile reconnects happen
messages are delayed
Relying solely on timestamps often replaces a race condition with a time consistency bug.
6. Databases Do Not Automatically Save You
Developers often assume the database guarantees correctness.
Databases guarantee atomicity per operation, not per workflow.
This is atomic:
UPDATE row SET value = 5
This is not:
READ row
MODIFY
WRITE row
Without isolation (locks or conditional updates), the database cannot detect a logical conflict.
The bug lives above the database layer.
7. What These Questions Really Evaluate
Concurrency questions in interviews are not about memorizing definitions.
They evaluate whether you understand:
the difference between scaling and correctness
state vs events
arrival vs ordering
retries vs duplicates
atomic operations vs atomic workflows
In other words:
Do you design systems assuming the network is unreliable and multiple things happen at once?
Because in production, they always do.
8. A Useful Mental Model
Single-machine programming assumes:
“Things happen one after another.”
Distributed systems require assuming:
“Everything happens at the same time, out of order, and at least once.”
Once you adopt this model, many design decisions change:
APIs
database writes
caching
retries
message processing
Concurrency is not an edge case.
It is the baseline.
Closing Thoughts
Many system design discussions focus on scale, cloud architecture, and service boundaries.
But some of the most critical failures in real systems come from simpler issues:
two valid operations interacting in an invalid way.
Before worrying about microservices, queues, or multi-region deployments, systems must answer a more fundamental question:
What happens when two users change the same thing at the same time?
The answer to that question often defines whether a system is merely scalable — or actually correct.