What Technical Interviews in Distributed Systems Actually Test

Modern backend engineering increasingly revolves around distributed systems.
As a consequence, many technical interviews — even for senior and leadership roles — are designed around deceptively simple scenarios: a text editor, a counter, a cart, a document, a status update.

Then the interviewer asks:

“Why did the system end up with the wrong value?”

Very often, the correct answer is not about architecture diagrams, microservices, or cloud providers.

It is about concurrency.

Below are some of the core concepts these interviews tend to probe, and why they matter in real systems.

1. Race Conditions: The Default State of Distributed Systems

A race condition occurs when multiple operations access and modify shared state concurrently, and the final result depends on the timing of execution rather than the logical order of events.

Consider a simple pattern:

read current value
apply change
write new value

If two requests execute simultaneously across two backend instances, both may read the same previous value and overwrite each other.

This is known as the lost update problem.

The system did not crash.
No exception occurred.
Every operation “succeeded”.

Yet the state is incorrect.

This is one of the most common real production bugs in multi-instance services.

2. The Illusion of Ordering

Engineers often intuitively assume that requests arrive and are processed in order.

In practice:

clients retry
networks reorder packets
load balancers distribute requests
UDP is unordered
mobile devices reconnect
clocks differ

The system does not process “events”.
It processes arrivals.

These are not the same thing.

A later user action can be processed before an earlier one.
Without safeguards, the system may persist an older state after a newer one.

3. Why “Read Then Write” Is Dangerous

Many naive implementations rely on:

SELECT state
compute new state
UPDATE state

In a single-threaded program this is safe.

In distributed systems, this is a critical section — but there is no lock.

Two processes can execute this sequence simultaneously and overwrite each other.
This is not a performance issue. It is a correctness issue.

Scaling stateless services horizontally amplifies this risk because concurrency increases with capacity.

4. Typical Solutions

There is no single universal fix. Instead, systems use different consistency strategies.

4.1 Optimistic Concurrency Control (Versioning)

Each record carries a version:

UPDATE document
SET content = ?, version = version + 1
WHERE id = ? AND version = ?

Only one writer succeeds. Others must retry.

This is effectively a compare-and-swap (CAS).

Widely used in:

relational databases
DynamoDB conditional writes
document stores

It prevents lost updates without heavy locking.

4.2 Idempotency

Requests should be safe to repeat.

If the same operation arrives twice (retries, network duplication), the system should not produce a different result.

This is essential in:

payment systems
event consumers
APIs behind unreliable networks

Idempotency keys or operation identifiers allow systems to detect duplicates.

4.3 Event Ordering

Sometimes the state should reflect the latest logical event, not the last write.

Solutions include:

timestamps (careful: clocks drift)
logical clocks
sequence numbers per entity
monotonic versioning

The key insight:
Last write ≠ most recent action.

4.4 Serialization via Queues

Instead of multiple concurrent writers:

clients → queue → single consumer → database

Queues provide ordering and eliminate write races at the cost of latency and throughput constraints.

Common in:

collaborative editing
inventory systems
financial ledgers

5. Why Timestamps Alone Are Not Enough

A common first instinct is to store timestamps and keep the “latest”.

This works only if:

clocks are synchronized
events are monotonic
no client is offline
no retries occur

In real systems:

client clocks lie
mobile reconnects happen
messages are delayed

Relying solely on timestamps often replaces a race condition with a time consistency bug.

6. Databases Do Not Automatically Save You

Developers often assume the database guarantees correctness.

Databases guarantee atomicity per operation, not per workflow.

This is atomic:

UPDATE row SET value = 5

This is not:

READ row
MODIFY
WRITE row

Without isolation (locks or conditional updates), the database cannot detect a logical conflict.

The bug lives above the database layer.

7. What These Questions Really Evaluate

Concurrency questions in interviews are not about memorizing definitions.

They evaluate whether you understand:

the difference between scaling and correctness
state vs events
arrival vs ordering
retries vs duplicates
atomic operations vs atomic workflows

In other words:

Do you design systems assuming the network is unreliable and multiple things happen at once?

Because in production, they always do.

8. A Useful Mental Model

Single-machine programming assumes:

“Things happen one after another.”

Distributed systems require assuming:

“Everything happens at the same time, out of order, and at least once.”

Once you adopt this model, many design decisions change:

APIs
database writes
caching
retries
message processing

Concurrency is not an edge case.
It is the baseline.

Closing Thoughts

Many system design discussions focus on scale, cloud architecture, and service boundaries.

But some of the most critical failures in real systems come from simpler issues:
two valid operations interacting in an invalid way.

Before worrying about microservices, queues, or multi-region deployments, systems must answer a more fundamental question:

What happens when two users change the same thing at the same time?

The answer to that question often defines whether a system is merely scalable — or actually correct.

What Technical Interviews in Distributed Systems Actually Test

1. Race Conditions: The Default State of Distributed Systems

2. The Illusion of Ordering

3. Why “Read Then Write” Is Dangerous

4. Typical Solutions

4.1 Optimistic Concurrency Control (Versioning)

4.2 Idempotency

4.3 Event Ordering

4.4 Serialization via Queues

5. Why Timestamps Alone Are Not Enough

6. Databases Do Not Automatically Save You

7. What These Questions Really Evaluate

8. A Useful Mental Model

Closing Thoughts

More from this blog

When the Message “Disappears” : A Production-Focused Guide Using AWS SQS

Java 21 in Distributed Systems: Bounded Concurrency, Deadlines, and Failure Containment

The Operational Cost of LLM APIs

Why AI Features Are Becoming Reliability Problems

Command Palette

1. Race Conditions: The Default State of Distributed Systems

2. The Illusion of Ordering

3. Why “Read Then Write” Is Dangerous

4. Typical Solutions

4.1 Optimistic Concurrency Control (Versioning)

4.2 Idempotency

4.3 Event Ordering

4.4 Serialization via Queues

5. Why Timestamps Alone Are Not Enough

6. Databases Do Not Automatically Save You

7. What These Questions Really Evaluate

8. A Useful Mental Model

Closing Thoughts

More from this blog