Skip to main content

Command Palette

Search for a command to run...

What Technical Interviews in Distributed Systems Actually Test

Updated
5 min read

Modern backend engineering increasingly revolves around distributed systems.
As a consequence, many technical interviews — even for senior and leadership roles — are designed around deceptively simple scenarios: a text editor, a counter, a cart, a document, a status update.

Then the interviewer asks:

“Why did the system end up with the wrong value?”

Very often, the correct answer is not about architecture diagrams, microservices, or cloud providers.

It is about concurrency.

Below are some of the core concepts these interviews tend to probe, and why they matter in real systems.


1. Race Conditions: The Default State of Distributed Systems

A race condition occurs when multiple operations access and modify shared state concurrently, and the final result depends on the timing of execution rather than the logical order of events.

Consider a simple pattern:

read current value
apply change
write new value

If two requests execute simultaneously across two backend instances, both may read the same previous value and overwrite each other.

This is known as the lost update problem.

The system did not crash.
No exception occurred.
Every operation “succeeded”.

Yet the state is incorrect.

This is one of the most common real production bugs in multi-instance services.


2. The Illusion of Ordering

Engineers often intuitively assume that requests arrive and are processed in order.

In practice:

  • clients retry

  • networks reorder packets

  • load balancers distribute requests

  • UDP is unordered

  • mobile devices reconnect

  • clocks differ

The system does not process “events”.
It processes arrivals.

These are not the same thing.

A later user action can be processed before an earlier one.
Without safeguards, the system may persist an older state after a newer one.


3. Why “Read Then Write” Is Dangerous

Many naive implementations rely on:

SELECT state
compute new state
UPDATE state

In a single-threaded program this is safe.

In distributed systems, this is a critical section — but there is no lock.

Two processes can execute this sequence simultaneously and overwrite each other.
This is not a performance issue. It is a correctness issue.

Scaling stateless services horizontally amplifies this risk because concurrency increases with capacity.


4. Typical Solutions

There is no single universal fix. Instead, systems use different consistency strategies.

4.1 Optimistic Concurrency Control (Versioning)

Each record carries a version:

UPDATE document
SET content = ?, version = version + 1
WHERE id = ? AND version = ?

Only one writer succeeds. Others must retry.

This is effectively a compare-and-swap (CAS).

Widely used in:

  • relational databases

  • DynamoDB conditional writes

  • document stores

It prevents lost updates without heavy locking.


4.2 Idempotency

Requests should be safe to repeat.

If the same operation arrives twice (retries, network duplication), the system should not produce a different result.

This is essential in:

  • payment systems

  • event consumers

  • APIs behind unreliable networks

Idempotency keys or operation identifiers allow systems to detect duplicates.


4.3 Event Ordering

Sometimes the state should reflect the latest logical event, not the last write.

Solutions include:

  • timestamps (careful: clocks drift)

  • logical clocks

  • sequence numbers per entity

  • monotonic versioning

The key insight:
Last write ≠ most recent action.


4.4 Serialization via Queues

Instead of multiple concurrent writers:

clients → queue → single consumer → database

Queues provide ordering and eliminate write races at the cost of latency and throughput constraints.

Common in:

  • collaborative editing

  • inventory systems

  • financial ledgers


5. Why Timestamps Alone Are Not Enough

A common first instinct is to store timestamps and keep the “latest”.

This works only if:

  • clocks are synchronized

  • events are monotonic

  • no client is offline

  • no retries occur

In real systems:

  • client clocks lie

  • mobile reconnects happen

  • messages are delayed

Relying solely on timestamps often replaces a race condition with a time consistency bug.


6. Databases Do Not Automatically Save You

Developers often assume the database guarantees correctness.

Databases guarantee atomicity per operation, not per workflow.

This is atomic:

UPDATE row SET value = 5

This is not:

READ row
MODIFY
WRITE row

Without isolation (locks or conditional updates), the database cannot detect a logical conflict.

The bug lives above the database layer.


7. What These Questions Really Evaluate

Concurrency questions in interviews are not about memorizing definitions.

They evaluate whether you understand:

  • the difference between scaling and correctness

  • state vs events

  • arrival vs ordering

  • retries vs duplicates

  • atomic operations vs atomic workflows

In other words:

Do you design systems assuming the network is unreliable and multiple things happen at once?

Because in production, they always do.


8. A Useful Mental Model

Single-machine programming assumes:

“Things happen one after another.”

Distributed systems require assuming:

“Everything happens at the same time, out of order, and at least once.”

Once you adopt this model, many design decisions change:

  • APIs

  • database writes

  • caching

  • retries

  • message processing

Concurrency is not an edge case.
It is the baseline.


Closing Thoughts

Many system design discussions focus on scale, cloud architecture, and service boundaries.

But some of the most critical failures in real systems come from simpler issues:
two valid operations interacting in an invalid way.

Before worrying about microservices, queues, or multi-region deployments, systems must answer a more fundamental question:

What happens when two users change the same thing at the same time?

The answer to that question often defines whether a system is merely scalable — or actually correct.