Skip to main content

Command Palette

Search for a command to run...

Working With an AI Coding Assistant (Codex) as a Backend Engineer

Updated
5 min read

Over the last months I started using an AI coding assistant powered by large language models (Codex-style systems).

I did not approach it as a novelty or productivity experiment.

I approached it the same way I approach any new piece of infrastructure:
with skepticism and with a production mindset.

The interesting discovery was this:

The assistant is not a faster autocomplete.

It behaves much closer to a very fast junior engineer with perfect recall and zero operational experience.

Once I started treating it that way, it became genuinely useful.

This post is not about whether AI will replace engineers.
It is about how it actually changes day-to-day backend work.


What It Is Actually Good At

The first surprise was not code generation.

It was code navigation.

In large systems, a lot of time is not spent writing code.
It is spent reconstructing intent.

Typical tasks:

  • understanding an unfamiliar module

  • finding where a side effect originates

  • tracing request flows

  • reconstructing configuration behavior

  • mapping DTOs across layers

The assistant is very good at building a mental index of a codebase quickly.

You can ask questions like:

“Where could a timeout be happening in this flow?”

And it will point to:

  • HTTP client configuration

  • thread pool limits

  • retry wrappers

  • circuit breaker policies

Not always correctly — but almost always usefully.

The real productivity gain is not typing less code.

It is reducing search time.


The Refactoring Multiplier

The second strong use case is mechanical refactoring.

Things engineers postpone for months:

  • renaming confusing interfaces

  • splitting large classes

  • extracting validation logic

  • migrating method signatures

  • removing duplication

These tasks are cognitively easy but operationally expensive.

They require attention, but not deep design thinking.

The assistant is extremely effective here.

You still review every change.

But the cost of attempting a refactor drops dramatically.

The interesting side effect:

I started performing refactors earlier.

Not because the assistant is perfect — but because the activation energy disappeared.


Where It Fails (Consistently)

The assistant writes correct-looking code far more often than correct systems.

This is the most important observation.

It is strong at:

  • syntax

  • API usage

  • small local logic

It is weak at:

  • concurrency

  • distributed systems

  • failure handling

  • timeouts

  • idempotency

  • partial failure

In other words, it struggles exactly where real backend incidents happen.

If you ask it to implement a retry mechanism, it will produce one.

If you ask it to design a safe retry mechanism, it will often produce a system that can duplicate side effects.

This is a critical difference.

The assistant optimizes for plausibility, not for operability.


The Illusion of Correctness

The most dangerous property is fluency.

Bad code used to look suspicious.

AI-generated code often looks clean, documented and well structured.

Which makes engineers trust it more than they should.

The failure mode is subtle:

You stop questioning decisions that you did not consciously make.

Over time, this introduces architectural drift.

Not dramatic failures — but many small design decisions that nobody truly owns.

I’ve seen:

  • retries added in three layers

  • hidden blocking calls inside async flows

  • silent error swallowing

  • accidental N+1 queries

All of them reasonable locally.
All of them problematic systemically.


The Real Productivity Shift

The assistant does not remove the need for senior engineers.

It increases the value of judgment.

Before:

Senior engineers wrote more code correctly.

Now:

Senior engineers reject more code correctly.

A large part of using an AI assistant well is knowing when not to accept its solution.

You stop thinking of it as a coding tool and start thinking of it as a proposal generator.


A Practical Workflow That Worked For Me

What worked best for me was separating tasks into two categories.

Tasks I delegate to the assistant

  • boilerplate

  • DTO mapping

  • test scaffolding

  • refactor mechanics

  • documentation drafts

  • codebase exploration

Tasks I never delegate

  • concurrency control

  • transactional boundaries

  • caching strategy

  • retries

  • idempotency

  • API contracts

  • state machines

Interestingly, this boundary maps almost exactly to the boundary between programming and engineering.

The assistant is good at programming.

Engineering still requires responsibility for behavior in production.


Unexpected Benefit: Thinking More Explicitly

One side effect I did not expect:

I started writing clearer code.

Because I needed to describe intent precisely when prompting, I became more explicit about:

  • invariants

  • failure modes

  • assumptions

  • data ownership

The tool forces you to articulate reasoning you previously kept in your head.

That alone improved code reviews and documentation.


Final Thought

AI coding assistants change how code is produced.

They do not change what reliable systems require.

Production systems are constrained by latency, partial failure, concurrency and time.

The assistant does not experience incidents, on-call or operational consequences.

Engineers do.

The most useful mental model I found is this:

The assistant can generate solutions.

The engineer is still accountable for reality.

And in backend systems, reality is what eventually wins.

More from this blog

Leandro Maia

10 posts

Notes on Backend Systems and Software Architecture