AI made code cheap.
Not free. Not correct. Not maintainable. Just cheap.
This is not a small thing. For years, one of software’s plain old bottlenecks was typing the thing into existence. Today a plausible diff can arrive from autocomplete, chat, an agentic IDE, or a small squad of coding agents before your coffee has cooled to a legally defensible temperature.
At first, it looks like productivity because artifacts appear faster: more code, more text, more PRs, more “works on my machine” energy.
But software has never been just text in a repository. The expensive parts orbit the text: understanding the system, choosing trade-offs, checking weird failure modes, preserving intent, and changing the thing safely six months later, when everyone has forgotten why finalFinal2.ts was allowed to survive code review.
AI does not erase those costs. It moves them.
The Diff Is Not the Finish Line
Imagine an agent writes in five minutes what would have taken you an hour. Nice. Very shiny. A tiny productivity banner unfurls somewhere in Jira.
Then the real questions arrive.
Who understands why the solution has this shape? Who checked that it does not break the system in some boring but expensive corner? Who can explain it three months later? Who owns it when the agent has wandered off into another context window? Who pays when a plausible answer turns out to be confidently wrong?
The argument about whether AI can write code is mostly boring now. It can. The interesting part is what becomes scarce next. When generation gets cheaper, the bottleneck moves to review, tests, architecture, documentation of intent, and trust.
Cheap code is not the same thing as cheap software. A generated diff is a starting point. Sometimes it is a very good starting point. But it is not a signed receipt from reality.
Feeling Fast Is a Terrible Benchmark
The early 2025 METR study was useful because it was rude to everyone’s intuition. Sixteen experienced open-source developers worked on 246 real tasks in mature projects they knew well. Before the experiment, they expected AI to speed them up by 24%. Afterward, they still felt like it had sped them up by about 20%.
Measured time told a different story: the AI-assisted tasks took 19% longer.
Do not overread this as “AI makes developers slower” carved into a stone tablet. The setup mattered: early-2025 tools, experienced developers, mature codebases, and a high quality bar. Later METR notes are more cautious and suggest the effect is probably improving, while also becoming harder to measure because developers now dislike being forced to work without AI. Understandable. Once you have a chainsaw, being handed a butter knife feels personal.
The useful lesson is more boring, which usually means more useful: subjective speed is not productivity.
AI can remove blank-page friction. It can offer options, scaffold code, and keep work moving. But the total effect depends on integration, review load, tests, codebase knowledge, task quality, and the skill of the human holding the steering wheel.
“Does AI speed up programming?” is too vague to carry much weight. Ask instead whether it lowers the total cost of changing software over the whole lifecycle.
Debt Learned New Tricks
We already had technical debt: messy abstractions, bugs, security issues, tangled code, and all the other little invoices future-you receives with interest.
AI adds new invoices.
Cognitive debt is what happens when the system grows faster than the team’s shared mental model. Everyone has more code to look at, but not necessarily more understanding. The map gets bigger while the cartographers are in meetings.
Intent debt is code without a reason attached. The goal, constraints, rejected alternatives, and assumptions are trapped in a chat transcript, or worse, were never explicit in the first place. Future maintainers can read the diff, but they cannot reconstruct why this shape was chosen.
Epistemic debt is subtler: you get a working artifact without building the internal ability to understand, debug, or reproduce it. This is especially dangerous for beginners, but not only for them. Senior people can also outsource the learning loop. We just do it with better variable names.
These labels are not just academic furniture. A paper on cognitive and intent debt gives names to a thing many teams already feel. Debt Behind the AI Boom looked at verified AI-authored commits and found hundreds of thousands of static-analysis issues, mostly code smells. Static analysis is a blunt instrument, but the signal is hard to dismiss: cheap generation can push maintenance cost downstream.
Research on novice programming shows the learning version of the same pattern. Unrestricted AI helped participants get functional results faster, but they performed much worse later on maintenance tasks without AI. Scaffolded AI, where learners had to explain back what was happening, reduced that failure.
I don’t think the mechanism is subtle. If AI acts as a contractor, it can finish the task and steal the lesson. If it acts as tutor, critic, and reviewer, it can strengthen the loop.
I try to treat every task as having two outputs. One is external: code, document, diff, closed ticket. The other is internal: what I can now explain, diagnose, and do again without the tool. AI optimizes the first by default. The second needs protection.
Maintenance Is Where the Bill Arrives
James Shore has a blunt practical test: you need AI that reduces your maintenance costs.
That sounds less exciting than “10x developer”, but it is the part that decides whether the magic survives contact with a real codebase. If AI doubles output but makes every change slightly harder to maintain, the win can disappear quickly. A codebase is already a machine for generating future obligations. Feeding it more code with less understanding is not acceleration. It is a credit card with syntax highlighting.
A good AI workflow cannot stop at code. It has to leave evidence behind.
I want the agent to explain the rationale, name assumptions, and show at least one rejected alternative when the choice is non-obvious. Tests should cover failure modes, not only the happy path where the demo gods smile. Review should ask whether the intention is clear, not only whether the style checker has stopped yelling. Important decisions should land somewhere durable: docs, AGENTS.md, decision notes, task history, or whatever your team actually reads when production starts making eye contact.
After the merge, the team should understand the system better than before. If the repository changed but the humans learned nothing, you did not get free velocity. You got a deferred invoice.
Open Source Shows Who Pays
Open source makes the cost transfer visible.
When AI-generated issues or PRs become cheap to produce, maintainers pay with attention. Daniel Stenberg’s post about ending the curl bug bounty on HackerOne was not “we hate security reports now”. It was about removing an incentive that produced too much low-value noise. Godot maintainers described a related problem with AI-generated PRs: they can look plausible enough to demand review, while the author may not understand or have tested the change.
That breaks the trust model.
A maintainer is no longer only asking “is this code correct?” They are asking “is there a responsible author behind this diff?” That author needs to understand consequences, respond to feedback, and own the work after the first prompt has cooled down.
The same pattern appears in research on vibe-coding PRs. Less experienced AI-assisted contributors produced larger PRs, got much more review feedback, had lower acceptance rates, and kept PRs open longer. This is not an argument for keeping beginners away from AI. It is a reminder that AI democratizes production much faster than it democratizes accountability.
AI can give someone the ability to generate more code. It does not automatically give them taste, context, restraint, or system-level judgment. Sadly, npm install experience remains unavailable.
The Answer Is a Harness, Not a Ban
I don’t think the solution is to avoid AI. That feels like banning forklifts because warehouses became dangerous.
My preferred answer is boring: make the hidden costs visible.
An agent needs a harness: clear instructions, scoped tools, permissions, small tasks, verifiable outputs, context files, memory, handoffs, and explicit done criteria. Boring? Yes. Also the difference between “assistant” and “junior developer with shell access and suspicious confidence”.
Context engineering is how this quietly becomes ordinary software engineering. AGENTS.md, CLAUDE.md, local notes, project conventions, test commands, permissions, and tool descriptions are no longer just documentation. They are runtime inputs. The quality of that context directly affects the quality of the work.
Personal notes and project docs stop being an archive. They become a control surface.
Once agents have tools, this also stops being a chat UX problem. A text-only assistant can be wrong in annoying ways. An agent that reads your filesystem, runs shell commands, opens PRs, calls MCP servers, or touches external systems can be wrong in operational ways. Prompt injection is not a funny chatbot trick in that world. It is a vulnerability class.
At some point the agent stops looking like autocomplete and starts looking like an actor in your engineering system. Actors need onboarding, permissions, tests, review, and rollback paths.
The New Expensive Parts
So yes, AI made code cheap. That is real. It is useful. I like it. I use it constantly.
But the keyboard was never the deepest cost center.
The scarce resource is not syntax anymore. It is justified belief: knowing what should exist, checking that it actually works, preserving why it was done, making the next change safe, and keeping enough trust that humans do not spend all day reverse-engineering machine confidence.
Before merging an AI-generated change, I want to know a few things: what problem does this solve, why this approach, what assumptions does it depend on, what evidence proves it works, what can still fail, and whether future-me can recover the intent without reading a 40k-token chat log titled “quick fix”.
For me, that is the test.
Not whether it lowers the cost of typing.
Whether it lowers the total cost of change.