
For about a month this year I let myself believe the dream. Write a spec, hand it to the agent, never read the code. Whether you call it vibe coding or dress it up as spec-driven development, the pitch is the same and it is intoxicating: you describe the what, the model handles the how, and you stay at altitude forever. I run a production MCP server, a PR-reviewer agent, an EdTech platform — I generate a lot of code I never typed. So I wanted it to be true.
It isn't. Or rather, it's true right up until the now-familiar three-month wall, where the codebase you never looked at hardens into spaghetti that neither you nor the AI can safely change. I've stood at that wall. Which is why Matt Pocock's talk — “Why Software Fundamentals Matter More Than Ever” — landed like a confession I'd been avoiding.
Pocock's whole argument is an inversion of the slogan everyone repeats. “Code is cheap now,” we say. His correction: code might be cheap, but bad code is the most expensive it has ever been — because a tangled codebase is precisely the thing that stops you from reaping the leverage AI is supposed to hand you. The model gets lost in your mess too. And his fix isn't a better model or a cleverer prompt. It's the boring stuff: the software fundamentals a lot of us quietly assumed AI had made optional.
He's packaged those fundamentals as a set of Claude Code skills — mattpocock/skills, on skills.sh and GitHub — and the install counts tell their own story. I pressure-tested his five AI failure modes against my own scars. Here's what held up.
Start with the reframe, because everything else hangs off it. The thing AI made cheap is typing. The thing that was always the real asset — and is now scarcer, not more abundant — is a codebase you can still change without fear.
Here's the mechanism people skip. Let an agent build continuously with no design and you don't get neutral code, you get entropy: every change makes the next change harder to reason about. Eventually the system gets complex enough that asking the AI to modify it produces garbage, because the AI is navigating the same maze you are, with less context than you have. The leverage evaporates exactly when you need it most. That's the three-month wall in one sentence: ships in an afternoon, unmaintainable by the quarter.
I learned this the expensive way. An early version of one of my agent systems was a sprawl of cooperating sub-agents — each tiny, each “simple,” generated fast. It demoed beautifully. Then I tried to change one behavior and found it touched nine files and three implicit contracts nobody had written down. I eventually tore the whole thing down to a single agent — a migration that took far longer than the original build. That rewrite was the tax on code I'd never actually designed. “Code is cheap” had quietly billed me for the most expensive month of the project.
More writing
So when Pocock says fundamentals matter more now, this is what he means: the faster you can generate code, the more it costs you to generate the wrong code. The five failure modes below are just the specific ways “the wrong code” happens — and the decades-old principle that fixes each one.
The most common waste isn't buggy code. It's perfectly working code that solves a problem you didn't have. It happens because you and the model never shared a design concept — the invisible, agreed-upon picture of what you're building. Worse, the default models are eager to a fault: ask for anything and they'll start emitting a plan before they understand the requirements, because producing output reads as being helpful.
Pocock's fix is gleefully adversarial. His grill-me skill — by his numbers installed over 322,000 times — flips the model from eager assistant to relentless interrogator. Instead of answering you, it interviews you: sometimes 40 questions, sometimes 100, refusing to converge until every branch of the design is resolved. Then you turn that transcript into a PRD. The point isn't the document; it's that you can't survive a 60-question grilling about a feature you haven't actually thought through. It forces the thinking before the typing.
I felt the absence of this every time I skipped it. My PR-reviewer's first version went sideways precisely because I'd never forced the question what is a finding worth posting? The model and I had different answers and neither of us had said so out loud. A grilling would have surfaced that in minute three instead of pull request thirty.
Even once you agree on what, you can still spend the whole build speaking slightly different languages. It feels like a developer and a domain expert failing to communicate for lack of shared vocabulary — one says “user,” another says “account,” a third file says “member,” and they all mean the same thing until the day they don't.
This is the oldest fix in the book: Eric Evans called it a ubiquitous language in Domain-Driven Design twenty years ago. Pick one word per concept, write it down, make everyone — now including the model — use it. Pocock's ubiquitous-language skill scans your codebase and emits a Markdown glossary of defined terms. Keep that file open, pass it to the agent, and two things happen: the model burns fewer tokens “thinking” because it isn't reconciling synonyms, and its output stays glued to your plan instead of drifting into its own dialect.
My reviewer blurred “finding,” “comment,” and “issue” until the day the distinction actually mattered, and the fuzziness leaked into both the prompts and the code. A glossary I should have written on day one would have cost ten minutes and saved a refactor.
Give an agent good feedback loops — a typechecker, a browser, a test suite — and it will still floor it. The characteristic AI failure is writing an enormous amount of code all at once and only then checking whether any of it works. It outruns its headlights: generating faster than it can verify, so errors compound three layers deep before the first signal comes back.
The throttle is forty years old and it's called test-driven development. Kent Beck's red-green-refactor loop forces one deliberate step at a time: write the failing test, make it pass, clean up, repeat. Pointed at an agent, TDD does something subtle and important — it caps the model's speed at the rate of your feedback loop. It literally cannot outrun its headlights if it has to produce a passing test before each increment. Pocock leans on this hard, and it's the one fundamental I'd already adopted on my own, because my reviewer taught me the hard way: the version that wrote tests first was the first version I trusted to touch its own logic.
Here's the one I underrated. Left alone, AI doesn't build clean code — it builds lots of code. Specifically, it loves what John Ousterhout, in A Philosophy of Software Design, calls shallow modules: many small units whose interface is nearly as complicated as the implementation they hide. Each looks tidy in isolation. Collectively they're a hairball, and the AI chokes on them — it can't explore the system, gets lost tracing dependencies, and loses the plot of what the whole thing does.
The antidote is Ousterhout's deep modules: fewer, larger modules that hide a great deal of complexity behind a small, clean interface.
// Shallow: the caller orchestrates everything.
// The interface is as complex as the work.
const conn = openConnection(cfg);
const tx = beginTransaction(conn);
const stmt = prepare(tx, sql);
bindParams(stmt, params);
const rows = execute(stmt);
commit(tx);
closeConnection(conn);
// Deep: one obvious door, a world of complexity behind it.
const rows = await db.query(sql, params);The second version isn't “less code that does less.” It does more — pooling, transactions, retries, cleanup — and exposes almost none of it. That asymmetry is the whole game: maximize the functionality hidden, minimize the interface exposed. Pocock's improve-codebase-architecture skill is built to drive exactly this, sending the agent to explore the codebase and wrap scattered, related functions into deep, independently testable boundaries.
This is, in retrospect, exactly what my multi-agent-to-single-agent migration was. Those sub-agents were shallow modules with a tangle of implicit contracts between them. Collapsing them behind a single clean entry point wasn't dumbing it down — it was burying the complexity where it belonged, behind a door the model could actually open.
The last failure mode is the one nobody warns you about, because it shows up even when everything is going right. The AI is shipping, the tests are green, features are landing — and you are exhausted in a way you've never been before. Pocock names it directly: developers are burning out trying to hold all this freshly generated context in their heads. You didn't write the code, so you have no muscle memory for it, yet you're still on the hook for understanding all of it. That's a new and brutal kind of fatigue.
Deep modules are the cure here too, and this is the part that reframed how I work. Once your system is a set of deep modules, you can treat each one as a gray box: design its interface from the outside and deliberately stop caring about the inside. You become the strategic programmer — the one who decides what the boundaries are and how they fit together — and you let the AI be the tactical programmer handling the messy implementation behind the interface you specified.
// You own this — the strategic surface. Small, stable, yours.
export interface ReviewEngine {
review(pr: PullRequest): Promise<Finding[]>;
}
// The AI owns the inside: grounding, the verification pass,
// severity gating, token budgeting. A gray box you don't hold in your head.This is how I now keep my MCP server in my head — by not keeping most of it in my head. I spend my limited attention on the interfaces and let the implementations be someone else's problem, where “someone else” is the model. The energy I save by designing instead of memorizing is the difference between shipping for a week and shipping for a year.
Step back and the pattern is almost funny. Ubiquitous language. Test-driven development. Deep modules. A real design before you build. None of these are new — they're decades old, the kind of thing dismissed as “stuff that didn't break.” Pocock's point, and mine after a year of shipping AI code, is that they didn't break and that's exactly why they matter more now. The thing that didn't break is the thing you can lean your whole weight on when everything around it is moving fast.
It almost doesn't matter whether you're Team Vibe Coding or Team Spec-Driven Development. Both camps share the same failure the moment they stop reading the code, and both are rescued by the same fundamentals. That's the tell that Pocock is pointing at something real: his skills aren't a methodology you buy into, they're old engineering wisdom wearing new prompts. grill-me is a design review. ubiquitous-language is a glossary. improve-codebase-architecture is a refactor. The hundreds of thousands of people who installed the first one are hundreds of thousands of people who, like me, learned that the agent doesn't replace the discipline — it raises the price of not having it.
So here's my honest verdict. Go install the skills from skills.sh/mattpocock/skills or the mattpocock/skills repo; they're a genuinely good on-ramp. But install the fundamentals first. Let the AI do the typing — it's faster and better at it than you'll ever be. Just don't let it do the thinking about structure, naming, and design, because that's the part that was never cheap, and in the age of cheap code it's the only part left worth your name.