When AI Multiplies Velocity — and Chaos: Multi-Agent Lessons from a Real Codebase
I’ve been building a game called All Is Vanity in my spare time. This week I tried something ambitious: running eight AI coding agents simultaneously against the codebase.
At first, it felt like unlocking a superpower.
Then everything broke.
And the root cause was a one-line null check.
The Part That Worked
My game is structured so each level lives in its own reasonably isolated set of files. I set up one Claude Code agent per level, each restricted to modifying only its own directory. No shared system edits. No touching other levels.
Because the boundaries were clear, this scaled surprisingly well.
My iteration loop looked like this:
- Build the project
- Load a level in the browser
- Identify what was wrong
- Prompt that specific agent instance to fix or improve it
- Repeat
Multiple levels evolved simultaneously. Commits stayed small. Builds stayed green. It felt like having a small team of junior engineers, each owning their own module.
This was parallelized iteration with hard boundaries, and it was genuinely productive.
Then I Got Greedy
A machine update wiped my open agent windows. I had to restart everything.
Instead of recreating the clean per-level setup, I changed tactics. I created agents organized by task type:
- UI refactor
- Controls overhaul
- Tutorial improvements
- Boss tweaks
- Debug systems
- Bug fixes
- General improvements
Eight agents. Running simultaneously. Modifying overlapping files across the entire codebase.
Commits exploded in size. Everything still “kind of” worked.
Until it didn’t.
The Red Herring
The UI mostly worked fine. Menus loaded. Buttons responded. Animations played.
But when loading a level, the app froze.
That sent me down the wrong path entirely.
I started debugging level loading. Asset pipeline? Scene transitions? Async issues? State corruption from the overlapping agent changes?
Nope.
Here was the actual error:
1
2
3
4
5
TypeError: Cannot read properties of null (reading 'length')
at GameProgressManager.getTotalUnlockedLevels (AllIsVanity.js:12708:18)
at DebugPanel.buildUI (AllIsVanity.js:83101:135)
at new DebugPanel (AllIsVanity.js:83050:7)
at Game.init (AllIsVanity.js:11401:22)
The crash was happening during debug panel construction – long before level loading logic even ran.
The UI freezing on level load was a red herring.
The real problem? getTotalUnlockedLevels() was accessing .length on a null array. Something like:
1
return this.unlockedLevels.length;
When unlockedLevels hadn’t been initialized yet.
The fix:
1
return this.unlockedLevels ? this.unlockedLevels.length : 0;
One line.
Hours lost.
Why This Was Harder Than It Should Have Been
If this had been a small, isolated commit, I would have caught it in minutes.
But instead I had:
- Massive commits touching dozens of files
- Cross-system edits from multiple agents
- Agents modifying initialization order without coordinating
- Debug UI intertwined with progress tracking logic
- No clear atomic change to revert to
Reverting became guesswork. The system had drifted too far from a clean, stable baseline.
AI didn’t cause the bug. One of the agents introduced it as a side effect of a larger refactor, and the commit was so big I couldn’t isolate the change.
AI amplified my lack of containment.
The Real Lesson
AI is a multiplier.
If your structure is good, it multiplies velocity. If your structure is loose, it multiplies chaos.
In this case, I violated some fundamentals that I know better than to skip:
- Small, atomic commits
- Clear ownership boundaries
- Build verification after every batch of changes
- Controlled initialization order
As a solo developer, it’s easy to skip ceremony. “I know the whole system.” “I’ll clean it up later.” “It’s fine.”
When AI accelerates output, skipping ceremony becomes dangerous. You can generate dozens of file changes in seconds. But if you can’t trace what changed and why, you’ve traded velocity for drift.
The Containment Model
Here’s the framework I’m now using for multi-agent AI development.
1. Surface Area Ownership
Each agent owns a bounded module. No shared system edits without explicit coordination. The level-scoped approach worked because agents couldn’t step on each other.
2. Atomic Change Units
One intention, one commit, build must pass. If a commit touches 40 files across 6 systems, it’s too big. Period.
3. Initialization Contracts
Any system accessed at startup must be non-null, have safe defaults, and fail predictably. AI loves to assume things are initialized. Reality disagrees.
4. Red Herring Awareness
When something breaks after a large batch of AI changes: verify the stack trace first. Confirm execution order. Don’t trust where it looks broken – trust where it actually breaks.
5. Always Preserve a Clean Anchor
Never let the branch drift so far that reverting becomes painful. Tag known-good states. Commit early, commit often.
What This Sprint Actually Produced
Before I sound like I’m just complaining about process, let me be clear: the output this week was real.
All Is Vanity alone saw 111 commits in 9 days – after sitting dormant since 2020. The work included a complete boss combat system rework across 10 worlds, bullet-hell patterns, a 2D overworld map, cinematic death sequences, tutorial system, debug tooling, hero select screen, pause menu overhaul, wind/kite VFX, enemy death gibs, and a full UI polish pass. The game went from version 1.0 to 1.1.82.
But it wasn’t just one project. During this same week, using the same multi-agent workflow across Claude and Codex, I also built:
-
Beat Them Up – a Godot 4.6 beat-em-up game from scratch in 4 days. 11 commits. Six new enemy types (Cockroach Ninja, Rat Biker, Mosquito Drone, Trash Panda, Slug Bouncer, Pigeon Bomber). Hub-based district progression, XP system, co-op support, equipment visuals, cutscene intro, 268 passing tests.
-
Taxi in Space – a complete space taxi game built in 2 days. 18 commits. Physics-based flying, story mode and gauntlet mode (16 levels), 4 voiced passenger characters, controller support, touch controls, CRT overlay, fuel system, slow-mo death effects, web export. Playable on my site.
-
MindSpasm – a modernization of my old Flash-based comic builder. Rebuilt from scratch as React/TypeScript frontend + Node/Express backend with Docker. Character editor with armature/IK posing, comic editor with panels and speech bubbles, SVG asset library, user auth, PostgreSQL database. 9 commits over 4 days.
Four projects. One week. ~150 commits. Two AI tools.
The multi-agent approach works. The containment model is what makes it work safely.
Where I Landed
The worst part of this whole story? The bug wasn’t complex. The architecture wasn’t flawed. The system wasn’t fundamentally broken.
It was a null check.
That’s the paradox of modern AI tooling: we can generate complexity instantly, but we still lose hours to the simplest things – especially when our own process makes them hard to find.
There’s a thread on Hacker News right now debating how to meaningfully evaluate AI tool capabilities. The community keeps circling the same tension: impressive demos prove nothing about genuine productivity. The real test isn’t what the AI can do in isolation – it’s what you actually ship when you use it under real conditions, with real constraints, on real codebases.
I agree. The metric that matters isn’t tokens generated or lines changed. It’s validated features per dollar per week. And that metric is as much about your process as it is about the model.
AI doesn’t replace engineering discipline. It makes discipline the bottleneck.