Post

Token Limits Reveal Organizational Debt

Token Limits Reveal Organizational Debt

Token limits are usually discussed as a model constraint.

I have started to see them as organizational feedback.

Messy hidden context transformed into a clean shared map

When an AI session needs an enormous amount of context to make a safe change, that is not only a model problem. It may be a sign that the system is hard to explain, hard to test, or carrying too much implicit knowledge.

In late March, I was pushing through technically dense work: rendering experiments, runtime differences, visual effects, UI systems, and long-running AI sessions. That week had 76 commits touching 352 files. There were moments where the model could help a lot, but only if the relevant history was clear. What had been tried? What failed? Which approach had been rejected? Which runtime behaved differently?

That is where token limits stopped feeling like an annoyance and started feeling like a mirror.

Context Windows Punish Ambiguity

A human team has context windows too. We just call them memory, onboarding, documentation, meeting time, and review bandwidth.

If a senior engineer needs a two-hour explanation before touching a system, the system has context cost. If a new teammate keeps making locally reasonable changes that break global assumptions, the organization has hidden rules. If every bug requires someone to remember why an old experiment failed, the team has poor institutional memory.

AI makes that visible faster.

The model asks, in effect: what do I need to know to make this change safely?

If the answer is “almost everything,” the problem is not only AI.

Legibility Becomes A Competitive Advantage

The more I work this way, the more I value legibility.

Legible code is not just pleasant code. It is code that can be safely operated on by people and tools with limited context.

Legible organizations have the same property. Decisions are written down. Ownership is clear. Important rules have names. Tests describe expectations. APIs carry intent. Workflows are repeatable.

That kind of legibility directly improves AI leverage.

An AI agent can do more useful work when it does not have to infer every unwritten rule. A human reviewer can move faster when the change is bounded. A new session can start cleanly when handoffs are explicit.

This is why I think AI will increase the return on good engineering hygiene.

Messy systems can still use AI. Clean systems will compound with it.

The Hidden Cost Of “Just Put It In The Prompt”

Long prompts can be useful, but they can also hide process debt.

If every AI session needs a long instruction block explaining how not to break the system, some of that instruction probably belongs somewhere more durable.

Maybe it belongs in tests. Maybe in a schema. Maybe in a doc. Maybe in a smaller module boundary. Maybe in a command that does the right thing by default.

The point is not to eliminate context. The point is to stop paying the same context cost every time.

In my own workflow, I started moving repeated context into repo rules, build docs, harnesses, and source-of-truth files. That was not busywork. It reduced the cost of every later AI session.

Good systems are token-efficient.

What Leaders Should Do Differently

If you are leading AI adoption, do not only measure tool usage.

Measure context friction.

Where do engineers have to explain the same thing repeatedly? Where do AI sessions make the same class of mistake? Where do reviews get bogged down because the system’s intent is not obvious? Where does one expert have to bless every change because the rules are not encoded anywhere?

Those are the places to invest.

AI does not make context free. It makes context management more important.

The teams that win will reduce the amount of private memory required to do safe work.

The Executive Read

Token limits are not just a cap on AI.

They are a forcing function for clearer systems.

If the model cannot hold the whole organization in its context window, good. Neither can your people. The answer is not to wait for infinite context. The answer is to make the work more modular, more explicit, and easier to verify.

That is not an AI trick.

That is engineering leadership.

This post is licensed under CC BY 4.0 by the author.