The Whole Codebase in Context — What a 12M-Token Window Cheap Enough to Use Changes
Subquadratic models put a 12-million-token window within affordable reach. For builders, the interesting part isn't the size — it's that 'just give the model the whole codebase' stops being a luxury and becomes a default worth designing around.
Large context windows aren't new. Affordable large context windows are. SubQ's subquadratic model offering a 12-million-token window at roughly a fifth the cost of frontier models changes the part of the equation that actually constrained builders: not whether you could fit a big codebase in context, but whether you could afford to do it on every request. When holding the whole codebase costs a fifth of what it used to and runs dramatically faster, "just give the model everything" shifts from an occasional luxury to a default you can design your workflow around.
This matters because so much of the friction in AI-assisted development comes from the model not having enough context. It edits a file without seeing how that file is used elsewhere. It suggests a pattern that contradicts conventions established in a part of the codebase it never saw. It re-implements something that already exists because it didn't know it was there. Most of these failures are context failures, and cheap large context is the most direct fix for them — if you rebuild your workflow to take advantage rather than carrying forward habits built around expensive context.
Why Cheap Context Beats Clever Retrieval for Many Tasks
The standard workaround for limited context was retrieval — fetch the relevant pieces and feed only those. Cheap large context changes when that's worth it.
Retrieval can miss what matters. Feeding the model only the slices a retrieval system judged relevant means the model sees what the retriever guessed it needed. When the retriever guesses wrong — misses a dependency, omits a convention, skips the one file that mattered — the model works blind in exactly the spot where it shouldn't. Giving it the whole codebase removes that guessing.
Codebases are densely interconnected. Code's meaning lives in its relationships — how a function is called, what depends on it, what conventions the surrounding modules follow. Those relationships span the codebase, and slicing it for retrieval can sever exactly the connections the model needs to reason correctly. Whole-context reasoning preserves them.
Cheap enough means default, not exception. When large context was expensive, you reserved it for special cases and engineered retrieval for everything else. When it's a fifth the cost, you can make whole-codebase context the normal path and reserve the engineering effort for the genuinely huge cases. The default flips, and the default is where most of the value is.
What Changes for Builders
Fewer context-failure bugs. The edits that broke something elsewhere, the duplicated implementations, the convention violations — many of these trace to the model not seeing enough. Give it the whole codebase affordably and a class of frustrating, hard-to-explain failures simply diminishes.
Less retrieval machinery to maintain. A lot of AI-dev tooling is elaborate retrieval pipelines built to work around context limits. Cheaper context lets you simplify or remove some of that, reducing a source of complexity and a source of its own bugs. Less machinery between you and the model means fewer places for things to go wrong.
Better reasoning about cross-cutting changes. Changes that touch many parts of the system — the ones most likely to introduce inconsistency — benefit most from the model seeing the whole picture. Whole-context reasoning makes the model a more reliable partner exactly where it was least reliable before.
Where to Stay Careful
Capacity isn't comprehension. A model accepting 12 million tokens doesn't guarantee it reasons well over all of them. The old problem of models attending poorly to the middle of long inputs doesn't vanish with a bigger window. Verify the model actually uses the whole codebase well, not just that it accepts it.
Some retrieval was adding focus, not just saving cost. Narrowing the model's context sometimes improved results by removing noise. Dumping the entire codebase in can dilute the model's attention with irrelevant code. The skill is knowing when whole-context helps and when focused context is still better.
New architecture, validate on your work. Subquadratic attention makes different trade-offs than standard transformers. Whether the cost and quality claims hold on your codebase is something only your own testing tells you. Treat it as a reason to evaluate, not to rebuild on faith.
How to Adapt the Workflow
Try whole-context on your context-failure cases. Take the bugs and frustrations that came from the model not seeing enough, and rerun them with the whole codebase in context. The improvement tells you where cheap large context pays off in your specific work.
Simplify retrieval where it only fought cost. Audit your retrieval machinery: which parts existed to save money versus to improve focus? The cost-driven parts are candidates to retire now that context is cheap; the focus-driven parts stay.
Keep focused context for noisy or huge cases. Whole-codebase context isn't always best. For very large codebases or noise-sensitive tasks, focused context still wins. Use the cheap large window as the default and fall back to focus deliberately.
Measure quality, not just feasibility. The question isn't whether you can fit the codebase in context — it's whether doing so makes the model's output better on your tasks. Measure the outcome, and let that decide how much whole-context to adopt.
The Default Worth Reconsidering
For years, the right instinct in AI-assisted development was to be frugal with context — retrieve narrowly, compress aggressively, give the model only what it strictly needed. That instinct was a response to cost, and the cost just dropped by a factor of five. The builders who benefit will be the ones who notice that the constraint shaping their workflow has loosened, and who let the model see more of the codebase where seeing more makes it better.
A 12-million-token window doesn't matter because it's big. It matters because it's finally cheap enough to use as a default, and most of the model's failures as a coding partner were failures of not seeing enough. Cheap, broad context is the most direct answer to that — for the builders willing to design around it instead of carrying forward the frugality that expensive context demanded.