You don’t consciously switch between your cerebellum and your prefrontal cortex. You just think. The right subsystem activates for the right task – automatically, invisibly. Your breathing is handled by something that doesn’t ask your permission. You experience “thinking” as a single thing. The brain experiences it as a coordinated system of very different organs, each picked for the job, none of them on a dropdown menu.
So why does every AI app put a dropdown menu on your head?
The chat UI let you to pick a tier before you ask the question. Claude Haiku or Opus. Gemini Flash or Pro. You are effectively asked “how hard should I think?” before you’ve heard what you need to think about. The compute budget is decided in advance, by the person who knows the least about the upcoming task: you.
The result is predictable. You overpay for “thanks” and underpower “debug this pipeline.” The interface inverts the natural economy of thinking.
Why don’t the providers fix this? Because they can’t.
The economics don’t work at their layer. Per-token pricing collapses if the provider auto-downgrades – the $20 monthly cap shrinks their revenue per query to match, and no public company will ship that.
The incentives point the wrong way. Providers make more when you pick expensive tiers. A true auto-router would send “what’s 2+2?” to the cheapest thing in the world. No incumbent will build the switch that downgrades their own sale.
And the context is thin. The provider sees your prompt, maybe a system prompt, maybe a thread. It does not see your terminal state, your domain, your last six sessions on this server, or the procedures that tell it what you’re trying to do. The layer that knows least gets the decision.
Which gives you the observation the whole piece rests on. The application is the only layer with all the context.
An application-layer router knows what kind of turn is happening. “thanks” after a tool execution means “acknowledged, task complete” – not “tell me about gratitude.” Five “no”s in a row mean STOP, not “try harder.” The app knows whether the user is drafting a cron schedule or debugging a kernel panic, because it was just involved in both. It has the procedures the operator is following. It has the server state. It has the history.
With all that context, routing becomes a real decision rather than a guess. A greeting gets the smallest cheapest model. An exploratory question gets a mid-tier model. A multi-step debugging session gets the big model, with procedural context injected and tools attached.
The user doesn’t see any of this. They just talk to the brain.
I spent some time building this, and the surprise was this: the gap between cheap and expensive models is not mostly model quality. It is mostly context assembly.
A few hundred characters of prompt to a small model handle a greeting as well as fourteen thousand characters to a large one, because most of those fourteen thousand are irrelevant to a greeting.
Conversely, a small model with the right procedural context and the right tools can handle what looked like “expensive” tasks at a fraction of the cost.
The routing question is not “which model is smarter.”
It is which model, with which context, is correctly sized for this turn.
And it is not just cheaper. It is more resilient. An application-layer brain can fall back to a local model when the cloud is down, to a procedural step-by-step when even the local model is unavailable, and to a manual handoff when nothing is. Each step is still useful. The cloud stops being load-bearing. The brain stops being a single point of failure.
So here is the reframe.
The best AI system isn’t the smartest model.
It’s the layer that knows which brain to use for the turn you’re in – without you having to think about it.
Your app shouldn’t ask you to pick a model.
Your cerebellum doesn’t ask your permission to breathe.
The dropdown is the bug.
Bogdan Susala, April 2026

Leave a Reply