Controlled Mayhem
Lab open
Work with us
Now shipping - Kodus Legal v0.4 RAG layer Lab note - Memory routing in TaskHive Open source - Hecate primitives v0.2 San Jose, CR - UTC-6 Now shipping - Kodus Legal v0.4 RAG layer Lab note - Memory routing in TaskHive Open source - Hecate primitives v0.2 San Jose, CR - UTC-6
LN/000Lab note - Infrastructure

Why we built Delegate: MCP servers as a context tax

Two months in, our agents had access to fourteen MCP servers. Every prompt was a fat envelope of tool descriptions before the model had even seen the user's question.

InfrastructureMCPSystems

Two months in, our agents had access to fourteen MCP servers. Linear, Slack, Google Calendar, GitHub, Asana, Notion, Stripe, half a dozen internal tools. Every prompt was a fat envelope of tool descriptions before the model had even seen the user's question. We were paying token tax on capability we weren't using on most calls.

MCP solved discovery. It did not solve scale.

Once you connect more than a handful of servers, every agent call shoulders the descriptive cost of every tool. Input prompts inflate, latency creeps up, and the model itself starts to drift because the context is now mostly metadata.

We measured it. Across our internal stack, MCP tool definitions accounted for roughly 60 to 80 percent of input tokens on routine calls. The actual reasoning was a tenant in someone else's apartment.

The thesis: tools should not all show up to the meeting

The agent should describe what it needs. A router should pick the right tool, fetch its definition, and hand it over for that one call. That's it. That's Delegate.

It's a cloud-native MCP proxy that sits between agents and their connected servers. The agent talks to Delegate. Delegate decides which underlying tool can serve the intent, loads only that tool's definition into the call, and proxies the request.

The result, on our own workloads:

  • 88% reduction in input tokens on tool-equipped agent calls
  • Connectable to any MCP-compliant server, no per-server adapter
  • One auth surface for the agent, regardless of how many backends are wired in
  • Stack: Railway for runtime, Supabase for state and auth

Three things we learned in the build

Smart routing is mostly a retrieval problem. We tried letting the model pick the tool itself. It worked, but ate the tokens we were trying to save. Embedding tool descriptions and routing via semantic search was the move — same accuracy, an order of magnitude cheaper.

Latency is more sensitive than people admit. Adding a router adds a hop. We had to ruthlessly cache tool definitions and parallelize the fetch so the agent never waits more than ~80ms longer than a direct connection.

The agent's mental model has to stay clean. The agent shouldn't know there's a router. From inside the call, it looks like one MCP server with a perfect memory of which tools to surface for which request.

Where it sits now

Delegate is in active testing with a small group of AI-forward founders. The 88% number holds across their setups too, with variation depending on how many servers they're connecting. We're working toward a public release and a hosted tier.

If you're running a multi-server MCP stack and want to test it against your own workload, get in touch.

The takeaway

MCP is the right protocol for the agentic era. The way most of us connect it today is not sustainable past a few servers. The fix isn't fewer tools. It's smarter routing.

- Suggested citation

Andrea Phillips. (April 27, 2026). Why we built Delegate: MCP servers as a context tax. Controlled Mayhem - Lab Notes.

AP
- About the author

Andrea Phillips

Senior engineer with deep experience building AI agent infrastructure — persistent memory, multi-agent orchestration, and MCP tooling. Designs and ships production-grade systems that make AI agents reliable, persistent, and genuinely useful. Fifteen years of full-stack and real-time engineering underpinning a focused practice in applied AI.

§02 - Logbook subscription

New notes in your inbox.

Roughly weekly, written when something breaks or surprises us. No marketing, no roundups - just the working notes. Unsubscribe anytime.

→ 1,240 readers · monthly cadence · no list selling