Why we built Delegate: MCP servers as a context tax

Two months in, our agents had access to fourteen MCP servers. Every prompt was a fat envelope of tool descriptions before the model had even seen the user's question.

InfrastructureMCPSystems

APAndrea Phillips·AI Engineer & Founder, Controlled Mayhem

04/27/2026Published

6 minRead time

v1.0Version

04/27/2026Last revised

Two months in, our agents had access to fourteen MCP servers. Linear, Slack, Google Calendar, GitHub, Asana, Notion, Stripe, half a dozen internal tools. Every prompt was a fat envelope of tool descriptions before the model had even seen the user's question. We were paying token tax on capability we weren't using on most calls.

MCP solved discovery. It did not solve scale.

Once you connect more than a handful of servers, every agent call shoulders the descriptive cost of every tool. Input prompts inflate, latency creeps up, and the model itself starts to drift because the context is now mostly metadata.

We measured it. Across our internal stack, MCP tool definitions accounted for roughly 60 to 80 percent of input tokens on routine calls. The actual reasoning was a tenant in someone else's apartment.

The thesis: tools should not all show up to the meeting

The agent should describe what it needs. A router should pick the right tool, fetch its definition, and hand it over for that one call. That's it. That's Delegate.

It's a cloud-native MCP proxy that sits between agents and their connected servers. The agent talks to Delegate. Delegate decides which underlying tool can serve the intent, loads only that tool's definition into the call, and proxies the request.

The result, on our own workloads:

88% reduction in input tokens on tool-equipped agent calls
Connectable to any MCP-compliant server, no per-server adapter
One auth surface for the agent, regardless of how many backends are wired in
Stack: Railway for runtime, Supabase for state and auth

Three things we learned in the build

Smart routing is mostly a retrieval problem. We tried letting the model pick the tool itself. It worked, but ate the tokens we were trying to save. Embedding tool descriptions and routing via semantic search was the move — same accuracy, an order of magnitude cheaper.

Latency is more sensitive than people admit. Adding a router adds a hop. We had to ruthlessly cache tool definitions and parallelize the fetch so the agent never waits more than ~80ms longer than a direct connection.

The agent's mental model has to stay clean. The agent shouldn't know there's a router. From inside the call, it looks like one MCP server with a perfect memory of which tools to surface for which request.

Where it sits now

Delegate is in active testing with a small group of AI-forward founders. The 88% number holds across their setups too, with variation depending on how many servers they're connecting. We're working toward a public release and a hosted tier.

If you're running a multi-server MCP stack and want to test it against your own workload, get in touch.

The takeaway

MCP is the right protocol for the agentic era. The way most of us connect it today is not sustainable past a few servers. The fix isn't fewer tools. It's smarter routing.

Why we built Delegate: MCP servers as a context tax

MCP solved discovery. It did not solve scale.

The thesis: tools should not all show up to the meeting

Three things we learned in the build

Where it sits now

The takeaway

- Suggested citation

- About the author

Andrea Phillips

New notes in your inbox.

Why we built Delegate: MCP servers as a context tax

MCP solved discovery. It did not solve scale.

The thesis: tools should not all show up to the meeting

Three things we learned in the build

Where it sits now

The takeaway

- Suggested citation

- About the author

Andrea Phillips

More from the logbook.

Your agents have amnesia. I gave mine a memory.

Beyond SaaS: introducing our paper on Delegate

Hecate is now an internal CLI

New notes in your inbox.