Two months in, our agents had access to fourteen MCP servers. Linear, Slack, Google Calendar, GitHub, Asana, Notion, Stripe, half a dozen internal tools. Every prompt was a fat envelope of tool descriptions before the model had even seen the user's question. We were paying token tax on capability we weren't using on most calls.
MCP solved discovery. It did not solve scale.
Once you connect more than a handful of servers, every agent call shoulders the descriptive cost of every tool. Input prompts inflate, latency creeps up, and the model itself starts to drift because the context is now mostly metadata.
We measured it. Across our internal stack, MCP tool definitions accounted for roughly 60 to 80 percent of input tokens on routine calls. The actual reasoning was a tenant in someone else's apartment.
The thesis: tools should not all show up to the meeting
The agent should describe what it needs. A router should pick the right tool, fetch its definition, and hand it over for that one call. That's it. That's Delegate.
It's a cloud-native MCP proxy that sits between agents and their connected servers. The agent talks to Delegate. Delegate decides which underlying tool can serve the intent, loads only that tool's definition into the call, and proxies the request.
The result, on our own workloads:
- 88% reduction in input tokens on tool-equipped agent calls
- Connectable to any MCP-compliant server, no per-server adapter
- One auth surface for the agent, regardless of how many backends are wired in
- Stack: Railway for runtime, Supabase for state and auth
Three things we learned in the build
Smart routing is mostly a retrieval problem. We tried letting the model pick the tool itself. It worked, but ate the tokens we were trying to save. Embedding tool descriptions and routing via semantic search was the move — same accuracy, an order of magnitude cheaper.
Latency is more sensitive than people admit. Adding a router adds a hop. We had to ruthlessly cache tool definitions and parallelize the fetch so the agent never waits more than ~80ms longer than a direct connection.
The agent's mental model has to stay clean. The agent shouldn't know there's a router. From inside the call, it looks like one MCP server with a perfect memory of which tools to surface for which request.
Where it sits now
Delegate is in active testing with a small group of AI-forward founders. The 88% number holds across their setups too, with variation depending on how many servers they're connecting. We're working toward a public release and a hosted tier.
If you're running a multi-server MCP stack and want to test it against your own workload, get in touch.
The takeaway
MCP is the right protocol for the agentic era. The way most of us connect it today is not sustainable past a few servers. The fix isn't fewer tools. It's smarter routing.