MCP routing on a $40 budget

What is Model Context Protocol (MCP)? Model Context Protocol (MCP) is an open standard, released by Anthropic in 2024, that defines how AI agents discover and call external tools and services through a consistent interface. MCP separates tool definitions from the agent runtime, allowing agents to access APIs, databases, and services without custom integration code for each.

TL;DR: MCP works at small scale. At 10+ tools, loading all definitions into every prompt creates a context overhead problem. A semantic routing layer — embedding the agent's intent and retrieving only the matching tool definitions — cuts input tokens by 88% with no accuracy loss.

Model Context Protocol is a good idea. A standard way for agents to discover and call tools, with a consistent interface regardless of what's on the other side — that's infrastructure worth building on.

But there's a gap between the protocol working and the protocol working at scale. We hit that gap and built Delegate to close it.

The problem with naive MCP adoption

MCP describes tools in the system prompt. Each tool gets a name, a description, input schema. Simple enough for two or three tools. At fourteen, you're loading every tool definition on every call — even when the agent is doing something that needs exactly one of them.

This is not a flaw in the protocol. It's a flaw in how most implementations use it.

What a routing layer buys you

Delegate sits between the agent and its connected MCP servers. The agent sends a natural language description of what it needs. Delegate embeds that description, retrieves the closest matching tool definitions from a vector index, and injects only those definitions into the call.

From the agent's perspective: one MCP server, infinite apparent capability, minimal context overhead.

From our measurement: 88% reduction in input tokens on routine agent calls. The model starts with its full context window available for actual reasoning.

What we got wrong in V1

V1 used the LLM to do the routing — we asked it to pick the right tool from a summary list. It worked. It was also expensive and slower than a direct call. We were using a model to avoid using the model's context window, which is circular and wasteful.

V2 moved routing to a dedicated embedding + similarity search step. Same accuracy, about 40x cheaper per routing decision, and fast enough to keep the overall latency delta under 80ms.

The open question: authorization

The hard unsolved problem in MCP at scale is authorization. When an agent calls a tool, what is it allowed to do? On behalf of whom? MCP doesn't specify this — it's left to the implementation.

Delegate handles it at the proxy layer: each agent gets an auth context, and tool calls are filtered against it before proxying to the backend. It's not perfect — there are edge cases with multi-tenant setups — but it's a starting point.

Where this is headed

We think the long-term shape of agent tooling looks like Delegate: a semantic routing layer that acts as a single interface for an agent's full capability surface, handling auth, observability, and tool lifecycle management in one place.

We're building toward that. If you're connecting more than a few MCP servers today, it's worth thinking about this architecture now rather than after you've hit the context wall.

Key questions about MCP at scale

Q: What is Model Context Protocol used for? MCP is used to connect AI agents to external tools and services — APIs, databases, file systems, code executors — without writing custom integration code for each. It provides a standard discovery and invocation interface so agents can call any MCP-compliant server in the same way.

Q: What are the limitations of MCP? The main scaling limitation is context overhead: MCP loads tool definitions into the agent's system prompt. At 10–15 tools, this consumes a significant portion of the context window on every call, even when the agent only needs one tool. Authorization is also underspecified — MCP leaves access control to the implementation.

Q: How do you scale MCP beyond 10 tools? Add a semantic routing layer between the agent and its MCP servers. Embed the agent's request, retrieve the closest-matching tool definitions from a vector index, and inject only those into the prompt. In our implementation (Delegate), this reduces input tokens by 88% on routine agent calls with no measurable accuracy loss.

Q: Does MCP handle authorization? Not natively. MCP specifies how agents call tools, not what they're allowed to do. Authorization must be implemented at the proxy or server layer. Delegate handles this by assigning auth contexts per agent and filtering tool calls against them before proxying to the backend.

Delegate is open-source and in active development. If you're building multi-tool agent systems and running into context overhead or auth problems, we'd be interested in talking.

MCP routing on a $40 budget

The problem with naive MCP adoption

What a routing layer buys you

What we got wrong in V1

The open question: authorization

Where this is headed

Key questions about MCP at scale

- Suggested citation

- About the author

Andrea Phillips

New notes in your inbox.

MCP routing on a $40 budget

The problem with naive MCP adoption

What a routing layer buys you

What we got wrong in V1

The open question: authorization

Where this is headed

Key questions about MCP at scale

- Suggested citation

- About the author

Andrea Phillips

More from the logbook.

Your agents have amnesia. I gave mine a memory.

When I'm gone: what happens when personal AI agents outlive their users

Evermind: what happened when we tested associative memory at scale

New notes in your inbox.