Every time you start a new conversation with an AI agent, it starts from zero. No memory of what you worked on last week. No awareness of decisions you made, preferences you set, or context that took twenty minutes to establish. You rebuild the context every time.
This is the agent memory problem. It's not a research problem anymore — it's an engineering problem, and it has a known shape.
Why context windows don't solve this
The obvious answer is: just put everything in the context window. Modern models support large contexts. Why not use them?
Three reasons. First, cost: large contexts are expensive per call. Running 100k tokens of history on every agent interaction compounds fast. Second, attention: models don't weight long-context equally — information buried in the middle of a large context window gets less attention than information at the edges. Third, relevance: not all history is relevant to every call. Loading everything loads noise.
The right architecture isn't a larger window. It's selective recall: surface the right memory for the current context, not all memory.
What a semantic memory layer does
MegaBrain is our answer to this. The idea is straightforward: instead of storing raw conversation history and dumping it into every prompt, you store facts, preferences, and decisions as structured, retrievable records. When an agent starts a session, it queries the memory layer for context relevant to the current task and injects only that.
It's the same insight as Delegate for tools applied to memory: don't load everything. Load what's needed.
The hard parts
Write-time extraction is the bottleneck. Deciding what to remember is harder than remembering it. Not every exchange contains a durable fact. The extraction layer needs to be good at identifying what's worth persisting — preferences, decisions, important context — and ignoring noise. We're using a small, fast model for extraction to keep cost down.
Identity and scope matter. Memory needs to be scoped: per-user, per-project, per-agent. A memory that bleeds across contexts is worse than no memory at all. Getting the namespacing right is mostly a data modeling problem.
Freshness and forgetting. Some facts have a shelf life. A user's preferred communication style is durable. The current sprint priorities are not. The memory layer needs a way to expire or deprecate facts, not just accumulate them.
Where MegaBrain sits now
We're running it internally across our agent stack. It's the reason our Paperclip agents maintain context across sessions. Early results are good — agents with access to MegaBrain context require significantly less re-briefing on repeated tasks.
We're working toward an open-source release. If you're building agents that need persistent context, reach out — we're interested in early testing partners.