What is spreading activation?

Spreading activation is an idea from cognitive psychology: memory is a graph of concepts, and activating one concept spreads activation to connected concepts with decay over distance. Hearing 'doctor' surfaces 'nurse' and 'hospital' without conscious recall. Evermind applies that mechanism to a language model's memory store.

What did the Evermind paper find?

At production scale, spreading activation barely helps. Aggressive spreading significantly degrades retrieval (ΔF1 = -0.017). Minimal spreading produces a borderline-significant positive effect on semantic queries (ΔF1 = +0.006). Entity-grounded queries are completely unaffected. The mechanism works as designed, but the marginal benefit does not justify the engineering cost on a corpus with adequate baseline retrieval.

Is a negative result worth publishing?

Yes. The paper resolves a question that has been hard to answer at meaningful scale: it runs 1,000 retrieval scenarios over a 100,000-chunk legal corpus with bootstrap confidence intervals. A clean, statistically powered null result tells production teams not to spend engineering effort on a mechanism that will not move their numbers.

Should production retrieval systems adopt spreading activation?

The paper recommends against it as a next improvement. Hybrid graph signals — entity co-occurrence and citation links — and investment in query understanding and reranking are more promising directions. If you do adopt spreading, the paper recommends a minimal decay setting and gating it to low-confidence queries only.

Evermind: what happened when we tested associative memory at scale

TL;DR: We published a paper testing a cognitively-inspired memory architecture — Evermind — at production scale. The honest result: spreading activation over an embedding-derived memory graph has a small, possibly-positive ceiling that we have now characterized empirically. It is not the retrieval improvement we hoped for. We publish the null result anyway, because a clean answer at scale is worth more than a hopeful one.

Human memory does not wait for a query. Hearing "doctor" surfaces "nurse" and "hospital" without any conscious recall effort. Every memory system we ship for language models works the other way: it waits for an explicit trigger — a user query, an LLM decision, a task prediction — before it retrieves anything.

Evermind: Context-Triggered Spreading Activation Memory for Large Language Models asks whether closing that gap actually helps. Daniel Phillips, Controlled Mayhem, May 2026.

The question the paper asks

Evermind is a memory architecture that combines two ideas: context-triggered retrieval, which surfaces relevant memories before the model generates rather than after it asks, and spreading activation over a weighted memory graph, so that a strongly-matching memory makes its neighbours slightly more likely to surface too.

The question is empirical, not architectural. Spreading activation is a natural candidate for improving conceptual retrieval — but how much does it actually help, on a real corpus, at real scale? If the answer is "meaningfully," production retrieval systems should adopt it. If the answer is "negligibly," they should not.

This work was motivated by a concrete operational need. The author operates Kodus, a Spanish-language legal-intelligence platform indexing over four million chunks of case law across five Costa Rican and Guatemalan legal corpora. Conceptual semantic queries there work passably — users report missed relevant documents when their phrasing diverges from the source text. Spreading activation looked like a fix worth testing properly.

What we found

We tested the spreading hypothesis at scale: 1,000 retrieval scenarios over a 100,000-chunk subset of the Kodus corpus, with bootstrap 95% confidence intervals over per-scenario F1. To our knowledge this is the largest single-corpus evaluation of spreading activation on contemporary LLM-augmented retrieval.

The results are clean, and they are mostly negative:

Aggressive spreading significantly degrades retrieval. At a decay setting of γ = 0.70, ΔF1 = −0.017 (95% CI [−0.027, −0.007]). Systems that switch spreading on naïvely should expect their retrieval to get worse.
Minimal spreading is borderline beneficial. At γ = 0.95, ΔF1 = +0.006 (95% CI [−0.0004, +0.0132]). The effect is real in direction, but small enough that the practical case is weak.
Entity-grounded queries are immune. For all 300 queries referencing specific laws, articles, or IDs, no value of γ changes the top-3 retrieval set at all. Spreading activation is categorically inert for "find-all" lookups.
The ceiling is robust. A graph-topology ablation — mutual k-NN versus open k-NN, eliminating a 30% isolated-node rate — does not break the +0.006 ceiling. The bottleneck is similarity geometry, not graph density.

Per-scenario, 5.1% of semantic queries improve and 3.5% degrade. The mechanism works exactly as designed. It just does not move the needle far enough to justify building, maintaining, and serving a k-NN memory graph in production.

Why we published a null result

The honest scientific contribution of this paper is the negative result with strong bounds. Spreading activation on embedding-derived memory graphs has a small, possibly-positive ceiling that nobody had characterized at this scale before.

That is worth publishing. A vague "it might help" sends production teams down a months-long engineering path. A clean, statistically powered "+0.006, here are the confidence intervals" tells them not to. The paper is deliberately readable as either a cautiously positive result (minimal spreading is safe and marginally beneficial on semantic queries) or a negative one (the benefit is too small to deploy) — both readings are defensible from the data, and the paper presents them honestly rather than picking the flattering one.

For Kodus and similar production legal-intelligence systems, the paper makes four recommendations:

Do not adopt spreading activation as the next improvement. The change is below the noise floor of user-perceived quality, and exactly zero on the entity-grounded queries that dominate production traffic.
Invest in hybrid graph signals first. Entity co-occurrence and citation links are signals an embedding-derived k-NN graph cannot see. The corpus already supports them.
Invest in query understanding and reranking. The dominant failure mode is missed relevant documents, not mis-ordered ones — query rewriting and cross-encoder reranking address recall more directly.
If you do adopt spreading, set γ = 0.95 and gate it. Use it only when baseline confidence is low. Never run aggressive decay; the degradation at γ = 0.70 is real.

How to read it

The paper runs 15 pages. If you have five minutes, read the abstract and Section 7 (Conclusion). If you have twenty, Section 4 covers the architecture and Section 6 reports the at-scale benchmark, the per-scenario breakdown, and the recommendations. Section 6.7 lists the limitations plainly — one corpus, one language, one embedding model, top-3 metrics only.

Read the full paper →

The takeaway

Cognitively-inspired memory mechanisms are appealing because the cognitive science is appealing. But faithful replication of a brain mechanism is not the same as an engineering win. Spreading activation works as designed on a 100,000-chunk legal corpus — and still does not earn its keep against a strong baseline.

The useful direction is not more iteration on graph construction within the k-NN family. It is hybrid signals the embedding graph cannot derive on its own. We would rather publish that clearly than ship a memory feature that looks principled and changes nothing.

If you are building retrieval or memory infrastructure and weighing associative recall against simpler wins, we would be interested in comparing notes.

Evermind: what happened when we tested associative memory at scale

The question the paper asks

What we found

Why we published a null result

How to read it

The takeaway

- Suggested citation

- About the author

Daniel Phillips

New notes in your inbox.

Evermind: what happened when we tested associative memory at scale

The question the paper asks

What we found

Why we published a null result

What we recommend instead

How to read it

The takeaway

- Suggested citation

- About the author

Daniel Phillips

More from the logbook.

Your agents have amnesia. I gave mine a memory.

When I'm gone: what happens when personal AI agents outlive their users

Beyond SaaS: introducing our paper on Delegate

New notes in your inbox.