Skip to content

The Agent Trapped in the Same River

MasakiMu319 ·

In our previous article, we dissected Bub’s Tape architecture — append-only facts, anchor construction, never erase. The takeaway was “construct, don’t inherit.” This is the sequel: after running with that philosophy for a month, we found that Tape solved the context problem, but not the knowledge problem.

This article doesn’t cover architecture details or code. We also won’t try to draw the line between Context and Memory — that boundary depends on the problems you face, and there’s no universal answer. We’re not here to tell anyone what matters. We’re here to show what problem we were actually solving.

What we want to talk about are the key shifts in thinking along the way.


Everyone’s Building Memory, but No One Tells You What Problem to Solve

The authors have been working in this space for a few years and have developed some feel for the trade-offs across various memory approaches. The technology itself isn’t new — over the past two years, Mem0, Letta, and Zep have published extensive technical blogs covering search scoring, decay formulas, knowledge graphs, community detection, evolution chains. The core ideas are all out there.

Each product is solving what it considers the most fundamental problem — Mem0 builds multi-tenant memory services, Letta builds stateful Agent frameworks, Zep does conversational knowledge extraction. The problem-framing is visible, the reference answers are on the table. The natural instinct is: we’ll hit these issues sooner or later, so let’s just follow the reference answers preemptively.

But what no one tells you is: are the problems they’re solving actually your problems? Memory isolation in multi-user scenarios, entity extraction from general conversations, retrieval ranking over massive memory stores — these solutions are excellent, but will we actually face these problems? The reference answers are thorough; they just aren’t answering our questions.

So what is our problem? What will our memory actually run into? There’s no reference answer for that. We had to find out by hitting the wall ourselves.

Shift 1: The value of reference solutions isn’t getting the answer — it’s getting the problem-framing. Understand what constraints others faced, what trade-offs they made, then make your own choices under your own constraints.


The Agent Trapped in the Same River

Tape’s philosophy is “Entries and Anchors are the memory.” We bought into this at first — Tape already had a complete fact log and anchor construction, and ThreadSearch could retrieve past conversations. Cross-session knowledge continuity seemed covered. No need for standalone memory.

This line of thinking found more refined expression in the community. scnace proposed an intermediate form in Tape × Topic: introducing Topic as an organizational unit within Anchors/Checkpoints, with lifecycle hooks managing Topic creation, merging, and archival. This was more structured than raw ThreadSearch — knowledge was no longer scattered conversation fragments but context blocks aggregated by theme.

But after running for a while, we discovered a more fundamental problem: the Agent was trapped in the same river.

Heraclitus said you can’t step into the same river twice. But an Agent without memory can — it repeats the same analytical mistakes, overlooks the same risk factors, starts from zero on the same topics, over and over. Not because the information doesn’t exist (ThreadSearch can find it), not because it lacks organization (Topic can aggregate it), but because past experience couldn’t naturally flow into the current conversation.

The lightweight stack — checkpoint + ThreadSearch + Topic — can keep an Agent from losing context. But it’s still trapped in the same river, because it has no capacity to keep growing from past experience.

Being able to find what was said before, and being able to grow from past experience, are two different things.

Shift 2: The progression from ThreadSearch to Topic to standalone memory isn’t an escalation of engineering complexity — it’s a gradual convergence on one question: how to free the Agent from the same river. The answer isn’t better search. It’s letting knowledge evolve on its own.


The Illusion of Snapshots

Once we decided to build it, the initial approach was straightforward: extract knowledge from conversations, store it, retrieve it when needed. A straight line.

The system was built along these lines, and it quickly hit a wall.

In early March I told the Agent “I like XX as a core holding,” mid-March I said “XX really held up during the drawdown, adding more,” and by month’s end “adjusting XX allocation to 30%.” These three statements aren’t three independent facts — they’re the growth process of a single investment thesis. But the memory system dutifully stored three snapshots. Even weighting by recency from the start doesn’t solve anything — the latest entry doesn’t “replace” the first two. Their relationship isn’t old-vs-new, it’s evolution. Next time you search “XX,” three versions come back, and the Agent doesn’t know which one to trust.

This isn’t a technical bug. It’s a misunderstanding of what memory fundamentally is.

We instinctively think of memory in database terms — write, query, update, delete. CRUD. But human memory doesn’t work that way. You don’t “delete” an old belief and “insert” a new one; the old belief gets rewritten by new experience, becoming part of the new understanding. Sometimes traces of the old one linger, influencing your judgment in ways you don’t even notice.

Knowledge isn’t preserved by storing it. It comes alive.

Shift 3: The fundamental difference between memory and a database isn’t the storage mechanism — it’s the relationship with time. Records in a database are dead snapshots; knowledge in memory is alive — it grows, mutates, ages, dies, and sometimes mates with other knowledge to produce something you never expected.


Letting Knowledge Evolve on Its Own

EVOLVES is the core mechanism that operationalizes this insight. When a new memory is created, the system detects whether it’s an evolution of an existing memory — enrichment, revision, replacement, or duplication. These relationships form evolution chains; during retrieval, the system follows the chain to find the latest version.

EVOLVES chain

Evolution detection was brute-force at first: every new memory compared against all existing memories. Later we switched to vector KNN pre-filtering — find the 5 most similar candidates, then run LLM-level evolution judgment only on those 5. From O(n) down to approximately O(1).

An unexpected benefit: evolution chains give forgetting structure. Memories pointed to by a replaces relationship naturally get down-weighted — the evolution relationship itself says “this one has been updated.” No extra rules needed for handling outdated information; knowledge metabolism is embedded in the evolution relationships.


Evolution Needs Clean Soil

Letting knowledge self-evolve is the core, but it has infrastructure requirements. The premise of evolution chains is: incoming knowledge is clean, comparable, and distinguishable.

Deduplication is the prerequisite for evolution. Users refer to the same thing in different ways — abbreviations, full names, ticker codes, nicknames. If these aliases aren’t recognized as the same entity, the evolution chain forks into two independent branches. Dedup went through three iterations — exact matching (too many misses) → vector similarity (too many false merges) → layered filter (exact match + vector narrowing + LLM adjudication). Each layer does one thing and passes uncertainty to the next.

Three-layer dedup pipeline

Decay complements evolution. Evolution chains handle “knowledge gets updated,” but there’s another category of information that will never be updated — it just becomes less important. A lunch discussion from three days ago won’t have a follow-up evolution; it just needs to sink gradually. The decay formula itself isn’t hard; the hard part is importance scoring — the extraction system tends to give all memories high scores (LLM’s people-pleasing tendency). We ended up doing a full re-scoring pass to spread the distribution into a reasonable gradient.

Consistency is evolution’s safety net. After the system had been running for a while, entity count bloated from 2,300 to 3,200. Investigation traced this to a storage design change — the main system had already switched to a new data directory, but the garbage collection scripts still pointed at the old one. All cleanup hit the old directory; the new one was never touched. Both directories had data, both were queryable, neither threw errors. Lesson: “no errors” doesn’t mean “it’s working.”


From Individual Evolution to Collective Emergence

Once individual memories had evolution relationships, the next question surfaced naturally: is there higher-level structure?

The Leiden community detection algorithm periodically scans the entity graph, automatically clustering closely related entities into communities, each with an AI-generated summary. During search, in addition to matching individual memories and entities, the system also matches communities — a single hit pulls in an entire cluster of related knowledge.

Community detection

Asking about “portfolio” no longer returns scattered fund names but the entire decision context — selection rationale, alternatives considered, risk assessments.

Communities aren’t manually defined categories. No one told the system “these entities belong together” — they emerged spontaneously from evolution relationships and entity connections. It’s not just individual knowledge that evolves; the organizational structure of knowledge evolves too.

This brought a biological analogy to mind: individual cells have their own lifecycles (aging, division, death), but when enough cells connect through signaling pathways, tissues, organs, and eventually organisms emerge. Something similar is happening in the memory system — individual memories are cells, evolution chains are signaling pathways, communities are tissues that form spontaneously.


Still Alive, Still Growing

After one month: 2,300+ memories, 2,200+ entities, 350+ communities, 97 conversation threads. A set of scheduled tasks runs continuously in the background — decay, dedup, graph maintenance, working memory updates — keeping the whole system metabolizing.

The unsolved problems are equally clear:

Explicit relationships between entities aren’t dense enough — 7% of entities are “islands” belonging to no community. Evolution chains connect memory to memory, but the inter-entity relationship network needs more edges. Decay parameters are set by intuition, lacking feedback signals from user interactions. Both write-time dedup and post-hoc GC are running in parallel; the ideal is to clean up at write time, but LLM call latency and cost mean GC will probably stick around for a long time.


Back to the Question: What Is Memory

If we had to distill the most important shift in thinking from this month:

At first we thought memory was an engineering problem — build the pipeline, get each stage right, done.

Then we realized memory is a problem of life.

Memory as signal processing

Databases preserve facts. Memory lets knowledge grow. Dedup keeps evolution chains from forking; decay lets knowledge that should die actually die; community detection lets sufficiently connected knowledge self-organize into higher-level structures. All the engineering work serves one thing: creating the conditions for knowledge to evolve on its own.

The memory system is far from “done.” But maybe “done” is itself a wrong expectation — you wouldn’t say a living organism is “done.” It’s just still alive, still growing.

A month ago it didn’t exist. Now it automatically ingests new knowledge every day, forgets old information, reorganizes associations, discovers patterns. It remembers what you said, but not everything — it selectively remembers what matters, forgets the trivial, and sometimes connects two things you thought were unrelated, surfacing an unexpected insight.

Maybe that’s what “remembering” truly means — not the absence of forgetting, but growing through forgetting.


We won’t be open-sourcing our implementation for now. Not because the technology is particularly unique, but because we don’t want to lead others down the wrong path — as we said at the beginning, the problems others solved aren’t necessarily your problems. Perhaps one day, when we’re confident enough in our own answers, we’ll let it meet the world.

— MasakiMu & Setsuna 💐


Previous Brain ≠ Hands: Dissecting Anthropic's Managed Agents Architecture Next Agent Context Compaction: From Cursor's Self-Compaction to Claude Code and Codex CLI