A Working Approach to AI Memory

Session 47, our AI agent confidently told us we'd processed 354 research links. The real number was 209. It wasn't hallucinating in the traditional sense — it was remembering a stale number it had written down weeks earlier and never updated.

That's the AI memory problem in one sentence. Not forgetting. Remembering the wrong thing.

Why Most AI Memory Doesn't Work

A developer named @signulll put it bluntly in a post that got over 1,600 likes: "No one has solved AI memory yet."

The standard approach is to dump everything into a vector database — a system that stores text as math and retrieves it by similarity. Ask a question, get the five most "similar" chunks of past context, and hope that's enough.

In practice, three things go wrong.

Cross-project noise. The agent pulls in memory from a completely different project because the words are similar. You're working on a website and the agent starts referencing notes from an unrelated API project because both mention "deployment."

Compaction amnesia. To save space, the system summarizes old memories. But summaries lose the details that mattered. The agent remembers you had a meeting but not what was decided.

Irrelevant recalls. The retrieval algorithm surfaces things that are semantically close but contextually useless. You ask about a database schema and get back a conversation about database backups from three months ago.

The result is an agent that feels like a coworker who half-remembers everything and confidently fills in the gaps. Sometimes that's worse than an agent with no memory at all.

What We Actually Built

We run an AI agent — Claude Code with a CLAUDE.md configuration — as the backbone of our business operations. Research analysis, project tracking, job search pipeline, content planning. It's been active for over 140 sessions across several months.

Early on, we hit every problem described above. Stale data. Irrelevant context. The agent forgetting critical decisions from two weeks ago while perfectly remembering trivial details from yesterday.

So we stopped trying to make search smarter and started thinking about structure instead.

Three Tiers, Not One Bucket

The system that emerged has three layers, each with a different job.

Tier 1: The Briefing Doc

A single file called MEMORY.md — about 130 lines. It loads every session, no retrieval needed. It contains identity, strategic direction, current state, what happened in the last few sessions, and anything the agent needs to know immediately.

Think of it like the one-page briefing a new employee gets on their first day. Not everything about the company — just enough to be useful right now.

This file changes constantly. Old session summaries get archived. New decisions get added. The current state section gets rewritten, not appended. It's a living document, not a log.

Tier 2: Topic Files

About 15 separate files in a memory directory, each covering a specific domain. Infrastructure notes. Job search rules. Project-specific context. Social media strategy.

These don't load automatically. The agent checks what it's about to work on, then pulls in the relevant file. Working on a resume? Load the application rules. Debugging a server? Load the infrastructure notes.

The key difference from vector search: the agent knows exactly where to look because the files are organized by topic, not by similarity score. There's no ambiguity about which file covers what.

Tier 3: The Deep Archive

A knowledge base of 71 documents — strategy breakdowns, tool evaluations, research notes — accessible through search tools. These never load unless the agent specifically queries for them.

This is where vector search actually makes sense. The archive is large enough that you can't browse it manually, and the queries are specific enough ("what do we know about MCP security?") that retrieval works well.

The trick is that most sessions never touch Tier 3. The briefing doc and topic files handle 90% of what the agent needs. The deep archive is for when the agent encounters something it hasn't dealt with recently and needs to check whether there's prior research.

Structure Beats Search

The core insight is unglamorous: knowing where to look matters more than having a great search algorithm.

When everything lives in one big database, the system has to guess what's relevant. When context is organized into tiers with clear boundaries, the agent can make a deliberate choice about what to load.

It's the difference between a filing cabinet and a pile of papers with a search engine on top. The filing cabinet is less sophisticated, but you find things faster because you know which drawer to open.

Staleness Is the Real Enemy

Memory systems don't usually fail dramatically. They fail slowly, through staleness.

A number gets written down. The underlying data changes. The old number stays in memory. The agent uses it weeks later with full confidence. That's how we got "354 links" when the answer was 209.

We added three rules to fight this.

Compute, don't cache. Anything that can be counted — links processed, blog posts published, database size — gets counted fresh every time. We wrote a stats script that runs at the start of each session. No stored numbers in prose.

Update immediately. When the agent learns something new — a deployment succeeded, a bug was fixed, a project changed status — it updates memory right then. Not after the current task. Not at the end of the session. Immediately.

Verify before recommending. Before the agent makes recommendations, it checks recent memory and the knowledge base. Recommendations based on stale context waste time and erode trust. We learned that one the hard way.

We're Not the Only Ones

A researcher named Vasilopoulos published a paper documenting a 108,000-line system built across 283 sessions. The knowledge-to-code ratio was 24.2% — nearly a quarter of the codebase was documentation and context, not executable code.

The interesting part: the same three-tier pattern emerged independently. Hot context loaded every session. Topic-specific files loaded on demand. A deeper archive searched when needed.

And the number one failure mode documented in the research? Staleness. Old information persisting in memory and causing wrong decisions downstream.

Two systems, built independently, converging on the same architecture and identifying the same primary failure mode. That's a signal worth paying attention to.

What This Costs

The maintenance overhead is genuinely small. The briefing doc gets rewritten naturally as part of each session's work. Old session summaries roll into an archive file automatically. Topic files only change when something meaningful happens in that domain.

The initial setup takes thought — you have to decide what goes in Tier 1 versus Tier 2, what the topic boundaries are, what warrants a knowledge base entry. But once the structure exists, keeping it current is part of the work, not separate from it.

The biggest cost isn't time. It's discipline. Someone — or some process — has to enforce the anti-staleness rules. The moment you let stale data sit in memory because updating it feels like a side task, you're back to the agent confidently citing wrong numbers.

The Honest Caveats

This approach works for a single agent used by a single person across a consistent set of projects. We haven't tested it with multiple agents sharing memory, or with teams where different people have different context needs.

The three-tier structure also assumes a relatively stable domain. If you're working on something completely new every week with no continuity between sessions, a briefing doc doesn't have much to brief about.

And vector search isn't useless — it's just not a complete answer by itself. It works well for Tier 3, where the archive is large and queries are specific. The problem is when people treat it as the entire memory system instead of one layer of it.

Start With the Briefing Doc

If you're running an AI agent for anything beyond single-session tasks, start with one file. Write down what the agent needs to know at the beginning of every session. Identity, current state, recent decisions, active work.

Keep it short — under 200 lines. Rewrite it regularly. Don't append forever.

That single file will do more for your agent's continuity than any vector database. Not because the technology is better, but because the problem was never about retrieval. It was about knowing what matters right now.

The fancy infrastructure can come later. Or it might not need to come at all.

This is the third in our series on AI agent architecture. See also: Why Your AI Fails Without Orchestration and CLAUDE.md vs SOUL.md vs SKILL.md.