Mastering Agentic AI: The Memory Management Pattern

In the previous post, Multi-Agent Collaboration closed out the foundational patterns — the building blocks of any non-trivial agentic system. Today we move into Part 2 of the book: Advanced Capabilities. And the first capability without which everything else falls apart is memory.

Here’s the uncomfortable truth at the heart of every LLM-based system: the model itself is stateless. Each API call is a clean slate. The illusion of conversation is just you (or your framework) feeding the prior turns back into the prompt every time. Take that scaffolding away and your “agent” greets the user as a stranger every single message.

Memory is what turns a smart text predictor into something that feels like an assistant.

Pattern #8: Memory Management
#

The Problem
#

An LLM has no built-in concept of “before.” It doesn’t remember that you introduced yourself in turn 1, that you mentioned you were a CTO in turn 3, or that you already tried the approach it’s about to suggest again. Every call is independent.

For a one-shot Q&A agent that doesn’t matter. For anything else — chatbots, copilots, long-running workflows, agents that learn from past interactions — it’s the whole game. You need a deliberate strategy for:

Short-term memory — what was just said. The current conversation context.
Persistent memory — what was said last week. Sessions, multi-turn workflows, agents that survive restarts.
Semantic memory (briefly, since it overlaps with RAG) — what’s relevant from a much larger body of past interactions.

Each one has different mechanics, different costs, and different failure modes. Let me walk through how I think about all three.

The Solution
#

1. Short-term memory: conversation history

The simplest form. Keep a list of past messages and inject it into every prompt. In LangChain, the building blocks are ChatMessageHistory (the raw list) and ConversationBufferMemory (the wrapper that integrates with chains):

from langchain.memory import ConversationBufferMemory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

prompt = ChatPromptTemplate(messages=[
    SystemMessagePromptTemplate.from_template("You are a friendly travel assistant."),
    MessagesPlaceholder(variable_name="chat_history"),
    HumanMessagePromptTemplate.from_template("{question}")
])

conversation = LLMChain(llm=llm, prompt=prompt, memory=memory)

conversation.predict(question="Hi, I'm Jane.")
conversation.predict(question="Do you remember my name?")  # → "Yes, you said you're Jane."

MessagesPlaceholder is the trick. The chain inserts the running history at exactly that point in the prompt each turn. The memory object handles append and retrieval automatically. Two lines of setup, full conversational context.

This is the right starting point — but it has a hard ceiling. Every turn the history grows, every turn you pay for every token of it, and eventually you hit the context window. The patterns to handle that (summarization buffer, sliding window, vector retrieval over old turns) all exist in LangChain, but in production I lean on LangGraph for anything beyond a toy chatbot.

2. Memory as explicit state

The shift in mindset that LangGraph forces is healthy: instead of “memory” as a separate concept bolted onto a chain, state becomes the memory. Whatever fields you put in your TypedDict are what the agent remembers across nodes.

class AppState(TypedDict):
    user_name: str
    login_count: int
    last_login: str
    task_status: str
    greeting: str
    summary: str

def login_tracker(state: AppState) -> dict:
    """Increments login count and timestamps."""
    return {
        "login_count": state.get("login_count", 0) + 1,
        "last_login": datetime.now(timezone.utc).isoformat(),
        "task_status": "logged_in",
    }

There’s no separate “memory store” here. The state object is the memory. Each node reads what it needs and writes what it changed. The greeter that runs next sees login_count: 42 in its state and can produce “Welcome back, Charlie — visit number 42 today” without any retrieval logic.

This is far closer to how application state works in any normal piece of software. You don’t think of a counter in your database as “long-term memory.” You think of it as state. Treating agent memory as state makes the whole thing much easier to reason about.

3. Persistent memory: LangGraph checkpointing

Where it gets interesting is when you need that state to survive across invocations. The user logs in today, comes back tomorrow, and the agent should pick up where it left off. LangGraph’s checkpointer does exactly this:

from langgraph.checkpoint.memory import MemorySaver
from langchain_core.messages import HumanMessage

class ConversationState(TypedDict):
    messages: Annotated[list[BaseMessage], operator.add]

memory = MemorySaver()
graph = builder.compile(checkpointer=memory)

# Thread ID groups messages into a single conversation session
config = {"configurable": {"thread_id": "session-001"}}

# Turn 1
graph.invoke({"messages": [HumanMessage(content="Hi! I'm Charlie, a CTO.")]}, config)

# Turn 2 — the model remembers Turn 1 via checkpointed state
graph.invoke({"messages": [HumanMessage(content="What's my name and role?")]}, config)
# → "You're Charlie, and you're a CTO."

# Turn 3 — different thread = fresh session
config2 = {"configurable": {"thread_id": "session-002"}}
graph.invoke({"messages": [HumanMessage(content="Do you know my name?")]}, config2)
# → "I don't have that information."

Two things to notice. First, thread_id is the session boundary. Same thread = continuation; new thread = clean slate. This is how you give each user (or each conversation, or each workflow run) its own memory. Second, MemorySaver is the in-memory backend — fine for development. For production you swap it for SqliteSaver or PostgresSaver (same interface, persisted on disk or in a real database) and your agent’s memory survives restarts.

The reducer Annotated[list[BaseMessage], operator.add] — same trick from Parallelization and Multi-Agent — is what lets each new turn append to the message history instead of overwriting it.

4. Long-term semantic memory (briefly)

The fourth flavor is what RAG handles: when memory grows beyond what fits in the context window, you don’t keep all of it — you retrieve the relevant pieces. Past conversations get summarized and embedded into a vector store; on each new turn the agent searches for related context and pulls only what matters. We’ll get there properly in Chapter 14 (Knowledge Retrieval / RAG); for now just know that “memory” eventually becomes a retrieval problem, not a storage problem.

Why This Matters
#

Memory is not a feature you bolt on at the end. It’s an architectural choice that determines what your agent can do:

Personalization — remembering preferences, history, context across sessions.
Multi-turn workflows — long tasks that span days or weeks (incident triage, onboarding, research projects).
Continuity — agents that survive process restarts without losing what they know.
Learning — every interaction becomes data the agent can build on. Which is exactly the foundation for the next chapter, Learning and Adaptation.

Trade-offs worth being honest about:

Context window pressure. Conversation buffers grow without bound. Eventually you hit the model’s limit. Plan for it from day one: cap the history, summarize older turns, or move to retrieval.
Cost. Every token of history is a token you pay for, on every single turn. A long-running chatbot can quietly become expensive.
Privacy. Persistent memory means you’re storing user data. That data has to be encrypted, scoped, deletable on request, and auditable. The technical pattern is easy; the compliance work around it is not.
Staleness. What was true three weeks ago might not be true now. Long-term memory needs a way to age, update, or invalidate facts.

Rule of thumb: start with the smallest memory that solves the problem. A single-turn agent needs no memory. A chatbot needs conversation buffer. A multi-session assistant needs persistent checkpointing. An agent that learns from years of interactions needs vector retrieval. Don’t over-engineer.

The Bigger Picture
#

This is post #8 in my series documenting Antonio Gulli’s Agentic Design Patterns. As always, full credit for the conceptual framework goes to him.

We’ve now stepped into Part 2: Advanced Capabilities. Memory is the first of these because everything that follows — learning, persistent goals, long-running workflows — depends on the agent being able to hold state across time. The patterns we covered in Part 1 (Chaining, Routing, Parallelization, Reflection, Tool Use, Planning, Multi-Agent) compose with Memory naturally: a router can remember the last route, a planner can remember the last plan, a multi-agent system can share long-term state across nodes.

All the code from this post is in my repository: carlosprados/Agentic_Design_Patterns, under 08_Memory_Management/. Three runnable examples covering LangChain conversation memory, LangGraph explicit state management, and persistent checkpointing with thread IDs. All work with Gemini and Ollama through the shared get_llm() abstraction.

What’s Next
#

In the next post we’ll tackle Learning and Adaptation — once you have memory, the natural next step is using it. Agents that adjust their behavior based on past interactions, feedback, and outcomes. The line between “agent with memory” and “agent that learns” turns out to be thinner than you might expect.

Stay tuned.

Pattern #8: Memory Management#

The Problem#

The Solution#

Why This Matters#

The Bigger Picture#

What’s Next#