Skip to main content
C carlos.enredando.me CTO · Advisor · Builder

Mastering Agentic AI: The Reflection Pattern

·1053 words·5 mins
Carlos Prados
Author
Carlos Prados
Telecommunications Engineer, Entrepreneur, CTO & CIO, Team Leader & Manager, IoT-M2M-Big Data Consultant, Pre-sales Engineer, Product-Service Manager & Strategist.

In the previous post, we saw how Parallelization buys speed by running independent work concurrently. Speed is great, but speed without quality is just a fast way to produce mediocre output. And LLMs, as anyone who has shipped one to production knows, have a deeply human flaw: their first draft is rarely good enough.

That’s the gap the Reflection pattern fills. It’s the moment an agent stops being a one-shot text generator and starts behaving like a junior developer who actually re-reads their pull request before opening it.

Pattern #4: Reflection
#

The Problem
#

You ask an LLM to write a Python function with a docstring, handle edge cases, and raise on bad input. The model produces something that looks right. It compiles. But it silently returns 1 for negative numbers, the docstring forgot to mention the exception, and the variable names are unhelpful.

What you actually wanted was the model to do what any senior engineer would: write a first version, look at it critically, find the gaps, and rewrite. A single forward pass through an LLM doesn’t include that step. The model is optimized to produce plausible output now, not to second-guess itself.

You can keep throwing bigger models at the problem, but you’ll plateau. The real fix is architectural: build a loop that generates, critiques, and refines until the output meets the bar.

The Solution
#

Reflection introduces a feedback loop with three roles — Producer, Critic, and (optional) Refiner. The Critic can be the same model with a different system prompt, or a separate agent entirely. There are three flavors of this pattern I’ve found useful in practice, going from cheap to expensive.

1. Single-pass LCEL chain — Generate, critique, refine, once. No loop. Good when you know one round of refinement is enough:

full_reflection_chain = (
    RunnablePassthrough.assign(initial_description=generation_chain)
    | RunnablePassthrough.assign(critique=critique_chain)
    | refinement_chain
)

RunnablePassthrough.assign is the elegant trick here. After each step, the new field (initial_description, then critique) is added to the running dict, so the next step has everything it needs. Three LLM calls, one clean chain, predictable cost. This is the right starting point for product descriptions, blog drafts, marketing copy — anywhere a single critique round measurably lifts the result.

2. Iterative loop with stopping condition — When one round isn’t enough, you keep going until the critic says it’s good:

for i in range(max_iterations):
    # Generate or refine based on previous critique
    response = llm.invoke(message_history)
    current_code = response.content
    message_history.append(response)

    # Critic with its own persona and stopping signal
    critique = llm.invoke(reflector_prompt).content
    if "CODE_IS_PERFECT" in critique:
        break
    message_history.append(HumanMessage(content=f"Critique:\n{critique}"))

The Critic is the same model with a different system prompt — “You are a senior software engineer doing a meticulous code review. If the code is perfect, respond exactly with CODE_IS_PERFECT. Otherwise, list your critiques.” The sentinel phrase is the stopping condition. The message_history carries the full context forward, so each iteration sees the previous code and the previous critique. This is how you build agents that produce code that actually passes a senior review.

3. LangGraph two-agent pipeline — When the Producer and the Critic should be structurally separate (different models, different tools, different state), model them as graph nodes:

class ReflectionState(TypedDict):
    topic: str
    draft_text: str
    review_output: str

builder = StateGraph(ReflectionState)
builder.add_node("draft_writer", draft_writer)
builder.add_node("fact_checker", fact_checker)

builder.add_edge(START, "draft_writer")
builder.add_edge("draft_writer", "fact_checker")
builder.add_edge("fact_checker", END)

The DraftWriter and the FactChecker are independent components. They can use different LLMs (a creative one for drafting, a colder/deterministic one for fact-checking), different prompts, different tools. The state object carries the draft from one to the other. From here, adding a loop is just adding a conditional edge from fact_checker back to draft_writer when the review flags issues — but I prefer to start with the linear pipeline and add the loop only when the data tells me it’s needed.

Why This Matters
#

The Producer-Critic split is more than a productivity trick. It removes the cognitive bias of a model reviewing its own work. The Critic, given a different persona and a fresh prompt, approaches the output as an outsider. It catches things the Producer would defend.

Where Reflection actually pays off:

  • Code generation: write → review → fix until tests pass or critic is satisfied.
  • Content workflows: draft → fact-check → rewrite for blog posts, reports, marketing copy.
  • Multi-step reasoning: propose a step, evaluate if it leads closer to the solution, backtrack if not.
  • Summarization: draft a summary, check it against the source for missing key points, refine.

The trade-offs are real and worth being honest about:

  • Latency: every iteration is another full LLM round trip. Three iterations is three times the wait.
  • Cost: token usage multiplies the same way. A reflection loop on a frontier model can get expensive fast.
  • Context window: each iteration appends to history. On long tasks, you’ll hit the model’s context limit if you’re not careful.
  • Diminishing returns: after two or three rounds, the critic often starts inventing problems just to have something to say. Cap the iterations.

Rule of thumb: start with the single-pass chain. Move to the loop only when you can point to a class of bugs that survives one critique round. Move to the two-agent pipeline only when the roles are genuinely different enough to justify the extra orchestration.


The Bigger Picture
#

This is post #4 in my series documenting Antonio Gulli’s Agentic Design Patterns. As always, full credit for the conceptual framework goes to him.

Reflection composes beautifully with the patterns we’ve already covered: a Router can dispatch to a chain that internally uses Reflection, a parallel branch can run a Critic alongside a Producer, and — as we’ll see in later chapters — Reflection is the foundation for Goal Setting and Self-Correction patterns where the agent doesn’t just refine an output, it refines its own plan.

All the code from this post is in my repository: carlosprados/Agentic_Design_Patterns, under 04_Reflection/. Three runnable examples (single-pass chain, iterative loop with stopping condition, LangGraph two-agent pipeline). All work with both Gemini and Ollama via the shared get_llm() abstraction.


What’s Next
#

In the next post we’ll tackle Tool Use — the pattern that turns an LLM from a text generator into something that can actually do things in the world. Function calling, external APIs, real side effects. This is where agentic systems stop being closed-world reasoners and start interacting with reality.

Stay tuned.