38. Agent Patterns & Memory
Chapter 38 — Agent Patterns & Memory
Overview
Agentic systems represent a paradigm shift from single-inference models to autonomous systems that plan, use tools, maintain memory, and achieve complex goals across multiple steps. This chapter explores proven architectural patterns, memory strategies, and safety mechanisms that make agents reliable in production environments.
Core Focus:
- Architectural patterns for agent design (ReAct, Plan-Execute, Reflection, Multi-Agent)
- Memory layer design for context retention and personalization
- Safety controls and governance frameworks
- Reliability techniques and evaluation methodologies
Why It Matters
Agents compose tools and knowledge to accomplish goals across steps. The right patterns improve reliability, constrain cost, and make behavior auditable in enterprise contexts.
Key Benefits:
- Autonomy: Handle multi-step tasks without constant human intervention
- Adaptability: Adjust plans based on intermediate results and feedback
- Capability Extension: Leverage external tools and knowledge sources
- Context Awareness: Maintain memory across interactions for personalized experiences
- Cost Efficiency: Cache intermediate results and optimize tool usage
Real-World Impact:
- Customer service agents reduce resolution time by 45% through autonomous tool usage
- Research assistants accelerate literature reviews from days to hours
- Coding agents automate 60-70% of routine development tasks
- Operations agents handle incident triage and remediation 24/7
Core Agent Patterns
Pattern Selection Framework
graph TD A[Task Requirements] --> B{Task Complexity} B -->|Simple, Tool-Based| C[Tool-Calling Pattern] B -->|Interactive, Iterative| D[ReAct Pattern] B -->|Complex, Multi-Step| E{Need Planning?} E -->|Yes, Clear Steps| F[Plan-and-Execute] E -->|Yes, Quality Focus| G[Reflection Pattern] B -->|Multi-Domain Expertise| H[Multi-Agent Pattern] C --> I[Fast, Deterministic<br/>Low Cost] D --> J[Observable Reasoning<br/>Good Tool Integration] F --> K[Structured Execution<br/>Parallelizable] G --> L[Self-Improving<br/>High Quality] H --> M[Specialized Agents<br/>Scalable] style C fill:#e1f5ff style D fill:#fff4e1 style F fill:#e8f5e9 style G fill:#ffe1e1 style H fill:#f5e1ff
Pattern Comparison Matrix
| Pattern | Use Case | Strengths | Limitations | Cost Profile | Best For |
|---|---|---|---|---|---|
| Tool-Calling | Direct function execution | Fast, deterministic, low-cost | Limited reasoning, rigid | $ | Structured data tasks, calculations |
| ReAct | Interactive tasks requiring tool calls | Simple, interpretable, good tool integration | Can loop indefinitely, verbose | $$ | Web search, API interactions |
| Plan-and-Execute | Complex multi-step workflows | Clear structure, parallelizable | Brittle to plan changes, overhead | $$$ | Document processing, research |
| Reflection | Tasks requiring quality iteration | Self-improving, catches errors | Higher latency and cost | $$$$ | Content generation, code review |
| Multi-Agent | Tasks needing diverse expertise | Specialized agents, scalable | Coordination overhead, complexity | $$$$$ | Enterprise workflows, creative tasks |
1. ReAct Pattern (Reasoning + Acting)
Architecture Overview:
graph TD A[User Query] --> B[LLM: Reasoning Step] B --> C{Action Needed?} C -->|Yes| D[Select Tool from Registry] D --> E[Execute Tool with Params] E --> F[Observe Result] F --> G[Update Context] G --> B C -->|No| H[Generate Final Answer] H --> I[Return to User] J[Safety Layer] --> D J --> E K[Audit Log] --> E L[Budget Manager] --> B style B fill:#e1f5ff style D fill:#fff4e1 style J fill:#ffe1e1 style H fill:#e8f5e9
Implementation Example (Minimal):
from typing import Dict, Any, List
import json
class ReActAgent:
"""ReAct agent with reasoning and action loop."""
def __init__(self, llm, tools: Dict[str, callable], max_iterations: int = 10):
self.llm = llm
self.tools = tools
self.max_iterations = max_iterations
self.trace = []
def run(self, task: str) -> str:
"""Execute task using ReAct loop."""
context = f"Task: {task}\n\nAvailable tools: {list(self.tools.keys())}\n\n"
for i in range(self.max_iterations):
# Reasoning step
prompt = f"""{context}
What should I do next? Respond in JSON:
{{"thought": "your reasoning", "action": "tool_name or FINISH", "action_input": "input"}}"""
response = self.llm.generate(prompt)
step = json.loads(response)
self.trace.append(step)
if step["action"] == "FINISH":
return step["action_input"]
# Execute tool
if step["action"] in self.tools:
try:
observation = self.tools[step["action"]](step["action_input"])
context += f"\nThought: {step['thought']}\n"
context += f"Action: {step['action']}({step['action_input']})\n"
context += f"Observation: {observation}\n"
except Exception as e:
context += f"\nError: {str(e)}\n"
return "Max iterations reached"
When to Use ReAct:
- Tasks requiring multiple tool calls with intermediate reasoning
- Need for observable decision-making process
- Workflows where context builds iteratively
- Debugging and explainability are important
2. Plan-and-Execute Pattern
Architecture Overview:
graph TD A[User Goal] --> B[Planner LLM:<br/>Generate Step Plan] B --> C[Plan Validator] C --> D[Execution Plan] D --> E[Step 1 Executor] D --> F[Step 2 Executor] D --> G[Step N Executor] E --> H[Results Aggregator] F --> H G --> H H --> I{Plan Complete?} I -->|No, Failed Step| J[Replanner:<br/>Adjust Plan] J --> D I -->|Yes| K[Final Output Synthesizer] L[Dependency Resolver] --> E L --> F L --> G style B fill:#e1f5ff style E fill:#fff4e1 style F fill:#fff4e1 style G fill:#fff4e1 style K fill:#e8f5e9
Key Advantages:
- Parallelization: Execute independent steps concurrently
- Error Recovery: Failed steps trigger replanning without full restart
- Transparency: Clear execution plan for stakeholder review
- Checkpointing: Save progress for long-running workflows
When to Use:
- Tasks with clear sequential or parallel steps
- Long-running workflows requiring checkpoints
- Scenarios where planning overhead is justified by execution efficiency
- Multi-tool coordination with complex dependencies
3. Reflection Pattern
Architecture Overview:
graph LR A[Initial Task] --> B[Generator LLM:<br/>Create Draft] B --> C[Draft Output v1] C --> D[Critic LLM:<br/>Evaluate Quality] D --> E{Meets Criteria?} E -->|No| F[Identify Issues &<br/>Provide Feedback] F --> G[Refiner LLM:<br/>Improve Draft] G --> C E -->|Yes, Score ≥ 0.9| H[Final Output] E -->|Iteration Limit| I[Return Best Attempt] J[Quality Rubric] --> D K[Iteration Counter] --> E style B fill:#e1f5ff style D fill:#ffe1e1 style G fill:#fff4e1 style H fill:#e8f5e9
Quality Criteria Framework:
| Criterion | Measurement | Weight | Threshold |
|---|---|---|---|
| Clarity | Readability score, jargon ratio | 20% | > 0.8 |
| Completeness | Required elements present | 30% | 100% |
| Accuracy | Fact verification, citation quality | 30% | > 0.95 |
| Conciseness | Word count vs. target | 10% | ±20% |
| Style Compliance | Style guide adherence | 10% | > 0.9 |
Cost-Benefit Analysis:
| Iterations | Cost Multiplier | Quality Improvement | Recommended For |
|---|---|---|---|
| 1 (no reflection) | 1x | Baseline | Low-stakes outputs |
| 2 | 2.5x | +15-25% | Standard content |
| 3 | 4x | +25-35% | High-quality content |
| 4+ | 6x+ | +30-40% (diminishing) | Critical documents only |
4. Multi-Agent Teams
Architecture Overview:
graph TB A[User Request] --> B[Coordinator Agent:<br/>Task Decomposition] B --> C{Route to Specialists} C -->|Research Query| D[Research Agent<br/>Tools: Search, PDF] C -->|Coding Task| E[Coding Agent<br/>Tools: File I/O, Test] C -->|Writing Task| F[Writing Agent<br/>Tools: Grammar, Style] C -->|Analysis Task| G[Analysis Agent<br/>Tools: SQL, Viz] D --> H[Shared Memory Bus] E --> H F --> H G --> H H --> I[Result Synthesizer Agent] I --> J[Quality Validator] J --> K{Quality OK?} K -->|No| B K -->|Yes| L[Final Output] style B fill:#e1f5ff style D fill:#fff4e1 style E fill:#fff4e1 style F fill:#fff4e1 style G fill:#fff4e1 style I fill:#e8f5e9
Coordination Patterns:
| Pattern | Description | Communication | Use Case | Complexity |
|---|---|---|---|---|
| Sequential | Agents execute in fixed order | Pipeline | Document processing | Low |
| Hierarchical | Coordinator delegates to specialists | Hub-and-spoke | Complex research | Medium |
| Peer-to-Peer | Agents collaborate directly | Mesh | Brainstorming, debate | High |
| Market-Based | Agents bid for tasks | Auction | Dynamic resource allocation | Very High |
Memory Architecture
Memory Layer Taxonomy
graph TB A[Agent Core] --> B[Working Memory<br/>Short-term, Session-scoped] A --> C[Episodic Memory<br/>Past Interactions, Events] A --> D[Semantic Memory<br/>Facts, Knowledge Graph] A --> E[Procedural Memory<br/>Learned Patterns, Skills] B --> F[Implementation:<br/>Cache/Redis<br/>TTL: 1 hour] C --> G[Implementation:<br/>Vector DB<br/>Embeddings + Metadata] D --> G E --> H[Implementation:<br/>Relational DB<br/>Structured Patterns] G --> I[Retrieval:<br/>Hybrid Score<br/>0.7×Relevance + 0.3×Recency] H --> J[Retrieval:<br/>Success Rate<br/>Usage Count] style B fill:#e1f5ff style C fill:#fff4e1 style D fill:#e8f5e9 style E fill:#ffe1e1
Memory Retrieval Strategies
| Strategy | Description | Use Case | Implementation | Score Formula |
|---|---|---|---|---|
| Recency | Most recent memories first | Conversational context | Sort by timestamp DESC | age_score = 1 - (hours_old / 720) |
| Relevance | Semantic similarity to query | Knowledge retrieval | Vector similarity search | cosine_similarity(query, memory) |
| Hybrid | Weighted recency + relevance | General agent memory | Combined scoring | 0.7×relevance + 0.3×recency |
| Entity-based | Memories about specific entities | User/product context | Filter by entity tags | Match on entity ID |
| Success-based | Prioritize successful outcomes | Procedural learning | Filter by confidence/success | success_rate > 0.8 |
Memory Compaction Strategy
Problem: Long-running agents accumulate thousands of memories, causing:
- Slow retrieval (O(n) searches)
- Context window overflow
- High storage costs
- Privacy concerns (data retention)
Solution: Tiered compaction with summarization
graph LR A[Raw Memories<br/>Age < 7 days] --> B{Compaction Trigger} B -->|Age 7-30 days| C[Cluster Similar Memories] C --> D[LLM Summarization] D --> E[Summary Memory<br/>Replaces 10-100 Originals] B -->|Age > 30 days| F{Retention Policy} F -->|High Value| G[Archive to Cold Storage] F -->|Low Value| H[Delete] I[Access Frequency] --> F J[User Consent] --> F style A fill:#e1f5ff style E fill:#e8f5e9 style H fill:#ffe1e1
Compaction ROI:
- Storage: 80-90% reduction after 30 days
- Retrieval Speed: 5-10x faster searches
- Quality: Minimal information loss (<5%) with good summarization
- Privacy: Reduces PII exposure window
Safety & Governance
Multi-Layer Safety Architecture
graph TB A[Agent Request] --> B[Layer 1:<br/>Input Validation] B --> C{Safe Input?} C -->|No| D[Reject/Sanitize] C -->|Yes| E[Layer 2:<br/>Tool Authorization] E --> F{Tool Allowed<br/>for User/Role?} F -->|No| G[Deny Access] F -->|Yes| H[Layer 3:<br/>Parameter Validation] H --> I{Params Valid<br/>& Within Bounds?} I -->|No| J[Reject Call] I -->|Yes| K[Execute Tool] K --> L[Layer 4:<br/>Output Validation] L --> M{Safe Output?} M -->|No| N[Filter/Redact] M -->|Yes| O[Layer 5:<br/>Audit Log] O --> P[Return Result] style B fill:#ffe1e1 style E fill:#ffe1e1 style H fill:#ffe1e1 style L fill:#ffe1e1 style O fill:#e8f5e9
Tool Risk Classification Framework
| Risk Level | Examples | Authorization | Rate Limit | Approval Required | Output Filtering |
|---|---|---|---|---|---|
| READ_ONLY | Search, fetch data | All users | 100/hour | No | PII redaction |
| WRITE | Create file, send email | Verified users | 50/hour | No | Content safety |
| SENSITIVE | Database query, API call | Role-based | 20/hour | Yes (manager) | Full redaction |
| DESTRUCTIVE | Delete data, charge payment | Admin only | 5/hour | Yes (multi-party) | Complete audit trail |
Safety Controls Checklist
Input Layer:
- Prompt injection detection (regex + ML)
- Maximum input length enforcement
- PII detection and redaction
- Malicious content filtering
Tool Layer:
- Tool allowlist per user/role
- Parameter schema validation (JSON Schema)
- Rate limiting with sliding window
- Tool execution timeout (30s default)
- Sandbox isolation for code execution
Output Layer:
- PII redaction (Presidio or similar)
- Toxic content filtering
- Factuality verification (for critical domains)
- Citation requirement for claims
Audit Layer:
- Every tool call logged (timestamp, user, params, result)
- Trace ID for request correlation
- Retention policy (90 days default, 7 years for financial)
- Tamper-evident logging (append-only, cryptographic hashing)
Reliability Techniques
Reliability Patterns
graph TD A[Tool Execution Request] --> B[Retry Controller] B --> C{Attempt Count} C -->|< Max Retries| D[Execute Tool] D --> E{Success?} E -->|Yes| F[Return Result] E -->|No, Retriable Error| G[Exponential Backoff] G --> H[Wait: 2^attempt seconds] H --> B E -->|No, Non-Retriable| I[Compensation Logic] C -->|≥ Max Retries| I I --> J[Rollback/Notify] K[Circuit Breaker] --> D L[Validation] --> F style D fill:#e1f5ff style F fill:#e8f5e9 style I fill:#ffe1e1
Retry Strategy Matrix:
| Error Type | Retriable? | Max Retries | Backoff | Example |
|---|---|---|---|---|
| Network timeout | Yes | 3 | Exponential (2^n) | API request failed |
| Rate limit (429) | Yes | 5 | Fixed (60s) | "Retry-After" header |
| Authentication (401) | No | 0 | N/A | Invalid API key |
| Bad request (400) | No | 0 | N/A | Invalid parameters |
| Server error (500) | Yes | 3 | Exponential | Temporary server issue |
| Not found (404) | No | 0 | N/A | Resource doesn't exist |
Circuit Breaker Pattern
States:
| State | Condition | Behavior | Transition |
|---|---|---|---|
| CLOSED | Normal operation | Execute all requests | → OPEN after 5 consecutive failures |
| OPEN | Service failing | Reject all requests immediately | → HALF_OPEN after 60s timeout |
| HALF_OPEN | Testing recovery | Allow 1 probe request | → CLOSED on success, → OPEN on failure |
Benefits:
- Prevents cascade failures
- Reduces load on failing services
- Fast failure instead of hanging
- Automatic recovery testing
Evaluation Framework
Comprehensive Metrics
graph LR A[Agent Performance] --> B[Success Metrics] A --> C[Cost Metrics] A --> D[Quality Metrics] A --> E[Safety Metrics] B --> F[Task Success Rate<br/>Target: > 95%] B --> G[First-Attempt Success<br/>Target: > 85%] C --> H[Avg Cost per Task<br/>Track: $0.10-$2.00] C --> I[Tool Call Efficiency<br/>Minimize: Redundant calls] D --> J[Output Quality Score<br/>LLM-graded: 0-1] D --> K[User Satisfaction<br/>CSAT/NPS] E --> L[Safety Violation Rate<br/>Target: 0%] E --> M[Audit Completeness<br/>Target: 100%] style F fill:#e8f5e9 style H fill:#fff4e1 style J fill:#e1f5ff style L fill:#ffe1e1
Evaluation Metrics Detailed
| Metric Category | Metric | Target | Measurement Method | Business Impact |
|---|---|---|---|---|
| Success | Task Completion Rate | > 95% | (Successful / Total) × 100 | Revenue, user satisfaction |
| Success | First-Attempt Success | > 85% | Successful without retry / Total | User experience, cost |
| Cost | Average Cost per Task | 2.00 | (Total LLM cost + Tool cost) / Tasks | Profitability |
| Cost | Tool Call Efficiency | Minimize | Redundant calls / Total calls | Infrastructure cost |
| Quality | Output Quality Score | > 0.8 | LLM-graded rubric (0-1) | User satisfaction |
| Quality | Hallucination Rate | < 2% | Fact-checked claims / Total claims | Trust, liability |
| Speed | Latency (p95) | < 10s | 95th percentile response time | User experience |
| Speed | Throughput | > 100/min | Concurrent tasks handled | Scalability |
| Safety | Safety Violation Rate | 0% | Violations detected / Total | Legal, reputation |
| Safety | Audit Trail Completeness | 100% | Logged actions / Total actions | Compliance |
Case Study: Research Assistant Agent
Problem Statement
A research team conducted literature reviews manually, taking 3-5 days per topic. They needed:
- Automated search across 10+ academic databases
- PDF extraction and key claim identification
- Citation management and deduplication
- Draft summary generation with proper attribution
Solution Architecture
Pattern Choice: Plan-and-Execute (for structured, parallelizable steps)
Implementation:
class ResearchAssistantAgent:
"""Plan-and-execute agent for literature reviews."""
def __init__(self):
self.planner = PlanAndExecuteAgent(...)
self.memory = MemoryManager(...)
self.tools = {
"search_papers": self._search_papers,
"fetch_pdf": self._fetch_pdf,
"extract_claims": self._extract_claims,
"deduplicate": self._deduplicate,
"generate_summary": self._generate_summary
}
def conduct_literature_review(self, topic: str, databases: List[str],
date_range: Tuple[str, str], max_papers: int = 50):
"""Execute complete literature review workflow."""
# Plan generation
plan = self.planner.create_plan(f"""
Conduct literature review on: {topic}
Search databases: {databases}
Date range: {date_range}
Maximum papers: {max_papers}
Steps: Search → Dedupe → Fetch PDFs → Extract Claims → Summarize
""")
# Execute with caching
results = self.planner.execute_plan(plan, cache_enabled=True)
return results
Results
Before Agent:
- Manual process: 3-5 days per review
- Coverage: 10-15 papers per review
- Citation quality: Inconsistent
- Cost: 800 in researcher time
After Agent (6 months):
- Automated reviews: 2-4 hours
- Coverage: 40-50 papers per review
- Success rate: 86% (validated by researchers)
- Cost: 30 per review (LLM + APIs)
- Citation quality: Consistent, proper formatting
- ROI: 93% cost reduction
Key Optimizations:
| Optimization | Technique | Impact |
|---|---|---|
| Caching | Store search results & PDF extractions | -45% redundant API calls |
| Reranking | Lightweight model filters papers before expensive extraction | -18% costs |
| Memory Summarization | Compress old reviews into summaries | -60% context size |
| Schema Validation | Strict tool contracts | -92% error retry loops |
Implementation Checklist
Phase 1: Pattern Selection (Week 1)
- Analyze use case requirements and complexity
- Choose primary pattern (ReAct, Plan-Execute, Reflection, Multi-Agent)
- Identify required tools and their risk levels
- Define success criteria and evaluation metrics
- Document expected task flows and edge cases
Phase 2: Core Implementation (Weeks 2-3)
- Implement chosen agent pattern with basic tools
- Define JSON schemas for all tool parameters
- Build working memory system with session management
- Add basic error handling and retries
- Create initial test suite with 5-10 example tasks
Phase 3: Memory & Context (Week 4)
- Implement episodic memory with vector storage
- Add semantic memory for facts and entities
- Build memory retrieval with recency + relevance scoring
- Implement privacy controls (PII detection, redaction)
- Add memory compaction and summarization
Phase 4: Safety & Governance (Week 5)
- Implement tool authorization framework
- Add parameter validation and constraints
- Build output filtering and content safety
- Create audit logging for all tool executions
- Define and enforce rate limits
- Set up approval workflows for sensitive tools
Phase 5: Reliability (Week 6)
- Add retry logic with exponential backoff
- Implement critique-and-improve loops
- Build tool output validation
- Add circuit breakers for failing tools
- Implement graceful degradation strategies
- Create monitoring dashboards
Phase 6: Evaluation & Optimization (Weeks 7-8)
- Build comprehensive evaluation framework
- Create task suite with gold standards
- Measure success rate, cost, and latency
- Conduct human evaluation for quality
- Optimize prompts based on failure analysis
- Implement caching for expensive operations
- Profile and optimize slow paths
Phase 7: Production Readiness (Week 9)
- Set up production monitoring and alerting
- Implement A/B testing framework
- Create runbooks for common issues
- Build human-in-the-loop review queue
- Document all tools and patterns
- Train operations team
- Conduct security review and penetration testing
Common Pitfalls & Solutions
| Pitfall | Impact | Solution | Prevention |
|---|---|---|---|
| Infinite loops | Agent cycles without progress, wastes cost | Set max iteration limits (5-10), detect repetition | Track unique states, halt on cycles |
| Tool hallucination | Agent invents non-existent tools | Strict tool allowlists, validate before execution | Provide tool list in every prompt |
| Context overflow | Exceeds token limits, fails | Memory summarization, sliding window | Monitor context size, compact proactively |
| Brittle plans | Agent can't adapt to failures | Use reflection pattern, add replanning capability | Test with failure injection |
| Excessive tool calls | High cost, slow execution | Cache results, batch operations, use lightweight models | Profile tool usage, set budgets |
| Poor error handling | Crashes on tool failures | Structured error taxonomy, fallback strategies | Classify errors as retriable/fatal |
| Missing audit trail | Can't debug or explain decisions | Log all steps with timestamps, inputs, outputs | Make logging mandatory |
| PII leakage | Privacy violations, compliance risk | PII detection, redaction before memory storage | Scan all inputs/outputs |
| Prompt injection | Security vulnerabilities | Input validation, sandboxing, content filtering | Treat all inputs as untrusted |
| Cost runaway | Budget overruns | Hard budget caps, cost monitoring, early stopping | Alert on anomalous spend |
Best Practices Summary
Pattern Selection:
- Start simple (ReAct or tool-calling), add complexity only when needed
- Prefer deterministic execution over creative reasoning when possible
- Use multi-agent only when task truly requires diverse expertise
- Prototype with cheap models, optimize later
Memory Design: 5. Hybrid retrieval (0.7×relevance + 0.3×recency) works well for most cases 6. Compact memories older than 30 days to reduce costs 7. Separate working memory (cache) from long-term (vector DB) 8. Always filter PII before storage
Safety: 9. Layer security (input → authorization → execution → output) 10. Log everything (audit trail is non-negotiable) 11. Use circuit breakers to prevent cascade failures 12. Test with adversarial inputs (prompt injection, jailbreaks)
Reliability: 13. Retry only on transient errors (network, rate limits) 14. Use exponential backoff to avoid overwhelming services 15. Validate tool outputs before feeding to next step 16. Implement timeouts for all external calls
Cost Optimization: 17. Cache aggressively (search results, API calls, LLM outputs) 18. Use smaller models for planning/routing 19. Batch operations when possible 20. Monitor per-task cost, set budgets
Further Reading
Frameworks:
- LangChain (Python, TypeScript) - Most popular, broad ecosystem
- LangGraph - Stateful multi-agent workflows
- AutoGPT - Autonomous task execution
- CrewAI - Role-based multi-agent teams
Academic Papers:
- ReAct: Yao et al., 2022 - "ReAct: Synergizing Reasoning and Acting in Language Models"
- Reflexion: Shinn et al., 2023 - "Reflexion: Language Agents with Verbal Reinforcement Learning"
- Tree of Thoughts: Yao et al., 2023 - Deliberate problem-solving with LMs
Memory Systems:
- MemGPT - Virtual context management
- Mem0 - Persistent memory for agents
- Conversation memory patterns in LangChain
Safety:
- Constitutional AI (Anthropic)
- Prompt injection defenses (Simon Willison's research)
- Tool sandboxing best practices
Next Chapter Preview: Chapter 39 dives deep into task-oriented agents for specific domains (coding, research, operations), with practical tooling and evaluation methodologies.