Part 7: Agentic Systems & Orchestration

Chapter 38: Agent Patterns & Memory

Hire Us
7Part 7: Agentic Systems & Orchestration

38. Agent Patterns & Memory

Chapter 38 — Agent Patterns & Memory

Overview

Agentic systems represent a paradigm shift from single-inference models to autonomous systems that plan, use tools, maintain memory, and achieve complex goals across multiple steps. This chapter explores proven architectural patterns, memory strategies, and safety mechanisms that make agents reliable in production environments.

Core Focus:

  • Architectural patterns for agent design (ReAct, Plan-Execute, Reflection, Multi-Agent)
  • Memory layer design for context retention and personalization
  • Safety controls and governance frameworks
  • Reliability techniques and evaluation methodologies

Why It Matters

Agents compose tools and knowledge to accomplish goals across steps. The right patterns improve reliability, constrain cost, and make behavior auditable in enterprise contexts.

Key Benefits:

  • Autonomy: Handle multi-step tasks without constant human intervention
  • Adaptability: Adjust plans based on intermediate results and feedback
  • Capability Extension: Leverage external tools and knowledge sources
  • Context Awareness: Maintain memory across interactions for personalized experiences
  • Cost Efficiency: Cache intermediate results and optimize tool usage

Real-World Impact:

  • Customer service agents reduce resolution time by 45% through autonomous tool usage
  • Research assistants accelerate literature reviews from days to hours
  • Coding agents automate 60-70% of routine development tasks
  • Operations agents handle incident triage and remediation 24/7

Core Agent Patterns

Pattern Selection Framework

graph TD A[Task Requirements] --> B{Task Complexity} B -->|Simple, Tool-Based| C[Tool-Calling Pattern] B -->|Interactive, Iterative| D[ReAct Pattern] B -->|Complex, Multi-Step| E{Need Planning?} E -->|Yes, Clear Steps| F[Plan-and-Execute] E -->|Yes, Quality Focus| G[Reflection Pattern] B -->|Multi-Domain Expertise| H[Multi-Agent Pattern] C --> I[Fast, Deterministic<br/>Low Cost] D --> J[Observable Reasoning<br/>Good Tool Integration] F --> K[Structured Execution<br/>Parallelizable] G --> L[Self-Improving<br/>High Quality] H --> M[Specialized Agents<br/>Scalable] style C fill:#e1f5ff style D fill:#fff4e1 style F fill:#e8f5e9 style G fill:#ffe1e1 style H fill:#f5e1ff

Pattern Comparison Matrix

PatternUse CaseStrengthsLimitationsCost ProfileBest For
Tool-CallingDirect function executionFast, deterministic, low-costLimited reasoning, rigid$Structured data tasks, calculations
ReActInteractive tasks requiring tool callsSimple, interpretable, good tool integrationCan loop indefinitely, verbose$$Web search, API interactions
Plan-and-ExecuteComplex multi-step workflowsClear structure, parallelizableBrittle to plan changes, overhead$$$Document processing, research
ReflectionTasks requiring quality iterationSelf-improving, catches errorsHigher latency and cost$$$$Content generation, code review
Multi-AgentTasks needing diverse expertiseSpecialized agents, scalableCoordination overhead, complexity$$$$$Enterprise workflows, creative tasks

1. ReAct Pattern (Reasoning + Acting)

Architecture Overview:

graph TD A[User Query] --> B[LLM: Reasoning Step] B --> C{Action Needed?} C -->|Yes| D[Select Tool from Registry] D --> E[Execute Tool with Params] E --> F[Observe Result] F --> G[Update Context] G --> B C -->|No| H[Generate Final Answer] H --> I[Return to User] J[Safety Layer] --> D J --> E K[Audit Log] --> E L[Budget Manager] --> B style B fill:#e1f5ff style D fill:#fff4e1 style J fill:#ffe1e1 style H fill:#e8f5e9

Implementation Example (Minimal):

from typing import Dict, Any, List
import json

class ReActAgent:
    """ReAct agent with reasoning and action loop."""

    def __init__(self, llm, tools: Dict[str, callable], max_iterations: int = 10):
        self.llm = llm
        self.tools = tools
        self.max_iterations = max_iterations
        self.trace = []

    def run(self, task: str) -> str:
        """Execute task using ReAct loop."""
        context = f"Task: {task}\n\nAvailable tools: {list(self.tools.keys())}\n\n"

        for i in range(self.max_iterations):
            # Reasoning step
            prompt = f"""{context}
What should I do next? Respond in JSON:
{{"thought": "your reasoning", "action": "tool_name or FINISH", "action_input": "input"}}"""

            response = self.llm.generate(prompt)
            step = json.loads(response)
            self.trace.append(step)

            if step["action"] == "FINISH":
                return step["action_input"]

            # Execute tool
            if step["action"] in self.tools:
                try:
                    observation = self.tools[step["action"]](step["action_input"])
                    context += f"\nThought: {step['thought']}\n"
                    context += f"Action: {step['action']}({step['action_input']})\n"
                    context += f"Observation: {observation}\n"
                except Exception as e:
                    context += f"\nError: {str(e)}\n"

        return "Max iterations reached"

When to Use ReAct:

  • Tasks requiring multiple tool calls with intermediate reasoning
  • Need for observable decision-making process
  • Workflows where context builds iteratively
  • Debugging and explainability are important

2. Plan-and-Execute Pattern

Architecture Overview:

graph TD A[User Goal] --> B[Planner LLM:<br/>Generate Step Plan] B --> C[Plan Validator] C --> D[Execution Plan] D --> E[Step 1 Executor] D --> F[Step 2 Executor] D --> G[Step N Executor] E --> H[Results Aggregator] F --> H G --> H H --> I{Plan Complete?} I -->|No, Failed Step| J[Replanner:<br/>Adjust Plan] J --> D I -->|Yes| K[Final Output Synthesizer] L[Dependency Resolver] --> E L --> F L --> G style B fill:#e1f5ff style E fill:#fff4e1 style F fill:#fff4e1 style G fill:#fff4e1 style K fill:#e8f5e9

Key Advantages:

  • Parallelization: Execute independent steps concurrently
  • Error Recovery: Failed steps trigger replanning without full restart
  • Transparency: Clear execution plan for stakeholder review
  • Checkpointing: Save progress for long-running workflows

When to Use:

  • Tasks with clear sequential or parallel steps
  • Long-running workflows requiring checkpoints
  • Scenarios where planning overhead is justified by execution efficiency
  • Multi-tool coordination with complex dependencies

3. Reflection Pattern

Architecture Overview:

graph LR A[Initial Task] --> B[Generator LLM:<br/>Create Draft] B --> C[Draft Output v1] C --> D[Critic LLM:<br/>Evaluate Quality] D --> E{Meets Criteria?} E -->|No| F[Identify Issues &<br/>Provide Feedback] F --> G[Refiner LLM:<br/>Improve Draft] G --> C E -->|Yes, Score ≥ 0.9| H[Final Output] E -->|Iteration Limit| I[Return Best Attempt] J[Quality Rubric] --> D K[Iteration Counter] --> E style B fill:#e1f5ff style D fill:#ffe1e1 style G fill:#fff4e1 style H fill:#e8f5e9

Quality Criteria Framework:

CriterionMeasurementWeightThreshold
ClarityReadability score, jargon ratio20%> 0.8
CompletenessRequired elements present30%100%
AccuracyFact verification, citation quality30%> 0.95
ConcisenessWord count vs. target10%±20%
Style ComplianceStyle guide adherence10%> 0.9

Cost-Benefit Analysis:

IterationsCost MultiplierQuality ImprovementRecommended For
1 (no reflection)1xBaselineLow-stakes outputs
22.5x+15-25%Standard content
34x+25-35%High-quality content
4+6x++30-40% (diminishing)Critical documents only

4. Multi-Agent Teams

Architecture Overview:

graph TB A[User Request] --> B[Coordinator Agent:<br/>Task Decomposition] B --> C{Route to Specialists} C -->|Research Query| D[Research Agent<br/>Tools: Search, PDF] C -->|Coding Task| E[Coding Agent<br/>Tools: File I/O, Test] C -->|Writing Task| F[Writing Agent<br/>Tools: Grammar, Style] C -->|Analysis Task| G[Analysis Agent<br/>Tools: SQL, Viz] D --> H[Shared Memory Bus] E --> H F --> H G --> H H --> I[Result Synthesizer Agent] I --> J[Quality Validator] J --> K{Quality OK?} K -->|No| B K -->|Yes| L[Final Output] style B fill:#e1f5ff style D fill:#fff4e1 style E fill:#fff4e1 style F fill:#fff4e1 style G fill:#fff4e1 style I fill:#e8f5e9

Coordination Patterns:

PatternDescriptionCommunicationUse CaseComplexity
SequentialAgents execute in fixed orderPipelineDocument processingLow
HierarchicalCoordinator delegates to specialistsHub-and-spokeComplex researchMedium
Peer-to-PeerAgents collaborate directlyMeshBrainstorming, debateHigh
Market-BasedAgents bid for tasksAuctionDynamic resource allocationVery High

Memory Architecture

Memory Layer Taxonomy

graph TB A[Agent Core] --> B[Working Memory<br/>Short-term, Session-scoped] A --> C[Episodic Memory<br/>Past Interactions, Events] A --> D[Semantic Memory<br/>Facts, Knowledge Graph] A --> E[Procedural Memory<br/>Learned Patterns, Skills] B --> F[Implementation:<br/>Cache/Redis<br/>TTL: 1 hour] C --> G[Implementation:<br/>Vector DB<br/>Embeddings + Metadata] D --> G E --> H[Implementation:<br/>Relational DB<br/>Structured Patterns] G --> I[Retrieval:<br/>Hybrid Score<br/>0.7×Relevance + 0.3×Recency] H --> J[Retrieval:<br/>Success Rate<br/>Usage Count] style B fill:#e1f5ff style C fill:#fff4e1 style D fill:#e8f5e9 style E fill:#ffe1e1

Memory Retrieval Strategies

StrategyDescriptionUse CaseImplementationScore Formula
RecencyMost recent memories firstConversational contextSort by timestamp DESCage_score = 1 - (hours_old / 720)
RelevanceSemantic similarity to queryKnowledge retrievalVector similarity searchcosine_similarity(query, memory)
HybridWeighted recency + relevanceGeneral agent memoryCombined scoring0.7×relevance + 0.3×recency
Entity-basedMemories about specific entitiesUser/product contextFilter by entity tagsMatch on entity ID
Success-basedPrioritize successful outcomesProcedural learningFilter by confidence/successsuccess_rate > 0.8

Memory Compaction Strategy

Problem: Long-running agents accumulate thousands of memories, causing:

  • Slow retrieval (O(n) searches)
  • Context window overflow
  • High storage costs
  • Privacy concerns (data retention)

Solution: Tiered compaction with summarization

graph LR A[Raw Memories<br/>Age < 7 days] --> B{Compaction Trigger} B -->|Age 7-30 days| C[Cluster Similar Memories] C --> D[LLM Summarization] D --> E[Summary Memory<br/>Replaces 10-100 Originals] B -->|Age > 30 days| F{Retention Policy} F -->|High Value| G[Archive to Cold Storage] F -->|Low Value| H[Delete] I[Access Frequency] --> F J[User Consent] --> F style A fill:#e1f5ff style E fill:#e8f5e9 style H fill:#ffe1e1

Compaction ROI:

  • Storage: 80-90% reduction after 30 days
  • Retrieval Speed: 5-10x faster searches
  • Quality: Minimal information loss (<5%) with good summarization
  • Privacy: Reduces PII exposure window

Safety & Governance

Multi-Layer Safety Architecture

graph TB A[Agent Request] --> B[Layer 1:<br/>Input Validation] B --> C{Safe Input?} C -->|No| D[Reject/Sanitize] C -->|Yes| E[Layer 2:<br/>Tool Authorization] E --> F{Tool Allowed<br/>for User/Role?} F -->|No| G[Deny Access] F -->|Yes| H[Layer 3:<br/>Parameter Validation] H --> I{Params Valid<br/>& Within Bounds?} I -->|No| J[Reject Call] I -->|Yes| K[Execute Tool] K --> L[Layer 4:<br/>Output Validation] L --> M{Safe Output?} M -->|No| N[Filter/Redact] M -->|Yes| O[Layer 5:<br/>Audit Log] O --> P[Return Result] style B fill:#ffe1e1 style E fill:#ffe1e1 style H fill:#ffe1e1 style L fill:#ffe1e1 style O fill:#e8f5e9

Tool Risk Classification Framework

Risk LevelExamplesAuthorizationRate LimitApproval RequiredOutput Filtering
READ_ONLYSearch, fetch dataAll users100/hourNoPII redaction
WRITECreate file, send emailVerified users50/hourNoContent safety
SENSITIVEDatabase query, API callRole-based20/hourYes (manager)Full redaction
DESTRUCTIVEDelete data, charge paymentAdmin only5/hourYes (multi-party)Complete audit trail

Safety Controls Checklist

Input Layer:

  • Prompt injection detection (regex + ML)
  • Maximum input length enforcement
  • PII detection and redaction
  • Malicious content filtering

Tool Layer:

  • Tool allowlist per user/role
  • Parameter schema validation (JSON Schema)
  • Rate limiting with sliding window
  • Tool execution timeout (30s default)
  • Sandbox isolation for code execution

Output Layer:

  • PII redaction (Presidio or similar)
  • Toxic content filtering
  • Factuality verification (for critical domains)
  • Citation requirement for claims

Audit Layer:

  • Every tool call logged (timestamp, user, params, result)
  • Trace ID for request correlation
  • Retention policy (90 days default, 7 years for financial)
  • Tamper-evident logging (append-only, cryptographic hashing)

Reliability Techniques

Reliability Patterns

graph TD A[Tool Execution Request] --> B[Retry Controller] B --> C{Attempt Count} C -->|< Max Retries| D[Execute Tool] D --> E{Success?} E -->|Yes| F[Return Result] E -->|No, Retriable Error| G[Exponential Backoff] G --> H[Wait: 2^attempt seconds] H --> B E -->|No, Non-Retriable| I[Compensation Logic] C -->|≥ Max Retries| I I --> J[Rollback/Notify] K[Circuit Breaker] --> D L[Validation] --> F style D fill:#e1f5ff style F fill:#e8f5e9 style I fill:#ffe1e1

Retry Strategy Matrix:

Error TypeRetriable?Max RetriesBackoffExample
Network timeoutYes3Exponential (2^n)API request failed
Rate limit (429)Yes5Fixed (60s)"Retry-After" header
Authentication (401)No0N/AInvalid API key
Bad request (400)No0N/AInvalid parameters
Server error (500)Yes3ExponentialTemporary server issue
Not found (404)No0N/AResource doesn't exist

Circuit Breaker Pattern

States:

StateConditionBehaviorTransition
CLOSEDNormal operationExecute all requests→ OPEN after 5 consecutive failures
OPENService failingReject all requests immediately→ HALF_OPEN after 60s timeout
HALF_OPENTesting recoveryAllow 1 probe request→ CLOSED on success, → OPEN on failure

Benefits:

  • Prevents cascade failures
  • Reduces load on failing services
  • Fast failure instead of hanging
  • Automatic recovery testing

Evaluation Framework

Comprehensive Metrics

graph LR A[Agent Performance] --> B[Success Metrics] A --> C[Cost Metrics] A --> D[Quality Metrics] A --> E[Safety Metrics] B --> F[Task Success Rate<br/>Target: > 95%] B --> G[First-Attempt Success<br/>Target: > 85%] C --> H[Avg Cost per Task<br/>Track: $0.10-$2.00] C --> I[Tool Call Efficiency<br/>Minimize: Redundant calls] D --> J[Output Quality Score<br/>LLM-graded: 0-1] D --> K[User Satisfaction<br/>CSAT/NPS] E --> L[Safety Violation Rate<br/>Target: 0%] E --> M[Audit Completeness<br/>Target: 100%] style F fill:#e8f5e9 style H fill:#fff4e1 style J fill:#e1f5ff style L fill:#ffe1e1

Evaluation Metrics Detailed

Metric CategoryMetricTargetMeasurement MethodBusiness Impact
SuccessTask Completion Rate> 95%(Successful / Total) × 100Revenue, user satisfaction
SuccessFirst-Attempt Success> 85%Successful without retry / TotalUser experience, cost
CostAverage Cost per Task0.100.10-2.00(Total LLM cost + Tool cost) / TasksProfitability
CostTool Call EfficiencyMinimizeRedundant calls / Total callsInfrastructure cost
QualityOutput Quality Score> 0.8LLM-graded rubric (0-1)User satisfaction
QualityHallucination Rate< 2%Fact-checked claims / Total claimsTrust, liability
SpeedLatency (p95)< 10s95th percentile response timeUser experience
SpeedThroughput> 100/minConcurrent tasks handledScalability
SafetySafety Violation Rate0%Violations detected / TotalLegal, reputation
SafetyAudit Trail Completeness100%Logged actions / Total actionsCompliance

Case Study: Research Assistant Agent

Problem Statement

A research team conducted literature reviews manually, taking 3-5 days per topic. They needed:

  • Automated search across 10+ academic databases
  • PDF extraction and key claim identification
  • Citation management and deduplication
  • Draft summary generation with proper attribution

Solution Architecture

Pattern Choice: Plan-and-Execute (for structured, parallelizable steps)

Implementation:

class ResearchAssistantAgent:
    """Plan-and-execute agent for literature reviews."""

    def __init__(self):
        self.planner = PlanAndExecuteAgent(...)
        self.memory = MemoryManager(...)
        self.tools = {
            "search_papers": self._search_papers,
            "fetch_pdf": self._fetch_pdf,
            "extract_claims": self._extract_claims,
            "deduplicate": self._deduplicate,
            "generate_summary": self._generate_summary
        }

    def conduct_literature_review(self, topic: str, databases: List[str],
                                   date_range: Tuple[str, str], max_papers: int = 50):
        """Execute complete literature review workflow."""
        # Plan generation
        plan = self.planner.create_plan(f"""
        Conduct literature review on: {topic}
        Search databases: {databases}
        Date range: {date_range}
        Maximum papers: {max_papers}

        Steps: Search → Dedupe → Fetch PDFs → Extract Claims → Summarize
        """)

        # Execute with caching
        results = self.planner.execute_plan(plan, cache_enabled=True)
        return results

Results

Before Agent:

  • Manual process: 3-5 days per review
  • Coverage: 10-15 papers per review
  • Citation quality: Inconsistent
  • Cost: 500500-800 in researcher time

After Agent (6 months):

  • Automated reviews: 2-4 hours
  • Coverage: 40-50 papers per review
  • Success rate: 86% (validated by researchers)
  • Cost: 1515-30 per review (LLM + APIs)
  • Citation quality: Consistent, proper formatting
  • ROI: 93% cost reduction

Key Optimizations:

OptimizationTechniqueImpact
CachingStore search results & PDF extractions-45% redundant API calls
RerankingLightweight model filters papers before expensive extraction-18% costs
Memory SummarizationCompress old reviews into summaries-60% context size
Schema ValidationStrict tool contracts-92% error retry loops

Implementation Checklist

Phase 1: Pattern Selection (Week 1)

  • Analyze use case requirements and complexity
  • Choose primary pattern (ReAct, Plan-Execute, Reflection, Multi-Agent)
  • Identify required tools and their risk levels
  • Define success criteria and evaluation metrics
  • Document expected task flows and edge cases

Phase 2: Core Implementation (Weeks 2-3)

  • Implement chosen agent pattern with basic tools
  • Define JSON schemas for all tool parameters
  • Build working memory system with session management
  • Add basic error handling and retries
  • Create initial test suite with 5-10 example tasks

Phase 3: Memory & Context (Week 4)

  • Implement episodic memory with vector storage
  • Add semantic memory for facts and entities
  • Build memory retrieval with recency + relevance scoring
  • Implement privacy controls (PII detection, redaction)
  • Add memory compaction and summarization

Phase 4: Safety & Governance (Week 5)

  • Implement tool authorization framework
  • Add parameter validation and constraints
  • Build output filtering and content safety
  • Create audit logging for all tool executions
  • Define and enforce rate limits
  • Set up approval workflows for sensitive tools

Phase 5: Reliability (Week 6)

  • Add retry logic with exponential backoff
  • Implement critique-and-improve loops
  • Build tool output validation
  • Add circuit breakers for failing tools
  • Implement graceful degradation strategies
  • Create monitoring dashboards

Phase 6: Evaluation & Optimization (Weeks 7-8)

  • Build comprehensive evaluation framework
  • Create task suite with gold standards
  • Measure success rate, cost, and latency
  • Conduct human evaluation for quality
  • Optimize prompts based on failure analysis
  • Implement caching for expensive operations
  • Profile and optimize slow paths

Phase 7: Production Readiness (Week 9)

  • Set up production monitoring and alerting
  • Implement A/B testing framework
  • Create runbooks for common issues
  • Build human-in-the-loop review queue
  • Document all tools and patterns
  • Train operations team
  • Conduct security review and penetration testing

Common Pitfalls & Solutions

PitfallImpactSolutionPrevention
Infinite loopsAgent cycles without progress, wastes costSet max iteration limits (5-10), detect repetitionTrack unique states, halt on cycles
Tool hallucinationAgent invents non-existent toolsStrict tool allowlists, validate before executionProvide tool list in every prompt
Context overflowExceeds token limits, failsMemory summarization, sliding windowMonitor context size, compact proactively
Brittle plansAgent can't adapt to failuresUse reflection pattern, add replanning capabilityTest with failure injection
Excessive tool callsHigh cost, slow executionCache results, batch operations, use lightweight modelsProfile tool usage, set budgets
Poor error handlingCrashes on tool failuresStructured error taxonomy, fallback strategiesClassify errors as retriable/fatal
Missing audit trailCan't debug or explain decisionsLog all steps with timestamps, inputs, outputsMake logging mandatory
PII leakagePrivacy violations, compliance riskPII detection, redaction before memory storageScan all inputs/outputs
Prompt injectionSecurity vulnerabilitiesInput validation, sandboxing, content filteringTreat all inputs as untrusted
Cost runawayBudget overrunsHard budget caps, cost monitoring, early stoppingAlert on anomalous spend

Best Practices Summary

Pattern Selection:

  1. Start simple (ReAct or tool-calling), add complexity only when needed
  2. Prefer deterministic execution over creative reasoning when possible
  3. Use multi-agent only when task truly requires diverse expertise
  4. Prototype with cheap models, optimize later

Memory Design: 5. Hybrid retrieval (0.7×relevance + 0.3×recency) works well for most cases 6. Compact memories older than 30 days to reduce costs 7. Separate working memory (cache) from long-term (vector DB) 8. Always filter PII before storage

Safety: 9. Layer security (input → authorization → execution → output) 10. Log everything (audit trail is non-negotiable) 11. Use circuit breakers to prevent cascade failures 12. Test with adversarial inputs (prompt injection, jailbreaks)

Reliability: 13. Retry only on transient errors (network, rate limits) 14. Use exponential backoff to avoid overwhelming services 15. Validate tool outputs before feeding to next step 16. Implement timeouts for all external calls

Cost Optimization: 17. Cache aggressively (search results, API calls, LLM outputs) 18. Use smaller models for planning/routing 19. Batch operations when possible 20. Monitor per-task cost, set budgets

Further Reading

Frameworks:

  • LangChain (Python, TypeScript) - Most popular, broad ecosystem
  • LangGraph - Stateful multi-agent workflows
  • AutoGPT - Autonomous task execution
  • CrewAI - Role-based multi-agent teams

Academic Papers:

  • ReAct: Yao et al., 2022 - "ReAct: Synergizing Reasoning and Acting in Language Models"
  • Reflexion: Shinn et al., 2023 - "Reflexion: Language Agents with Verbal Reinforcement Learning"
  • Tree of Thoughts: Yao et al., 2023 - Deliberate problem-solving with LMs

Memory Systems:

  • MemGPT - Virtual context management
  • Mem0 - Persistent memory for agents
  • Conversation memory patterns in LangChain

Safety:

  • Constitutional AI (Anthropic)
  • Prompt injection defenses (Simon Willison's research)
  • Tool sandboxing best practices

Next Chapter Preview: Chapter 39 dives deep into task-oriented agents for specific domains (coding, research, operations), with practical tooling and evaluation methodologies.