Context Windows and AI Memory: Why Long-Running Agents Need Better Recall for Business
AI agents forget. Context windows fill up, conversations reset, and institutional knowledge vanishes. Here's how UK businesses are solving the AI memory problem to build agents that actually learn and improve over time.
Context Windows and AI Memory: Why Long-Running Agents Need Better Recall for Business
Here's an uncomfortable truth about AI agents in 2026: they're brilliant in the moment and terrible over time.
An AI agent can analyse a complex contract, write a nuanced strategy document, or debug tricky code — all in a single conversation. But ask it tomorrow what it did yesterday? Blank. Ask it to remember the preferences your client expressed last month? Gone. Ask it to learn from the mistake it made on Tuesday? What mistake?
This is the AI memory problem, and it's the biggest gap between where AI agents are today and where businesses need them to be.
The Context Window: AI's Working Memory
Every AI model has a context window — the amount of text it can "hold in mind" at once. Think of it as working memory:
- GPT-4o: 128,000 tokens (~96,000 words)
- Claude 3.5: 200,000 tokens (~150,000 words)
- Gemini 1.5 Pro: 2,000,000 tokens (~1.5 million words)
These numbers sound huge — and for single-session tasks, they are. You can paste an entire codebase into Claude and ask it to refactor. You can give Gemini a full book and ask for analysis.
But context windows have three critical limitations for business use:
1. They Reset Between Sessions
Every new conversation starts from zero. The agent has no memory of previous interactions unless you explicitly provide them. This means:
- A support agent re-learns customer preferences every single time
- A sales agent forgets deal context between calls
- A reporting agent can't track trends across months
2. They Degrade With Length
Even within a single long session, performance drops as the context fills up. Information at the beginning gets "pushed out" by newer content. Studies show accuracy on early-context information can drop by 20-40% in very long conversations.
3. They're Expensive to Fill
Sending 200K tokens costs real money — roughly £2-3 per request with top-tier models. If your agent needs historical context, replaying entire conversation histories becomes prohibitively expensive at scale.
The Memory Architecture Stack
Smart businesses are solving this with a layered memory architecture. Think of it like human memory:
Layer 1: Working Memory (Context Window)
What it is: The current conversation context. Analogy: What you're thinking about right now. Use: Immediate task execution, real-time reasoning. Capacity: 128K-2M tokens depending on model.
Layer 2: Short-Term Memory (Session Summaries)
What it is: Compressed summaries of recent interactions, stored between sessions. Analogy: What you did this morning — you remember the gist, not every word. Use: Continuity between conversations. "Last time we discussed X, you asked me to Y." Implementation: After each session, the agent generates a structured summary. Next session, it loads the summary instead of the full transcript.
Layer 3: Long-Term Memory (Vector Databases)
What it is: Searchable embeddings of all past interactions, documents, and decisions. Analogy: Your ability to recall "I've seen something like this before" and retrieve the relevant memory. Use: Pattern recognition, preference learning, institutional knowledge. Implementation: Store interaction summaries, decisions, and outcomes as vector embeddings. When relevant context is needed, search for semantically similar past events.
Layer 4: Procedural Memory (Learned Workflows)
What it is: Documented processes the agent has learned from experience. Analogy: Skills you've practiced until they're automatic — like driving or typing. Use: The agent learns that "when Client X requests a report, they always want the executive summary first, financials second, and appendices as a separate PDF." Implementation: Explicit rules and workflows extracted from successful past interactions, stored as structured instructions.
Building Business Memory: A Practical Approach
Step 1: Define What's Worth Remembering
Not everything should be stored. Define memory categories:
Always remember:
- Client preferences and communication styles
- Decisions made and their rationale
- Errors and how they were resolved
- Workflow optimisations discovered
- Compliance requirements encountered
Summarise and store:
- Meeting notes and action items
- Research findings and sources
- Project status snapshots
- Performance metrics over time
Don't store:
- Routine small talk
- Temporary debugging context
- Draft iterations (store only finals)
- Sensitive data that should expire
Step 2: Choose Your Storage Stack
For most UK SMEs, the practical stack is:
- Summaries: Simple key-value store (Redis, or even a JSON file for small-scale)
- Long-term memory: Vector database (Pinecone, Weaviate, or pgvector if you're already on PostgreSQL)
- Procedures: Markdown files or a structured knowledge base that the agent reads at startup
- Audit trail: Append-only log in your existing database
Total infrastructure cost: £20-100/month for most small to mid-sized deployments.
Step 3: Implement Memory Retrieval
The agent needs to know when and what to remember. This is where retrieval-augmented generation (RAG) comes in:
- At session start: Load the most recent summary for this client/project/workflow
- On demand: When the agent encounters something it should know about, search long-term memory for relevant context
- Proactively: Before generating a response, check if there are stored preferences or past decisions that should inform the output
Step 4: Memory Maintenance
Like any knowledge system, AI memory needs maintenance:
- Compaction: Periodically summarise and compress older memories
- Correction: When the agent makes a mistake, update the relevant memory with the correction
- Expiry: Set TTLs on time-sensitive information (project deadlines, temporary preferences)
- Review: Periodically audit what the agent "knows" about key clients and processes
Real-World Impact: Before and After
Customer Support Agent
Before memory architecture:
- Every ticket starts fresh
- Customer repeats their setup, preferences, and history
- Agent suggests solutions already tried and rejected
- Average resolution: 4 interactions
After memory architecture:
- Agent recalls previous tickets, solutions tried, and customer preferences
- "I can see we resolved a similar issue in October by adjusting your API rate limits. Shall I check if that applies here?"
- Average resolution: 1.7 interactions
- Customer satisfaction: +34%
Business Development Agent
Before:
- Generic outreach templates
- No memory of previous conversations with prospects
- Follows up on deals already closed or lost
After:
- Remembers conversation history, objections raised, and interests expressed
- Tailors follow-ups based on past interactions
- Flags when it's been too long since last contact
- Knows that "this prospect always responds better to case studies than pricing sheets"
The Compliance Dimension
For UK businesses, AI memory creates specific regulatory obligations:
UK GDPR:
- AI memories containing personal data are subject to data subject rights
- Customers can request to see what the AI "remembers" about them
- Right to erasure applies — you need a way to delete specific memories
- Data retention policies must cover AI memory stores
Financial Services:
- FCA expects auditable records of AI-assisted decisions
- Memory systems provide this naturally — but only if logging is comprehensive
- Model your memory retention around existing regulatory requirements
Professional Services:
- Client confidentiality extends to AI memory
- Memories from Client A must never leak into responses for Client B
- Implement strict memory partitioning by client/matter
What's Coming: Memory-Native Models
The next generation of AI models will have built-in memory capabilities:
- Persistent memory APIs — models that maintain state between sessions natively
- Memory-aware fine-tuning — models that learn from your specific interactions over time
- Collaborative memory — multiple agents sharing a common memory pool (with access controls)
- Forgetting mechanisms — models that can be instructed to genuinely forget specific information, not just "pretend" to
These features are already in preview with several major AI providers. By late 2026, memory management will be a platform feature rather than something you build yourself.
Getting Started Today
- Pick one agent — your most active AI-assisted workflow
- Add session summaries — after each interaction, store a structured summary
- Load summaries at startup — give the agent its recent history at the beginning of each session
- Measure the difference — track resolution time, accuracy, and user satisfaction before and after
- Expand gradually — add vector search, procedural memory, and cross-agent memory as you prove value
The businesses that solve AI memory first will have agents that compound in value over time — agents that know your clients, understand your processes, and improve with every interaction. Everyone else will keep starting from scratch.
Building AI agents that remember? Talk to us about memory architecture and long-running agent design for your business.
