Skip to main content
Technical Guide

AI Agent Infrastructure: Choosing & Running Orchestration Platforms in Production

A practical guide to AI agent infrastructure for UK businesses — comparing orchestration platforms like CrewAI, LangGraph, AutoGen, and custom stacks. Covers hosting, scaling, monitoring, and the real costs of running autonomous AI agents in production.

Rod Hill·10 February 2026·12 min read

AI Agent Infrastructure: Choosing & Running Orchestration Platforms in Production

You've built an AI agent that works on your laptop. Congratulations — that was the easy part.

The hard part? Running it reliably, 24/7, handling real customer data, scaling when demand spikes, and doing it all without your monthly API bill requiring a second mortgage.

This guide covers the infrastructure decisions that separate a clever demo from a production system. We'll compare the major orchestration platforms, walk through hosting options, and give you the real numbers on what it costs to run AI agents at business scale in the UK.

The Agent Infrastructure Stack

Every production AI agent system has the same core layers, whether you're running a single assistant or a swarm of autonomous agents:

Layer 1: The Foundation Model (LLM)

Your agent's brain. The choice here cascades through everything else.

Current landscape (early 2026):

  • Claude (Anthropic) — Best for complex reasoning, tool use, and long-context tasks. Claude Opus 4 and Sonnet 4 have set new benchmarks for agentic reliability.
  • GPT-4o and o3 (OpenAI) — Strong all-rounder. Excellent tool calling, good vision capabilities.
  • Gemini 2.5 (Google) — Massive context windows (1M+ tokens). Good for document-heavy workflows.
  • DeepSeek V3 / R1 — Remarkable performance at fraction of the cost. Strong for reasoning tasks where you can self-host.
  • Llama 3.3 and Mistral Large — Open-weight options for on-premise deployment.

The practical decision: Most businesses use a tiered approach. Complex orchestration tasks get Claude or GPT-4o. Simple classification and routing gets a smaller, cheaper model. Batch processing uses open-weight models to control costs.

Layer 2: The Orchestration Framework

This is where agents get their structure — how they plan, use tools, collaborate, and maintain state.

Layer 3: Tool & API Integration

How your agents connect to the real world — databases, APIs, file systems, web browsers.

Layer 4: Infrastructure & Hosting

Where it all runs, how it scales, and how you keep it alive.

Layer 5: Observability & Control

Monitoring, logging, cost tracking, and the kill switch you hope you never need.

Comparing Orchestration Platforms

CrewAI

What it is: A framework for orchestrating role-based AI agents that collaborate like a team.

Best for: Multi-agent workflows where each agent has a clear role (researcher, writer, analyst, etc.)

Strengths:

  • Intuitive mental model — you define agents with roles, goals, and backstories
  • Built-in task delegation and collaboration patterns
  • Good for content pipelines, research workflows, and sequential processes
  • Active community and growing ecosystem

Weaknesses:

  • Less flexible for complex, dynamic routing
  • Can be opinionated about workflow structure
  • Performance overhead for simple single-agent tasks

Production readiness: Good for well-defined workflows. Less suited for highly dynamic agent interactions.

Typical use case: A content agency running a research → write → edit → SEO pipeline with different agents handling each stage.

LangGraph

What it is: A library for building stateful, multi-actor applications with LLMs, built on top of LangChain.

Best for: Complex workflows that need conditional routing, loops, and persistent state.

Strengths:

  • Graph-based architecture gives fine-grained control over agent flow
  • Excellent state management — checkpoint, resume, and branch conversations
  • Human-in-the-loop patterns built in
  • Strong typing and debugging tools
  • LangSmith integration for observability

Weaknesses:

  • Steeper learning curve than CrewAI
  • Tied to the LangChain ecosystem (which some developers find heavy)
  • Can be over-engineered for simple use cases

Production readiness: Strong. LangGraph Cloud provides managed hosting with persistence.

Typical use case: A financial services firm running a compliance review agent that routes documents through different specialist sub-agents based on content type, with human approval gates.

AutoGen (Microsoft)

What it is: A framework for building multi-agent conversational systems.

Best for: Agent-to-agent conversation patterns, research and analysis workflows.

Strengths:

  • Flexible conversation patterns between agents
  • Good integration with Microsoft ecosystem (Azure, Office 365)
  • Support for code execution environments
  • GroupChat pattern for multi-agent discussions

Weaknesses:

  • More research-oriented than production-focused
  • Less structured than CrewAI for business workflows
  • Documentation can lag behind rapid development

Production readiness: Improving, but still more experimental than LangGraph or CrewAI for business use.

Custom Stacks (Direct API + Your Code)

What it is: Building agent logic directly using LLM APIs, without a framework.

Best for: Simple, focused agents. Teams with strong engineering capabilities. Performance-critical applications.

Strengths:

  • Complete control over every aspect
  • No framework overhead or abstractions
  • Easier to debug — it's just your code
  • Can be significantly faster and cheaper

Weaknesses:

  • You build everything yourself — state management, tool routing, error handling
  • No community patterns to lean on
  • More maintenance burden

Production readiness: As good as your engineering team.

Typical use case: A SaaS company that needs a single, highly optimised customer support agent integrated into their existing Node.js backend.

The Decision Matrix

FactorCrewAILangGraphAutoGenCustom
Ease of start⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Flexibility⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Production ready⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Multi-agent⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Cost control⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Observability⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐

Our recommendation for most UK SMEs: Start with a custom stack for your first agent. It forces you to understand every piece. Graduate to LangGraph or CrewAI when you need multi-agent orchestration.

Tool Integration: MCP and Beyond

The Model Context Protocol (MCP) has fundamentally changed how agents connect to tools. Instead of building custom integrations for every API, you connect to MCP servers that expose standardised tool interfaces.

What this means practically:

  • Your agent can connect to Slack, Google Drive, GitHub, databases, and hundreds of other services through a single protocol
  • Tool discovery is automatic — the agent can see what tools are available and how to use them
  • You can swap underlying implementations without changing agent code

The MCP ecosystem in 2026:

  • 500+ community MCP servers available
  • Major SaaS platforms shipping official MCP servers
  • Growing marketplace of commercial MCP servers for enterprise integrations

For UK businesses: MCP means your agents can integrate with Xero, Companies House, HMRC APIs, and UK-specific services without custom development. Check the MCP server registry before building any integration from scratch.

Hosting Options

Option 1: Serverless (AWS Lambda, Vercel, Cloudflare Workers)

Best for: Event-driven agents, webhook handlers, simple request-response patterns.

Pros: Zero infrastructure management. Pay per invocation. Scales automatically. Cons: Cold starts (2-10 seconds). Execution time limits (15 min on Lambda). No persistent connections. Stateless by default.

Cost example: An agent handling 10,000 invocations/day with average 30s execution: ~£40-80/month on Lambda.

Option 2: Containers (ECS, Cloud Run, Railway, Fly.io)

Best for: Long-running agents, agents that need persistent connections, multi-step workflows.

Pros: Full control. Persistent connections. No cold starts. Can run background processes. Cons: You manage scaling. Always-on cost even when idle.

Cost example: A single container running 24/7 on Railway: ~£15-25/month. On AWS ECS: ~£30-60/month.

Option 3: Managed Agent Platforms

Best for: Teams without DevOps expertise. Rapid deployment.

Pros: Built-in state management, tool hosting, monitoring. Minimal infrastructure knowledge needed. Cons: Vendor lock-in. Less flexibility. Can be expensive at scale.

Cost example: LangGraph Cloud starts around $100/month for production workloads.

Option 4: Self-Hosted (Your Own Servers / VPS)

Best for: Data sovereignty requirements. Maximum control. Cost optimisation at scale.

Pros: Complete control. No vendor lock-in. Can run open-weight models locally. UK data residency guaranteed. Cons: Full ops burden. Hardware management. Scaling is manual.

Cost example: A Hetzner dedicated server with GPU (RTX 4090): ~€200/month. Can run local models + orchestration.

The Hosting Decision

For most UK SMEs starting out:

  1. Start with containers on Railway or Fly.io — simple, affordable, good developer experience
  2. Move to AWS/GCP when you need enterprise compliance (ISO 27001, SOC 2)
  3. Add serverless for specific event-driven patterns alongside your main infrastructure
  4. Consider self-hosted only when API costs exceed £2,000/month and you have the ops capability

The Real Cost of Running AI Agents

Let's get specific. Here's what a mid-complexity agent system actually costs for a UK SME:

The Customer Service Agent

  • Function: Handles email inquiries, routes to human for complex cases
  • Volume: 500 inquiries/day
  • LLM cost: ~£150-300/month (Claude Sonnet for triage, Haiku for simple replies)
  • Infrastructure: ~£30/month (container hosting)
  • Monitoring: ~£20/month (LangSmith or equivalent)
  • Total: ~£200-350/month
  • Replaces: ~0.5 FTE of support staff time

The Research & Analysis Pipeline

  • Function: Monitors competitors, summarises industry news, generates reports
  • Volume: Daily reports, 50+ sources monitored
  • LLM cost: ~£80-150/month (mix of models)
  • Infrastructure: ~£20/month
  • Web scraping: ~£30/month (proxy/API costs)
  • Total: ~£130-200/month
  • Replaces: ~4-6 hours/week of analyst time

The Document Processing System

  • Function: Extracts data from invoices, contracts, forms
  • Volume: 200 documents/day
  • LLM cost: ~£200-400/month (vision models for complex layouts)
  • Infrastructure: ~£40/month
  • Storage: ~£10/month
  • Total: ~£250-450/month
  • Replaces: ~1 FTE of data entry

Cost Optimisation Strategies

  1. Model routing: Use cheap models for simple tasks, expensive models only when needed. A good router saves 40-60% on LLM costs.
  2. Caching: Cache common responses. Semantic caching (similar queries → cached results) can cut costs by 30%.
  3. Prompt optimisation: Shorter prompts = lower costs. A 50% prompt reduction halves your input token spend.
  4. Batch processing: Run non-urgent tasks in batches during off-peak hours. Some providers offer 50% discounts for batch API access.
  5. Open-weight models: Run Llama or Mistral locally for high-volume, lower-complexity tasks. After hardware costs, marginal cost per request approaches zero.

Observability: Knowing What Your Agents Are Doing

Production agents need monitoring. Not optional. Here's the minimum:

What to Track

  • Latency: Per-step and end-to-end. Set alerts for >30s responses.
  • Token usage: Per agent, per task, per model. This is your biggest variable cost.
  • Success rate: Percentage of tasks completed without errors or human intervention.
  • Tool call patterns: Which tools are used most? Which fail most?
  • Cost per task: The metric that matters most for ROI tracking.

Tools

  • LangSmith / LangFuse: Purpose-built for LLM observability. Trace every step.
  • Helicone: LLM proxy that adds logging and analytics with minimal code changes.
  • OpenTelemetry: Open standard. Works with existing monitoring stacks (Grafana, Datadog).
  • Custom logging: At minimum, log every LLM call, tool call, and decision point to structured JSON.

The Dashboard You Need

Build or configure a dashboard that shows:

  1. Tasks processed today (vs. yesterday, vs. last week)
  2. Current error rate (with 5-minute rolling window)
  3. Total LLM spend today (projected monthly)
  4. Agent utilisation (% time active vs. idle)
  5. Human escalation rate (lower is better, to a point)

Security Considerations for UK Businesses

Data Residency

  • If processing personal data, understand where it flows. Most LLM APIs process data in the US.
  • For sensitive data: consider EU-region deployments (Azure UK South, AWS eu-west-2 London) with local models.
  • Document your data flows for GDPR compliance.

API Key Management

  • Never hardcode API keys. Use environment variables or secrets managers.
  • Rotate keys quarterly.
  • Implement per-agent key scoping — each agent should have minimal permissions.

Input/Output Filtering

  • Validate all inputs before they reach your agent.
  • Filter outputs for PII leakage, especially in customer-facing agents.
  • Implement content moderation for any agent that generates customer-facing text.

Audit Trails

  • Log every decision an agent makes, with reasoning.
  • This isn't just good practice — UK regulators are increasingly expecting explainable AI decisions.
  • Retain logs for minimum 12 months.

Getting Started: Your First Production Agent

Here's the path we recommend for UK SMEs:

Week 1: Define & Prototype

  • Identify one high-value, low-risk workflow to automate
  • Build a prototype using direct API calls (no framework)
  • Test with real data in a sandbox environment

Week 2: Harden

  • Add error handling and retry logic
  • Implement basic logging and monitoring
  • Set up cost alerts (hard limit at 2x expected spend)
  • Add human escalation paths

Week 3: Deploy

  • Containerise your agent
  • Deploy to Railway or Fly.io
  • Set up health checks and auto-restart
  • Run in shadow mode (agent runs alongside human, results compared)

Week 4: Go Live

  • Route real traffic to your agent
  • Monitor closely for first 48 hours
  • Adjust thresholds and routing rules based on real performance
  • Document everything for your team

What's Next

The infrastructure landscape is moving fast. Three trends to watch:

  1. Agent-native hosting platforms — Purpose-built platforms for running AI agents are emerging, with built-in state management, tool hosting, and multi-tenant isolation.
  2. Hybrid local-cloud architectures — Running sensitive processing on local hardware with cloud LLMs for complex reasoning. Best of both worlds for data sovereignty.
  3. Agent marketplaces — Pre-built agents you can deploy and customise, rather than building from scratch. Think WordPress plugins, but for AI agents.

The businesses that build this infrastructure now won't just have better AI — they'll have a platform that compounds. Every new agent you deploy benefits from the tools, monitoring, and patterns you've already built.

Start with one agent. Get the infrastructure right. Then scale.


Need help designing your agent infrastructure? Get in touch — we help UK businesses build production AI systems that actually work.

Tags

AI agentsorchestrationinfrastructureCrewAILangGraphAutoGenproduction AIAI platformsUK businessagent deploymentMLOpsLLMOps
RH

Rod Hill

The Caversham Digital team brings 20+ years of hands-on experience across AI implementation, technology strategy, process automation, and digital transformation for UK businesses.

About the team →

Need help implementing this?

Start with a conversation about your specific challenges.

Talk to our AI →