Technical Guide

AI Agent Infrastructure: Choosing & Running Orchestration Platforms in Production

A practical guide to AI agent infrastructure for UK businesses — comparing orchestration platforms like CrewAI, LangGraph, AutoGen, and custom stacks. Covers hosting, scaling, monitoring, and the real costs of running autonomous AI agents in production.

Rod Hill·10 February 2026·12 min read

AI Agent Infrastructure: Choosing & Running Orchestration Platforms in Production

You've built an AI agent that works on your laptop. Congratulations — that was the easy part.

The hard part? Running it reliably, 24/7, handling real customer data, scaling when demand spikes, and doing it all without your monthly API bill requiring a second mortgage.

This guide covers the infrastructure decisions that separate a clever demo from a production system. We'll compare the major orchestration platforms, walk through hosting options, and give you the real numbers on what it costs to run AI agents at business scale in the UK.

The Agent Infrastructure Stack

Every production AI agent system has the same core layers, whether you're running a single assistant or a swarm of autonomous agents:

Layer 1: The Foundation Model (LLM)

Your agent's brain. The choice here cascades through everything else.

Current landscape (early 2026):

Claude (Anthropic) — Best for complex reasoning, tool use, and long-context tasks. Claude Opus 4 and Sonnet 4 have set new benchmarks for agentic reliability.
GPT-4o and o3 (OpenAI) — Strong all-rounder. Excellent tool calling, good vision capabilities.
Gemini 2.5 (Google) — Massive context windows (1M+ tokens). Good for document-heavy workflows.
DeepSeek V3 / R1 — Remarkable performance at fraction of the cost. Strong for reasoning tasks where you can self-host.
Llama 3.3 and Mistral Large — Open-weight options for on-premise deployment.

The practical decision: Most businesses use a tiered approach. Complex orchestration tasks get Claude or GPT-4o. Simple classification and routing gets a smaller, cheaper model. Batch processing uses open-weight models to control costs.

Layer 2: The Orchestration Framework

This is where agents get their structure — how they plan, use tools, collaborate, and maintain state.

Layer 3: Tool & API Integration

How your agents connect to the real world — databases, APIs, file systems, web browsers.

Layer 4: Infrastructure & Hosting

Where it all runs, how it scales, and how you keep it alive.

Layer 5: Observability & Control

Monitoring, logging, cost tracking, and the kill switch you hope you never need.

Comparing Orchestration Platforms

CrewAI

What it is: A framework for orchestrating role-based AI agents that collaborate like a team.

Best for: Multi-agent workflows where each agent has a clear role (researcher, writer, analyst, etc.)

Strengths:

Intuitive mental model — you define agents with roles, goals, and backstories
Built-in task delegation and collaboration patterns
Good for content pipelines, research workflows, and sequential processes
Active community and growing ecosystem

Weaknesses:

Less flexible for complex, dynamic routing
Can be opinionated about workflow structure
Performance overhead for simple single-agent tasks

Production readiness: Good for well-defined workflows. Less suited for highly dynamic agent interactions.

Typical use case: A content agency running a research → write → edit → SEO pipeline with different agents handling each stage.

LangGraph

What it is: A library for building stateful, multi-actor applications with LLMs, built on top of LangChain.

Best for: Complex workflows that need conditional routing, loops, and persistent state.

Strengths:

Graph-based architecture gives fine-grained control over agent flow
Excellent state management — checkpoint, resume, and branch conversations
Human-in-the-loop patterns built in
Strong typing and debugging tools
LangSmith integration for observability

Weaknesses:

Steeper learning curve than CrewAI
Tied to the LangChain ecosystem (which some developers find heavy)
Can be over-engineered for simple use cases

Production readiness: Strong. LangGraph Cloud provides managed hosting with persistence.

Typical use case: A financial services firm running a compliance review agent that routes documents through different specialist sub-agents based on content type, with human approval gates.

AutoGen (Microsoft)

What it is: A framework for building multi-agent conversational systems.

Best for: Agent-to-agent conversation patterns, research and analysis workflows.

Strengths:

Flexible conversation patterns between agents
Good integration with Microsoft ecosystem (Azure, Office 365)
Support for code execution environments
GroupChat pattern for multi-agent discussions

Weaknesses:

More research-oriented than production-focused
Less structured than CrewAI for business workflows
Documentation can lag behind rapid development

Production readiness: Improving, but still more experimental than LangGraph or CrewAI for business use.

Custom Stacks (Direct API + Your Code)

What it is: Building agent logic directly using LLM APIs, without a framework.

Best for: Simple, focused agents. Teams with strong engineering capabilities. Performance-critical applications.

Strengths:

Complete control over every aspect
No framework overhead or abstractions
Easier to debug — it's just your code
Can be significantly faster and cheaper

Weaknesses:

You build everything yourself — state management, tool routing, error handling
No community patterns to lean on
More maintenance burden

Production readiness: As good as your engineering team.

Typical use case: A SaaS company that needs a single, highly optimised customer support agent integrated into their existing Node.js backend.

The Decision Matrix

Factor	CrewAI	LangGraph	AutoGen	Custom
Ease of start	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐	⭐⭐
Flexibility	⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐
Production ready	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐
Multi-agent	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐
Cost control	⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐⭐
Observability	⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐	⭐⭐

Our recommendation for most UK SMEs: Start with a custom stack for your first agent. It forces you to understand every piece. Graduate to LangGraph or CrewAI when you need multi-agent orchestration.

Tool Integration: MCP and Beyond

The Model Context Protocol (MCP) has fundamentally changed how agents connect to tools. Instead of building custom integrations for every API, you connect to MCP servers that expose standardised tool interfaces.

What this means practically:

Your agent can connect to Slack, Google Drive, GitHub, databases, and hundreds of other services through a single protocol
Tool discovery is automatic — the agent can see what tools are available and how to use them
You can swap underlying implementations without changing agent code

The MCP ecosystem in 2026:

500+ community MCP servers available
Major SaaS platforms shipping official MCP servers
Growing marketplace of commercial MCP servers for enterprise integrations

For UK businesses: MCP means your agents can integrate with Xero, Companies House, HMRC APIs, and UK-specific services without custom development. Check the MCP server registry before building any integration from scratch.

Hosting Options

Option 1: Serverless (AWS Lambda, Vercel, Cloudflare Workers)

Best for: Event-driven agents, webhook handlers, simple request-response patterns.

Pros: Zero infrastructure management. Pay per invocation. Scales automatically. Cons: Cold starts (2-10 seconds). Execution time limits (15 min on Lambda). No persistent connections. Stateless by default.

Cost example: An agent handling 10,000 invocations/day with average 30s execution: ~£40-80/month on Lambda.

Option 2: Containers (ECS, Cloud Run, Railway, Fly.io)

Best for: Long-running agents, agents that need persistent connections, multi-step workflows.

Pros: Full control. Persistent connections. No cold starts. Can run background processes. Cons: You manage scaling. Always-on cost even when idle.

Cost example: A single container running 24/7 on Railway: ~£15-25/month. On AWS ECS: ~£30-60/month.

Option 3: Managed Agent Platforms

Best for: Teams without DevOps expertise. Rapid deployment.

Pros: Built-in state management, tool hosting, monitoring. Minimal infrastructure knowledge needed. Cons: Vendor lock-in. Less flexibility. Can be expensive at scale.

Cost example: LangGraph Cloud starts around $100/month for production workloads.

Option 4: Self-Hosted (Your Own Servers / VPS)

Best for: Data sovereignty requirements. Maximum control. Cost optimisation at scale.

Pros: Complete control. No vendor lock-in. Can run open-weight models locally. UK data residency guaranteed. Cons: Full ops burden. Hardware management. Scaling is manual.

Cost example: A Hetzner dedicated server with GPU (RTX 4090): ~€200/month. Can run local models + orchestration.

The Hosting Decision

For most UK SMEs starting out:

Start with containers on Railway or Fly.io — simple, affordable, good developer experience
Move to AWS/GCP when you need enterprise compliance (ISO 27001, SOC 2)
Add serverless for specific event-driven patterns alongside your main infrastructure
Consider self-hosted only when API costs exceed £2,000/month and you have the ops capability

The Real Cost of Running AI Agents

Let's get specific. Here's what a mid-complexity agent system actually costs for a UK SME:

The Customer Service Agent

Function: Handles email inquiries, routes to human for complex cases
Volume: 500 inquiries/day
LLM cost: ~£150-300/month (Claude Sonnet for triage, Haiku for simple replies)
Infrastructure: ~£30/month (container hosting)
Monitoring: ~£20/month (LangSmith or equivalent)
Total: ~£200-350/month
Replaces: ~0.5 FTE of support staff time

The Research & Analysis Pipeline

Function: Monitors competitors, summarises industry news, generates reports
Volume: Daily reports, 50+ sources monitored
LLM cost: ~£80-150/month (mix of models)
Infrastructure: ~£20/month
Web scraping: ~£30/month (proxy/API costs)
Total: ~£130-200/month
Replaces: ~4-6 hours/week of analyst time

The Document Processing System

Function: Extracts data from invoices, contracts, forms
Volume: 200 documents/day
LLM cost: ~£200-400/month (vision models for complex layouts)
Infrastructure: ~£40/month
Storage: ~£10/month
Total: ~£250-450/month
Replaces: ~1 FTE of data entry

Cost Optimisation Strategies

Model routing: Use cheap models for simple tasks, expensive models only when needed. A good router saves 40-60% on LLM costs.
Caching: Cache common responses. Semantic caching (similar queries → cached results) can cut costs by 30%.
Prompt optimisation: Shorter prompts = lower costs. A 50% prompt reduction halves your input token spend.
Batch processing: Run non-urgent tasks in batches during off-peak hours. Some providers offer 50% discounts for batch API access.
Open-weight models: Run Llama or Mistral locally for high-volume, lower-complexity tasks. After hardware costs, marginal cost per request approaches zero.

Observability: Knowing What Your Agents Are Doing

Production agents need monitoring. Not optional. Here's the minimum:

What to Track

Latency: Per-step and end-to-end. Set alerts for >30s responses.
Token usage: Per agent, per task, per model. This is your biggest variable cost.
Success rate: Percentage of tasks completed without errors or human intervention.
Tool call patterns: Which tools are used most? Which fail most?
Cost per task: The metric that matters most for ROI tracking.

Tools

LangSmith / LangFuse: Purpose-built for LLM observability. Trace every step.
Helicone: LLM proxy that adds logging and analytics with minimal code changes.
OpenTelemetry: Open standard. Works with existing monitoring stacks (Grafana, Datadog).
Custom logging: At minimum, log every LLM call, tool call, and decision point to structured JSON.

The Dashboard You Need

Build or configure a dashboard that shows:

Tasks processed today (vs. yesterday, vs. last week)
Current error rate (with 5-minute rolling window)
Total LLM spend today (projected monthly)
Agent utilisation (% time active vs. idle)
Human escalation rate (lower is better, to a point)

Security Considerations for UK Businesses

Data Residency

If processing personal data, understand where it flows. Most LLM APIs process data in the US.
For sensitive data: consider EU-region deployments (Azure UK South, AWS eu-west-2 London) with local models.
Document your data flows for GDPR compliance.

API Key Management

Never hardcode API keys. Use environment variables or secrets managers.
Rotate keys quarterly.
Implement per-agent key scoping — each agent should have minimal permissions.

Input/Output Filtering

Validate all inputs before they reach your agent.
Filter outputs for PII leakage, especially in customer-facing agents.
Implement content moderation for any agent that generates customer-facing text.

Audit Trails

Log every decision an agent makes, with reasoning.
This isn't just good practice — UK regulators are increasingly expecting explainable AI decisions.
Retain logs for minimum 12 months.

Getting Started: Your First Production Agent

Here's the path we recommend for UK SMEs:

Week 1: Define & Prototype

Identify one high-value, low-risk workflow to automate
Build a prototype using direct API calls (no framework)
Test with real data in a sandbox environment

Week 2: Harden

Add error handling and retry logic
Implement basic logging and monitoring
Set up cost alerts (hard limit at 2x expected spend)
Add human escalation paths

Week 3: Deploy

Containerise your agent
Deploy to Railway or Fly.io
Set up health checks and auto-restart
Run in shadow mode (agent runs alongside human, results compared)

Week 4: Go Live

Route real traffic to your agent
Monitor closely for first 48 hours
Adjust thresholds and routing rules based on real performance
Document everything for your team

What's Next

The infrastructure landscape is moving fast. Three trends to watch:

Agent-native hosting platforms — Purpose-built platforms for running AI agents are emerging, with built-in state management, tool hosting, and multi-tenant isolation.
Hybrid local-cloud architectures — Running sensitive processing on local hardware with cloud LLMs for complex reasoning. Best of both worlds for data sovereignty.
Agent marketplaces — Pre-built agents you can deploy and customise, rather than building from scratch. Think WordPress plugins, but for AI agents.

The businesses that build this infrastructure now won't just have better AI — they'll have a platform that compounds. Every new agent you deploy benefits from the tools, monitoring, and patterns you've already built.

Start with one agent. Get the infrastructure right. Then scale.

Need help designing your agent infrastructure? Get in touch — we help UK businesses build production AI systems that actually work.

AI Agent Infrastructure: Choosing & Running Orchestration Platforms in Production

The Agent Infrastructure Stack

Layer 1: The Foundation Model (LLM)

Layer 2: The Orchestration Framework

Layer 3: Tool & API Integration

Layer 4: Infrastructure & Hosting

Layer 5: Observability & Control

Comparing Orchestration Platforms

CrewAI

LangGraph

AutoGen (Microsoft)

Custom Stacks (Direct API + Your Code)

The Decision Matrix

Tool Integration: MCP and Beyond

Hosting Options

Option 1: Serverless (AWS Lambda, Vercel, Cloudflare Workers)

Option 2: Containers (ECS, Cloud Run, Railway, Fly.io)

Option 3: Managed Agent Platforms

Option 4: Self-Hosted (Your Own Servers / VPS)

The Hosting Decision

The Real Cost of Running AI Agents

The Customer Service Agent

The Research & Analysis Pipeline

The Document Processing System

Cost Optimisation Strategies

Observability: Knowing What Your Agents Are Doing

What to Track

Tools

The Dashboard You Need

Security Considerations for UK Businesses

Data Residency

API Key Management

Input/Output Filtering

Audit Trails

Getting Started: Your First Production Agent

Week 1: Define & Prototype

Week 2: Harden

Week 3: Deploy

Week 4: Go Live

What's Next

Tags

Rod Hill

Related Articles

OpenClaw Enterprise Security Hardening: Complete UK Deployment Guide 2026

AI Agentic APIs & Tool Integration: Connecting AI to Your Business Systems in 2026

Need help implementing this?