Deploying the Internet of AI Agents: Part VI

Production Deployment on Akamai Connected Cloud

Manikandan Meenakshi Sundaram¹, Sharanya Badrinarayanan¹, Neha Save¹, Javier Solis Vindas², John Zinky, Ph.D.², *Hema Seshadri, Ph.D.¹˒²

¹ Northeastern University · ² Akamai Technologies *Principal Investigator

In Parts I through V of this series, we examined the MBTA Transit Conversational Intelligence system from multiple perspectives. Part I introduced the vision of the Internet of AI Agents and the need for new infrastructure enabling specialized agents to discover, authenticate, and collaborate across organizational boundaries. Part II demonstrated the system’s four agents (exchange, alerts, planner, stopfinder) communicating through three protocols (A2A, MCP, SLIM). Part III explored federated agent discovery through the Northeastern registry and MIT-NANDA’s switchboard architecture, enabling cross-organizational collaboration. Part V examined the orchestration logic determining when to use direct API access via MCP versus multi-agent coordination via A2A.

Those discussions focused on what the system does and how it works. This post examines where it runs and how we deploy it. The architecture worked perfectly on localhost. Four services coordinated flawlessly. The registry discovered agents instantly. Traces appeared beautifully in Jaeger. Then we tried deploying to actual servers across network boundaries.

It was 11 PM when port 4317 refused all connections. The OpenTelemetry (OTeL) Collector started successfully according to its logs. Jaeger also claimed it started successfully. Yet traces vanished into the void. Two hours of debugging later, we discovered both services were fighting over the same port. Jaeger’s all-in-one image bundles an OTeL receiver that conflicts with running a separate collector. The fix involved switching to Jaeger-Query (UI only) and letting the dedicated collector handle ingestion.

That was night one. Night two brought permission errors where Nginx couldn’t read files in /home/ubuntu/. Night three revealed that MongoDB connection strings with special characters need URL encoding. By night four, we had automation that worked reliably.

Those struggles proved valuable. Every problem we encountered in testing, we fixed before production. Every deployment script we wrote for the test infrastructure, we refined for production use. Every lesson learned on simple virtual machines informed our production Kubernetes architecture.

This post examines our complete deployment journey on Linode infrastructure (part of Akamai’s distributed cloud platform): how we built testing environments using virtual machines with Supervisor process management and Docker containers, how we evolved to production deployment on Kubernetes for orchestration and scalability, and what we learned from operating both approaches simultaneously. These deployment patterns apply beyond transit systems to any federated multi-agent architecture requiring distributed services, protocol coordination, and production reliability.

The Four-Server Testing Topology

Figure 1: Four-server testing deployment topology on Linode infrastructure

Our testing deployment is distributed across four independent Linode instances, each managing services through mechanisms suited to rapid iteration and clear failure visibility (Fig. 1).

The exchange server at 50.116.53.133 runs two services via Supervisor: the exchange agent on port 8100 performing the intelligent routing decisions described in Part V, and the frontend UI on port 3000 serving the chat interface introduced in Part II. This g6-standard-4 instance (8GB RAM, 4 CPUs) handles LLM classification operations and WebSocket connections for real-time user interaction.

The agent server at 96.126.111.107 hosts the three specialized agents described in Part II, each as an independent Supervisor-managed process. The alerts agent operates on port 50051, providing service disruption monitoring and historical pattern analysis from the 41,970 incident dataset. The planner agent runs on port 50052, generating routes with LLM-enhanced alternatives. The stopfinder agent listens on port 50053, resolving Boston landmarks to MBTA stations via a 50-entry database. Another g6-standard-4 instance supports concurrent agent execution and simultaneous MBTA API calls.

The registry server at 97.107.132.213 provides the agent discovery infrastructure examined in Part III. The Northeastern registry Flask API runs on port 6900, exposing registration and search endpoints. A web dashboard on port 80 visualizes agent health with color-coded status indicators. The agent facts service operates on port 8000, serving detailed metadata. This g6-standard-2 instance (4GB RAM, 2 CPUs) connects to MongoDB Atlas for persistent storage of agent digital passports (the metadata records introduced in Part I).

The observability server at 66.228.45.25 runs the distributed tracing infrastructure in Docker containers. The OTeL Collector receives traces on ports 4317 and 4318. Jaeger provides trace visualization on port 16686. ClickHouse stores analytics on ports 8123 and 9000. Grafana displays monitoring dashboards on port 3001. This g6-nanode-1 instance (1GB RAM, 1 CPU) proves sufficient because observability workloads are IO-bound rather than CPU-intensive.

This separation achieves fault isolation where agent crashes don’t affect request routing, independent scaling where high-load services get appropriately sized instances, flexible updates enabling agent deployments without exchange downtime, and clear operational boundaries confining problems to specific servers.

Testing with Supervisor: When Simplicity Enables Velocity

Our testing deployment uses Supervisor process control, and there’s a practical reason why: during active development, you deploy changes constantly.

Uploading a file via SCP and typing supervisorctl restart mbta-exchange takes 10 seconds. Building a Docker image, pushing to a registry, and pulling on the server takes 3 minutes. When you are debugging why the routing logic misclassified a query (as discussed in Part V), those 3 minutes compound quickly. We deployed for testing 8 to 10 times on some days. The supervisor’s simplicity kept us moving.

The configuration looks straightforward, and that’s the point:
[program:mbta-exchange]
command=/opt/mbta-agentcy/venv/bin/python -m src.exchange_agent.exchange_server
directory=/opt/mbta-agentcy
autostart=true
autorestart=true
stdout_logfile=/var/log/mbta-exchange.log
environment=REGISTRY_URL=”http://97.107.132.213:6900″,OTeL_EXPORTER_OTLP_ENDPOINT=”http://66.228.45.25:4317″

If the exchange agent crashes (and during testing, it crashed often while we refined the heuristic pattern detection and LLM routing described in Part V), the supervisor restarts it automatically. Logs go to a single file. No container layers to debug. No image builds. Just Python processes that start, run, and restart when needed.

The agent server follows the same pattern. Three agents, three Supervisor configs, three log files. When the planner agent consumed excessive memory during route generation with specific query patterns, we saw it crash in the logs immediately. No Kubernetes abstractions hiding the problem. Just clear cause and effect: bad query, memory spike, crash, restart. We fixed the memory leak because we could see it clearly.

The registry server serves both the discovery API (described in Part III’s discussion of federated registries) and a web dashboard. The dashboard became unexpectedly useful during testing. We kept it open in a browser tab, refreshing occasionally to verify agents registered correctly with their capabilities and endpoints. Green dots meant agents were alive. Gray dots meant something was wrong. This visual feedback caught registration problems faster than checking API responses manually.

The observability server took a different path. We tried running the OpenTelemetry (OTeL) Collector, Jaeger, ClickHouse, and Grafana via Supervisor initially. That experiment lasted about two hours before we gave up. The configuration complexity (OTeL config files, Jaeger storage backends, ClickHouse schemas, Grafana datasources) made manual installation painful. We switched to Docker Compose, and suddenly everything just worked. One docker compose up command started all four services with correct networking and storage. Sometimes the right tool for the job is the one that makes problems disappear.

The Testing Deployment Workflow

Here’s what actually happens when we deploy a fix to testing.

You’re on your Windows laptop. You just fixed a bug where the LLM routing (Part V’s unified classification) misclassified accessibility queries. You want to test it on the real server, not just localhost. You open Git Bash:

scp -i mbta-exchange-key \
src/exchange_agent/exchange_server.py \
root@50.116.53.133:/opt/mbta-agentcy/src/exchange_agent/

Two seconds later, the file is on the server. Then:

ssh -i mbta-exchange-key root@50.116.53.133 ‘supervisorctl restart mbta-exchange’

Three seconds later, the service is running your new code. You watch the logs in another terminal:

ssh -i mbta-exchange-key root@50.116.53.133 ‘tail -f /var/log/mbta-exchange.log’

The exchange agent starts. It connects to the registry at 97.107.132.213:6900 (Part III’s Northeastern registry). It initializes the MCP client (Part II’s tool protocol). It’s serving requests. Total time from “I just fixed this” to “it’s running in testing”: under a minute.

This speed mattered during live demonstrations. When presenting the system to Akamai, AGNTCY-ADS and MIT-NANDA collaborators, we deployed an improvement to the registry-side semantic filtering (Part III’s discovery mechanism) between demos. Nobody noticed. The system stayed up. The fix went live. That’s the value of simple deployment mechanisms when you need velocity.

Rollback follows the same rhythm. Something’s broken. Check the git history. Find the last working version. Three commands:

git checkout HEAD~1 src/exchange_agent/exchange_server.py
scp -i mbta-exchange-key src/exchange_agent/exchange_server.py root@50.116.53.133:/opt/…
ssh -i mbta-exchange-key root@50.116.53.133 ‘supervisorctl restart mbta-exchange’

Sixty seconds later, you’re back to working code. This safety net meant we could try experimental changes to the StateGraph orchestrator (Part V’s multi-agent coordination) confidently, knowing recovery was trivial.

After deploying the registry manually three times and hitting different problems each time (first: permissions, second: MongoDB URI encoding, third: Nginx proxy configuration), we stopped and wrote automation:

#!/bin/bash
# deploy-registry.sh

MONGO_URI=$1

# Create firewall
FIREWALL_ID=$(linode-cli firewalls create \
–label “mbta-registry-firewall” \
–rules.inbound ‘[
{“protocol”: “TCP”, “ports”: “22,80,6900,8000”, “addresses”: {“ipv4”: [“0.0.0.0/0”]}}
]’ \
–json | jq -r ‘.[0].id’)

# Generate SSH key
ssh-keygen -t rsa -b 4096 -f registry-key -N “” -q

# Launch instance
INSTANCE_ID=$(linode-cli linodes create \
–label “mbta-registry” \
–region us-east \
–type g6-standard-2 \
–image linode/ubuntu22.04 \
–authorized_keys “$(cat registry-key.pub)” \
–firewall_id $FIREWALL_ID \
–json | jq -r ‘.[0].id’)

sleep 60 # Wait for boot

IP=$(linode-cli linodes list –json | jq -r “.[] | select(.id==$INSTANCE_ID) | .ipv4[0]”)

# Deploy application
ssh -o StrictHostKeyChecking=no -i registry-key root@$IP << REMOTE
apt-get update && apt-get install -y python3.11 python3-pip supervisor nginx
mkdir -p /home/ubuntu/registry && cd /home/ubuntu/registry
python3.11 -m venv venv
source venv/bin/activate
pip install flask flask-cors pymongo
# Configure supervisor, nginx, start services…
REMOTE

echo “✅ Registry deployed at http://$IP”

This script transforms 45 minutes of manual configuration into 90 seconds of automated execution. More importantly, it does the same thing every time. No forgetting to open port 6900 in the firewall. No typos in configuration files. No wondering “did I install supervisor yet?” Deployment knowledge lives in version-controlled scripts rather than scattered Slack messages.

Network Security Through Cloud Firewall

Every server needs a firewall. We created firewalls for each server, allowing only what each service actually needs. The exchange server gets SSH (22), exchange API (8100), and frontend (3000). The agent server gets SSH and the three agent ports (50051-50053) for SLIM communication. The registry server gets SSH, HTTP (80), registry API (6900), and agent facts (8000). The observability server needs more ports for its multiple services: Jaeger (16686), Grafana (3001), ClickHouse (8123, 9000), and OTeL Collector (4317, 4318).

During deployment, we forgot to open port 50051 for the alerts agent. The exchange agent kept timing out when trying to establish SLIM connections. The error message said, “connection refused.” We checked agent logs (agent running fine), network connectivity (ping worked), and service status (Supervisor showed running). Twenty minutes later, someone said, “did we open the port in the firewall?” We hadn’t. One firewall rule addition later, everything worked.

Now the deployment script creates firewalls with all required ports before launching instances. Mistakes you make once become automation that prevents recurrence.

The Production Environment: Kubernetes on Linode

Testing taught us what worked. Production demanded what scaled.

While VMs with Supervisor served testing perfectly, production needed capabilities Supervisor couldn’t provide: automatic scaling when traffic spikes during Boston’s morning commute, rolling updates without downtime when deploying agent improvements, health monitoring with automatic recovery when pods crash, and declarative configuration documenting infrastructure in version-controlled manifests.

Figure 2: Production Kubernetes deployment architecture on Linode Kubernetes Engine

We deployed production to Kubernetes using Linode Kubernetes Engine (LKE). LKE provides managed Kubernetes, where Linode operates the control plane infrastructure (API server, scheduler, etcd database) while we manage worker nodes running application containers. The platform’s CNCF certification ensures workload portability to other Kubernetes providers, avoiding vendor lock-in while benefiting from Linode’s developer-friendly interface and predictable pricing (flat monthly rates without surprise egress fees) (Fig. 2).

Creating the production cluster took minutes through Linode Cloud Manager: select the us-east region for proximity to MBTA data sources, define a node pool with three g6-standard-4 instances providing 24GB total memory, enable automatic node replacement for hardware fault tolerance, and click create. Within 5 minutes, we had a working cluster with kubectl access.

The shift from VMs to Kubernetes meant thinking differently about deployment. On VMs, you have servers running processes. On Kubernetes, you have pods that can move between nodes, services that route traffic across pod replicas, and manifests that declare desired state rather than executing imperative commands. This abstraction provides powerful capabilities at the cost of conceptual complexity.

Production deployment is now live on this Kubernetes cluster, demonstrating how the multi-agent coordination patterns described in Part V operate at scale with automatic failover and health monitoring (Fig. 2).

Deploying to Kubernetes: The Complete Manifest Structure

Kubernetes deployment involves writing YAML manifests that describe what you want rather than scripts that do what you want. This declarative approach feels strange initially but proves valuable for production operations.

We organized manifests into logical groups. A ConfigMap holds shared configuration like the registry URL (Part III’s discovery service) and agent endpoints. A Secret stores API keys for OpenAI (used in Part V’s LLM routing) and MBTA (accessed by all agents). Deployments define application replicas and container specifications. Services provide stable network endpoints. An Ingress routes external traffic based on domain names.

The ConfigMap became our single source of truth for configuration:

apiVersion: v1
kind: ConfigMap
metadata:
name: mbta-config
namespace: mbta
data:
REGISTRY_URL: “http://registry:6900”
ALERTS_AGENT_URL: “http://alerts-agent:50051”
PLANNER_AGENT_URL: “http://planner-agent:50052”
STOPFINDER_AGENT_URL: “http://stopfinder-agent:50053”
OTeL_EXPORTER_OTLP_ENDPOINT: “http://OTeL-collector:4317”
USE_SLIM: “true”

Every service references this ConfigMap via envFrom. Change the registry URL once, and all services pick up the change on their next restart. This centralization eliminated configuration drift where different services had different registry URLs because someone updated one deployment but forgot the others.

The Exchange Agent: Production Deployment with Replicas

The exchange agent deployment demonstrates production Kubernetes patterns with high availability:

apiVersion: apps/v1
kind: Deployment
metadata:
name: exchange
namespace: mbta
spec:
replicas: 2
selector:
matchLabels:
app: exchange
template:
spec:
containers:
– name: exchange
image: registry.northeastern.edu/mbta/exchange:latest
ports:
– containerPort: 8100
envFrom:
– configMapRef:
name: mbta-config
– secretRef:
name: mbta-api-secrets
resources:
requests:
memory: “512Mi”
cpu: “200m”
limits:
memory: “1Gi”
cpu: “500m”
livenessProbe:
httpGet:
path: /
port: 8100
periodSeconds: 15
readinessProbe:
httpGet:
path: /
port: 8100
periodSeconds: 5

Two replicas mean two pods running the exchange agent simultaneously. If one crashes while processing a complex routing decision, the other handles incoming requests while Kubernetes restarts the failed pod. Users experience no interruption. During our production deployment, we triggered an intentional crash by sending a malformed request. One pod died. The other kept serving. Within 10 seconds, Kubernetes restarted the crashed pod. The system recovered automatically.

The liveness probe checks health every 15 seconds via HTTP GET to the root endpoint. Three consecutive failures trigger a pod restart. The readiness probe determines whether the pod should receive traffic. A pod failing readiness checks gets removed from the service load balancer until it passes again. This matters during rolling updates: new pods don’t receive traffic until they initialize successfully, preventing errors from hitting pods still loading the MCP client or connecting to the registry.

Resource requests (512Mi memory, 200m CPU) tell Kubernetes what the pod needs to run. Kubernetes schedules pods only on nodes with sufficient available resources. Resource limits (1Gi memory, 500m CPU) cap what the pod can consume. A pod exceeding memory limits gets killed and restarted. A pod exceeding CPU limits gets throttled. These constraints prevent the exchange agent from monopolizing cluster resources during expensive LLM operations.

Agent Deployments: Dual-Container Pods for Multi-Protocol Support

Each specialized agent runs two containers in one pod, supporting both HTTP A2A and SLIM transport (the two communication mechanisms described in Part II).

This felt overcomplicated initially. Why not separate pods for each protocol? We tried that during initial Kubernetes testing. It created problems. With separate pods, each protocol needed its own service, its own endpoint URL, and its own configuration. The exchange agent had to track two URLs per agent. Worse, the HTTP and SLIM versions could desync during rolling updates, where one pod updated successfully but the other failed. Requests via HTTP got the new version while requests via SLIM got the old version. Debugging this split-brain scenario consumed hours.

Putting both containers in one pod solved this:

apiVersion: apps/v1
kind: Deployment
metadata:
name: alerts-agent
namespace: mbta
spec:
template:
spec:
containers:
– name: alerts-http
image: registry.northeastern.edu/mbta/agent:latest
command: [“uvicorn”, “src.agents.alerts.main:app”, “–port”, “8001”]
ports:
– containerPort: 8001
name: http
envFrom:
– configMapRef:
name: mbta-config
– secretRef:
name: mbta-api-secrets
– name: alerts-slim
image: registry.northeastern.edu/mbta/agent:latest
command: [“python”, “-m”, “src.agents.alerts.slim_alerts_wrapper_fixed”]
ports:
– containerPort: 50051
name: slim
envFrom:
– configMapRef:
name: mbta-config
– secretRef:
name: mbta-api-secrets

Both containers share the pod’s network namespace, enabling localhost communication if needed. Both reference the same ConfigMap and Secret. Both deploy together, scale together, fail together. Updates happen atomically. The exchange agent tracks one service name (alerts-agent) that routes to both protocol endpoints. This co-location eliminated the synchronization nightmares we experienced with separate pods.

The planner and stopfinder agents follow identical dual-container patterns with different ports. Planner uses 8002 for HTTP and 50052 for SLIM. Stopfinder uses 8003 for HTTP and 50053 for SLIM. This consistency simplifies operations: every agent deployment looks structurally similar, reducing cognitive load when troubleshooting.

Registry Deployment: MongoDB Sidecar Pattern

The registry deployment bundles MongoDB and the Flask API in the same pod using the sidecar pattern:

apiVersion: apps/v1
kind: Deployment
metadata:
name: registry
namespace: mbta
spec:
template:
spec:
containers:
– name: mongodb
image: mongo:7
ports:
– containerPort: 27017
volumeMounts:
– name: mongo-data
mountPath: /data/db
resources:
requests:
memory: “256Mi”
cpu: “100m”
limits:
memory: “512Mi”
cpu: “300m”
– name: registry
image: registry.northeastern.edu/mbta/registry:latest
ports:
– containerPort: 6900
name: http
env:
– name: MONGO_URI
value: “mongodb://localhost:27017”
– name: ENABLE_FEDERATION
value: “false”

The MongoDB container runs in the same pod as the registry, making it accessible via localhost without network configuration. The registry stores agent digital passports (Part I’s metadata records), including capabilities, endpoints, and protocol support. The volume mount persists MongoDB data beyond container restarts, ensuring agent registrations survive pod failures.

This sidecar approach works for our current scale (36 registered agents across 6 domains, as mentioned in Part III). Larger deployments would use managed databases (MongoDB Atlas, Linode Managed Databases) for automated backups, replication, and operational support. For a research demonstration serving Boston users, the sidecar pattern provides adequate reliability without operational overhead.

Observability Stack: Four Services Coordinating in Kubernetes

The observability infrastructure deploys as four separate Kubernetes deployments, demonstrating how containerized services coordinate through cluster networking.

The OTeL Collector receives traces from all application services (exchange agent, three specialized agents, registry):

apiVersion: apps/v1
kind: Deployment
metadata:
name: OTeL-collector
namespace: mbta
spec:
containers:
– name: OTeL-collector
image: OTeL/opentelemetry-collector-contrib:0.91.0
args: [“–config=/etc/OTeL/OTeL-collector-config.yaml”]
ports:
– containerPort: 4317
name: otlp-grpc
– containerPort: 4318
name: otlp-http
volumeMounts:
– name: config
mountPath: /etc/OTeL
resources:
requests:
memory: “256Mi”
cpu: “100m”
limits:
memory: “512Mi”
cpu: “300m”

The configuration mounts from a ConfigMap defining receivers, processors, and exporters. The collector receives OTLP traces on ports 4317 (gRPC) and 4318 (HTTP), processes them through batching to reduce network overhead, and exports to both Jaeger for trace visualization and ClickHouse for analytics storage.

Jaeger provides the distributed trace visualization, enabling the request flow analysis shown in Part V:

apiVersion: apps/v1
kind: Deployment
metadata:
name: jaeger
namespace: mbta
spec:
containers:
– name: jaeger
image: jaegertracing/all-in-one:1.52
ports:
– containerPort: 16686
name: ui
– containerPort: 4317
name: otlp-grpc
env:
– name: COLLECTOR_OTLP_ENABLED
value: “true”
– name: SPAN_STORAGE_TYPE
value: “badger”
volumeMounts:
– name: jaeger-data
mountPath: /badger

ClickHouse stores both raw traces from Jaeger and custom application analytics (conversation logs, agent invocations, LLM calls). Grafana queries ClickHouse to display real-time dashboards for monitoring system health. Each service has a ClusterIP Service enabling DNS-based discovery, where OTeL-collector.mbta.svc.cluster.local resolves automatically to the collector pod IP.

This observability infrastructure captures the complete execution flow described in Part V: LLM routing decisions, StateGraph orchestration steps, individual agent invocations via SLIM, and response synthesis. The traces reconstruct how the system coordinates three agents to answer “Should I wait for Red Line delays?” with historical analysis and alternative routes.

External Access Through Ingress

The Ingress resource routes external traffic to internal services based on domain names:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: mbta-ingress
namespace: mbta
annotations:
nginx.ingress.kubernetes.io/proxy-read-timeout: “3600”
nginx.ingress.kubernetes.io/proxy-http-version: “1.1”
spec:
rules:
– host: mbta.agent.mitdataworksai.com
http:
paths:
– path: /
backend:
service:
name: frontend
port:
number: 3000
– host: exchange.agent.mitdataworksai.com
http:
paths:
– path: /
backend:
service:
name: exchange
port:
number: 8100
– host: grafana.agent.mitdataworksai.com
http:
paths:
– path: /
backend:
service:
name: grafana
port:
number: 3001
– host: jaeger.agent.mitdataworksai.com
http:
paths:
– path: /
backend:
service:
name: jaeger
port:
number: 16686

Users access the chat interface at mbta.agent.mitdataworksai.com instead of remembering IP addresses and ports. Operators access Grafana at grafana.agent.mitdataworksai.com for monitoring dashboards tracking the routing decisions and agent performance discussed in Part V. Developers access Jaeger at jaeger.agent.mitdataworksai.com for detailed trace inspection showing the complete StateGraph execution flow.

Domain-based routing provides clean URLs and enables SSL/TLS termination at the Ingress for HTTPS. The Nginx Ingress Controller (deployed to the cluster) implements these routing rules, acting as a reverse proxy that terminates external connections and forwards requests to appropriate backend services.

Why Both Deployments: Testing on VMs, Production on Kubernetes

Running both deployment strategies simultaneously felt redundant initially. Why maintain two infrastructure approaches for the same system? Because each solves different problems.

Testing on VMs catches issues that Kubernetes would hide. During testing, the planner agent consumed excessive memory when generating routes with complex transfer constraints (Part V’s context-aware route planning). On Kubernetes with automatic restarts, the pod would crash and restart without necessarily revealing why. On VMs with Supervisor, the service crashed visibly in logs, forcing an investigation. We found and fixed a memory leak in the LLM route generation that would have been harder to diagnose in Kubernetes, where crashes trigger automatic recovery masking root causes.

Production on Kubernetes provides safeguards, but testing lacks. When the exchange agent crashed during a demonstration due to an uncaught exception in the heuristic pattern detection (Part V’s domain expertise detection), Kubernetes restarted the pod within 10 seconds. The audience saw a brief loading state. The system recovered. Testing on VMs would have required someone to SSH in and manually restart the service, resulting in visible downtime during the presentation.

The dual strategy also enables graduated deployment, validating changes in simple environments before deploying to sophisticated orchestration. New improvements to the semantic matching algorithm (Part III’s registry-side filtering) deploy to testing VMs first, where rapid iteration and straightforward debugging outweigh orchestration benefits. After validation through actual use serving test queries, changes are deployed to production Kubernetes, where automatic scaling and health monitoring activate. This reduces production risk while maintaining development velocity.

One more benefit emerged unexpectedly. When Kubernetes node failures took the production cluster offline during a particularly aggressive load test, we redirected traffic to the testing infrastructure within 10 minutes. The testing VMs became an emergency backup, maintaining service availability while we recovered the production cluster. This redundancy proved its value exactly once, but that once justified the operational overhead of maintaining both deployments.

These deployment lessons generalize beyond MBTA transit to any multi-agent system requiring distributed services. Organizations deploying AGNTCY-compatible agents or federated registry architectures face similar trade-offs between deployment simplicity (VMs) and production capabilities (Kubernetes). Starting simple enables rapid validation. Migrating to orchestration happens when benefits justify complexity.

Agent Health Monitoring Through Registry Integration

The registry (Part III’s discovery infrastructure) tracks agent health through metadata maintained in both testing and production deployments.

Each registered agent has metadata including an alive field indicating operational status, a last_update timestamp showing recent activity, capabilities, and tags describing functionality (part of the agent digital passport from Part I), and assigned_to for future multi-tenant scenarios.

The web dashboard at http://97.107.132.213 displays agent status with color-coded indicators (green for alive, gray for offline), update timestamps, and registered capabilities. During testing, we monitor this dashboard to verify that agents register correctly after deployment. During production operation, the dashboard provides a quick health assessment without running kubectl commands or checking pod status across the cluster.

Future enhancements would implement automatic heartbeat monitoring:

import threading
import time
import httpx

def heartbeat_loop():
while True:
try:
httpx.put(f”{REGISTRY_URL}/agents/mbta-alerts/status”,
json={“alive”: True})
time.sleep(30)
except Exception:
time.sleep(5) # Retry sooner on failure

threading.Thread(target=heartbeat_loop, daemon=True).start()

Agents would send status updates every 30 seconds to the registry. The registry would mark agents offline if no heartbeat is received within 90 seconds. The dashboard would refresh automatically, showing real-time status. This automatic detection would catch crashed pods faster than manual monitoring.

We haven’t implemented automatic heartbeats yet because our three-agent deployment rarely experiences failures where automatic detection would significantly improve response time over manual monitoring via the dashboard. As systems scale to dozens of agents across multiple organizations (the federated model from Part III), automatic health monitoring becomes essential rather than optional.

Comparing Deployment Approaches: When to Use Which

The question isn’t whether VMs or Kubernetes is better. The question is which problems you’re solving.

VMs with Supervisor solve operational simplicity where services run as processes with obvious lifecycle management, rapid deployment where changes take seconds not minutes, straightforward debugging using familiar tools (SSH, tail, grep), and geographic flexibility deploying services to different regions independently (relevant for the future multi-city expansion discussed in Part III).

But you’re managing dependencies manually (is Python 3.11 installed? What about the required pip packages?), scaling manually (traffic spike during morning commute? SSH and start another instance), and handling failures manually (agent crashed? Restart it yourself).

Kubernetes solves production scale where traffic varies unpredictably throughout the day, services need high availability tolerating pod failures, teams want deployment automation reducing human error, and configuration requires documentation in version-controlled manifests enabling reproducible deployments.

But you’re learning orchestration concepts (pods, services, deployments, ingress), debugging container abstractions (why won’t this pod schedule? why can’t services find endpoints?), and accepting that all components must live in one cluster unless you’re ready for multi-cluster complexity.

We chose both. Testing uses simple tools because development velocity matters more than production features when refining the intelligent routing logic (Part V). Production uses sophisticated orchestration because reliability matters more than deployment simplicity when serving actual users querying real MBTA data. This isn’t indecision. It’s using appropriate tools for different problems.

This pattern applies broadly to multi-agent deployments. Small research projects (fewer than 10 agents) often find VMs sufficient, and Kubernetes overcomplicated. Medium-scale deployments (10 to 50 agents) benefit from Kubernetes orchestration but require infrastructure expertise. Large federated deployments (hundreds of agents across organizations, as envisioned in Part I’s Internet of AI Agents) essentially require orchestration to manage complexity. We currently operate in the small-to-medium range and use tooling appropriate for our scale.

Lessons from Operating Both Deployments

Here’s what we learned from running testing on VMs and production on Kubernetes simultaneously.

Aspect	Testing (VMs + Supervisor)	Production (Kubernetes)
Deploy Speed	10 seconds	2-3 minutes
Rollback Speed	60 seconds	Automatic (10s)
Scaling	Manual	Automatic
Failure Recovery	Manual restart	Auto-restart pods
Debugging	Clear (tail logs)	Abstract (kubectl)
Complexity	Simple	Sophisticated
Best For	Rapid iteration	Production reliability
Geographic Distribution	Yes (4 regions)	No (single cluster)

Figure 3: Deployment Strategy Comparison – Testing VMs versus Production Kubernetes

Start simple, add complexity only when you feel the pain. Our initial plan was “everything on Kubernetes from day one” because that’s what modern cloud-native architecture should use. We’re glad we didn’t follow that plan. The Kubernetes learning curve (why won’t pods schedule? why can’t the exchange agent reach the registry service? why does ingress return 404 for valid routes?) would have blocked progress for weeks. Starting with VMs, let us validate the multi-agent coordination patterns (Part V’s StateGraph orchestration) quickly. We migrated to Kubernetes only after understanding what we actually needed from orchestration: automatic scaling during traffic spikes, rolling updates for zero-downtime deployments, and health probes for automatic recovery (Fig. 3).

Separate observability regardless of deployment mechanism. Whether running on VMs or Kubernetes, observability infrastructure should be separate from applications. We initially ran the ClickHouse analytics database on the exchange server to save infrastructure costs (one fewer instance to manage and pay for). ClickHouse memory spikes during trace ingestion occasionally affected exchange agent response times, adding 200 to 300 milliseconds to the routing decisions described in Part V. Moving observability to dedicated infrastructure eliminated this contention. Monitoring costs resources, but those resources shouldn’t come from user-facing services.

Automate deployment even when it seems like overkill. Writing deployment scripts for four testing servers felt unnecessary. Then we rebuilt the registry three times in one week while experimenting with different MongoDB schemas for storing agent digital passports (Part I’s metadata records). The automation script paid for itself immediately. What took 45 minutes manually now takes 90 seconds automatically. More importantly, the script documents the deployment process. New team members joining the project don’t need to ask, “How do I set up the registry?” They run deploy-registry.sh with the MongoDB connection string.

Test in the environment that shows problems clearly. VMs with Supervisor expose failures visibly. Services crash, logs show exceptions, processes stop. Kubernetes hides failures behind automatic restarts and pod abstractions. For testing and debugging, visible failures are features, not bugs. They force you to understand and fix root causes. For production serving real users, automatic recovery is essential because uptime matters more than learning opportunities. Use both environments appropriately for their strengths.

Version control infrastructure configuration, not just application code. Our testing deployment scripts (bash + Linode CLI) and production manifests (Kubernetes YAML) both live in git alongside application code. Infrastructure changes go through the same review process as code changes. Deployments are reproducible from clean checkouts. Configuration is documented in executable form. This discipline prevents “it works on my test server but not on production” problems, where environments drift due to manual changes nobody documented.

These lessons apply to any distributed multi-agent system requiring coordination across services, whether implementing AGNTCY-compatible agents, federated registry architectures like the switchboard (Part III), or intelligent orchestration patterns like StateGraph (Part V). The specific technologies differ (VMs versus containers, Supervisor versus Kubernetes, Linode versus other clouds), but the principles remain: start simple, automate early, test where problems show clearly, and version control everything.

Future Enhancements

Several improvements would benefit production robustness as the system scales beyond research demonstration toward production operation serving actual Boston commuters.

Blue-green deployment on Kubernetes would maintain two complete namespace sets (blue serving traffic, green idle), deploy updates to the green namespace, run automated tests against the green environment, switch Ingress traffic to green if tests pass, and keep blue running for instant rollback capability. This strategy enables zero-downtime updates with reduced deployment risk, important for maintaining availability during the morning and evening commute peaks.

Horizontal Pod Autoscaling would automatically adjust exchange agent replicas based on request rate or CPU utilization, ensuring the system handles traffic spikes (morning rush hour queries) without manual intervention while reducing costs during low-traffic periods (late night) by scaling down to minimum replicas.

GitOps workflow using Argo CD or Flux would deploy Kubernetes changes automatically when manifests merge to the main branch, maintaining complete deployment history in git and enabling rollback to any previous version via git revert rather than manual kubectl commands.

Secret management integration with HashiCorp Vault would eliminate base64-encoded API keys in Kubernetes Secrets, provide automatic credential rotation for OpenAI and MBTA API keys, enable fine-grained access control limiting which services access which secrets, and maintain audit logs of secret access for security compliance.

These enhancements matter at scale but add complexity that our current deployment doesn’t yet justify. We operate what we understand, monitor what we operate, and automate what we repeat. As the system grows from research demonstration to production service, appropriate sophistication follows naturally.

Conclusion

Deploying distributed multi-agent systems transforms elegant architectures into messy reality, revealing challenges you didn’t anticipate from local testing.

Our journey from localhost to production involved debugging port conflicts at 2 AM (OTeL Collector versus Jaeger, both claiming port 4317), rebuilding servers three times to fix permission errors (Nginx reading files in /home/ubuntu/), learning that MongoDB connection strings need special character encoding, and recognizing that deployment expertise differs fundamentally from development skills. These struggles taught us more than smooth deployments would have.

The dual deployment strategy (testing on VMs, production on Kubernetes) balanced rapid iteration with production reliability. Testing provided fast deploy-restart cycles, enabling 8 to 10 deployments daily and clear failure visibility, accelerating debugging. Production provided automatic pod restarts, maintaining availability, rolling updates enabling zero-downtime deployments, and replica management ensuring no single pod failure interrupts service. Operating both simultaneously enabled graduated deployment, where changes are validated in simple environments before deploying to sophisticated orchestration.

We started with perfect architecture diagrams showing the exchange agent, specialized agents, registry, and observability stack communicating seamlessly. We ended with working systems running on real infrastructure distributed across four Linode servers, serving actual users querying real MBTA data, handling actual pod failures with automatic recovery, and teaching us actual lessons about distributed systems operation (Fig. 4). The diagrams didn’t show the late nights or the cryptic errors or the humbling recognition that we didn’t know as much as we thought. But that’s where the learning happened.

Figure 4: Linode Cloud Manager showing Active Instances across Testing and Production Environments

The deployment patterns described here apply beyond MBTA transit to any federated multi-agent architecture. Organizations building AGNTCY-compatible agents face similar decisions about deployment mechanisms, orchestration sophistication, and operational trade-offs. The protocols differ (A2A, MCP, SLIM as discussed in Part II), the discovery mechanisms vary (federated registries as explored in Part III), and the coordination patterns diverge (StateGraph orchestration examined in Part V), but the deployment fundamentals remain: start simple enough to understand, automate enough to scale, test where problems show clearly, and migrate to sophistication when benefits justify complexity.

The path from localhost to production isn’t smooth. It involves struggles, mistakes, and learning. But with reasonable infrastructure, appropriate automation, and a willingness to learn from failures, the path is navigable. We walked it. This post shares the map, including the parts where we got lost.

Technical Summary

Testing deployment uses four Linode instances: exchange server (50.116.53.133, g6-standard-4, 8GB RAM) running Supervisor-managed exchange agent and frontend, agents server (96.126.111.107, g6-standard-4, 8GB RAM) running three Supervisor-managed agents, registry server (97.107.132.213, g6-standard-2, 4GB RAM) running Supervisor-managed registry services, and observability server (66.228.45.25, g6-nanode-1, 1GB RAM) running Docker-containerized telemetry stack.

Production deployment uses Linode Kubernetes Engine with three-node pool (g6-standard-4 instances, 8GB RAM each, 24GB total cluster memory), seven application deployments (exchange with 2 replicas for availability, three dual-container agents supporting HTTP and SLIM protocols, registry with MongoDB sidecar, frontend), four observability deployments (OTeL Collector, Jaeger, ClickHouse, Grafana), ConfigMap storing shared configuration (registry URL, agent endpoints, OTeL endpoint), Secret storing base64-encoded credentials (OpenAI, MBTA, Anthropic API keys), Services providing stable DNS names for pod-to-pod communication, and Ingress routing external traffic via domain names (mbta.agent.mitdataworksai.com).

Application process management in testing uses Supervisor with autostart and autorestart policies, environment variable injection, stdout and stderr capture to log files, and supervisorctl interface for lifecycle control (status, restart, reread, update commands). Production uses Kubernetes with replica management ensuring high availability, rolling update strategy deploying new versions gradually, liveness probes checking pod health every 15 seconds, readiness probes controlling traffic routing during updates, and resource limits preventing excessive consumption (1Gi memory, 500m CPU per exchange pod).

Network security implements Cloud Firewall with service-specific inbound rules per server (exchange: 22, 8100, 3000; agents: 22, 50051-50053; registry: 22, 80, 6900, 8000; observability: 22, 16686, 3001, 8123, 9000, 4317, 4318), unrestricted outbound for API access (MBTA V3, OpenAI, registry queries), and SSH key-based authentication eliminating password access.

Deployment automation uses Linode CLI for infrastructure provisioning (firewalls, instances, node pools), bash scripts for VM configuration (dependencies, services, environment), kubectl for Kubernetes deployments (apply manifests, rolling updates), and SCP for code transfer from Windows development machine to remote servers.

Testing update workflow involves SCP upload (2 to 3 seconds transferring changed files), supervisorctl restart (3 to 5 seconds restarting service), and log verification watching initialization messages, completing in under 60 seconds from code change to verified operation. Rollback completes in under 60 seconds via git checkout, retrieving the previous version and redeployment.

Production update workflow involves building container images from updated source code, pushing images to container registry, updating Kubernetes manifests with new image tags, and kubectl apply triggering rolling updates, completing in 2 to 3 minutes with zero downtime as new pods start before old pods terminate.

Agent health tracking includes registry status fields (alive, last_update, capabilities, tags stored in MongoDB), web dashboard visualization at http://97.107.132.213 with color-coded status and capability display, and planned heartbeat monitoring enabling automatic failure detection within 90 seconds.

Total infrastructure costs approximately $125 monthly across testing and production environments, demonstrating that production-grade multi-agent systems can operate on modest budgets when appropriately architected.

Acknowledgments

We thank Dr. Hema Seshadri, Dr. John Zinky, and Javier Solis Vindas from Akamai Technologies for infrastructure guidance, deployment pattern discussions, and for providing research access to Linode cloud resources.

The MBTA Transit Conversational Intelligence system runs on Linode infrastructure, demonstrating production deployment patterns for federated multi-agent systems using emerging AI protocols.

References

Linode. “Linode Cloud Manager Documentation.” https://www.linode.com/docs/ (2024).
Linode. “Linode Kubernetes Engine.” https://www.linode.com/products/kubernetes/ (2025).
https://github.com/agntcy/oasf
https://github.com/projnanda/nanda-index
https://github.com/projnanda/nanda-index/blob/main/switchboard/README.md
Supervisor Project. “Supervisor: A Process Control System.” http://supervisord.org/ (2024).
Docker, Inc. “Docker Compose Documentation.” https://docs.docker.com/compose/ (2024).
Cloud Native Computing Foundation. “Kubernetes Documentation.” https://kubernetes.io/docs/ (2024).
MongoDB, Inc. “MongoDB: The Developer Data Platform.” https://www.mongodb.com/docs/ (2024).
OpenTelemetry Project. “OpenTelemetry Documentation.” https://opentelemetry.io/docs/ (2024).
https://dataworksai.com/deploying-the-internet-of-ai-agents-part-1/
https://dataworksai.com/deploying-the-internet-of-ai-agents-part-ii/
https://dataworksai.com/deploying-the-internet-of-ai-agents-part-iii/
https://dataworksai.com/deploying-the-internet-of-ai-agents-part-v/

Author

Manikandan Meenakshi Sundaram