Deploying the Internet of AI Agents: Part V

0

Manikandan Meenakshi Sundaram

Deploying the Internet of AI Agents: Part V

MBTA Transit Conversational Intelligence: Orchestration Through Semantic Discovery

Manikandan Meenakshi Sundaram¹, Sharanya Badrinarayanan¹, Neha Save¹, Javier Solis Vindas², John Zinky, Ph.D.², *Hema Seshadri, Ph.D.¹˒² 

¹ Northeastern University · ² Akamai Technologies  *Principal Investigator

Here is a scenario I want you to imagine. You’re standing on a snowy platform at Fenway, phone in hand, staring at yet another Red Line delay alert. You need to reach Harvard in forty minutes for a meeting. You have multiple questions forming in your mind. It is not “What are the delays? How do I get to Harvard?” “Where’s the nearest station to where I’m heading?” The real question is layered with context and constraint: “Given these specific delays, their likely duration based on past patterns, my current location at a landmark rather than a station, and my time deadline, what should I actually do right now?”

No single AI agent holds all the pieces to answer this. The Alerts agent is aware of current disruptions and can predict durations based on 41,970 historical incidents. The StopFinder agent resolves “Fenway Park” to its nearest MBTA station. The RoutePlanner agent generates alternative routes avoiding disrupted lines. The answer requires all three, but only if they share their findings with each other. This is the coordination problem at the heart of multi-agent systems. How do you discover which agents are relevant when dozens are registered? How do you order their execution when they depend on each other? How do you pass context from one agent’s analysis to another’s routing logic? And how do you do all of this at production scale, where agent catalogs might contain not three agents but three thousand?

In Part III, we examined the federated registry model, specifically how the Northeastern registry stores agent metadata and how the switchboard enables cross-organizational agent collaboration through automatic schema translation. This post examines what happens after discovery, exploring how agents coordinate to answer prompts, how the system routes between MCP tools and A2A agents, and how specialized domain knowledge shapes the architecture.

While we trace through the entire system architecture, the user prompt we use to do so is: “I’m at Fenway Park and need to get to Harvard. Should I wait for the Red Line delays to clear or take an alternative?”

System Architecture: Tracing the Data Flow

The system comprises four architectural layers that transform user prompts into intelligent responses (Fig. 1). The user agent wrapper manages frontend interactions. The chat UI provides a browser-based interface where users type queries. Event logs capture conversation history for analytics and debugging. The frontend communicates with backend services via WebSocket, enabling real-time bidirectional communication without HTTP request-response overhead.

The exchange agent serves as the central orchestrator. It contains an LLM performing intent classification and routing decisions, an MCP client for direct tool access to MBTA data, and a StateGraph orchestrator coordinating multi-agent workflows (Fig. 1). This component decides whether queries route through fast MCP tools for simple lookups or sophisticated A2A agent coordination for complex requests requiring historical analysis and decision support.

Figure 1: Tracing the Data Flow

In Part II, we discussed the protocols used; the agent communications layer provides two protocol paths. The MCP server exposes MBTA API functionality as callable tools, including mbta_get_alerts for service disruptions, mbta_get_vehicles for real-time positions, mbta_get_routes for line information, and mbta_get_schedules for timing data. The SLIM transport on A2A enables secure, low-latency messaging between the exchange agent and specialized agents using encrypted channels with authentication.

The domain knowledge layer comprises two data sources. The MBTA official API provides real-time data, including current alerts, vehicle positions, stop information, and route schedules through REST endpoints (Fig. 1). The specialized agents (Alerts agent on port 8001, Planner agent on port 8002, Stopfinder agent on port 8003) augment API data with historical pattern analysis from 41,970 incidents spanning 2020 through 2023, landmark-to-station resolution via a 50-entry Boston database mapping locations like Fenway Park to Kenmore station, and LLM-enhanced route generation providing alternative paths when disruptions occur.

The user types their prompt into the MBTA Transit Conversational Intelligence application’s browser-based chat interface. The frontend app, written in JavaScript, captures the input and transmits it via WebSocket, sending a JSON message containing the prompt text and a protocol selection parameter set to auto, indicating that the user has not manually selected a specific routing path.

WebSocket was selected for its support of real-time bidirectional communication without HTTP request-response overhead. This enables future streaming response capability, where partial results could be displayed as agents complete their work, and maintains connection state for conversation context across multiple exchanges.

The FastAPI backend server receives this WebSocket message, proxies it to the Exchange agent, and forwards the prompt along with the protocol selection parameter, setting a timeout for the request. The architectural separation between the frontend app and Exchange agent serves multiple purposes. The lightweight frontend app handles UI concerns, including rendering, WebSocket management, and weather effects, while the Exchange agent focuses exclusively on routing logic and agent coordination. This separation enables independent scaling of each layer as demand grows. 

The Exchange agent functions as a central orchestrator, directing the workflow, delegating tasks, and planning. It follows the supervisor architecture. This orchestrator uses multi-agent planning to decompose tasks and assign them to specialized “worker” agents or protocols. The plan is often static or semi-static, reflecting a structured business process. 

There are two execution plans implemented: automatic routing and a user manual ride. Now the prompt reaches the critical decision point, where the system must determine which execution plan to use for this request: automatic routing or user-manual override. 

Execution Paths

The first execution path is through automatic routing. The prompt arrives at the Exchange agent, implemented in exchange_server.py, which serves as the central orchestration component responsible for routing decisions. The Exchange agent must quickly and accurately determine which of three possible execution paths should handle this particular prompt. There are three execution paths a user prompt could take to generate a response: Shortcut, MCP, and A2A Domain Path (Fig. 2).

  • Shortcut Path: The shortcut path, operating in approximately 10 milliseconds, handles pattern-matched responses for simple greetings via regex matching against known greeting phrases, requiring zero LLM calls and executing in sub-millisecond time.
  • MCP Path: The MCP Path manages direct MBTA API queries for current state lookups. This path can access real-time data, including alerts, predictions, vehicle positions, and routes, but cannot perform historical analysis, decision support, or leverage domain expertise beyond what the API provides.
  • A2A Domain Path: The A2A domain path, typically requiring six to eight seconds, coordinates multiple specialized agents with access to domain expertise. This path can perform historical analysis using the 41,970 incident dataset, provide decision support, and generate predictive recommendations for prompts that require capabilities beyond the current API data.

Figure 2: Different Execution Paths on the MBTA Transit Conversational Intelligence UI

The Exchange agent also supports manual protocol override through user interface (UI) control buttons, allowing users to force specific routing paths for testing or preference (Fig. 2). When a user selects the MCP button in the UI, the system sets the force protocol parameter to MCP, bypassing intelligent routing and executing the prompt through MCP tools regardless of complexity. Similarly, selecting the A2A button forces agent-based coordination even for simple prompts that could be handled through MCP.

This manual override capability serves multiple purposes during development and operation. It enables comparative testing of both paths with identical prompts, validating that routing logic correctly identifies prompt characteristics. For the frontend user, it provides transparency and control, allowing them to understand how different execution paths handle their requests. The system internals panel displays the selected path and whether manual override was applied, maintaining full visibility into routing decisions. During system development, these manual controls proved invaluable for debugging routing logic and comparing response quality across execution paths. 

But how does the system decide which path to take? The answer lies in a two-stage detection process that combines heuristic pattern matching with LLM semantic understanding. You can see how these different execution paths are answered with different protocols, the directions you need to go from Fenway to Harvard, and the alerts and stops you will require. 

Routing Decision Principles

At step one, before invoking an LLM, the system applies regex and substring matching to identify prompts that require domain expertise. The detection function examines the prompt text for several pattern categories, including decision-support phrases such as “should I” or “recommend,” predictive analytics phrases such as “how long will” or “when will,” and multi-agent coordination patterns such as location pairs matching “from X to Y.” The implementation uses keyword lists to detect each pattern category (Fig. 3):

def needs_domain_expertise(query: str) -> tuple[bool, str, List[str]]:
query_lower = query.lower()
detected_patterns = []

# Decision support patterns
DECISION = [“should i”, “recommend”, “suggest”, “better to”]
if any(kw in query_lower for kw in DECISION):
    detected_patterns.append(“decision_support”)
    return True, “Query needs decision support”, detected_patterns

# Predictive analysis patterns
PREDICTIVE = [“how long will”, “when will”, “worth waiting”]
if any(kw in query_lower for kw in PREDICTIVE):
    detected_patterns.append(“predictive”)
    return True, “Query requires predictive analysis”, detected_patterns

# Multi-agent coordination patterns
if re.search(r”from .+ to .+”, query_lower):
    detected_patterns.append(“routing”)
    return True, “Query requires multi agent coordination”, detected_patterns

return False, “Simple fact lookup”, detected_patterns

Figure 3: Keyword lists to detect each pattern category

For our example prompt, the detection identifies two patterns. The phrase “should I” matches decision support patterns, while the structure “from Fenway Park to Harvard” matches routing patterns. The function returns true along with the reasoning “query needs decision support” and the detected pattern list containing decision support and routing.

This heuristic serves as a safety mechanism for the subsequent LLM routing decision. If the LLM incorrectly selects MCP for a prompt exhibiting domain-expertise patterns, this detection overrides that decision and forces A2A routing, providing defense-in-depth for routing reliability.

At step 2 of the routing decision principles, the system performs a single LLM  call using GPT-4o-mini from OpenAI, consolidating multiple classification tasks that would otherwise require separate LLM invocations. For example, Intent extraction: The router analyzes the text and extracts a structured semantic intent: “alerts”, “trip_planning”, “stops” (and implicitly “none/other” if not related to the MBTA domain) are assigned. The system prompt instructs the model to classify intent, such as alerts, trip planning, or stops, then select the appropriate execution path based on whether the prompt requires analysis, prediction, historical data, or expertise.

The routing implementation constructs a comprehensive system prompt and calls GPT-4o-mini by OpenAI (Fig. 4):

system_prompt = “””You are an intelligent MBTA query routing system.
STEP 1: CLASSIFY INTENT
– “alerts”: Anything about MBTA service, delays, or disruptions
– “trip_planning”: Route planning, directions, how to get somewhere
– “stops”: Station/stop information

STEP 2: CHOOSE PATH
Is it MBTA-related?
  └─ YES → Does it need analysis/prediction/historical data?
    ├─ YES → path=”a2a” (Domain experts needed)
    └─ NO → Can MCP tool provide the answer?
              ├─ YES → path=”mcp” + select tool
              └─ NO → path=”a2a”
PRINCIPLE:
– Current fact → MCP
– Analysis/Prediction/Historical/Expertise → A2A
“””
response = await openai_client.chat.completions.create(
model=”gpt-4o-mini”,
messages=[
    {“role”: “system”, “content”: system_prompt},
    {“role”: “user”, “content”: f’Query: “{query}”‘}
],
temperature=0.3,
max_tokens=300
)

Figure 4: Comprehensive System prompt to LLM

GPT-4o-mini by OpenAI analyzes our example prompt and returns a JSON response classifying the intent as trip planning with 95 percent confidence, selecting the A2A path with complexity rated at 0.85 (Fig. 5):

{
  “intent”: “trip_planning”,
  “confidence”: 0.95,
  “path”: “a2a”,
  “reasoning”: “Query requires decision support using domain expertise. ‘Should I wait’ needs historical pattern analysis to predict delay duration.”,
  “complexity”: 0.85
}

Figure 5: Intent Classification & Reasoning

The reasoning explicitly states that the prompt requires decision support using domain expertise, noting that “should I wait” needs historical pattern analysis to predict delay duration, and this data exists in agent memory rather than in MCP tools. The model’s reasoning correctly identifies that answering this question requires the historical incident dataset, demonstrating that GPT-4o-mini has learned to recognize when prompts need capabilities beyond API access.

Routing Validation: Heuristic Override & Direct API Access through MCP

Let’s see how routing validation occurs through heuristic override. In step one, we saw that before the LLM is invoked, the prompts require domain expertise, and step two showcased classification tasks such as intent classification and reasoning (Fig. 6). The domain expertise detection from step one now validates the LLM’s decision from step two:

if needs_expertise and not decision.get(“manual_override”):
original_path = decision[“path”]
decision[“path”] = “a2a”
decision[“reasoning”] = f”EXPERTISE REQUIRED: {expertise_reasoning}”

Figure 6: Domain Expertise Detection

If domain expertise is needed and no manual override has been applied, the system can override the original path selection and force A2A routing, with an updated rationale explaining the need for expertise.

For this prompt, both the heuristic and LLM agree on A2A routing, so no override occurs. This two-stage approach provides defense-in-depth, where the heuristic can catch cases where the LLM might incorrectly route domain-expertise prompts to MCP.

The routing decision is made. For our example prompt requiring decision support and multi-agent coordination, the system selected the A2A path. But before examining that complex orchestration, let us first understand the simpler alternative that handles most queries.

Let’s look at the MCP path. Not all prompts require multi-agent orchestration. Simple real-time data lookups bypass the StateGraph entirely, routing through MCP tools for sub-second responses. When a user asks “Red Line delays?” the Exchange agent’s unified LLM classifier using GPT-4o-mini by OpenAI returns a routing decision indicating the intent is alerts, the path is MCP, the selected tool is mbta_get_alerts, and the parameters specify route_id as Red Line.

The Exchange agent invokes the MCP client, which communicates with the mbta-mcp server via stdio transport. The MCP client spawns the server as a Python subprocess and establishes a session over stdin and stdout streams. This stdio approach differs from HTTP-based communication, providing lower latency and simpler process management for local tool execution.

The MCP server receives the tool call and queries the MBTA V3 API directly, making an HTTP GET request to the alerts endpoint with the API key and route filter parameters. The MBTA API returns current alert data in JSON format, which the MCP server packages and returns to the client through the stdio stream.

The Exchange agent receives the structured API response and uses GPT-4o-mini by OpenAI to synthesize it into natural language. The synthesis prompt instructs the model to convert technical API data into a conversational answer, maintaining accuracy while improving readability. The final response states, “Red Line is experiencing delays due to a signal problem at Park Street.

Total latency is approximately 1000 milliseconds, compared to 6 to 8 seconds for the A2A path. The MCP path trades domain expertise for speed, handling prompts where current API data is sufficient without requiring historical analysis or multi-agent coordination.

The MCP path handles current alerts and predictions, vehicle locations in real time, stop information and nearby stations, route schedules and service patterns, and fare information and trip planning for direct routes. These queries require only API access without domain expertise. The MCP path cannot handle decision support questions such as “Should I wait for delays to clear?”, historical predictions such as “How long will delays typically last?“, or multi-agent coordination such as “Route from X to Y considering current delays and crowding.” These prompts require access to historical datasets, predictive analysis capabilities, or coordination between multiple specialized agents, and therefore automatically route to the A2A path for domain expertise.

Our example prompt, which was to get directions to go from Fenway to Harvard, cannot use this fast path. It requires historical prediction (“should I wait?”) and multi-agent coordination (location resolution plus route planning with disruption constraints). The prompt now enters the StateGraph orchestration layer, where specialized agents will collaborate to build an intelligent answer.

Having determined A2A routing, the Exchange agent transfers control to the StateGraph orchestrator, a LangGraph-based workflow coordination engine implemented in stategraph_orchestrator.py. This component manages the discovery, ordering, execution, and synthesis of specialized agent interactions.

The StateGraph Execution Model

StateGraph implements a directed acyclic graph (DAG) where nodes represent state transformation functions, edges define transitions that can be conditional or deterministic, state is an immutable dictionary propagated through the graph, and execution proceeds from an entry point through conditional branches to an end node. Our graph topology consists of four nodes arranged sequentially: 

  • The discovery node queries the registry for semantic agent filtering, receives the top N ranked candidates based on relevance scores, extracts the origin and destination via regex, and outputs the matched agents, intent classification, and parsed locations. 
  • The routing node determines the optimal execution order for agents, sets a routing decision flag, and outputs the ordered agent list. 
  • The execute node calls agents sequentially via SLIM or HTTP protocols, extracts domain analysis from their responses, and passes context to downstream agents, outputting the agents’ responses and the extracted domain analysis. 
  • The synthesis node combines responses with minimal transformation, preserving agent-specific details, to produce the final response for the user.

This four-node structure enables context-aware agent coordination where each agent contributes information consumed by subsequent agents, building toward a comprehensive answer rather than operating in isolation. We now know which agents to call. But order matters because calling them in the wrong sequence would produce inferior results. This is where execution orchestration becomes critical.

The first step: finding the right agents for this prompt. But with dozens of registered agents in production, how does the system efficiently identify which ones are relevant?

Node 1: Discovery

    The discovery node receives an initial state containing the user message and conversation identifier. Unlike traditional approaches that retrieve all agents and filter client-side, the node delegates semantic filtering to the registry through a specialized search endpoint.

    The node’s first action constructs a semantic search query based on the user’s intent. For trip-planning prompts, it generates keywords such as “route planning, trip directions, navigation path finding.” For alert prompts, it generates “transit service alerts, delays, disruptions, status monitoring.” This query is sent as an HTTP POST request to the registry’s /search/semantic endpoint, along with parameters specifying the maximum number of results and whether to filter for only alive agents.

    The registry performs keyword extraction and relevance scoring entirely on the registry server side, through semantic matching, eliminating the need to transmit full agent catalogs to the orchestrator. This architecture scales efficiently from three agents to thousands.

    The registry’s semantic search algorithm extracts meaningful keywords from the query by removing stopwords (common words like “the,” “a,” “is”) and retaining only substantive terms longer than two characters. For the query “route planning trip directions navigation,” the extracted keywords are [route, planning, trip, directions, navigation]. The registry then calculates a relevance score for each registered agent using a weighted scoring system (Fig. 7):

    def calculate_relevance_score(query, agent_id, description, capabilities, tags):
        score = 0.0
        query_keywords = extract_keywords(query)

    # Agent ID matching (weight: 2.0)
    for keyword in query_keywords:
            if keyword in normalize_text(agent_id):
                score += 2.0

    # Description matching (weight: 1.5)
    for keyword in query_keywords:
            if keyword in normalize_text(description):
                score += 1.5

    # Capabilities matching (weight: 3.0 highest priority)
    for capability in capabilities:
            for keyword in query_keywords:
                if keyword in normalize_text(capability):
                score += 3.0

    # Tags matching (weight: 1.0)
    for tag in tags:
            for keyword in query_keywords:
                if keyword in normalize_text(tag):
                score += 1.0

    return score

    Figure 7: Weighted Scoring System

    Capabilities receive the highest weight (3.0) because they directly represent what an agent can do. Agent identifiers receive moderate weight (2.0) because they often include descriptive keywords such as “planner” or “alerts.” Descriptions receive lower weight (1.5) as they may contain generic transit terminology. Tags receive the lowest weight (1.0) as they provide supplementary categorization.

    For MBTA Transit Conversational Intelligence with three agents, a query containing “route planning trip directions” would score the RoutePlanner agent at 7.5 points (matching “route” in description for 1.5 points, “planning” in capabilities for 3.0 points, “trip” in capabilities for 3.0 points), the Alerts Agent at 0.0 points (no keyword matches), and the StopFinder agent at 0.0 points (no keyword matches).

    The registry sorts agents by relevance score in descending order and returns only the top N results. For typical prompts, N equals five, but the orchestrator can adjust this parameter. Each returned agent includes its relevance score and match reason, showing which keywords matched which fields, such as “desc:route,cap: planning” indicating matches in description and capabilities fields.

    With scaling properties, this registry-side filtering architecture provides linear scaling, as the agent catalog grows (Fig. 8):

    Agent CountWithout FilteringWith Agent Filtering
    3 agentsFetch 3, process 3Filter 3, return 3
    100 agentsFetch 100, process 100Filter 100, return 5
    1000 agentsFetch 1000, process 1000Filter 1000, return 5

    Figure 8: Registry-Side Filtering Architecture

    For a catalog of 1000 agents, the orchestrator receives only five highly relevant candidates rather than the full catalog. This reduces token consumption in subsequent LLM calls by approximately 100x and eliminates network transfer of 995 irrelevant agent descriptions (Fig. 9).

    The HTTP response from the registry’s /search/semantic endpoint contains:
    {
      “query”: “route planning trip directions navigation path finding”,
      “total_candidates”: 3,
      “filtered_count”: 3,
      “returned_count”: 1,
      “results”: [
    {
      “agent_id”: “mbta-planner”,
      “agent_url”: “http://96.126.111.107:8002”,
      “relevance_score”: 7.5,
      “match_reason”: “desc:route,cap:planning,cap:trip”,
      “alive”: true,
      “capabilities”: [“trip-planning”, “routing”, “directions”, “navigation”],
      “description”: “Plans optimal routes and trips on Boston MBTA transit network”
    }
      ]
    }

    Figure 9: Registry’s Semantic Endpoint Filter

    The orchestrator parses this response to extract agent configurations, including network endpoints, capabilities, and relevance scores. The match reason provides explainability, showing exactly why each agent was selected.

    Concurrent with semantic matching, the system extracts origin and destination using regular expressions. The regex pattern matches structures like “I am at X and get to Y” in the lowercased prompt text, extracting location names with high accuracy in under one millisecond for these grammatical patterns. The discovery node completes by outputting an updated state containing the matched agents (in this case, mbta-planner with a relevance score of 7.5), the trip-planning intent with 85% confidence, parsed location texts, and the semantic scores dictionary for observability.

    Node 2: Routing

      The routing node addresses a critical question about whether agents should execute in parallel or sequentially. Parallel execution would launch all agents simultaneously, gather results, and synthesize them afterward. This approach minimizes the total time to the duration of the slowest agent but prevents context passing between agents.

      Sequential execution with context passing, which is our implementation, first calls StopFinder to get resolved locations, then calls Alerts Agent to get disruption analysis, and finally calls RoutePlanner agent with both resolved locations and the alerts context. This approach enables agents to collaborate with shared context but requires total time equal to the sum of all agent execution times.

      We chose sequential execution because our agents have data dependencies. The RoutePlanner agent needs resolved station names from the StopFinder agent to ensure accurate routing, and critically, the RoutePlanner agent needs disruption constraints from the Alerts agent to avoid suggesting routes on lines with major delays. The Alerts agent’s analysis also benefits from knowing the user’s intended route, allowing it to prioritize relevant disruptions over unrelated service notices.

      The routing node enforces dependency order by reordering the matched agents list (Fig. 10):

      def routing_node(state: AgentState) -> AgentState:
      matched = state[“matched_agents”]
      ordered = []

      # Add stopfinder first (location resolution)
      for a in matched:
          if “stopfinder” in a:
              ordered.append(a)
              break

      # Add alerts second (disruption analysis)
      if not any(“alerts” in a for a in ordered):
          ordered.append(“mbta-alerts”)

      # Add planner last (routing with context)
      for a in matched:
          if “planner” in a:
              ordered.append(a)
              break

      state[“matched_agents”] = ordered
      state[“routing_decision”] = “FULL_CHAIN”
      return state

      Figure 10: Routing Node enforces Dependency Order

      The implementation ensures the StopFinder agent appears first for location resolution, adds the Alerts agent second for disruption analysis, and places the RoutePlanner agent last for routing with full context. For our prompt, the agents were already in the correct order from semantic matching, but this enforcement prevents errors if the discovery process returns them in a different sequence.

      The routing node sets a routing decision flag to indicate full chain execution, meaning all three agents will execute sequentially with context passing enabled. With the execution order established, agents now invoke sequentially, each building upon the work of its predecessors.

      Node 3: Execute

        Node 3, the workflow consists of three primary parts. Let’s examine each one systematically.

        Part 1: Intelligent Location Resolution
        Before invoking the StopFinder agent, the system applies a station-name recognition heuristic to determine whether location resolution is necessary. The function checks whether the text contains known station indicators, such as “station,” “square,” or specific station names like “Harvard,” “Central,” or “Park Street.”

        For our prompt, checking “Fenway Park” returns false because it contains no station indicators and requires a lookup, while checking “Harvard” returns true because it contains a recognized station name and can be used directly. This optimization eliminates one of two stopfinder invocations, saving approximately 200 milliseconds of execution time.

        The system constructs a query for StopFinder using the format “Find station: Fenway Park” and invokes the agent via SLIM transport. Inside the StopFinder agent, a landmark database lookup occurs using an in-memory dictionary mapping common Boston landmarks to their nearest MBTA stations (Fig. 11):

        LANDMARK_TO_STATION = {
        “fenway park”: “Kenmore”,
        “td garden”: “North Station”,
        “mit”: “Kendall/MIT”,
        “northeastern”: “Northeastern University”,
        # … 50+ Boston landmarks
        }
        station = LANDMARK_TO_STATION.get(“fenway park”)  # Returns: “Kenmore”

        Figure 11: In-memory Dictionary Mapping common Boston Landmarks

        The database maps Fenway Park to Kenmore Station, along with dozens of other landmarks, including TD Garden to North Station, MIT to Kendall/MIT, and Northeastern to Northeastern University Station.

        The mapping is MBTA-specific, associating landmarks with actual transit stations rather than just geometric proximity. The StopFinder Agent returns the result indicating that the Kenmore station was found, and the state updates to track that mbta-stopfinder has been called, the resolved origin is Kenmore, and the resolved destination is Harvard, which was used directly without requiring a lookup. Now comes the critical question: Should the user wait for delays to clear, or take an alternative route? This is where domain expertise makes the difference between information and intelligence.

        Part 2: Where Historical Data Becomes Decision Support
        The Alerts agent demonstrates the core value proposition of A2A agents: access to proprietary domain knowledge beyond public APIs. The execute node constructs a query for the Alerts agent, and here we make an important architectural decision regarding query construction.

        Query Construction: Full Context vs Decomposition
        The system passes the complete original user prompt to the Alerts agent rather than a decomposed subtask. The Alerts agent receives “I’m at Fenway Park and need to get to Harvard. Should I wait for the Red Line delays to clear or take an alternative?” in its entirety, not merely “Are there Red Line delays?

        This full context approach enables agents to receive complete user intent, prioritize relevant information, such as focusing on the Red Line for this specific route, understand decision support needs where “should I wait” triggers prediction mode, and provide contextually appropriate responses.

        An alternative decomposed subquery approach would extract only “Are there Red Line delays?” and send just that portion to the agent. The decomposition approach creates several problems. First, intent is lost when “Should I wait?” becomes simply “Check delays,” thereby eliminating the decision-support requirement. Second, prioritization suffers because the Alerts agent doesn’t know how to focus on routes relevant to the user’s journey. Third, orchestration becomes more complex as the system must track which decomposed tasks belong to which original prompt. Fourth, synthesis becomes more difficult as it is harder to reintegrate decomposed results into a coherent answer.

        To illustrate this difference concretely, consider a prompt stating, “I’m going to a Red Sox game at Fenway, will Red Line delays affect me?” With full context, the Alerts agent sees “going to Fenway” and knows to check delays specifically affecting Fenway-accessible routes, such as the Green Line to Kenmore with a possible Red Line transfer. With a decomposed query of merely “Are there Red Line delays?”, the agent doesn’t understand why the user cares about the Red Line specifically and provides generic Red Line status without route-specific context.

        Alerts Agent: Historical Pattern Analysis
        The Alerts agent detects decision-support keywords (“should I wait”) and fetches the current alerts from the MBTA API. The API returns a Red Line signal problem at Park Street, categorized as a technical problem, severity 5, created at 08:30.

        The agent filters out non-transit alerts (elevator outages, maintenance) and proceeds to historical analysis using its dataset of 41,970 MBTA incidents from 2020-2023. The in-memory dictionary contains statistical distributions by incident cause (Fig. 12):

        HISTORICAL_PATTERNS = {
            “TECHNICAL_PROBLEM”: {
                “median”: 41,  # minutes
                “min”: 25, “max”: 73,
                “sample_size”: 23104
            },
            “MEDICAL_EMERGENCY”: {
                “median”: 33,
                “min”: 23, “max”: 63,
                “sample_size”: 1953
            }
        }

        # Lookup and calculate
        pattern = HISTORICAL_PATTERNS[“TECHNICAL_PROBLEM”]
        elapsed = 15  # minutes since 08:30
        remaining = 41 – 15  # 26 minutes

        Figure 12: In-memory Dictionary Mapping contains statistical distributions

        The dataset contains 23,104 technical problems (41 min median), 1,953 medical emergencies (33 min median), and 149 weather incidents (268 min median). Medians are used rather than means because outliers skew averages upward.

        The agent calculates: 15 minutes elapsed, 26 minutes remaining (37% through typical duration). Recommendation: take the alternative route, as the 26-minute wait exceeds the 20-25 minutes it takes to travel from Kenmore to Harvard.

        Domain Analysis Extraction
        The StateGraph extracts structured analysis rather than passing raw text (Fig. 13):

        def extract_alerts_domain_analysis(alerts_response: str):
            analysis = {“overall_recommendation”: “unknown”, “severity”: “unknown”,
                        “should_avoid_routes”: [], “delay_impact”: “unknown”}
           
            # Extract impact range (25-73 minutes)
            impact_match = re.search(r”(\d+)-(\d+)\s*minutes?”, alerts_response)
            if impact_match:
                impact_max = int(impact_match.group(2))
                analysis[“severity”] = “major” if impact_max > 20 else “minor”
           
            # Extract affected routes (Red)
            for line in [“Red Line”, “Orange Line”, “Blue Line”, “Green Line”]:
                if line in alerts_response:
                    analysis[“affected_routes”].append(line.split()[0])
           
            # Extract recommendation
            if “take alternative” in alerts_response.lower():
                analysis[“overall_recommendation”] = “take_alternative”
                analysis[“should_avoid_routes”] = analysis[“affected_routes”]
           
            return analysis

        Result: {“overall_recommendation”: “take_alternative”, “severity”: “major”, “should_avoid_routes”: [“Red”], “delay_impact”: “25-73 min”}

        Figure 13: Domain Analysis Extraction

        This structured format enables programmatic consumption by downstream agents without LLM parsing overhead. The RoutePlanner agent receives explicit constraints (e.g., avoid Red, major severity) for direct use in route-generation logic.

        The Alerts agent provides real-time information to the user: “Avoid Red Line, major severity, 26 minutes expected.” The RoutePlanner agent must now translate this into actionable alternatives.

        Part 3: Context-Aware Route Planning
        The RoutePlanner agent receives not just the user prompt but also the enriched context from both preceding agents. The execute node constructs the planner query by combining resolved stations from StopFinder with domain analysis from the Alerts agent (Fig. 14):

        def needs_domain_expertise(query: str) -> tuple[bool, str, List[str]]:
        query_lower = query.lower()
        detected_patterns = []

        # Decision support patterns
        DECISION = [“should i”, “recommend”, “suggest”, “better to”]
        if any(kw in query_lower for kw in DECISION):
            detected_patterns.append(“decision_support”)
            return True, “Query needs decision support”, detected_patterns

        # Predictive analysis patterns
        PREDICTIVE = [“how long will”, “when will”, “worth waiting”]
        if any(kw in query_lower for kw in PREDICTIVE):
            detected_patterns.append(“predictive”)
            return True, “Query requires predictive analysis”, detected_patterns

        # Multi agent coordination patterns
        if re.search(r”from .+ to .+”, query_lower):
            detected_patterns.append(“routing”)
            return True, “Query requires multi agent coordination”, detected_patterns

        return False, “Simple fact lookup”, detected_patterns

        Figure 14: Context-Aware Route Planning

        The constructed query uses a structured format beginning with “IMPORTANT: Plan route using these EXACT station names,” followed by origin Kenmore and destination Harvard. It then includes an alerts analysis context section stating the overall recommendation as take alternative, severity as major, and routes to avoid as Red, with a note that major disruptions have been detected and alternative routes should be prioritized.

        This query demonstrates three forms of context transfer that improve routing quality. Location context from StopFinder provides the resolved station name, Kenmore, instead of the landmark name, Fenway Park, preventing the RoutePlanner agent from attempting landmark resolution itself and ensuring consistency in station resolution across agents. Disruption context from Alerts agent informs the planner to avoid the Red Line due to detected delays, enabling intelligent route filtering that automatically excludes disrupted lines. Severity context from Alerts agent helps the planner understand this is a major rather than minor disruption, affecting route ranking to prioritize reliability over pure speed considerations.

        Without context passing, the RoutePlanner agent would face several problems. It would attempt to resolve “Fenway Park” itself, creating redundant work and potential inconsistency if a different resolution occurs. It would generate routes without disruption awareness and might suggest Red Line options that are currently experiencing delays. It would rank routes solely by speed, without accounting for reliability factors introduced by current service disruptions.

        RoutePlanner Agent: API First Routing with LLM Enhancement

        The RoutePlanner agent combines deterministic API queries with generative LLM routing to prevent hallucination while enabling creative alternatives. The agent first resolves station identifiers from names because the MBTA API requires stop identifiers rather than human-readable names. Finding Kenmore returns the identifier place-kencl, while finding Harvard returns place-harsq.

        The planner then checks for direct routes via API before attempting any generative routing. This API first principle queries routes serving the origin station, receiving Green-B, Green-C, and Green-D as the routes that stop at Kenmore, then queries routes serving the destination station, receiving only Red as the route serving Harvard. Computing the intersection of these route sets yields an empty set, confirming no direct route exists between Kenmore and Harvard.

        Before asking a large language model to generate routes, the system verifies connectivity through authoritative API data. This prevents the LLM from confidently suggesting impossible connections that don’t exist in the actual transit network. In early system versions without this API first approach, GPT-4o-mini would sometimes hallucinate direct routes such as “Take the Green Line from Kenmore to Harvard,” which is impossible because these stations are on different lines with no direct connection.

        Since no direct route exists and the alerts context indicates the Red Line should be avoided, the RoutePlanner agent uses GPT-4o-mini by OpenAI to generate alternative routes. The prompt instructs the model to generate two different route options that avoid the Red line due to service disruptions and provides details about the MBTA system, including line endpoints and major transfer stations. The prompt asks for different routes with transfer details and time estimates, and to format each option with the number of MBTA lines used, transfers, and timing information.

        The system uses a temperature of 0.7 for route generation, which is higher than the 0.3 used for routing classification. Route generation benefits from some creativity to explore different options. Temperature values too low, such as 0.1, cause the model to always suggest the same route, while values too high, like 0.9, risk suggesting nonsensical line combinations that don’t match the network topology. The temperature of 0.7 balances consistency with creative exploration of the solution space.

        GPT-4o-mini generates two route options. Option one uses Green and Orange lines with a transfer at North Station, taking approximately 25 minutes from Kenmore via Green-D to North Station, then Orange to Harvard. Option two uses Green Line and Bus 1 with a transfer at Hynes Convention Center, taking approximately 28 minutes from Kenmore via Green-C to Hynes, then Bus 1 to Harvard Square. The LLM recommends option one as faster, while avoiding Red entirely and providing reliable transit connections.

        The planner prepends a context explanation stating that routing is based on current conditions, noting major disruptions on Red with a predicted wait time of 25-73 minutes, with 26 minutes most likely. The final response combines this context with the generated route options and recommendation, providing users with both the constraint explanation and the actionable alternatives.

        Node 4: Synthesis

        The final node combines agent responses into a user-facing answer. Early implementations employed aggressive LLM synthesis, leading to quality issues. When the synthesis LLM received full responses from StopFinder stating “Found: Kenmore,” from Alerts agent providing detailed historical context and recommendations, and from RoutePlanner agent providing comprehensive route options with details, it would generate overly condensed output.

        The aggressive synthesis would produce text like “There are Red Line delays. Alternative routes are available from Fenway to Harvard,” which achieved brevity but lost critical information. The predicted 26-minute duration disappeared; the resolved station name Kenmore was replaced by the original Fenway; specific route options with transfer points vanished; and the recommendation rationale explaining why the alternative is better was eliminated.

        The LLM’s summarization removed the actionable details that make responses useful, prioritizing conciseness over informativeness. Our synthesis architecture implements three preservation rules (Fig. 15):

        async def synthesize_node(state: AgentState) -> AgentState:
        agents_called = state[“agents_called”]
        responses = [r.get(“response”) for r in state[“agent_responses”]]
        routing = state[“routing_decision”]

        # RULE 1: Single agent = direct return (zero synthesis)
        if len(agents_called) == 1:
            return {**state, “final_response”: responses[0], “should_end”: True}

        # RULE 2: Two agents = conditional concatenation
        if len(agents_called) == 2:
            if “no” in alerts_response.lower()[:100]:
                return {**state, “final_response”: planner_response}
           
            # Minimal synthesis for clean output
            prompt = “””Combine alerts and route concisely (3-4 sentences):
           
            Alerts: {alerts[:250]}
            Route: {planner[:400]}
           
            State recommendation clearly, provide route details.
            Don’t lose: predicted duration, route specifics, decision reasoning.”””
           
            synthesis = await openai_client.chat.completions.create(
                model=”gpt-4o-mini”,
                temperature=0.2,
                max_tokens=300,
                messages=[{“role”: “user”, “content”: prompt}]
            )
           
            return {**state, “final_response”: synthesis.choices[0].message.content}

        # RULE 3: Full chain (3+ agents) = minimal LLM synthesis
        if routing == “FULL_CHAIN”:
            # Similar synthesis logic for 3+ agents
            pass

        Figure 15: Three Preservation Rules in Synthesis Architecture 

        For multi-agent responses, the system uses GPT-4o-mini with a temperature of 0.2 to preserve fidelity and use input truncation to focus on key points while reducing token costs. The synthesis output states that Red Line signal delays will likely last approximately 26 more minutes with median of 41 minutes based on 23,104 past incidents, recommends taking alternative route as faster than waiting, lists alternative routes from Kenmore to Harvard including option one using Green and Orange via North Station in approximately 25 minutes and option two using Green and Bus 1 via Hynes in approximately 28 minutes, and recommends option one as faster than waiting and more reliable than the bus option.

        Quality validation confirms that the synthesis preserves the decision answer, telling the user to take an alternative, provides quantitative justification explaining 25 minutes is less than the 26 minutes wait time, maintains historical context referencing 41 minutes median from 23,104 incidents, keeps route details specifying lines, transfers, and timing, and achieves concise integration connecting all elements in four sentences.

        The Architectural Foundation: Where Does Domain Knowledge Live?

        Our hybrid MCP and A2A architecture makes a fundamental distinction that drives all routing decisions: the data access architecture. The routing logic does not optimize for speed or simplicity; rather, it reflects where domain knowledge physically resides in the system.

        MCP server capabilities provide tools that query the MBTA V3 API, offering access to current alerts, real-time vehicle positions, arrival predictions, and route schedules. However, these tools cannot access historical incident patterns, statistical distributions, or proprietary datasets compiled and processed over time.

        A2A agent capabilities include direct access to the MBTA V3 API and a curated dataset of 41,970 historical incidents from 2020 through 2023, along with preprocessed statistical distributions and domain-specific analysis logic. These agents can provide current status information, historical predictions, and decision-support recommendations.

        When a user asks whether they should wait for delays to clear, answering requires predicting delay duration from historical patterns. In our current implementation, this capability exists in the Alerts agent’s in-memory dataset, which represents a curated compilation of 41,970 MBTA incidents from 2020 through 2023, preprocessed into statistical distributions for rapid analysis. While this historical data originates from public MBTA sources, it has been processed, categorized, and optimized for predictive queries in ways that the standard MBTA API does not currently provide. The routing decision reflects this data availability: prompts requiring historical pattern analysis use A2A to access the agents’ enhanced datasets, while prompts that need only current status can use MCP’s direct API.

        This distinction is based on current data availability rather than fundamental protocol limitations. As data sources evolve, whether through new MCP tools accessing processed historical datasets or enhanced API endpoints providing statistical summaries, routing logic can adapt accordingly. Understanding this data-driven routing approach is essential to understanding the system architecture.

        The Real Distinction: Data Access Architecture

        MCP server capabilities include tools that expose MBTA API functionality. These tools provide mbta get alerts for current alerts, mbta get predictions for current predictions, mbta search stops for stops data, and MBTA plan trip for routes. All of these draw from the MBTA V3 API as their sole data source.

        Alerts: agent capabilities include both an MBTA API client for real-time data and the historical patterns dataset, containing 41,970 incidents from 2020 through 2023. This dual data-source architecture enables the agent to provide current status, along with historical prediction and decision-support capabilities.

        The historical dataset exists in the Alerts agent’s Python code as an in-memory dictionary. Routing decisions reflect these data access patterns: prompts about current MBTA state can be answered by MCP because the API provides that information directly, while prompts requiring historical predictions route to A2A agents because they have access to the processed historical dataset.

        This data organization is a design choice rather than a fundamental limitation of either protocol. Future architectures could provide MCP tools with database access to historical datasets or create enhanced API endpoints that serve statistical summaries directly. As data sources evolve, routing logic can adapt to reflect new capability availability across both execution paths.

        Key Architectural Principles

        The MBTA Transit Conversational Application utilizes four architectural principles

        1. The first principle is that data architecture drives routing decisions. Routing reflects where domain knowledge resides rather than prompt complexity or desired latency. Prompts answerable from the MBTA API alone are eligible for MCP routing. Prompts requiring the historical dataset must use A2A. Prompts requiring multi-agent context passing must use A2A because of coordination requirements.
        2. The second principle is full context over decomposition. Each agent receives a complete user context to enable intent understanding, including why the user is asking, prioritized information to emphasize what matters most for this specific prompt, and an appropriate response format that distinguishes between decision-support requests and informational queries.
        3. The third principle is sequential execution when dependencies exist, but parallel when agents are independent. Agent execution order follows data dependencies rather than performance optimization. The StopFinder agent provides input to the RoutePlanner agent and therefore executes before it. Alerts agent provides constraints to the RoutePlanner agent and, therefore, executes before the RoutePlanner agent. StopFinder and Alerts agents are independent of each other and could be parallelized in future optimizations without affecting answer quality.
        4. The fourth principle is structured context extraction over natural language parsing. Domain analysis, extracted as structured data with fields such as avoid routes and severity levels, enables deterministic consumption by downstream agents, eliminates ambiguity in constraint interpretation, and allows programmatic integration without brittle text parsing that could fail due to variations in natural language expression.

        This architectural examination revealed that effective multi-agent coordination depends on several interconnected design decisions. Routing must be based on data access patterns, asking where required knowledge lives rather than optimizing for speed alone. Full context should be preserved so agents understand user intent rather than processing decomposed tasks in isolation. Sequential execution should be used when agents have dependencies, accepting latency costs for context-aware collaboration that produces better answers. Structured analysis should be extracted from natural language to enable programmatic consumption by downstream agents without brittle parsing. Finally, synthesis should be kept to a minimum to preserve detail, rather than prioritizing brevity over useful information.

        The result is a system where domain expertise embedded in specialized agents enables intelligent decision support, allowing the system to answer “should I wait” with “take an alternative because a 25-minute wait is less than a 26-minute wait” rather than merely reporting “delays exist.” This goes beyond what current state API queries can provide, demonstrating the value of agent-based architectures with access to processed historical knowledge.

        References

        1. MBTA Transit Conversational Intelligence application
          https://github.com/DataWorksAI-com/MBTA-Transit-Conversational-Intelligence-application
        2. Google Developer Blog. “Announcing the agent2agent protocol (A2A).” https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/  (2025).
        3. Anthropic. “Model Context Protocol.” https://modelcontextprotocol.io/introduction  (2024).
        4. Muscariello, L., Pandey, V., and Polic, R. “The AGNTCY agent directory service: Architecture and implementation.” arXiv preprint arXiv:2509.18787 (2025).
        5. Raskar, R., et al. “Beyond DNS: Unlocking the internet of AI agents via the NANDA index and verified AgentFacts.” arXiv preprint arXiv:2507.14263 (2025).
        6. Massachusetts Bay Transportation Authority. “MBTA V3 API Documentation.” https://api-v3.mbta.com/ (2024).
        7. LangChain. “LangGraph Documentation.” https://langchain-ai.github.io/langgraph/ (2024).
        8. https://dataworksai.com/deploying-the-internet-of-ai-agents-part-ii/ 
        9. https://dataworksai.com/deploying-the-internet-of-ai-agents-part-iii/ 

        Author

        Leave a Reply

        Your email address will not be published. Required fields are marked *