Deploying the Internet of AI Agents: Part VIII

0

Manikandan Meenakshi Sundaram

Deploying the Internet of AI Agents: Part VIII

Designing DANS, A Dynamic Agent Naming System

Manikandan Meenakshi Sundaram¹, Sharanya Badrinarayanan¹, John Zinky, Ph.D.², Javier Solis Vindas², *Hema Seshadri, Ph.D.¹˒²

¹ Northeastern University · ² Akamai Technologies  *Principal Investigator

Part IV of this series argued that AI agents require resolvable Agent Names instead of static URLs. The architecture used Domain Name System (DNS) concepts to explain how AI Agent names transform into requester-specific communication channels between agents. This post presents the Dynamic Agent Naming System (DANS), which is a single server that implements dynamic resolution and is designed for experimentation.

DANS exposes a small, stable REST API, and that stability is the organizing principle of its design: callers depend only on the external interfaces, while the registry, the cache, the ranking logic, and the storage behind them remain free to evolve. This post follows that API and the lifecycle it serves: Record → Resolve → Make Comms → Invoke (Fig. 1).

Figure 1:  DANS interacts with external components using the life cycle of: Record,  Resolve,  Make Comms, and Invoke

The four phases below structure this blog (Fig. 2):

PhaseActivity
1. RecordThe target agent records its deployment, its base URL, the protocols it speaks, its region, and a health URL.
2. ResolveA requester queries with the Agent Name together with its own context, and DANS returns a single tailored endpoint.
3. Make CommsTo deliver scale and security, the Authoritative Name Server creates or selects the components that manage the communication channel between the requester and the target, and returns the endpoint to that tailored channel.
4. InvokeThe requester sends its request to the endpoint; the channel carries it to the target agent, and the response is returned.

Figure 2: The DANS lifecycle, Record, Resolve, Make Comms, Invoke

1. Record: Placing an Agent into DANS

On startup, the target agent needs to deploy itself and advertise its existence so it can be found. The agent acquires an Agent Name of the form namespace:path_name from a Name Space Server. It then records its deployment metadata: its base URL, the protocols it speaks, its region, and its connection hints. 

POST /register      # records an agent’s deployment# Header: X-API-Key: <key>   (required when DANS_AUTH=on){  “label”: “planner”,  “endpoint”: “http://96.126.111.107:50052”,  “namespace”: “public”,  “region”: “us-east”,  “location”: { “city”: “Newark”, “latitude”: 40.7357, “longitude”: -74.1724 },  “protocols”: [“a2a”, “slim”],  “protocol_metadata”: {    “a2a”:  { “version”: “0.3.0”, “path”: “/”, “format”: “google_a2a” },    “slim”: { “identity”: “public/mbta-transit-ci/planner” }  }}

The deployment metadata records where and how the agent may be reached. This is distinct from registering the agent’s capabilities in the agent discovery service described in Part III. DANS validates each submission on entry; the scheme must be http or https, the protocols must be recognized, and all fields are length-capped and reject invalid input with a 400. Each record is written through to MongoDB and held in memory; the in-memory state is reconstructed from MongoDB on restart.

Recording is idempotent per endpoint: re-recording the same label and endpoint updates the existing entry rather than creating a duplicate, whereas a new endpoint submitted under an existing label is added as a replica. This is the mechanism by which DANS represents regional replicas; a replica that omits protocol_metadata will inherit metadata from its siblings.

The base URL is sufficient to reach the agent, but it is not tailored to any particular requester; it is a single address that is the same for every caller. Converting it into a tailored channel is the work of the phases that follow.

Health check. A background loop polls every health URL at a fixed interval (AGENTS_HEALTH_INTERVAL, 30 seconds by default), recording each endpoint’s status, latency, and load, so that resolution never returns an agent replica that has silently failed.

2. Resolve: Converting a Name into a Tailored Endpoint

A requester agent gets a hold of an Agent Name, a tuple of (namespace, path_name), obtained either from the discovery registry of Part III or out of band (for example, from an email, a social profile, or an SMS message). With a single call it converts that name, together with its own context, into a usable endpoint to communicate to the target agent.

POST /resolve{  “agent_name”: “urn:agents.dataworksai.com:public:planner”,  “requester_context”: { “protocols”: [“a2a”], “location”: { “city”: “Boston” } }}

The name parameter. agent_name is the only required field; it identifies the agent being sought. It is the Agent Name minted at Record-Time, expressed either as a full URN (urn:agents.dataworksai.com:public:planner) or as a bare label (planner, which DANS expands under its own namespace). The name carries no protocol, version, or location information; those are resolution concerns, supplied via requester_context.

The requester context. requester_context is the mechanism by which the caller describes itself so that the response may be tailored to it. It is an open set of key–value hints; the two fields upon which DANS acts are protocols, the transports the caller is able to speak, listed in order of preference, which drive protocol selection, and location, the caller’s city or coordinates, which drive geographic ranking. The structure is intentionally extensible; further hints, such as device, network, or required security posture, may be added without altering the contract. Both fields are optional; an empty context resolves to the target agent’s defaults.

An unknown name returns 404; a known name with no healthy endpoints still returns a result, flagged emergency_fallback, so that the caller may determine whether to proceed.

The return value. DANS returns the tailored endpoint together with the reasoning behind its selection:

{  “endpoint”:      “http://97.107.132.213/firewall/proxy/planner”,  “protocol”:      “a2a”,  “negotiated_by”: “intersection”,  “fallback_protocol”: “http”,  “protocol_metadata”: { “version”: “0.3.0”, “path”: “/”, “format”: “google_a2a” },  “ttl”:       60,  “selected_by”:   “geo_nearest”,  “region”:        “Newark, NJ”,  “cached”:        false,  “metadata”:  { “direct_endpoint”: “http://96.126.111.107:50052”, “total_candidates”: 2 }}

The fields that govern a caller’s behavior are summarized below (Fig. 3):

FieldDefinition
endpointThe invoke URL, where the caller sends its request (entry to a tailored channel, not the base URL).
protocol / negotiated_byThe chosen transport and how it was chosen.
protocol_metadataConnection hints for that protocol (A2A version/path/format, or a SLIM identity).
ttlHow long the caller may cache (60s healthy → 5s unhealthy).
selected_byWhy this endpoint was chosen (geo_nearest / lowest_latency / only_available / *_fallback).
metadata.direct_endpointThe agent’s raw address, before the tailored channel.

Figure 3: The /resolve return value—the top-level fields specify where and how to connect; metadata explains why

Because DANS is built on FastAPI, the authoritative and always-current contract is the OpenAPI document served at /openapi.json (and, in human-readable form, at /docs).

3. Make Comms: Building the Communication Channel

Traffic management between the requesting agent and the target is handled by components created and maintained by the Authoritative Name Server. This occurs in two parts: the server first finds or makes a channel appropriate to the specific requester (3a), after which the request is carried by a separate, tailored communication channel that manages the scale and security of the agent-to-agent interaction (3b).

3a. Internals of the Authoritative Name Server: Finding or Making a Communication Channel

Several policies shape the response the Authoritative Name Server produces.

Protocol selection. Every agent advertises which communication methods it supports, and every requester states which ones it can use. DANS automatically picks the best match between the two using a “first applicable” rule (Fig. 4). Along with that match, DANS also sends back everything the requester needs to actually open a connection  so two agents built on completely different frameworks can still talk to each other, with no custom wiring required.

negotiated_byWhenResult
intersectionRequester preferences overlap the Target’s protocolsRequester’s first preference, for which the Target also supports
agent_defaultRequester sent no preferenceTarget’s primary (first-recorded) protocol
fallbackNo overlaphttp, with warning: “no_protocol_match”

Figure 4: Protocol negotiation, first matching rule wins.

Health monitoring. A background loop polls every recorded endpoint every 30 seconds, recording status, latency, and load. On resolution, DANS consults the cached health state and performs a live check of any endpoint not yet present in the cache, bounded by a concurrency semaphore so that a single request cannot generate an unbounded number of outbound connections. This adds the cost of one probe to a cold-cache resolution.

Geographic ranking. Among healthy candidates DANS selects exactly one replica and reports its rationale through selected_by (Fig. 5). Selection occurs at resolution time, and the caller receives a single chosen endpoint. Results are cached by name and context for the computed TTL.

Conditionselected_by
The caller supplied a locationgeo_nearest (the nearest healthy endpoint by great-circle distance)
No location was suppliedlowest_latency (the lowest-latency healthy endpoint)
A single healthy endpoint remainsonly_available
No healthy endpoint remainsemergency_fallback

Figure 5: How DANS determines selected_by.

Failover. Because a name may have several replica endpoints, an unhealthy endpoint is simply removed from consideration. The fares agent, for example, is a single name with two endpoints: Newark (us-east) and Frankfurt (eu-central). A requester in Boston resolves to Newark; a requester in London resolves to Frankfurt, based on geographic distance. Should Newark fail its health check, the next resolution returns Frankfurt, with no change to the caller, the name, or any configuration (Fig. 6).

Figure 6: Cross-region failover, one name & two registered endpoints

Resolution is single-pass: resolve once, invoke the returned URL, and re-resolve when the TTL expires.

A notable caching defect. The initial implementation cached the raw result and applied the tailored-URL rewrite afterward, so that a cache miss returned the tailored URL while a cache hit returned the raw endpoint. Identical calls therefore behaved differently according to cache state, and callers that relied on the rewrite intermittently bypassed the channel. The remedy was a single reordering: apply the rewrite before writing to the cache. The general principle is that a derived value should be derived first and cached second.

3b. The Tailored Communication Channel

The tailored communication channel is where the system’s scale and security are delivered. The channel consists of a chain of stages that manage the interaction between agents. Some of the stages may be reused between different requesters and some state may be needed to separate the requests. The Authoritative Name Server creates and configures the chain of servers that implement the communication channel.

A tailored channel needs an entry point and path through the components. The Tailored URL can represent both the location of the initial entry server (DNS name and port) and the path through the channel’s components  …/firewall/proxy/planner.

A2A Proxy. At present, the A2A proxy and the Prompt Firewall constitute a single component and therefore a single server. It terminates the inbound A2A connection, inspects the prompt (blocking injection, redacting sensitive responses, and rate-limiting), and forwards the request to the endpoint that DANS selected at resolution time. It does not re-rank or load-balance, as selection has already occurred at channel setup time.

Future Stages Richer stages of the chain are planned, such as compliance logging and a dedicated SLIM gateway.

4. Invoke: How to Invoke the URL

At invocation time, the caller sends the agent’s native request to the invoke URL. For an A2A agent, this is the ordinary A2A JSON-RPC message/send it would otherwise send directly. The proxy applies policy and forwards the request to the selected agent, and the response is returned along the same channel.

Two properties merit emphasis. First, the proxy guards both transports: HTTP A2A passes through it, and when SLIM is negotiated the same policy is applied on the SLIM path, ensuring that the alternative transport does not constitute a means of bypass. Second, because the proxy fronts a URL issued by DANS, health and geographic selection occur upstream at resolution time; a fresh resolution following a failure directs the proxy to a healthy replica.

A key idea is the separation of the control plane and the data plane. The Authoritative Name Server sets up the components for a connection (control plane).  The A2A proxy is the entry to the connection (data plane). The two meet at the single tailored URL that resolution returns. The detailed design of the data plane, the firewall’s rule engine, redaction, and rate-limiting internals, is a substantial subject in its own right, to which a later post is devoted.

Conclusion

DANS transforms an Agent Name into a customized communication channel through a four-stage lifecycle: Record, Resolve, Make Comms, and Invoke. This entire process is anchored by a stable REST API, which provides a consistent interface while allowing the underlying technical components to evolve independently. Since resolution returns a tailored URL, DANS can return a communication channel that allows agent-to-agent interactions to scale and be secure.

The upcoming installment in this series will shift from conceptual description to practical application. We will explore the DANS client libraries, specifically the target and requester SDK, alongside system deployment and operability. Furthermore, we will examine the MBTA integration case study, demonstrating how the Exchange agent utilizes DANS to record, resolve, and invoke the Planner, StopFinder, and Alerts agents, facilitating real-time regional failover.

References

  1. Part I: Stop Building Bots, Start Building AI Agent Networks 
  2. Part II: Implementing MBTA Transit Conversational Intelligence with Emerging AI Protocols
  3. Part III: Federated Architecture for Global AI Agent Registries
  4. Part IV: Need for Dynamic Resolution of AI Agent Names
  5. Part V: MBTA Transit Conversational Intelligence: Orchestration Through Semantic Discovery
  6. Part VI: Production Deployment on Akamai Connected Cloud
  7. Raskar et al., “Beyond DNS: Unlocking the Internet of AI Agents via the NANDA Index and Verified AgentFacts,” arXiv:2507.14263 (2025)
  8. Project NANDA Adaptive Resolver
  9. FastAPI
  10. https://mintlify.wiki/agentgateway/agentgateway/guides/a2a-proxy

Author

Leave a Reply

Your email address will not be published. Required fields are marked *