Token Delegation and MCP server orchestration for multi-user AI systems

#ai #mcp #security #identity

Written by Jakub Hrozek and Michelangelo Mori

We’ve been developing ToolHive to run and deploy MCP servers in a safe and consistent way. So, we are constantly asking ourselves, "how can a client use an MCP server more securely?" Recently, that led us to two more specific questions:

How do we maintain accountability and an audit trail when acting on behalf of users?
How do we serve multiple users with different access levels from the same client?

In this post, we'll illustrate how you can address these questions with the help of token delegation and an MCP server orchestrator.

It should be noted that we ran this exploration before the recent update to the MCP authorization spec. The update does ease things a bit, especially by making the MCP server an OAuth 2.1 resource server. That said, the spec doesn’t solve the issue of the MCP server authenticating to the backing system or the auditability issue and doesn’t concern itself with the identity of the caller.

The identity problem in agentic systems

If we’re working to ensure a client can use a MCP server more securely, we have to acknowledge that there are different kinds of clients. For the sake of identity, we’ll split them into agents and assistants. With assistants, such as ChatGPT desktop, the problem is simpler, as the assistant is really just an extension of the user and can assume their identity. The situation is very different with agents though.

Let’s illustrate the problem with an example: a DevOps engineer asks an agent to review an Infrastructure-As-Code configuration and identify security risks.

To do that, the agent needs to:

Read the Infrastructure-As-Code repository
Analyze the deployed infrastructure
Generate a report to an imaginary reporting system

The agent can’t just inherit the calling user's permissions. In this case, the DevOps engineer might have write access to the repository or the infrastructure, which the agent doesn’t need. In other cases, the agent might even need permissions the user doesn’t currently possess.

We also need to consider the audit trail. When the report is generated and saved or when the agent accesses cloud resources and thus triggers auditable events, we need to record both who requested the analysis (the user) and also who performs the actual action (the agent). Had we only used the agent credentials, we’d lose track of who triggered the agent run, and we shouldn’t be attributing the report or the resource access to the user (since they only invoke the agent and don’t actually perform the actions). Christian Posta articulated this well in his blog post on agent identity: “The user authorizes the goal, the agent chooses the implementation”.

Why token delegation matters for agents

To solve this challenge, we chose to implement OAuth 2.0 Token Exchange (RFC 8693) with a delegation pattern. RFC 8693 defines two distinct patterns:

Impersonation: Where "one principal to take on the rights and identity of another principal" and becomes "indistinguishable from" the original party
Delegation: Where "the acting party is still distinguishable from the party on whose behalf it is acting"

For the agentic use case, delegation is the clear choice. We need the agent to remain distinguishable from the user while acting on their behalf. For an assistant use-case, such as ChatGPT desktop, a simple impersonation would suffice, because the assistant lacks agency, and it could be argued that the assistant could simply exchange a user token with the intended audience of the assistant for a token with the intended audience of the MCP server.

The full discussion of token delegation is outside the scope of this post, but for the sake of illustration, a token for a user “alice” using a “github-issue-agent” might look as follows:

{
    "iss": "https://token-exchange-service", // to verify the token
    "sub": "[email protected]",              // The user (subject)
    "aud": ["mcp-orchestrator"],
    "act": {                                 // The agent (actor)
      "sub": "system:serviceaccount:github-issue-agent",
      "iss": "https://kubernetes-cluster"
    },
    "email": "[email protected]",            // User's claims for authorization
    "groups": ["devel", "engineering"]
  }

MCP Orchestration

We have been using out-of-the-box MCP servers, in particular the GitHub MCP server in our setup, so the server doesn’t implement the new authorization specification or the streamable HTTP transport. Instead, the GitHub MCP server assumes that it is provisioned with the right credentials and the HTTP transport is provided by a proxy that the ToolHive Kubernetes operator runs alongside the MCP server.

Now, what credentials are right is often unknown until a user invokes the agent. To that end, we implemented a simple MCP server orchestrator, which when given a delegated secret, makes a decision based on a simple policy engine and a delegated secret provided by a calling agent. On policy match, the MCP orchestrator provisions an MCP server with a matching credential and returns the URL of the server to the client.

Before an agent connects to the MCP server, it acquires a delegated token using its own identity. The policy engine decides whether the agent (identified by the act claim in the delegated token) is allowed to access the MCP server it wants to. Then, given the user (identified by the sub claim in the delegated token), the policy engine makes a decision on what secret, if any, the user is allowed to access. If both these access control checks succeed, an MCP server is provisioned for this user+agent combination with the help of the ToolHive operator. Regardless of what protocol the underlying MCP server is using (such as stdio in the case of the GitHub MCP server), ToolHive exposes the MCP server through HTTP and returns a URL to the caller.

An end-to-end implementation

To verify the usability of the whole flow end-to-end, we built a proof-of-concept system consisting of:

Python Agent: A simple GitHub issue summarization agent that demonstrates the complete delegation flow.
Token Exchange Service: RFC 8693-compliant service that creates delegated tokens. Despite RFC 8693 being a standard, we ended up implementing our own simple token delegation service, because as we found out, not many IdPs support issuing delegated tokens (several, including Keycloak do support token exchange, though).
MCP Orchestrator ("MCop"): Policy engine and provisioning service for MCP servers which leverages the ToolHive operator for MCP server management.

As a reference architecture, we deployed the whole system into a Kubernetes cluster, which allows us to use Kubernetes service accounts as agent identities.

On a high level, the complete flow works as follows:

User authenticates with their identity provider (we used Dex for testing)
User calls the agent with their JWT token
Agent requests a delegated token combining its service account with the user's identity
Agent calls the orchestrator with the delegated token to request an MCP server
Orchestrator validates policies and provisions an isolated MCP server instance
Agent uses the provisioned MCP server to complete the user's request

Let’s illustrate the flow with more examples including the policy decisions:

After having received the user request, the agent validates the user token first, Then, before kicking off its logic, the agent requests a delegated token:

 delegated_token = await token_exchange_client.exchange_tokens(
      subject_token=user_jwt,          # DevOps engineer's token
      actor_token=agent_jwt,           # Agent's service account token
      audience="mcp-orchestrator"      # Target service
  )

The agent then contacts the MCP orchestrator which makes two policy decisions:

Agent Authorization: Can this agent access the requested MCP server? The orchestrator checks the agent's identity from the act.sub claim against a policy configuration:

# Agent binding policy
github:
  allowedAgents:
    - "system:serviceaccount:agents:github-issue-summarizer"

This ensures that only authorized agents can access specific MCP servers.

User-Specific Provisioning: What secrets should the MCP server receive?

The orchestrator examines the user's claims (email, groups) to determine which secrets they can access:

github-public-only-token:
  allowedPrincipals:
    - email: [email protected]

github-private-token:
  allowedPrincipals:
    - email: [email protected]

Once authorization passes, the orchestrator provisions a new MCP server instance by creating an MCPServer CRD instance that is subsequently picked by the ToolHive operator. Each instance is configured with the secrets selected by the orchestrator and isolated in its own deployment. The URL of the MCP server is then returned to the agent, which uses it by passing it to the usual MCP client libraries.

Adapting the flow to the new MCP authorization specification

If the GitHub MCP server had implemented the new authorization spec, the situation would be somewhat different in the sense that the server would enforce access to its interface. We would still have to, however, include a policy that would acquire or select the right credential to talk to the back-end service.

Acquiring or selecting the credentials for the back-end connection would still benefit from a delegated token in case of an agent to keep the audit trail and ensure that we capture both who the agent is and on whose behalf it is acting.

For MCP server authors, we would like to make ToolHive the piece that makes deploying your MCP servers easy, fun, and secure.

Key takeaways

Token delegation solves the fundamental identity problem for multi-user agents by preserving both the user's authorization context and the agent's identity throughout the request chain.

Being able to select the right credentials for the MCP server to connect to the services exposed by its tools is a problem that needs to be addressed with the new authorization spec as well. Our solution for orchestrating just-in-time servers with pre-configured credentials is a way to better secure existing MCP servers used by agentic systems. More work is needed to enable developers to create servers that are easy to run and secure out of the box.

At Stacklok, we will be continuing the effort to address these challenges. If you’re working in this space, we’d love to hear from you! Please join our Discord and let’s talk.