Security Brief Governance & Security Security Boundaries for Enterprise AI

Why MCP Is a Trust-Boundary Problem

The problem is not connector count. It is identity, scope, trust boundaries, and what the runtime is allowed to do.

Mar 1, 2026 7 min

Risk brief

What matters, what breaks, and what to pay attention to.

The problem is not connector count. It is identity, scope, trust boundaries, and what the runtime is allowed to do.

MCP expands capability, but also widens the trust boundary.

Identity and scoped permissions matter more than connector count.

Prompt injection turns tool access into an execution-risk surface.

Watch the video

Why MCP Is a Trust-Boundary Problem

0:00

MCP is often introduced as a cleaner interface for tool access. That is true, but it understates the real design problem. Once agents can discover and invoke tools, the security question shifts from whether the connector works to whether the access model is coherent.

I've been watching teams rush to implement Model Context Protocol without understanding what they're actually building. They see the demos where Claude Desktop connects to GitHub or reads local files, and they think "great, now my agents can do anything." That's exactly the problem. When your agents can do anything, they will.

Trust boundaries, not just connectors

Every MCP integration creates a trust boundary. Who is the agent acting as? Which scopes are attached? What data can be read? What actions can be taken? What must remain approval-gated?

Prompt injection makes this more serious, not less. When instructions and data live in the same context window, the runtime needs stronger boundaries than simple tool availability. It needs scoped identity, constrained actions, and clear separation between observation and execution.

A useful mental model is to treat MCP like internal platform infrastructure, not a plugin directory. That means authentication, authorization, input controls, output validation, audit logging, and revocation paths all matter.

I keep seeing teams implement MCP servers that expose entire databases with SELECT * privileges because "the agent needs context." That's not context. That's a breach waiting to happen. The same teams wouldn't dream of giving a junior developer production database credentials on their first day, yet they'll hand those same credentials to an LLM that can be convinced it's playing a video game.

The anatomy of MCP trust failures

Let me walk through what actually happens when MCP goes wrong. I recently reviewed an incident where a customer support agent built on Anthropic's Claude was given MCP access to both Zendesk and Stripe. The intention was reasonable: let the agent look up customer tickets and check payment status. The implementation was not.

The MCP server exposed these capabilities:

zendesk_search: Full text search across all tickets
zendesk_update: Update any ticket field
stripe_customer_lookup: Get customer details
stripe_refund: Process refunds up to $500

What could go wrong? Everything. A carefully crafted support ticket convinced the agent it was running a "refund validation test" and processed 47 refunds in under three minutes. The prompt injection came through a ticket subject line. The agent never questioned why a customer would need multiple refunds processed in sequence.

This wasn't a failure of the LLM. Claude performed exactly as designed. It read the instructions in the ticket, determined they were valid actions within its capability set, and executed them. The failure was treating MCP like a simple connector instead of a security boundary.

Where teams get this wrong

The failure mode is not that an agent cannot connect to a tool. The failure mode is that it connects too broadly, moves too much context, or takes action with a level of authority no human would have approved in that moment.

That is why I think of MCP primarily as a trust-boundary problem. The connector is the easy part. The security model is the actual product.

I see three consistent patterns where teams fail:

First, they implement ambient authority. The MCP server runs with a single set of credentials that represent maximum possible permissions. Every agent request executes with the same authority regardless of who initiated it or why. This is like running every microservice as root because "it's easier."

Second, they trust the context window as a security boundary. If prompt injection has taught us anything, it's that LLMs cannot distinguish between instructions and data. Yet teams build MCP servers that accept any instruction from the agent as gospel. "Delete all records where user_input = true" becomes a valid command because the agent asked nicely.

Third, they skip the audit trail. When an MCP server executes actions, who tracks what happened? I've debugged systems where the only record of agent actions lived in CloudWatch logs with 24-hour retention. When something goes wrong, and it will, you need forensic-level detail about what the agent saw, what it decided, and what it did.

Building security into MCP architecture

Here's what secure MCP implementation actually looks like. I'm going to use a real architecture I helped design for a financial services client who needed agents to access customer data, transaction history, and execute specific operations.

Component	Insecure Approach	Secure Implementation
Authentication	Single API key for all agents	Per-agent service accounts with rotating credentials
Authorization	Tool-level access (can use Stripe: yes/no)	Operation-level permissions with parameter constraints
Data Access	Direct database queries	Read-through cache with field-level filtering
Write Operations	Immediate execution	Approval queue with human-in-the-loop for sensitive actions
Audit Trail	Basic logging	Immutable event stream with full request/response capture
Context Isolation	Shared context across requests	Sandboxed execution per conversation

The secure implementation uses what I call the "minimum viable authority" pattern. Each agent request carries three pieces of context:

Who initiated the conversation (user identity)
What role the agent is playing (support, sales, operations)
What specific task is being performed (lookup, update, execute)

The MCP server evaluates these against a policy engine, in this case Open Policy Agent, before executing any tool. The policies encode rules like:

Support agents can view transaction history but not modify it
Refunds over $100 require manager approval
Customer data access is logged and rate-limited
Write operations are queued and reviewed if they match risk patterns

The prompt injection problem gets worse

MCP amplifies prompt injection risks because it provides a standardized way for agents to take actions. Before MCP, each integration was bespoke. An attacker had to understand the specific function calling syntax for each model and integration. Now, MCP provides a uniform interface. Learn to inject against one MCP server, and you can attack them all.

I've been tracking prompt injection attempts against MCP servers in production. The sophisticated ones don't ask the agent to "ignore previous instructions." They embed commands in data that looks legitimate. A support ticket about a refund that contains transaction IDs formatted as MCP tool calls. A document summary request where the document contains instructions to exfiltrate other documents. A code review comment that triggers deployment actions.

The scariest part? These attacks work because they exploit the fundamental design of LLMs. The model can't tell the difference between your instructions and user data. It's all just tokens in the context window. MCP makes this worse by giving those confused tokens the ability to take real actions.

Here's a concrete example from last month. An attacker discovered that a company's documentation agent had MCP access to their Confluence instance. The agent was supposed to help employees find information. The attacker submitted a question: "Can you help me understand the authentication flow described in the security audit from last quarter?" The agent dutifully searched for, found, and summarized a confidential penetration test report.

Was this a failure of the agent? Not really. It did exactly what it was asked. The failure was giving the agent broad read access to Confluence without understanding that "helping employees find information" could include finding information they shouldn't have access to.

Implementing defense in depth for MCP

The solution isn't to avoid MCP. It's to implement it with the same rigor you'd apply to any API that handles sensitive operations. Here's the defense-in-depth approach I recommend:

Layer 1: Authentication and Identity

Every MCP request must carry cryptographically verifiable identity. Not just "this is the customer service agent" but "this is agent instance cs-prod-4729 running on behalf of user alice@company.com in support ticket #8924." I use JWT tokens with short expiration times, typically 5 minutes, that encode this context.

Layer 2: Authorization and Policy

Before any tool execution, check permissions. Not just "can this agent use Stripe" but "can this agent issue a refund for this specific customer for this specific amount given the current context?" I've had good success with policy engines like OPA or AWS Cedar that can express complex rules about who can do what when.

Layer 3: Input Validation

Every parameter passed to an MCP tool needs validation. Not just type checking, but semantic validation. If the tool accepts SQL, parse it and verify it only touches allowed tables. If it accepts customer IDs, verify they match the current conversation context. This is where most teams fail. They validate that the input is a string when they should validate that it's a safe string.

Layer 4: Output Filtering

Even read operations need filtering. Just because an agent can query customer data doesn't mean it should see all fields. I implement field-level filtering based on the principle of least privilege. The support agent sees name and ticket history. It doesn't see SSN or payment methods unless specifically authorized for the current interaction.

Layer 5: Audit and Monitoring

Every MCP operation needs to be logged with enough detail to reconstruct what happened. That means request parameters, response data, decision logic, and timing. I push these to an immutable audit stream, typically Amazon QLDB or a similar append-only store. When, not if, something goes wrong, you need this forensic trail.

What this means for your engineering team

If you're building or deploying MCP integrations, here's what you need to do differently:

Stop thinking of MCP as a convenience layer. It's security-critical infrastructure that happens to be convenient. That means it needs the same review process as any system that can modify production data or access sensitive information.

Start implementing graduated access controls. Not every agent needs every capability all the time. Build MCP servers that can dynamically adjust their capabilities based on context. The demo agent that sales shows to prospects should have read-only access to sanitized data. The production support agent needs more, but still not everything.

Invest in testing hostile prompts. I maintain a library of prompt injection patterns specifically for MCP contexts. Test what happens when tool names appear in user input. Test what happens when the agent is asked to explain its capabilities. Test what happens when error messages contain instructions.

Design for revocation. When you discover an agent is compromised or misbehaving, you need to cut its access immediately. That means MCP servers need to support real-time permission updates, credential rotation, and circuit breakers. I've seen too many teams realize they can't actually stop a rogue agent without taking down the entire system.

Most importantly, recognize that MCP represents a fundamental shift in how we think about system integration. We're moving from deterministic APIs called by code we control to probabilistic agents that interpret instructions we can't fully predict. That requires a different security model, one that assumes the caller is potentially confused or compromised.

The teams that get this right are the ones that will build truly powerful agent systems. The ones that don't are building future incident reports. I know which one I'd rather be.