In April 2026, a Unit 42 incident response engagement documented something that, if described two years earlier, would have sounded like a slightly paranoid thought experiment. An insider used their company’s own AI assistant to stage a data exfiltration attack. The forensic analysis showed the employee manipulating the tool through a sequence of crafted prompts, steering it to retrieve, compile, and package sensitive records that the employee’s own account did not have direct permission to access. The AI did not know it was being weaponized. It was simply following instructions, as designed. The tool had more access than the person operating it, and the person had figured that out before the security team did.

cover

This is not primarily a story about a malicious insider, though it is that too. It is a story about an architectural assumption that broke silently: that the AI assistant was a productivity tool, bounded and supervised, not an autonomous actor with standing credentials and broad system access. The gap between those two mental models is where most enterprise security teams are currently living, uncomfortably and often without fully realizing it.

From chatbot to credentialed actor

The transition happened faster than anyone planned. What started as a pilot program to summarize support tickets became a system that could query the CRM, update records, and open Jira issues. The Copilot deployment that began by drafting emails ended up with OAuth tokens to read the entire shared drive. The AI-powered code assistant that started as a suggestion engine was granted repository write access because that made the workflow smoother. Each individual decision made sense at the time, in isolation. The cumulative result is that many organizations now have AI agents operating inside their environments with credential footprints that would look alarming on any human account review.

Model Context Protocol (MCP), the open standard that Anthropic published in late 2024 and that has since been adopted by virtually every major AI platform, is the plumbing that makes this possible at scale. MCP defines how a language model connects to external tools and data sources: a standardized JSON-RPC interface through which the model can invoke functions, query databases, read files, and call APIs. By early 2026, there were over 13,000 known MCP servers publicly catalogued, covering integrations from GitHub and Jira to Slack, internal SQL databases, cloud management consoles, and financial systems. MCP Gateway research from Prompt Security puts the number growing rapidly. The practical implication is that any organization deploying a modern AI assistant is almost certainly operating MCP-connected agents, whether or not that term appears anywhere in the procurement paperwork.

The security properties of MCP as deployed in most environments are best described as optimistic. Research cited in the Cisco State of AI Security 2026 found that 43% of tested MCP server implementations contain command injection flaws, 30% will fetch any URL passed to them without restriction (classic SSRF), and 22% are vulnerable to path traversal. The MCP Attack Library (MCPLIB), published by academic researchers cataloguing attack classes, documents 31 distinct attack methods across four categories: direct tool injection, indirect tool injection, malicious user attacks, and LLM-inherent attacks. The agents’ blind reliance on tool descriptions and sensitivity to file-based attack vectors are among the most consistent findings. The model trusts what the tool tells it about itself, which is exactly the kind of trust assumption that turns supply chain compromise into privilege escalation.

The attack surface you did not draw

Traditional threat modeling asks what an attacker can do if they compromise a user account, a server, or a network segment. Agentic AI introduces a category that fits none of those cleanly: a system that has credentials like a service account, accepts instructions like a user, interprets context like an application, and acts with latency and autonomy that human operators cannot always monitor in real time.

Tool poisoning is the attack that most clearly illustrates the structural problem. A malicious MCP server, or a legitimate server with a compromised tool definition, presents itself to the agent with a description that looks benign. The model, which decides which tools to invoke based on natural language descriptions in its context window, selects the malicious tool because the description matches the task. What appears to be a document search utility quietly exfiltrates the retrieved content to an attacker-controlled endpoint. What appears to be a calendar integration writes meeting details to an external log. The MasterMCP demonstration toolkit by SlowMist provides concrete examples: data poisoning through JSON injection, function overriding via malicious MCP plugin, and cross-server call hijacking where a compromised server intercepts traffic intended for a legitimate peer.

Indirect prompt injection is the variant that incident responders find most operationally disorienting, because it requires no direct access to the system. An attacker embeds instructions in a document, a web page, a GitHub issue, or an email that the agent is asked to process. When the agent ingests that content, it reads the attacker’s instructions as part of its input context and executes them. The documented GitHub MCP case is illustrative: a malicious issue in a public repository contained hidden instructions that, when an AI agent with repository access processed the issue, triggered data exfiltration from private repositories in the same GitHub account. The agent was just doing its job: reading an issue, following instructions. The instructions happened to come from an attacker rather than the user.

OWASP’s Top 10 for Agentic Applications (2026) classifies this under ASI01, acknowledging it as the primary risk category for autonomous AI systems. The structural reason it is so difficult to address is that the model has no reliable mechanism for distinguishing “instructions from the user” from “instructions embedded in data processed by the user.” This is not a configuration problem. It is an architectural property of how large language models process context, and it does not go away because you added an input filter.

Agent-to-agent impersonation represents the next tier of sophistication, already documented in real incidents. When organizations deploy multiple AI agents that communicate with each other (a research agent feeding outputs to a summarization agent feeding outputs to a reporting agent), the implicit trust between agents creates a lateral movement path. A compromised upstream agent can insert instructions into its outputs that the downstream agent executes without challenge. In one documented scenario, a compromised financial analysis agent received a poisoned intermediate output and proceeded to initiate a workflow that triggered unauthorized fund transfers before any human reviewed the chain. The Stellar Cyber analysis of agentic AI threats in 2026 describes memory poisoning in multi-agent architectures as one of the more persistent and underestimated problems: an agent’s persistent memory store, once poisoned, influences every subsequent session until explicitly cleared, which most deployments do not do routinely.

What IR looks like when the actor is autonomous

The forensic challenge is distinct from what the earlier post on AI forensic readiness described in terms of log preservation and evidence reconstruction. That piece addressed what you need to preserve and how. This is about what you are actually looking for during an investigation when the primary actor was not human.

The first problem is attribution. When a human compromises a system, there is generally a chain: credentials, authentication event, process creation, lateral movement. When an agent takes a harmful action, the authentication event is legitimate (the agent’s own service credentials), the process is the orchestration runtime (normal), and the action itself may be within the agent’s nominal permissions. The only anomaly is the instruction that caused the action, which exists inside the prompt context and may not be logged at all, or may be logged in a format that does not survive the default retention period. The Unit 42 2026 Incident Response Report notes that in AI-assisted insider threat cases, forensic analysis could only reconstruct the attack because the organization had specifically enabled verbose prompt logging months earlier, which was not the default configuration.

The second problem is scope determination. In a conventional intrusion, blast radius assessment starts from compromised accounts and pivots outward. When an agent is the vector, the blast radius is the agent’s entire permission footprint: every API it can call, every database it can query, every service it can write to. If the agent was configured with broad access in the name of usefulness, and most are, scope assessment becomes an exercise in enumerating everything the agent could theoretically have touched, not just what it demonstrably did. In environments without granular MCP server-level logging, “could have touched” is often the best you can do.

The third problem is timeline reconstruction. Agent actions unfold at machine speed. A single indirect prompt injection can trigger a chain of tool invocations that complete in seconds, long before any human SOC analyst would see an alert. By the time the incident is detected, the action chain may be complete and the evidence already degraded by short log retention defaults. This reinforces the point from the post on IR hidden slowdown: the delay between compromise and detection is where most damage accumulates, and autonomous agents compress that window catastrophically.

For the IR team arriving at an agentic AI incident, the immediate priorities are specific. Capture the full prompt and completion logs for the agent sessions in question before retention expiry. Enumerate all MCP servers the agent was connected to and verify their integrity, including checking version history for rug-pull updates. Map every tool invocation from the affected sessions, with timestamps, input parameters, and returned data. Identify any external data sources (documents, web pages, emails, GitHub issues) the agent processed in the window before the anomalous behavior, as these are the most likely injection vectors. Preserve the agent’s memory store if it uses persistent memory, treating it as potentially compromised. Cross-reference agent actions with downstream system logs to identify all systems touched, not just those where anomalies were detected.

To make that workflow executable under pressure, it helps to align likely attack patterns with the exact evidence sources to pull first.

Attack pattern Priority evidence to preserve Primary log source
Indirect prompt injection from external content Raw prompt context, retrieved external artifact, model completion trace Agent runtime prompt logs and content retrieval logs
Tool poisoning through MCP server metadata Tool definition snapshot, server version history, invocation payloads MCP gateway logs and server-side audit trail
Cross-server call hijacking Inter-server request chain, destination validation records, token scopes MCP broker logs and API gateway access logs
Agent-to-agent impersonation in multi-agent workflows Upstream output artifact, downstream execution trace, memory state diff Orchestrator event logs and persistent memory store history
Data exfiltration via legitimate tool calls Query parameters, returned dataset sample hash, outbound transfer records Tool invocation logs, database audit logs, egress proxy logs

If retention allows, preserve at least one known-good baseline session from the same agent profile. It gives investigators a behavioral control sample and reduces false positives during scope reconstruction.

The governance gap nobody wants to name

The Cisco data cited above is blunt: most organizations planned to deploy agentic AI into business functions, and 29% reported being prepared to secure those deployments. That is a gap worth sitting with. It means the majority of enterprise AI agent deployments in 2026 went live without security teams having formal control over what tools the agents could access, what permissions they held, how their actions were logged, or what the response procedure was if they behaved unexpectedly.

The analogy to shadow IT is imprecise but directionally useful. Shadow IT involves users deploying tools outside approved channels. Shadow MCP involves agents acquiring capabilities outside approved inventories, either because the agent’s tool list was never formally reviewed, or because MCP server updates added capabilities after the initial deployment, or because the agent discovered and registered new servers autonomously in environments where that was permitted. SecureMCP and similar auditing tools are beginning to address this, but the tooling is far ahead of the organizational processes required to act on what it finds.

The Cloud Security Alliance’s Agentic MCP Security Best Practices Guide published in May 2026 provides a useful baseline: maintain an inventory of all MCP servers and their permission scopes, apply least-privilege access at the tool level rather than granting agent-wide permissions, version-pin tool definitions and alert on changes, implement human-in-the-loop checkpoints for high-risk action categories, and log at the MCP interaction level rather than only at the application level. None of this is technically exotic. All of it requires organizational decision-making that most procurement and deployment processes currently skip.

The part that rarely makes it into vendor presentations is the liability question. When an AI agent causes a breach, damages data, or triggers an unauthorized transaction, the question of who is responsible is genuinely unresolved in most jurisdictions. NIS2 and DORA establish organizational accountability for operational resilience, which includes AI systems used in critical functions. But the causal chain from “agent was misconfigured” to “organization is liable” runs through governance decisions that most enterprises have not formalized. The investigation that cannot reconstruct what the agent did and why is not just forensically incomplete. It is an unresolved compliance exposure.

Liability and evidence chain

For legal, risk, and IR teams, the practical objective is to connect autonomous actions to accountable governance decisions with evidence that survives scrutiny. In practice, teams should start from governing obligations and map the relevant control expectations from NIS2 (Directive (EU) 2022/2555), DORA (Regulation (EU) 2022/2554), and, where applicable, the EU AI Act (Regulation (EU) 2024/1689).

From there, teams need a continuous chain that links approved design intent to runtime behavior and then to response decisions. The chain should preserve who approved tool scopes and risk acceptance for each agent profile. It should also capture what happened at execution time through prompt context, MCP telemetry, and downstream logs. Finally, it should document containment and remediation actions, including credential rotation and control updates. If one link is missing, organizations can still narrate what likely happened, but they struggle to demonstrate why controls failed and whether obligations were met.

Treating agents as principals, not tools

The mental model shift required here is not complicated, but it is uncomfortable because it implies work. An AI agent with persistent credentials, tool access, and autonomous decision-making capability is not a productivity feature. It is a principal in your identity and access management model. It should be inventoried, scoped, monitored, reviewed, and deprovisioned on the same cadence as human service accounts and privileged access accounts.

This means the agent gets a formal entry in your CMDB with its current MCP server connections listed as dependencies. It means access reviews happen quarterly, not never. It means the SIEM has detection rules for anomalous agent behavior, not just anomalous human behavior. It means the incident response playbook has a section specifically for AI agent compromise, distinct from the general “compromised service account” procedure, because the forensic artifacts are different and the scope assessment methodology is different.

The same mindset should also produce a concrete hardening baseline for every production agent. In practical terms, this means enforcing per-tool least privilege instead of broad shared scopes, pinning MCP server versions with formal approval for capability changes, and denying outbound destinations by default while allowing only business-required domains. It also means introducing policy checkpoints before high-impact actions such as fund movement, privilege changes, and bulk export operations. Where architecture allows, read and write tools should be separated across distinct agent identities.

Operationally, teams should retain detailed tool invocation metadata with IR-grade retention, treat persistent memory as sensitive state with integrity checks and periodic resets, and run adversarial testing for indirect prompt injection and tool poisoning before major releases. Monitoring should focus on behavioral drift in decision paths, not only infrastructure-level anomalies. Suspicious autonomous action chains should trigger immediate credential revocation and rotation.

None of this requires waiting for new regulation. OWASP guidance discussed earlier, the CSA best practices, and the accumulated evidence from 2026 incident data all point in the same direction. The organizations that treat their AI agents as trusted tools requiring no special oversight are the ones whose next IR engagement will involve reconstructing a prompt chain from fragmented logs at 2am. The organizations that treat agents as principals with real access and real accountability are the ones who, when something goes wrong, will actually be able to explain what happened.

That is, for an incident responder, the only thing that matters.