Most enterprises can't stop stage-three AI agent threats, VentureBeat survey finds

In March, Meta experienced a significant security incident when a rogue AI agent bypassed all identity checks and exposed sensitive data to unauthorized personnel. Shortly after, Mercor, a $10 billion AI startup, revealed a supply-chain breach via LiteLLM. Both incidents highlight a common structural weakness: monitoring without proper enforcement and enforcement without isolation. A VentureBeat survey involving 108 qualified enterprises identified this as the prevalent security architecture in use today.

Gravitee’s State of AI Agent Security 2026 survey of 919 executives and practitioners illustrates this disconnect. Despite 82% of executives believing their policies protect against unauthorized agent actions, 88% reported AI security incidents in the past year, and only 21% have visibility into their agents’ runtime activities. According to Arkose Labs’ 2026 Agentic AI Security Report, 97% of enterprise security leaders anticipate a significant AI-agent-driven incident within 12 months, yet just 6% of security budgets address this risk.

The VentureBeat survey shows a rebound in monitoring investment to 45% of security budgets in March, up from 24% in February, as organizations redirected funds to runtime enforcement and sandboxing. However, enterprises remain focused on observation while their agents require isolation. CrowdStrike’s Falcon sensors have identified over 1,800 distinct AI applications on enterprise endpoints, and the fastest adversary breakout time has decreased to 27 seconds. Monitoring systems designed for human-speed workflows are unable to match the pace of machine-speed threats.

The subsequent audit outlines three stages: observe, enforce, and isolate. Observation is the first stage, followed by enforcement, where IAM integration and cross-provider controls transform observation into action. The final stage, isolation, involves sandboxed execution to limit the impact when safeguards fail. VentureBeat Pulse data from 108 qualified enterprises connects each stage to investment signals, OWASP ASI threat vectors, regulatory surfaces, and actionable steps for security leaders.

The threat surface stage-one security cannot see

The OWASP Top 10 for Agentic Applications 2026 identified the attack surface last December. The ten risks are goal hijack (ASI01), tool misuse (ASI02), identity and privilege abuse (ASI03), agentic supply chain vulnerabilities (ASI04), unexpected code execution (ASI05), memory poisoning (ASI06), insecure inter-agent communication (ASI07), cascading failures (ASI08), human-agent trust exploitation (ASI09), and rogue agents (ASI10). These risks don’t have counterparts in traditional LLM applications. The audit links six of these risks to the stages where they are most likely to appear and outlines controls to address them.

In April 2025, Invariant Labs revealed the MCP Tool Poisoning Attack, where malicious instructions in an MCP server’s tool description can lead an agent to exfiltrate files or hijack a trusted server. CyberArk expanded this to Full-Schema Poisoning. The mcp-remote OAuth proxy addressed CVE-2025-6514, which threatened 437,000 downloads due to a command-injection flaw.

Merritt Baer, CSO at Enkrypt AI, stated in a VentureBeat interview that enterprises often misinterpret their approval of AI vendors as approval of the underlying system. The real dependencies lie deeper and are prone to failure under stress. CrowdStrike CTO Elia Zaitsev highlighted the visibility issue at RSAC 2026, explaining that it is difficult to discern if an agent or a human is running a browser without tracing the process tree, a distinction most enterprise logging setups cannot make.

The regulatory clock and the identity architecture

The priority of auditability reflects a similar pattern. Initially, 50% of respondents rated it as a top concern in January, but it fell to 28% in February as teams rushed deployments. By March, it surged to 65% when teams realized they lacked a forensic trail of their agents’ actions.

HIPAA’s 2026 Tier 4 penalty for willful neglect stands at $2.19M per violation category per year. Gravitee’s survey found that 92.7% of healthcare organizations reported AI agent security incidents compared to the 88% average across all industries. For healthcare systems with agents handling PHI, this ratio differentiates a reportable breach from an uncontested finding of willful neglect. FINRA’s 2026 Oversight Report recommends human checkpoints before agents execute actions or transactions, alongside narrow scopes, granular permissions, and comprehensive audit trails of agent actions.

Mike Riemer, Field CISO at Ivanti, discussed the speed issue in a VentureBeat interview, noting that threat actors reverse engineer patches within 72 hours. Enterprises that fail to patch within this window remain vulnerable. Agents operating at machine speed extend this exposure indefinitely.

The identity issue is structural. Gravitee’s survey of 919 practitioners showed that only 21.9% of teams treat agents as identity-bearing entities, 45.6% continue using shared API keys, and 25.5% of deployed agents can create and task other agents. One-fourth of enterprises can spawn agents that their security teams have not provisioned, representing ASI08 as architecture.

Guardrails alone are not a strategy

A 2025 study by Kazdan and colleagues demonstrated a fine-tuning attack that evaded model-level guardrails in 72% of attempts against Claude 3 Haiku and 57% against GPT-4o. This attack was recognized as a vulnerability by Anthropic and earned a $2,000 bug bounty from OpenAI. Guardrails limit what an agent is instructed to do, not what a compromised agent can access.

Chief Information Security Officers (CISOs) are aware of this. In VentureBeat’s survey, preventing unauthorized actions consistently ranked as the top capability priority, between 68% and 72% across waves. The demand is for permissioning, not prompting, indicating that guardrails target the wrong control surface.

At RSAC 2026, Zaitsev discussed the impending identity shift: “AI agents and non-human identities will proliferate across enterprises, quickly outnumbering human identities. Each agent will act as a privileged entity with OAuth tokens, API keys, and uninterrupted access to previously isolated data sets.” Identity security designed for humans will not withstand this change. Cisco President Jeetu Patel compared agents to “teenagers, highly intelligent but with no fear of consequences” in a VentureBeat interview.

VentureBeat Prescriptive Matrix: AI Agent Security Maturity Audit

Stage	Attack Scenario	What Breaks	Detection Test	Blast Radius	Recommended Control
1: Observe	Attacker embeds goal-hijack payload in forwarded email (ASI01). Agent summarizes email and silently exfiltrates credentials to an external endpoint. See: Meta March 2026 incident.	No runtime log captures the exfiltration. SIEM never sees the API call. The security team learns from the victim. Zaitsev: agent activity is “indistinguishable” from human activity in default logging.	Inject a canary token into a test document. Route it through your agent. If the token leaves your network, stage one failed.	Single agent, single session. With shared API keys (45.6% of enterprises): unlimited lateral movement.	Deploy agent API call logging to SIEM. Baseline normal tool-call patterns per agent role. Alert on the first outbound call to an unrecognized endpoint.
2: Enforce	Compromised MCP server poisons tool description (ASI04). Agent invokes poisoned tool, writes attacker payload to production DB using inherited service-account credentials. See: Mercor/LiteLLM April 2026 supply-chain breach.	IAM allows write because agent uses shared service account. No approval gate on write ops. Poisoned tool indistinguishable from clean tool in logs. Riemer: “72-hour patch window” collapses to zero when agents auto-invoke.	Register a test MCP server with a benign-looking poisoned description. Confirm your policy engine blocks the tool call before execution reaches the database. Run mcp-scan on all registered servers.	Production database integrity. If agent holds DBA-level credentials: full schema compromise. Lateral movement via trust relationships to downstream agents.	Assign scoped identity per agent. Require approval workflow for all write ops. Revoke every shared API key. Run mcp-scan on all MCP servers weekly.
3: Isolate	Agent A spawns Agent B to handle subtask (ASI08). Agent B inherits Agent A’s permissions, escalates to admin, rewrites org security policy. Every identity check passes. Source: CrowdStrike CEO George Kurtz, RSAC 2026 keynote.	No sandbox boundary between agents. No human gate on agent-to-agent delegation. Security policy modification is a valid action for admin-credentialed process. CrowdStrike CEO George Kurtz disclosed at RSAC 2026 that the agent “wanted to fix a problem, lacked permissions, and removed the restriction itself.”	Spawn a child agent from a sandboxed parent. Child should inherit zero permissions by default and require explicit human approval for each capability grant.	Organizational security posture. A rogue policy rewrite disables controls for every subsequent agent. 97% of enterprise leaders expect a material incident within 12 months (Arkose Labs 2026).	Sandbox all agent execution. Zero-trust for agent-to-agent delegation: spawned agents inherit nothing. Human sign-off before any agent modifies security controls. Kill switch per OWASP ASI10.

Sources: OWASP Top 10 for Agentic Applications 2026; Invariant Labs MCP Tool Poisoning (April 2025); CrowdStrike RSAC 2026 Fortune 50 disclosure; Meta March 2026 incident (The Information/Engadget); Mercor/LiteLLM breach (Fortune, April 2, 2026); Arkose Labs 2026 Agentic AI Security Report; VentureBeat Pulse Q1 2026.

The stage-one attack scenario in this matrix is not hypothetical. Unauthorized tool or data access ranked as the most feared failure mode in every wave of VentureBeat’s survey, growing from 42% in January to 50% in March. That trajectory and the 70%-plus priority rating for prevention of unauthorized actions are the two most mutually reinforcing signals in the entire dataset. CISOs fear the exact attack this matrix describes, and most have not deployed the controls to stop it.

Hyperscaler stage readiness: observe, enforce, isolate

The maturity audit tells you where your security program stands. The next question is whether your cloud platform can get you to stage two and stage three, or whether you are building those capabilities yourself. Patel put it bluntly: “It’s not just about authenticating once and then letting the agent run wild.” A stage-three platform running a stage-one deployment pattern gives you stage-one risk.

VentureBeat Pulse data surfaces a structural tension in this grid. OpenAI leads enterprise AI security deployments at 21% to 26% across the three survey waves, making the same provider that creates the AI risk also the primary security layer. The provider-as-security-vendor pattern holds across Azure, Google, and AWS. Zero-incremental-procurement convenience is winning by default. Whether that concentration is a feature or a single point of failure depends on how far the enterprise has progressed past stage one.

Provider	Identity Primitive (Stage 2)	Enforcement Control (Stage 2)	Isolation Primitive (Stage 3)	Gap as of April 2026
Microsoft Azure	Entra ID agent scoping. Agent 365 maps agents to owners. GA.	Copilot Studio DLP policies. Purview for agent output classification. GA.	Azure Confidential Containers for agent workloads. Preview. No per-agent sandbox at GA.	No agent-to-agent identity verification. No MCP governance layer. Agent 365 monitors but cannot block in-flight tool calls.
Anthropic	Managed Agents: per-agent scoped permissions, credential mgmt. Beta (April 8, 2026). $0.08/session-hour.	Tool-use permissions, system prompt enforcement, and built-in guardrails. GA.	Managed Agents sandbox: isolated containers per session, execution-chain auditability. Beta. Allianz, Asana, Rakuten, and Sentry are in production.	Beta pricing/SLA not public. Session data in Anthropic-managed DB (lock-in risk per VentureBeat research). GA timing TBD.
Google Cloud	Vertex AI service accounts for model endpoints. IAM Conditions for agent traffic. GA.	VPC Service Controls for agent network boundaries. Model Armor for prompt/response filtering. GA.	Confidential VMs for agent workloads. GA. Agent-specific sandbox in preview.	Agent identity ships as a service account, not an agent-native principal. No agent-to-agent delegation audit. Model Armor does not inspect tool-call payloads.
OpenAI	Assistants API: function-call permissions, structured outputs. Agents SDK. GA.	Agents SDK guardrails, input/output validation. GA.	Agents SDK Python sandbox. Beta (API and defaults subject to change before GA per OpenAI docs). TypeScript sandbox confirmed, not shipped.	No cross-provider identity federation. Agent memory forensics limited to session scope. No kill switch API. No MCP tool-description inspection.
AWS	Bedrock model invocation logging. IAM policies for model access. CloudTrail for agent API calls. GA.	Bedrock Guardrails for content filtering. Lambda resource policies for agent functions. GA.	Lambda isolation per agent function. GA. Bedrock agent-level sandboxing on roadmap, not shipped.	No unified agent control plane across Bedrock + SageMaker + Lambda. No agent identity standard. Guardrails do not inspect MCP tool descriptions.

Status as of April 15, 2026. GA = generally available. Preview/Beta = not production-hardened. “What’s Missing” column reflects VentureBeat’s analysis of publicly documented capabilities; gaps may narrow as vendors ship updates.

No provider in this grid ships a complete stage-three stack today. Most enterprises assemble isolation from existing cloud building blocks. That is a defensible choice if it is a deliberate one. Waiting for a vendor to close the gap without acknowledging the gap is not a strategy.

The grid above covers hyperscaler-native SDKs. A large segment of AI builders deploys through open-source orchestration frameworks like LangChain, CrewAI, and LlamaIndex that bypass hyperscaler IAM entirely. These frameworks lack native stage-two primitives. There is no scoped agent identity, no tool-call approval workflow, and no built-in audit trails. Enterprises running agents through open-source orchestration need to layer enforcement and isolation on top, not assume the framework provides it.

VentureBeat’s survey quantifies the pressure. Policy enforcement consistency grew from 39.5% to 46% between January and February, the largest consistent gain of any capability criterion. Enterprises running agents across OpenAI, Anthropic, and Azure need enforcement that works the same way regardless of which model executes the task. Provider-native controls enforce policy within that provider’s runtime only. Open-source orchestration frameworks enforce it nowhere.

One counterargument deserves acknowledgment: not every agent deployment needs stage three. A read-only summarization agent with no tool access and no write permissions may rationally stop at stage one. The sequencing failure this audit addresses is not that monitoring exists. It is that enterprises running agents with write access, shared credentials, and agent-to-agent delegation are treating monitoring as sufficient. For those deployments, stage one is not a strategy. It is a gap.

Allianz shows stage-three in production

Allianz, one of the world’s largest insurance and asset management companies, is running Claude Managed Agents across insurance workflows, with Claude Code deployed to technical teams and a dedicated AI logging system for regulatory transparency, per Anthropic’s April 8 announcement. Asana, Rakuten, Sentry, and Notion are in production on the same beta. Stage-three isolation, per-agent permissioning, and execution-chain auditability are deployable now, not roadmap. The gating question is whether the enterprise has sequenced the work to use them.

The 90-day remediation sequence

Days 1–30: Inventory and baseline. Map every agent to a named owner. Log all tool calls. Revoke shared API keys. Deploy read-only monitoring across all agent API traffic. Run mcp-scan against every registered MCP server. CrowdStrike detects 1,800 AI applications across enterprise endpoints; your inventory should be equally comprehensive. Output: agent registry with permission matrix, MCP scan report.

Days 31–60: Enforce and scope. Assign scoped identities to every agent. Deploy tool-call approval workflows for write operations. Integrate agent activity logs into existing SIEM. Run a tabletop exercise: What happens when an agent spawns an agent? Conduct a canary-token test from the prescriptive matrix. Output: IAM policy set, approval workflow, SIEM integration, canary-token test results.

Days 61–90: Isolate and test. Sandbox high-risk agent workloads (PHI, PII, financial transactions). Enforce per-session least privilege. Require human sign-off for agent-to-agent delegation. Red-team the isolation boundary using the stage-three detection test from the matrix. Output: sandboxed execution environment, red-team report, board-ready risk summary with regulatory exposure mapped to HIPAA tier and FINRA guidance.

What changes in the next 30 days

EU AI Act Article 14 human-oversight obligations take effect August 2, 2026. Programs without named owners and execution trace capability face enforcement, not operational risk.

Anthropic’s Claude Managed Agents is in public beta at $0.08 per session-hour. GA timing, production SLAs, and final pricing have not been announced.

OpenAI Agents SDK ships TypeScript support for sandbox and harness capabilities in a future release, per the company’s April 15 announcement. Stage-three sandbox becomes available to JavaScript agent stacks when it ships.

What the sequence requires

McKinsey’s 2026 AI Trust Maturity Survey pegs the average enterprise at 2.3 out of 4.0 on its RAI maturity model, up from 2.0 in 2025 but still an enforcement-stage number; only one-third of the ~500 organizations surveyed report maturity levels of three or higher in governance. Seventy percent have not finished the transition to stage three. ARMO’s progressive enforcement methodology gives you the path: behavioral profiles in observation, permission baselines in selective enforcement, and full least privilege once baselines stabilize. Monitoring investment was not wasted. It was stage one of three. The organizations stuck in the data treated it as the destination.

The budget data makes the constraint explicit. The share of enterprises reporting flat AI security budgets doubled from 7.9% in January to 16% in February in VentureBeat’s survey, with the March directional reading at 20%. Organizations expanding agent deployments without increasing security investment are accumulating security debt at machine speed. Meanwhile, the share reporting no agent security tooling at all fell from 13% in January to 5% in March. Progress, but one in twenty enterprises running agents in production still has zero dedicated security infrastructure around them.

About this research

Total qualified respondents: 108. VentureBeat Pulse AI Security and Trust is a three-wave VentureBeat survey run January 6 through March 15, 2026. Qualified sample (organizations 100+ employees): January n=38, February n=50, March n=20. Primary analysis runs from January to February; March is directional. Industry mix: Tech/Software 52.8%, Financial Services 10.2%, Healthcare 8.3%, Education 6.5%, Telecom/Media 4.6%, Manufacturing 4.6%, Retail 3.7%, other 9.3%. Seniority: VP/Director 34.3%, Manager 29.6%, IC 22.2%, C-Suite 9.3%.

Most enterprises can't stop stage-three AI agent threats, VentureBeat survey finds

The threat surface stage-one security cannot see

The regulatory clock and the identity architecture

Guardrails alone are not a strategy

VentureBeat Prescriptive Matrix: AI Agent Security Maturity Audit

Hyperscaler stage readiness: observe, enforce, isolate

Allianz shows stage-three in production

The 90-day remediation sequence

What changes in the next 30 days

What the sequence requires

About this research

Popular Posts

Fusion startup Helion hits blistering temps as it races toward 2028 deadline

When Should the Government Intervene?

Treatments That Improve Skin Texture

Barcelona’s 125th anniversary: When their star striker was kidnapped for 23 days

Great White Sharks Were Scared From Their Habitat by Just 2 Predators : ScienceAlert

About US

Top Categories

Usefull Links