AI agent credentials live in the same box as untrusted code. Two new architectures show where the blast radius actually stops.

Contents

The Monolithic Agent Issue for Security Teams Anthropic’s Approach: Separating the Brain from the Hands Nvidia’s Approach: Securing and Monitoring the Sandbox

At RSAC 2026, four keynotes independently reached the same conclusion: the necessity of extending zero trust principles to AI. Microsoft’s Vasu Jakkal emphasized this extension, while Cisco’s Jeetu Patel advocated for shifting from access control to action control, describing agents as “supremely intelligent, but with no fear of consequence” in a VentureBeat interview. CrowdStrike’s George Kurtz pinpointed AI governance as the largest gap in enterprise technology, and Splunk’s John Morgan argued for a trust and governance model for agents. The consensus was clear across four companies and four stages: there is one central issue.

In a separate VentureBeat interview during RSAC, Matt Caulfield, VP of Product for Identity and Duo at Cisco, stated, “While the concept of zero trust is good, we need to take it a step further. It’s not just about authenticating once and then letting the agent run wild. It’s about continuously verifying and scrutinizing every single action the agent’s trying to take, because at any moment, that agent can go rogue.”

A staggering 79% of organizations have already implemented AI agents, according to PwC’s 2025 AI Agent Survey. However, as reported in the Gravitee State of AI Agent Security 2026 report, only 14.4% have obtained complete security approval for their entire AI agent fleet. A CSA survey presented at RSAC indicated that merely 26% have AI governance policies in place. The CSA’s Agentic Trust Framework highlights the gap between rapid deployment and security readiness, labeling it a governance emergency.

Industry leaders at RSAC concurred on the problem, yet two companies offered distinct architectural solutions, exposing where real risks lie.

The Monolithic Agent Issue for Security Teams

In many enterprises, agents are deployed as monolithic containers, performing functions, executing code, and storing credentials in a single process. This design assumes mutual trust among all components. OAuth tokens, API keys, and git credentials coexist in an environment where code is executed immediately after it’s written.

If a prompt injection occurs, attackers can access everything: exfiltrate tokens, spawn sessions, and affect not just the agent but the entire container and its connected services.

A CSA and Aembit survey of 228 IT and security professionals reveals that 43% use shared service accounts for agents, 52% rely on workload identities instead of agent-specific credentials, and 68% cannot differentiate agent activity from human activity in logs. No single function takes ownership of AI agent access; security teams point to developers, and developers point back to security.

CrowdStrike CTO Elia Zaitsev, in an interview with VentureBeat, noted the familiarity of this pattern: “Securing agents is similar to securing highly privileged users. They have identities, access systems, and take actions. There’s rarely a single solution. It’s a defense in depth strategy.”

During his RSAC keynote, CrowdStrike CEO George Kurtz highlighted the ClawHavoc campaign, targeting the OpenClaw agentic framework. Named by Koi Security on February 1, 2026, Antiy CERT confirmed 1,184 malicious skills tied to 12 publisher accounts. Snyk’s ToxicSkills research showed 36.8% of 3,984 ClawHub skills scanned had security flaws, with 13.4% rated critical. The average breakout time fell to 29 minutes, with the fastest observed at 27 seconds, as reported in the CrowdStrike 2026 Global Threat Report.

Anthropic’s Approach: Separating the Brain from the Hands

Anthropic’s Managed Agents, launched on April 8 in public beta, divide each agent into three untrusting components: a brain (Claude and its decision-routing harness), hands (disposable Linux containers for code execution), and a session (an external append-only event log).

This separation mirrors traditional software patterns such as microservices and serverless functions. Credentials never enter the sandbox; OAuth tokens are stored in an external vault. When the agent calls an MCP tool, it sends a session-bound token to a proxy, which retrieves real credentials from the vault for the external call, ensuring the agent never sees the actual token. For security directors, a compromised sandbox yields nothing for an attacker to reuse.

The security improvement came alongside a performance enhancement. By decoupling the brain from the hands, inference can start before the container boots, reducing the median time to first token by roughly 60%. This design not only improves security but also enhances speed, addressing enterprise concerns about security-induced latency.

Session durability is another benefit. In a monolithic setup, a container crash results in complete state loss. With Managed Agents, the session log persists outside the brain and hands. If the harness crashes, a new one can boot up, read the event log, and continue with no loss of state, yielding productivity gains over time. Managed Agents offer built-in session tracing via the Claude Console.

Pricing is set at $0.08 per session-hour of active runtime, excluding idle time, plus standard API token fees. Security directors can now calculate the cost of agent compromise per session-hour against the architectural controls.

Nvidia’s Approach: Securing and Monitoring the Sandbox

Released on March 16 in early preview, Nvidia’s NemoClaw differs by not separating the agent from its execution environment. Instead, it encases the entire agent in four stacked security layers, monitoring every action. Anthropic and Nvidia are the only vendors to have launched zero-trust agent architectures publicly as of now, with others in development.

NemoClaw implements five enforcement layers between the agent and the host. Sandboxed execution uses Landlock, seccomp, and network namespace isolation at the kernel level. Default-deny outbound networking requires explicit operator approval for each external connection through YAML-based policy. Access operates with minimal privileges, and a privacy router directs sensitive queries to locally-run Nemotron models, reducing token costs and data leaks to zero. Intent verification is crucial for security teams: OpenShell’s policy engine intercepts every agent action before it reaches the host. Evaluating NemoClaw involves weighing runtime visibility against operator staffing costs.

The agent remains unaware of its containment within NemoClaw. Actions within policy proceed normally, while out-of-policy actions face configurable denials.

Observability is a strong point. A real-time Terminal User Interface logs every action, network request, and blocked connection, providing a complete audit trail. However, the cost is significant: operator load increases linearly with agent activity, as each new endpoint requires manual approval. High observation quality comes at the expense of low autonomy, which can quickly become costly in production environments with numerous agents.

Durability is an overlooked issue. Agent state is stored as files inside the sandbox, and if the sandbox fails, the state vanishes. There is no external session recovery mechanism, posing a durability risk for long-running agent tasks. Security teams must factor this risk into deployment planning before production.

The Credential Proximity Gap

Both architectures improve upon the monolithic default, but they diverge on a critical point: the proximity of credentials to the execution environment.

Anthropic completely removes credentials from the blast radius. If an attacker breaches the sandbox through prompt injection, they obtain a disposable container with no tokens and no persistent state. Exfiltrating credentials necessitates a two-step attack: influencing the brain’s reasoning and convincing it to act through a worthless container. Single-step credential theft is structurally impossible.

NemoClaw limits the blast radius and monitors all internal actions. Four security layers restrict lateral movement, and default-deny networking blocks unauthorized connections. However, the agent and generated code share the same sandbox. While Nvidia’s privacy router keeps inference credentials on the host, messaging and integration tokens (Telegram, Slack, Discord) are injected as runtime environment variables. Inference API keys are proxied through the router, not directly passed into the sandbox. Exposure varies by credential type: credentials are policy-gated, not structurally removed.

This distinction is crucial for indirect prompt injection, where adversaries embed instructions in content the agent queries during legitimate work. A poisoned web page or manipulated API response could introduce malicious instructions into the reasoning chain as trusted context, with proximity to execution.

In Anthropic’s system, indirect injection can influence reasoning but cannot access the credential vault. In NemoClaw’s system, injected context is adjacent to both reasoning and execution within the shared sandbox, creating the most significant divergence between the two designs.

David Brauchler of NCC Group, Technical Director and Head of AI/ML Security, supports gated agent architectures based on trust segmentation principles, where AI systems inherit the trust level of the data they process. Untrusted input leads to restricted capabilities. Both Anthropic and Nvidia are moving toward this model, though neither has fully achieved it.

The Zero-Trust Architecture Audit for AI Agents

The audit framework encompasses three vendor patterns across six security dimensions, distilled into five key priorities:

Audit every deployed agent for the monolithic pattern. Identify any agent that holds OAuth tokens within its execution environment. According to CSA data, 43% use shared service accounts, making these the primary targets.
Ensure credential isolation in agent deployment RFPs. Clarify whether the vendor structurally removes credentials or uses policy gates. Both methods reduce risk but differ in effectiveness and failure modes.
Test session recovery before production. Interrupt a sandbox mid-task and check if the state persists. If it doesn’t, long-duration tasks risk data loss, which increases with task length.
Plan for the observability model. Anthropic’s console tracing fits into current observability workflows, while NemoClaw’s TUI needs an operator. Staffing needs vary accordingly.
Monitor indirect prompt injection solutions. Neither architecture fully addresses this vulnerability. Anthropic limits the impact of a successful injection, while NemoClaw intercepts malicious actions but not data. Demand vendor commitments to improve this area.

As soon as two architectures were launched, zero trust for AI agents moved from a research topic to a practical necessity. The monolithic default poses a liability, and the 65-point gap between deployment speed and security approval is likely where future breaches will originate.

AI agent credentials live in the same box as untrusted code. Two new architectures show where the blast radius actually stops.

The Monolithic Agent Issue for Security Teams

Anthropic’s Approach: Separating the Brain from the Hands

Nvidia’s Approach: Securing and Monitoring the Sandbox

The Credential Proximity Gap

The Zero-Trust Architecture Audit for AI Agents

Popular Posts

Lyme disease treated with antibiotic that doesn’t harm gut microbiome

Georgia Rocked By Violent Protests Over Russia-EU Divide Within Country

The hydrogen finger-pointing game

How To Navigate Burnout In A World That Glorifies Hustle

Bluesky promises more verification and an ‘aggressive’ approach to impersonation

About US

Top Categories

Usefull Links