An attacker inserts a hidden directive within a forwarded email, which an OpenClaw agent then summarizes as part of its routine tasks. This covert instruction commands the agent to send credentials to an external endpoint. The agent complies, executing a sanctioned API call with its own OAuth tokens.
The system’s firewall logs an HTTP 200 status, and Endpoint Detection and Response (EDR) registers a normal process. No security alarms are triggered, indicating a blind spot in current security measures. This oversight became apparent when six security teams developed six defense tools for OpenClaw within 14 days, yet three vulnerabilities persisted.
The situation is more severe than many security teams realize. Token Security reports that 22% of enterprise customers have employees using OpenClaw without IT’s consent. Meanwhile, Bitsight found over 30,000 public OpenClaw instances in just two weeks, a sharp increase from approximately 1,000. Additionally, Snyk’s ToxicSkills audit reveals that 36% of ClawHub skills have security vulnerabilities.
One key figure in addressing these issues is Jamieson O’Reilly, founder of Dvuln and a security adviser for the OpenClaw project. His early research into credential leaks was among the first alerts to the community. He has collaborated with founder Peter Steinberger to implement dual-layer malicious skill detection and is now advocating a capabilities specification proposal within the agentskills standards body.
O’Reilly acknowledges the security gaps, telling VentureBeat, “It wasn’t designed from the ground up to be as secure as possible,” and emphasizes that they are addressing these issues without making excuses.
Three attack surfaces your stack cannot see
The first vulnerability is runtime semantic exfiltration, where malicious behavior is encoded in meaning rather than binary patterns, eluding current defense mechanisms.
Palo Alto Networks has mapped OpenClaw to all categories in the OWASP Top 10 for Agentic Applications, highlighting what researcher Simon Willison terms a “lethal trifecta”: access to private data, exposure to untrusted content, and external communication within a single process. EDR perceives the agent’s actions as typical since the credentials are legitimate and API calls are authorized. The current defense infrastructure does not monitor the agent’s decisions regarding access, or the rationale behind them.
The second issue is cross-agent context leakage. When agents or skills share session context, a prompt injection in one channel can influence decisions across the network. Giskard researchers demonstrated this in January 2026, showing agents appending attacker-controlled instructions to their workspace files, lying dormant until activated by external commands. Palo Alto Networks researchers Sailesh Mishra and Sean P. Morgan caution that persistent memory can turn these into stateful attacks with delayed execution, where a hidden instruction can activate weeks later.
O’Reilly considers cross-agent context leakage the most challenging gap to close, due to its link with prompt injection, a vulnerability affecting all LLM-powered agent systems. He explains that unchecked context flow between agents and skills allows a single prompt to disrupt the entire chain. Currently, no tools ensure cross-agent context isolation. IronClaw sandboxes individual skill execution, and ClawSec monitors file integrity, but neither tracks context propagation within workflows.
The third vulnerability concerns agent-to-agent trust chains lacking mutual authentication. OpenClaw agents delegate tasks to other agents or external servers without identity verification. A compromised agent in a workflow can exploit the trust of every agent it contacts. By compromising one agent through prompt injection, an attacker can issue commands to all agents in the chain using pre-established trust relationships.
Microsoft’s security team has labeled OpenClaw as untrusted code execution with persistent credentials, noting its ingestion of untrusted text, downloading and executing skills from external sources, and using any credentials it holds. Kaspersky’s enterprise risk assessment adds that agents on personal devices pose security threats since these devices store VPN configurations, browser tokens, and corporate service credentials. The Moltbook social network for OpenClaw agents highlighted this risk when Wiz researchers discovered a misconfigured database exposing 1.5 million API authentication tokens and 35,000 email addresses.
What 14 days of emergency patching actually closed
The defense ecosystem responded with three strategies. Two tools aim to fortify OpenClaw internally. ClawSec, from Prompt Security, a SentinelOne company, continuously verifies agents, monitors file drift, and enforces zero-trust egress by default. Meanwhile, OpenClaw’s VirusTotal integration, developed by Steinberger, O’Reilly, and VirusTotal’s Bernardo Quintero, scans and blocks known malicious ClawHub packages.
Two tools involve architectural rewrites. IronClaw, developed by NEAR AI, is a Rust-based reimplementation that runs all untrusted tools in WebAssembly sandboxes, where tool code starts with zero permissions and must explicitly request access to networks, filesystems, or APIs. Credentials are injected at the host boundary, and agent code never touches them. Carapace, an independent project, reverses risky OpenClaw defaults by implementing fail-closed authentication and OS-level subprocess sandboxing.
Two other tools focus on scanning and auditability. Cisco’s open-source scanner integrates static, behavioral, and LLM semantic analysis, while NanoClaw reduces the codebase to about 500 lines of TypeScript, running each session in an isolated Docker container.
O’Reilly describes the supply chain failure bluntly: “Right now, the industry basically created a brand-new executable format written in plain human language and forgot every control that should come with it.” His proactive approach included deploying the VirusTotal integration before a similar approach was taken by skills.sh, a larger repository. Koi Security’s audit underscores the urgency, as 341 malicious skills found in early February ballooned to 824 out of 10,700 on ClawHub by mid-month. The ClawHavoc campaign embedded the Atomic Stealer macOS infostealer within skills masquerading as cryptocurrency tools, compromising crypto wallets, SSH credentials, and browser passwords.
OpenClaw Security Defense Evaluation Matrix
|
Dimension |
ClawSec |
VirusTotal Integration |
IronClaw |
Carapace |
NanoClaw |
Cisco Scanner |
|
Discovery |
Agents only |
ClawHub only |
No |
mDNS scan |
No |
No |
|
Runtime Protection |
Config drift |
No |
WASM sandbox |
OS sandbox + prompt guard |
Container isolation |
No |
|
Supply Chain |
Checksum verify |
Signature scan |
Capability grants |
Ed25519 signed |
Manual audit (~500 LOC) |
Static + LLM + behavioral |
|
Credential Isolation |
No |
No |
WASM boundary injection |
OS keychain + AES-256-GCM |
Mount-restricted dirs |
No |
|
Auditability |
Drift logs |
Scan verdicts |
Permission grant logs |
Prometheus + audit log |
500 lines total |
Scan reports |
|
Semantic Monitoring |
No |
No |
No |
No |
No |
No |
Source: VentureBeat analysis based on published documentation and security audits, March 2026.
The capabilities spec that treats skills like executables
O’Reilly has introduced a skills specification standards update to the agentskills maintainers, mainly led by Anthropic and Vercel, which is currently under discussion. This proposal requires each skill to declare its capabilities explicitly and visibly to users before execution, akin to mobile app permission manifests. The proposal is receiving positive feedback from the security community as it treats skills as executables.
O’Reilly stated, “The other two gaps can be meaningfully hardened with better isolation primitives and runtime guardrails, but truly closing context leakage requires deep architectural changes to how untrusted multi-agent memory and prompting are handled.” He emphasized that the new capabilities spec is the first proactive step towards addressing these challenges, rather than applying temporary fixes.
What to do on Monday morning
Assume OpenClaw is already present in your organization. The 22% shadow deployment rate is just the starting point. These six steps can help mitigate the risks and document the unresolved issues.
-
Inventory what is running. Scan for WebSocket traffic on port 18789 and mDNS broadcasts on port 5353. Monitor corporate authentication logs for new App ID registrations, OAuth consent events, and Node.js User-Agent strings. Any instance running a version prior to v2026.2.25 is susceptible to the ClawJacked remote takeover flaw.
-
Mandate isolated execution. Ensure no agent operates on a device connected to production infrastructure. Require deployment in containers with scoped credentials and explicit tool whitelists.
-
Deploy ClawSec on every agent instance and run every ClawHub skill through VirusTotal and Cisco’s open-source scanner before installation. Both tools are free. Treat skills as third-party executables, as that is their nature.
-
Require human-in-the-loop approval for sensitive agent actions. OpenClaw’s exec approval settings have three modes: security, ask, and allowlist. Set sensitive tools to ask, so the agent pauses and requests confirmation before executing shell commands, writing to external APIs, or modifying files outside its workspace. Any action that involves credentials, changes configurations, or sends data externally should wait for human approval.
-
Map the three surviving gaps against your risk register. Record whether your organization accepts, mitigates, or blocks each risk: runtime semantic exfiltration, cross-agent context leakage, and agent-to-agent trust chains.
-
Bring the evaluation table to your next board meeting. Present it not as an AI experiment but as a critical bypass of your existing DLP and IAM investments. Every agentic AI platform that follows will undergo this same defensive cycle. The framework will apply to every agent tool your team assesses in the next two years.
Your security stack is designed to catch malicious code but not agents executing malicious instructions through legitimate API calls. This is where these three vulnerabilities exist.

