In 2024, researchers from the University of Illinois discovered that GPT-4 could autonomously exploit 87% of a curated 15-vulnerability one-day dataset when provided with a common vulnerabilities and exposures (CVE) description. Without this description, it could only exploit 7%. This discovery highlighted a “margin of safety” for the industry, indicating that while AI could exploit known vulnerabilities, it lacked the ability to discover them.
However, on April 7, Anthropic announced that the Claude Mythos Preview model had closed this margin. The model autonomously discovered thousands of zero-day vulnerabilities across major operating systems and browsers. Mythos also scored 83.1% on the CyberGym vulnerability reproduction benchmark. In a campaign targeting OpenBSD over 1,000 scaffold runs, the total compute cost was under $20,000.
The timeline for exploitation is rapidly decreasing. Langflow’s CVE-2026-33017 (CVSS 9.8) was exploited 20 hours after disclosure without a public proof-of-concept. Marimo’s CVE-2026-39987 (CVSS 9.3) was compromised in 9 hours and 41 minutes.
The defense infrastructure that most organizations depend on was not designed for such rapid exploitation timelines. Rapid7’s 2026 threat landscape report notes that the median time from CVE publication to CISA’s known exploited vulnerabilities (KEV) listing is five days. Google’s M-Trends 2026 report indicates that exploitation often occurs before a patch is released. The first exploit for the Langflow advisory appeared in 20 hours, and the Marimo advisory was exploited in less than 10 hours.
The belief that a patch window provides safety because exploitation takes time is now obsolete. Here are the necessary steps to adapt:
Adopt a Three-Layer Filter Instead of CVSS-Only Prioritization
Most vulnerability management programs still prioritize vulnerabilities solely by CVSS score. CVSS measures a vulnerability’s “theoretical” severity without accounting for whether it is being actively exploited or how quickly it could be weaponized. For instance, a CVSS 8.8 vulnerability with a history of active exploitation (like Docker’s CVE-2026-34040) is often ranked lower than a CVSS 9.8 vulnerability that may never be exploited.
A recent study validated against 28,377 real-world vulnerabilities suggests a more effective approach: a three-layer decision tree using CISA KEV status, Exploit Prediction Scoring System (EPSS) scores, and CVSS to form a unified prioritization filter.
Three-Layer Vulnerability Prioritization Filter
|
Layer |
Data source |
Threshold |
Action |
SLA |
|
1. Active exploitation |
CISA KEV catalog |
Listed |
Immediate patching |
Hours |
|
2. Predicted exploitation |
EPSS via FIRST.org |
Score ≥ 0.088 |
Escalate to Tier 0 pipeline |
24 hours |
|
3. Severity baseline |
CVSS via NVD |
Score ≥ 7.0 |
Typical remediation |
Per policy |
Validated result: 18x efficiency gain, 85.6% coverage of exploited vulnerabilities, and approximately 95% reduction in urgent remediation workload. All data sources are open and free.
This system can be fully automated. Scripts can be created to query the CISA KEV API, the EPSS API from FIRST.org, and the NVD, and run against your asset inventory for every published CVE. The human role should be as an approver, not the initiator.
Address the Agent Authorization Gap
The ability to quickly create exploits affects not only patch prioritization but also the configuration of controls for agent-driven systems with privileged credentials. Current authorization policies have not been evaluated against AI agent behaviors, posing a measurable risk. CVE-2026-34040 revealed that Docker’s authorization plugin architecture can silently bypass every plugin when the request body exceeds 1MB. Common AuthZ plugins (OPA, Casbin, Prisma Cloud) are unaware of this bypass, as it occurs in Docker’s middleware before reaching the plugin.
When Cyera demonstrated this vulnerability, they showed that an AI agent could infer the bypass path while completing a legitimate task without explicit instructions to exploit anything.
The Internet Engineering Task Force (IETF) is developing authorization models for agents. The document draft-klrc-aiagent-auth-01, published in March by AWS, Zscaler, Ping Identity, and OpenAI participants, suggests using the Secure Production Identity Framework for Everyone (SPIFFE) and OAuth 2.0 for AI agents to obtain dynamically provisioned, short-lived credentials.
Separately, the IETF Agent Identity Protocol draft (draft-prakash-aip-00) indicates that out of about 2,000 surveyed model context protocol (MCP) servers, none had authentication.
These standards are still months or years from implementation. Meanwhile, security teams must proactively incorporate agent-level test scenarios for all authorization boundaries, including oversized requests, burst frequency, and multi-step escalation of privileged requests.
Understand Your Credential Blast Radius
A survey conducted by CSA/Zenity and released on April 16 found that 53% of organizations had experienced AI agents exceeding their intended permissions, and 47% faced a security incident involving an agent.
When AI builder tools like Flowise (CVE-2025-59528, CVSS 10.0), Langflow, or n8n are compromised, the impact extends beyond the host. These tools hold API keys to frontier models, database credentials, vector store tokens, and OAuth tokens to business systems. A compromised AI builder host represents a credential harvest that unlocks authenticated access to all connected services.
Without credential dependency maps for each AI tool host, incident response for agent compromise is speculative. Document each credential, its access extent, and the associated credential rotation process for every instance. Begin migrating static API keys to short-lived tokens where downstream services allow.
Five Steps for This Quarter
1. Deploy the three-layer KEV-EPSS-CVSS filter
Replace CVSS-only prioritization with the table above. Automate data collection from all three APIs as part of a scheduled script against your asset inventory. Desired outcome: 18 times more efficient, 85.6% coverage of exploited vulnerabilities, and 95% reduction in urgent remediation workload.
2. Implement event-driven patching for Tier 0 services.
Identify services in the critical exposure tier: those directly exposed to internet users, AI builder hosts, and container orchestration control plane. Trigger event-driven patching on a CVE publication instead of waiting for the next maintenance window for this tier.
Goal: Deploy patches to canary within four hours of a CVE being declared critical. Use CISA KEV and EPSS feeds to trigger event-driven patching. If meeting the four-hour patching goal is impossible due to legacy dependencies, change-freeze windows, or rollback risk, apply compensating controls immediately. These include removing internet exposure to the vulnerable service, rotating credentials for the vulnerable service, disabling affected functionality (if applicable), and identifying an exception owner for the exposure until a patch can be deployed.
Allowing unbounded exposures for extended periods while awaiting a maintenance window is unacceptable.
3. Test authorization boundaries at agent scale.
Create test cases for every API that AI agents may communicate with via AuthZ policies. Specifically, include test cases for requests exceeding 1MB, 5MB, and 10MB body sizes. This includes test cases for burst rate > 100 requests per second and test cases for unusual parameter combinations (privileged flags, host mounts, capability additions). Additionally, patch to Docker Engine 29.3.1 to fix CVE-2026-34040.
4. Credential blast radius mapping for all AI builder hosts.
Document each credential for each Langflow, Flowise, n8n, and custom AI pipeline instance. Classify each credential by its lifespan (static key vs. short-lived token). Identify what each credential can access. Set up alerts for anomalous IP or identity for any credential access.
5. Shadow AI discovery scan for this week.
According to CSA data, there is a greater than 50% chance that your agents have exceeded their expected boundaries. Check your Security Information and Event Management (SIEM) and network monitoring tools for communications to the default ports of the AI builder: Langflow 7860, Flowise 3000, and n8n 5678. Any unauthorized instances are an unmonitored attack surface.
The Takeaway
AI agents are emerging, and the standards bodies are responding. The IETF has multiple drafts related to agent authentication and authorization. The Coalition for Secure AI has published its MCP Security taxonomy and Secure-by-Design principles.
However, these standards progress at the pace of standards bodies, while the exploit window is now measured in hours. Organizations that implement the three-layer filter and event-driven patching this quarter will see a measurable reduction in exposure. Those who delay will face adversaries operating in less than 20 hours.
Nik Kale is a principal engineer specializing in enterprise AI platforms and security

