A bug that lay dormant for 27 years within OpenBSD’s TCP stack has finally been discovered. Despite numerous code audits and fuzzing attempts, this flaw persisted in a system renowned for its robust security, capable of crashing any server with just two packets. Anthropic’s discovery campaign uncovered this vulnerability, spending roughly $20,000, although the actual model run that identified the issue cost less than $50.
Anthropic’s Claude Mythos Preview autonomously detected the flaw without human intervention beyond the initial prompt.
The capability jump is not incremental
In the realm of exploit writing for Firefox 147, Mythos vastly outperformed Claude Opus 4.6, achieving success 181 times compared to Opus’s 2. This represents a 90-fold enhancement in a single generation. On the SWE-bench Pro test, Mythos scored 77.8% against Opus’s 53.4%, while in CyberGym vulnerability reproduction, it reached 83.1% versus Opus’s 66.6%. Mythos completed Anthropic’s Cybench CTF with a perfect score, leading the red team to pivot towards real-world zero-day discoveries due to the lack of further evaluative challenges. It subsequently identified thousands of zero-day vulnerabilities across all major operating systems and browsers, some dating back decades. Anthropic engineers with no security background tasked Mythos with finding remote code execution vulnerabilities, and by morning, they had a fully functional exploit, as per Anthropic’s red team assessment.
Anthropic has initiated Project Glasswing, a defensive coalition with 12 partners, including CrowdStrike, Cisco, Palo Alto Networks, Microsoft, AWS, Apple, and the Linux Foundation. This project is supported by $100 million in usage credits and $4 million in open-source grants. More than 40 other organizations involved in critical software infrastructure have gained access. The partners have been testing Mythos on their systems for weeks. Anthropic has promised a public report of the findings by early July 2026.
Security directors got the announcement. They didn’t get the playbook.
“I’ve been in this industry for 27 years,” Anthony Grieco, Cisco SVP and Chief Security and Trust Officer, stated in an exclusive interview with VentureBeat at RSAC 2026. “I have never been more optimistic for what we can do to change security because of the velocity. It’s also a little bit terrifying because we’re moving so quickly. It’s also terrifying because our adversaries have this capability as well, and so frankly, we must move this quickly.”
This week, security directors received the news in various formats, including an exclusive VentureBeat interview with Anthropic’s Newton Cheng. A widely shared X post summarizing the Mythos findings noted its breakthroughs in cryptography libraries, virtual machine monitors, and its ability to provide zero-security-training engineers with functional exploits overnight. Unanswered in these stories was the question of where the current detection methods’ limits lie and what needs to be revised before July.
Seven vulnerability classes that show where every detection method hits its ceiling
-
OpenBSD TCP SACK, 27 years old. Crafted packets crash any server. SAST, fuzzers, and auditors overlooked a logic flaw necessitating semantic reasoning about TCP options under adversarial situations. Campaign cost: ~$20,000, with the $50 per-run figure being retrospective.
-
FFmpeg H.264 codec, 16 years old. Despite fuzzers testing the vulnerable code path 5 million times, the flaw went undetected until Mythos identified it by analyzing code semantics. Campaign cost: ~$10,000.
-
FreeBSD NFS remote code execution, CVE-2026-4747, 17 years old. Unauthenticated root access from the internet, validated independently. Mythos constructed a 20-gadget ROP chain across multiple packets without human intervention.
-
Linux kernel local privilege escalation. Mythos combined two to four low-severity vulnerabilities into a full local privilege escalation through race conditions and KASLR bypasses. CSA’s Rich Mogull noted Mythos did not succeed in remote kernel exploitation but achieved local success. No tools today can automate vulnerability chaining.
-
Browser zero-days across every major browser. Thousands identified, some needing human-model collaboration. Mythos chained four vulnerabilities into a JIT heap spray, bypassing both the renderer and OS sandboxes. Firefox 147: 181 working exploits vs. two for Opus 4.6.
-
Cryptography library vulnerabilities (TLS, AES-GCM, SSH). Implementation flaws allow certificate forgery or decryption of encrypted communications, noted in Anthropic’s red team blog and Help Net Security. A critical Botan library certificate bypass was disclosed alongside the Glasswing announcement. These are flaws in code, not in the mathematical concepts.
-
Virtual machine monitor guest-to-host escape. Memory corruption in production VMMs breaks the assumption of workload isolation in cloud security architectures.
Nicholas Carlini, during Anthropic’s launch briefing: “I’ve found more bugs in the last couple of weeks than I found in the rest of my life combined.”
VentureBeat’s prescriptive matrix
|
Vulnerability Class |
Why Current Methods Miss It |
What Mythos Does |
Security Director Action |
|
OS kernel logic (OpenBSD 27yr, Linux 2-4 chain) |
SAST lacks semantic reasoning. Fuzzers miss logic flaws. Pen testers are time-limited. Bounties exclude kernel scope. |
Chains 2-4 low-severity issues into local privilege escalation. ~$20K campaign. |
Integrate AI-assisted kernel reviews into pen test RFPs. Expand bounty scopes. Request Glasswing findings from OS vendors by July. Reassess findings based on chainability. |
|
Media codec (FFmpeg 16yr H.264) |
SAST unflagged. Fuzzers ran path 5M times without triggering. |
Analyses semantics beyond brute force. ~$10K campaign. |
Audit FFmpeg, libwebp, ImageMagick, libpng. Stop equating fuzz coverage with security. Monitor Glasswing codec CVEs starting July. |
|
Network stack RCE (FreeBSD 17yr, CVE-2026-4747) |
DAST limited to protocol depth. Pen tests overlook NFS. |
Creates full autonomous chain to unauthenticated root with a 20-gadget ROP chain. |
Immediately patch CVE-2026-4747. Audit NFS/SMB/RPC services. Add protocol fuzzing to the 2026 cycle. |
|
Multi-vuln chaining (2-4 sequenced, local) |
No tool chains. Pen testers have limited hours. CVSS scores are isolated. |
Achieves autonomous local chaining through race conditions and KASLR bypass. |
Mandate AI-assisted chaining in pen test methods. Develop chainability scores. Allocate budgets for AI red teams for 2026. |
|
Browser zero-days (thousands, 181 Firefox exploits) |
Bounties and continuous fuzzing missed thousands. Some need human-model collaboration. |
Improves 90x over Opus 4.6. Chains 4 vulnerabilities into JIT heap spray escaping renderer and OS sandbox. |
Reduce patch SLA to 72 hours for critical issues. Prepare pipelines for July cycle. Urge vendors for Glasswing timelines. |
|
Crypto libraries (TLS, AES-GCM, SSH, Botan bypass) |
SAST limited on crypto logic. Pen testers rarely audit crypto depth. Formal verification isn’t standard. |
Detects certificate forgery and decryption flaws in battle-tested libraries. |
Audit all crypto library versions now. Monitor Glasswing crypto CVEs from July. Expedite PQC migration. |
|
VMM / hypervisor (guest-to-host memory corruption) |
Cloud security relies on isolation. Few pen tests target hypervisors. Bounties rarely include VMM. |
Identifies guest-to-host escape in production VMM. |
Audit hypervisor/VMM versions. Request Glasswing findings from cloud providers. Reevaluate multi-tenant isolation assumptions. |
Attackers are faster. Defenders are patching once a year.
The CrowdStrike 2026 Global Threat Report reveals a 29-minute average eCrime breakout time, 65% faster than in 2024, alongside an 89% annual increase in AI-augmented attacks. According to CrowdStrike CTO Elia Zaitsev in an exclusive interview with VentureBeat, “Adversaries leveraging agentic AI can perform those attacks at such a great speed that a traditional human process of look at alert, triage, investigate for 15 to 20 minutes, take an action an hour, a day, a week later, it’s insufficient.” A $20,000 Mythos discovery campaign operating over a few hours can replace what would take months for nation-states to research.
CrowdStrike CEO George Kurtz highlighted this urgency on LinkedIn on the same day as the Glasswing announcement, noting, “AI is creating the largest security demand driver since enterprises moved to the cloud.” The upcoming EU AI Act, effective August 2, 2026, adds regulatory pressure with its requirements for automated audit trails, cybersecurity standards for high-risk AI systems, incident reporting, and fines up to 3% of global revenue. Security directors are facing dual challenges: the Glasswing disclosure in July and the compliance deadline in August.
Mike Riemer, Field CISO at Ivanti and a 25-year US Air Force veteran, emphasized the industry’s vulnerability in a VentureBeat interview. He noted, “Threat actors are reverse engineering patches, and the speed at which they’re doing it has been enhanced greatly by AI. They’re able to reverse engineer a patch within 72 hours. So if I release a patch and a customer doesn’t patch within 72 hours of that release, they’re open to exploit.” Riemer pointed out the significant lead attackers have over defenders.
Grieco confirmed this disparity at RSAC 2026. “If you talk to an operational team and many of our customers, they’re only patching once a year,” Grieco told VentureBeat. “And frankly, even in the best of circumstances, that is not fast enough.”
CSA’s Mogull argues that while defenders ultimately have the upper hand—once a vulnerability is fixed, all deployments benefit—the current situation, where attackers can reverse-engineer patches in 72 hours while defenders patch annually, favors the attackers.
Mythos is not alone in this capability. Researchers at AISLE, an AI cybersecurity startup, tested Anthropic’s showcase vulnerabilities on smaller, open-weight models, and all eight models detected the FreeBSD exploit. AISLE reports that one model, with just 3.6 billion parameters, costs 11 cents per million tokens, and a 5.1-billion-parameter open model successfully replicated the analysis chain of the 27-year-old OpenBSD bug. AISLE concludes that “The moat in AI cybersecurity is the system, not the model.” This suggests that the detection ceiling is a broader issue, not specific to Mythos, and that inexpensive models can identify the same flaws. The timeline for addressing these vulnerabilities is, therefore, shrinking.
According to Anthropic’s red team blog, over 99% of the vulnerabilities identified by Mythos remain unpatched. The public Glasswing report, due in early July 2026, is expected to initiate a significant patching phase across operating systems, browsers, cryptography libraries, and critical infrastructure software. Security directors who have not yet expanded their patching capabilities, redefined their bug bounty programs, and developed chainability scoring will face an overwhelming wave. July is not just a disclosure event; it’s a patching crisis.
What to tell the board
Security directors often claim to their boards, “we have scanned everything.” However, Merritt Baer, CSO at Enkrypt AI, told VentureBeat that this statement needs qualification in the light of Mythos.
“What security leaders actually mean is: we have exhaustively scanned for what our tools know how to see,” Baer explained in an exclusive interview with VentureBeat. “That’s a very different claim.”
Baer suggested restructuring residual risk for boards into three tiers: known-knowns (vulnerability classes reliably detected by your stack), known-unknowns (classes that exist but are only partially covered by your tools, such as stateful logic flaws and authentication boundary confusion), and unknown-unknowns (vulnerabilities arising from how safe components interact in unsafe ways). “This is where Mythos is landing,” Baer asserted.
The recommended board-level statement is: “We have high confidence in detecting discrete, known vulnerability classes. Our residual risk is concentrated in cross-function, multi-step, and compositional flaws that evade single-point scanners. We are actively investing in capabilities that raise that detection ceiling.”
Regarding chainability, Baer was straightforward. “Chainability has to become a first-class scoring dimension,” she said. “CVSS was built to score atomic vulnerabilities. Mythos is exposing that risk is increasingly graph-shaped, not point-in-time.” Baer outlined three necessary shifts for security programs: from severity scoring to exploitability pathways, from vulnerability lists to graphs modeling relationships across identity, data flow, and permissions, and from remediation SLAs to path disruption, prioritizing nodes that break the chain over fixing the highest CVSS scores.
“Mythos isn’t just finding missed bugs,” Baer said. “It’s invalidating the assumption that vulnerabilities are independent. Security programs that don’t adapt, from coverage thinking to interaction thinking, will keep reporting green dashboards while sitting on red attack paths.”
VentureBeat will update this story with additional operational details from Glasswing’s founding partners as interviews are completed.

