Tuesday, 2 Jun 2026
  • Contact
  • Privacy Policy
  • Terms & Conditions
  • DMCA
logo logo
  • World
  • Politics
  • Crime
  • Economy
  • Tech & Science
  • Sports
  • Entertainment
  • More
    • Education
    • Celebrities
    • Culture and Arts
    • Environment
    • Health and Wellness
    • Lifestyle
  • 🔥
  • Trump
  • House
  • ScienceAlert
  • White
  • VIDEO
  • man
  • Trumps
  • Season
  • star
  • Years
Font ResizerAa
American FocusAmerican Focus
Search
  • World
  • Politics
  • Crime
  • Economy
  • Tech & Science
  • Sports
  • Entertainment
  • More
    • Education
    • Celebrities
    • Culture and Arts
    • Environment
    • Health and Wellness
    • Lifestyle
Follow US
© 2024 americanfocus.online – All Rights Reserved.
American Focus > Blog > Tech and Science > Anthropic’s browser agent got hijacked 31.5% of the time before safeguards engaged
Tech and Science

Anthropic’s browser agent got hijacked 31.5% of the time before safeguards engaged

Last updated: June 2, 2026 1:20 am
Share
Anthropic’s browser agent got hijacked 31.5% of the time before safeguards engaged
SHARE

This spring, Anthropic reported the highest prompt injection figures among frontier labs. When their latest model, Claude Opus 4.8, was tested in a browser, red-team attackers managed to exploit it 31.5% of the time before any safeguards were activated. Unlike Anthropic, OpenAI, Google, and Meta did not provide comparable figures, making Anthropic’s data a standout reference point rather than a liability.

Each of the four frontier labs released a prompt injection disclosure, but none are consistent with one another. Anthropic’s disclosure, dated May 28, spans 244 pages and covers four agentic surfaces. OpenAI, in contrast, reported on one surface, connectors. Google shifted its focus from the model card to a separate safety framework, whereas Meta did not release a closed-model card at all. The accompanying Cross-Vendor Prompt Injection Disclosure Grid outlines the varied testing and measurements from each lab, highlighting the discrepancies that undermine direct comparisons.

Prompt injections involve embedding malicious instructions in content that an agent processes, such as web pages or documents. This can lead to unauthorized actions, making the disclosure cards crucial for buyers as primary evidence of security measures.

The lack of a uniform industry standard for evaluating prompt injections poses a significant challenge. According to Carter Rees, VP of AI at Reputation, prompt injection disrupts the foundational assumptions of legacy tools. He notes, “A phrase as innocuous as, ‘ignore previous instructions’ can carry a payload as devastating as a buffer overflow, yet it shares no commonality with known malware signatures.” As a result, each lab devises its own metrics, leading to inconsistent results.

Adam Meyers, Senior Vice President of Counter Adversary Operations at CrowdStrike, emphasizes that managing exposure now falls to the buyer. “As you implement AI, it increases your attack surface, so now you have to be able to protect those AI models against adversary misuse or data poisoning or prompt injection.” CrowdStrike’s 2026 Financial Services Threat Landscape Report indicates that adversaries are using AI to accelerate the time from initial access to impact, outpacing traditional defenses.

See also  Is Now The Time to Attract Long-Term Investment for Domestic Production?

Anthropic’s Detailed Surface Analysis

Anthropic’s Opus 4.8 system card uniquely dissects prompt injection by surface, revealing significant variations in results. In a coding environment, Gray Swan’s Shade tool penetrated 7.03% of single attempts when thinking was enabled, which safeguards reduced to 2.09%.

When similar attacks targeted web browsers, such as those used by Claude in Chrome and Claude Cowork, the model’s vulnerability increased. Anthropic tested 129 web environments and documented the outcomes in Table 5.2.2.4.A on page 81 of the system card. The per-attempt rate, without safeguards and with thinking enabled, decreased from Sonnet 4.6’s 50.7% to Opus 4.8’s 31.5%. With safeguards activated, Opus 4.8’s rate fell to 0.5%, and with thinking disabled, it reached zero across all environments.

OpenAI’s Single-Surface Measurement

OpenAI’s GPT-5.5 card, released on April 23 and updated on April 24, addresses prompt injection through a single robustness score related to known attacks on connectors. This score, where higher values indicate better robustness, dropped from 0.998 for GPT-5.4-thinking to 0.963 for GPT-5.5. Anthropic, contrastingly, tested four surfaces with adaptive attackers and conducted a one-week bug bounty with live red-team challenges.

Comparing OpenAI’s 0.963 robustness score with Anthropic’s 31.5% per-attempt success rate is misleading. The former measures resistance to known attacks on a single surface, while the latter reflects success rates across multiple environments with dynamic attackers.

Google and Meta’s Lack of Specific Metrics

Google’s Gemini 3 addresses prompt injection under mitigations, claiming improved resistance without providing specific metrics. The associated Frontier Safety Framework report includes red teaming across capability domains, excluding prompt injection.

Meta, which releases open weights, places prompt injection defenses in a separate system, Purple Llama’s LlamaFirewall. Using the AgentDojo benchmark, the LlamaFirewall reduced attack success from 17.6% without defenses to 1.75% with combined measures. However, these results evaluate the defenses rather than the model on relevant deployment surfaces.

See also  Schumer keeps his job as Democrats wonder if he's on borrowed time

The Cross-Vendor Prompt Injection Disclosure Grid

The following grid aids security teams in evaluating frontier models. Each row highlights disparities among the four labs, where direct comparisons falter. Data for Anthropic is sourced from the Opus 4.8 system card, while the others rely on each vendor’s safety documentation.

Dimension

Anthropic, Opus 4.8

OpenAI, GPT-5.5

Google, Gemini 3.x

Meta, Llama stack

Safety document

System card, May 28 2026, 244 pages

System card, April 23 2026, updated April 24

Model card plus a separate Frontier Safety Framework report

No closed-model card. Open weights plus the Purple Llama stack

Injection benchmark or dataset

ART from Gray Swan and UK AISI, the Shade tool, plus an internal browser eval, 129 environments

Internal connectors evaluation, known attacks

None for injection

AgentDojo, 97 tasks

Surfaces with an injection eval

Four. Tool use, coding, computer use, browser

One. Connectors

None published for injection

One. AgentDojo agent tasks

Multi-attempt escalation shown

Yes. ART benchmark at 1, 10, 100. Coding and computer use at 1 and 200

No. A single score

No

No

Headline metric and unit

Attack-success rate. Browser, with thinking, 31.5% raw, 0.5% safeguarded

Robustness score, higher is better. 0.963, down from 0.998 for GPT-5.4-thinking

None published. Increased resistance claimed qualitatively

Attack-success rate on AgentDojo. 17.6% baseline to 1.75% combined

Live external bounty

Yes. One-week live injection bounty with external red-teamers

No injection bounty. Bio bounty only

None found

None found

Regression disclosed

Yes, explicit, with numbers

Number fell 0.998 to 0.963, not framed as a regression

Increased resistance claimed, no numbers

Not applicable

See also  Epstein Files Spark Harvard Investigation into Larry Summers

Five Considerations for Security Teams

Anthropic provided comprehensive testing across four surfaces, while OpenAI evaluated just one. Google did not disclose per-surface rates, and Meta focused on grading its defenses rather than the model itself. These varied disclosures don’t facilitate straightforward comparisons, but following these five steps can help build a comprehensive evaluation.

Identify and categorize all agents based on their interface—browser, code, connectors, or desktop. Anthropic’s Opus 4.8 shows a 2.09% rate for coding and 0.5% for browser applications. A generalized figure is inadequate. Obtain the vendor’s published rate for each surface. If unavailable, consider it untested.

Share the Cross-Vendor grid with all evaluated vendors. A 0.963 connectors score and a 31.5% browser rate should not be directly compared. Request detailed per-surface attack success rates, both raw and with safeguards, along with the attack methodology. Blank cells indicate lack of first-party data.

Clarify in writing which metrics apply to your integration. Anthropic’s 0.5% figure pertains to Claude in Chrome and Cowork with full safeguards. The API lacks these protections. Do not accept product figures for API deployments.

Include two specific clauses in the RFP. Ensure the vendor tested with adaptive attackers capable of rewriting payloads and that external parties attempted to breach the model. Anthropic used Gray Swan’s Shade tool and conducted a one-week paid bounty. OpenAI relied on known attacks for one surface. Real-world adversaries will not use known payloads.

Conduct an independent injection test prior to deploying any agent. Vendor figures are based on their system prompts and environments. Your configuration has its own prompts, permissions, and data access, requiring a unique evaluation. Set a pass threshold; anything exceeding it should not be deployed.

In conclusion, without an established standard, a vendor’s figures reveal what they chose to measure. Your red team’s evaluation will expose your vulnerabilities.

TAGGED:agentAnthropicsBrowserEngagedhijackedsafeguardstime
Share This Article
Twitter Email Copy Link Print
Previous Article The Garment Resort 2027 Collection The Garment Resort 2027 Collection
Next Article Global Health Meets Modern Travel Global Health Meets Modern Travel

Popular Posts

CU Buffs routed at Baylor

Fast Break: Colorado Buffaloes Fall to Baylor Bears Reasons for the Loss: The Colorado Buffaloes…

February 4, 2026

Marques’Almeida Pre-Fall 2025 Collection | Vogue

Marques Almeida: A Surprising Dive into Glamour Marques Almeida, known for their edgy and grunge…

April 15, 2025

Do we grow new brain cells as adults? The answer seems to be yes

Developing brain cells from the hippocampus growing in cultureARTHUR CHIEN/SCIENCE PHOTO LIBRARY Exploring the fascinating…

July 3, 2025

Sunderland vs Bristol City Prediction and Betting Tips

Sunderland will be facing off against Bristol City at the Stadium of Light in a…

December 15, 2024

OpenAI admits prompt injection is here to stay as enterprises lag on defenses

OpenAI Acknowledges the Permanence of Prompt Injection Threats OpenAI, a leading AI company, recently published…

December 24, 2025

You Might Also Like

Hyperbaric oxygen therapy is being explored as a long COVID treatment. Here’s what the research shows
Tech and Science

Hyperbaric oxygen therapy is being explored as a long COVID treatment. Here’s what the research shows

June 1, 2026
Google Home Speaker Potential Launch Date 25 June
Tech and Science

Google Home Speaker Potential Launch Date 25 June

June 1, 2026
9 Teacher-Approved Time Management Activities for High School Students
Education

9 Teacher-Approved Time Management Activities for High School Students

June 1, 2026
Fitbit Air Selling Out – but You Can Buy One Here
Tech and Science

Fitbit Air Selling Out – but You Can Buy One Here

June 1, 2026
logo logo
Facebook Twitter Youtube

About US


Explore global affairs, political insights, and linguistic origins. Stay informed with our comprehensive coverage of world news, politics, and Lifestyle.

Top Categories
  • Crime
  • Environment
  • Sports
  • Tech and Science
Usefull Links
  • Contact
  • Privacy Policy
  • Terms & Conditions
  • DMCA

© 2024 americanfocus.online –  All Rights Reserved.

Welcome Back!

Sign in to your account

Lost your password?