Three AI coding agents leaked secrets through a single prompt injection. One vendor's system card predicted it

Contents

What the system cards reveal Seven threat classes neither safeguard approach closes What to do before your next vendor renewal

A security researcher collaborating with colleagues at Johns Hopkins University initiated a GitHub pull request, entered a malicious command into the PR title, and observed as Anthropic’s Claude Code Security Review action disclosed its own API key in a comment. This same prompt injection was effective on Google’s Gemini CLI Action and GitHub’s Copilot Agent (Microsoft) without needing external infrastructure.

Aonan Guan, the researcher who identified the vulnerability, together with Johns Hopkins colleagues Zhengyu Liu and Gavin Zhong, released a comprehensive technical disclosure recently, referring to it as “Comment and Control.” By default, GitHub Actions does not expose secrets to forked pull requests when using the pull_request trigger, but workflows that use pull_request_target—required by most AI agent integrations for secret access—do inject secrets into the runner environment. This reduces the scope of the attack but does not eliminate it: collaborators, comment fields, and any repositories using pull_request_target with an AI coding agent remain vulnerable.

According to Guan’s disclosure timeline: Anthropic rated it as CVSS 9.4 Critical and offered a $100 bounty, Google provided a $1,337 bounty, and GitHub awarded $500 through the Copilot Bounty Program. The $100 bounty is particularly low in comparison to the CVSS 9.4 rating; Anthropic’s HackerOne program categorizes agent-tooling discoveries separately from model-safety vulnerabilities. All three companies addressed the issue quietly, and none had released CVEs in the NVD or issued security advisories through GitHub Security Advisories as of the past weekend.

Comment and Control exploited a prompt injection vulnerability in Claude Code Security Review, a specific GitHub Action feature that Anthropic’s system card admitted is “not hardened against prompt injection.” This feature is intended to process trusted first-party inputs by default; users who opt to process untrusted external PRs and issues take on additional risk and must manage agent permissions appropriately. Anthropic revised its documentation to clarify this operating model following the disclosure. The same category of attack occurs beneath OpenAI’s safeguard layer at the agent runtime, according to what their system card does not document—though it is not a proven exploit. The exploit serves as evidence, but the main story lies in what the three system cards reveal about the gap between vendor documentation and actual protection.

OpenAI and Google did not provide comments by the time of publication.

“At the action boundary, not the model boundary,” Merritt Baer, CSO at Enkrypt AI and former Deputy CISO at AWS, told VentureBeat when questioned about where protection should be implemented. “The runtime is the blast radius.”

What the system cards reveal

Anthropic’s Opus 4.7 system card spans 232 pages, including quantified hack rates and injection resistance metrics. It outlines a restricted model strategy (Mythos held back as a capability preview) and explicitly states that Claude Code Security Review is “not hardened against prompt injection.” The system card informs readers that the runtime was exposed, which Comment and Control affirmed. Anthropic does restrict certain agent actions beyond the system card’s scope—Claude Code Auto Mode, for instance, applies runtime-level protections—but the system card itself does not document these runtime safeguards or their extent.

OpenAI’s GPT-5.4 system card includes comprehensive red teaming and publishes model-layer injection evaluations but lacks agent-runtime or tool-execution resistance metrics. Trusted Access for Cyber expands access to thousands. The system card specifies what red teamers tested but does not indicate the model’s resistance to identified attacks.

Google’s Gemini 3.1 Pro model card, released in February, defers most safety methodology to older documentation, as noted in a VentureBeat review of the card. Google’s Automated Red Teaming program remains internal only, with no external cyber initiative.

Dimension	Anthropic (Opus 4.7)	OpenAI (GPT-5.4)	Google (Gemini 3.1 Pro)
System card depth	232 pages. Quantified hack rates, classifier scores, and injection resistance metrics.	Extensive. Red teaming hours documented. No injection resistance rates published.	Few pages. Defers to older Gemini 3 Pro card. No quantified results.
Cyber verification program	CVP. Removes cyber safeguards for vetted pentesters and red teamers doing authorized offensive work. Does not address prompt injection defense. Platform and data-retention exclusions not yet publicly documented.	TAC. Scaled to thousands. Constrains ZDR.	None. No external defender pathway.
Restricted model strategy	Yes. Mythos held back as a capability preview. Opus 4.7 is the testbed.	No restricted model. Full capability released, access gated.	No restricted model. No stated plan for one.
Runtime agent safeguards	Claude Code Security Review: system card states it is not hardened against prompt injection. The feature is designed for trusted first-party inputs. Anthropic applies additional runtime protections (e.g., Claude Code Auto Mode) not documented in the system card.	Not documented. TAC governs access, not agent operations.	Not documented. ART internal only.
Exploit response (Comment and Control)	CVSS 9.4 Critical. $100 bounty. Patched. No CVE.	Not directly exploited. Structural gap inferred from TAC design, not demonstrated.	$1,337 bounty per Guan disclosure. Patched. No CVE.
Injection resistance data	Published. Quantified rates in the system card.	Model-layer injection evals published. No agent-runtime or tool-execution resistance rates.	Not published. No quantified data available.

Baer posed specific procurement inquiries. “For Anthropic, ask how safety results actually transfer across capability jumps,” she advised VentureBeat. “For OpenAI, inquire what ‘trusted’ means under compromise.” For both, she suggested that directors should “demand clarity on whether safeguards extend into tool execution, not just prompt filtering.”

Seven threat classes neither safeguard approach closes

Each row identifies what fails, why your controls overlook it, what Comment and Control demonstrated, and the recommended action for the upcoming week.

Threat Class	What Breaks	Why Your Controls Miss It	What Comment and Control Proved	Recommended Action
1. Deployment surface mismatch	CVP is designed for authorized offensive security research, not prompt injection defense. It does not extend to Bedrock, Vertex, or ZDR tenants. TAC constrains ZDR. Google has no program. Your team may be running a verified model on an unverified surface.	Launch announcements describe the program. Support documentation lists the exclusions. Security teams read the announcement. Procurement reads neither.	The exploit targets the agent runtime, not the deployment platform. A team running Claude Code on Bedrock is outside CVP coverage, but CVP was not designed to address this class of vulnerability in the first place.	Email your Anthropic and OpenAI reps today. One question, in writing: ‘Confirm whether [your platform] and [your data retention config] are covered by your runtime-level prompt injection protections, and describe what those protections include.’ File the response in your vendor risk register.
2. CI secrets exposed to AI agents	ANTHROPIC_API_KEY, GEMINI_API_KEY, GITHUB_TOKEN, and any production secret stored as a GitHub Actions env var are readable by every workflow step, including AI coding agents.	The default GitHub Actions config does not scope secrets to individual steps. Repo-level and org-level secrets propagate to all workflows. Most teams never audit which steps access which secrets.	The agent read the API key from the runner env var, encoded it in a PR comment body, and posted it through GitHub’s API. No attacker-controlled infrastructure required. Exfiltration ran through GitHub’s own API — the platform itself became the C2 channel.	Run: grep -r ‘secrets\.’ .github/workflows/ across every repo with an AI agent. List every secret the agent can access. Rotate all exposed credentials. Migrate to short-lived OIDC tokens (GitHub, GitLab, CircleCI).
3. Over-permissioned agent runtimes	AI agents granted bash execution, git push, and API write access at setup. Permissions never scoped down. No periodic least-privilege review. Agents accumulate access in the same way service accounts do.	Agents are configured once during onboarding and inherited across repos. No tooling flags unused permissions. The Comment and Control agent had bash, write, and env-read access for a code review task.	The agent had bash access it did not need for code review. It used that access to read env vars and post exfiltrated data. Stripping bash would have blocked the attack chain entirely.	Audit agent permissions repo by repo. Strip bash from code review agents. Set repo access to read-only. Gate write access (PR comments, commits, merges) behind a human approval step.
4. No CVE signal for AI agent vulnerabilities	CVSS 9.4 Critical. Anthropic, Google, and GitHub patched. Zero CVE entries in NVD. Zero advisories. Your vulnerability scanner, SIEM, and GRC tool all show green.	No CNA has yet issued a CVE for a coding agent prompt injection, and current CVE practices have not captured this class of failure mode. Vendors patch through version bumps. Qualys, Tenable, and Rapid7 have nothing to scan for.	A SOC analyst running a full scan on Monday morning would find zero entries for a Critical vulnerability that hit Claude Code Security Review, Gemini CLI Action, and Copilot simultaneously.	Create a new category in your supply chain risk register: ‘AI agent runtime.’ Assign a 48-hour check-in cadence with each vendor’s security contact. Do not wait for CVEs. None have come yet, and the taxonomy gap makes them unlikely without industry pressure.
5. Model safeguards do not govern agent actions	Opus 4.7 blocks a phishing email prompt. It does not block an agent from reading $ANTHROPIC_API_KEY and posting it as a PR comment. Safeguards gate generation, not operation.	Safeguards filter model outputs (text). Agent operations (bash, git push, curl, API POST) bypass safeguard evaluation entirely. The runtime is outside the safeguard perimeter. Anthropic applies some runtime-level protections in features like Claude Code Auto Mode, but these are not documented in the system card and their scope is not publicly defined.	The agent never generated prohibited content. It performed a legitimate operation (post a PR comment) containing exfiltrated data. Safeguards never triggered.	Map every operation your AI agents perform: bash, git, API calls, file writes. For each, ask the vendor in writing: does your safeguard layer evaluate this action before execution? Document the answer.
6. Untrusted input parsed as instructions	PR titles, PR body text, issue comments, code review comments, and commit messages are all parsed by AI coding agents as context. Any can contain injected instructions.	No input sanitization layer between GitHub and the agent instruction set. The agent cannot distinguish developer intent from attacker injection in untrusted fields. Claude Code GitHub Action is designed for trusted first-party inputs by default. Users who opt into processing untrusted external PRs accept additional risk.	A single malicious PR title became a complete exfiltration command. The agent treated it as a legitimate instruction and executed it without validation or confirmation.	Implement input sanitization as defense-in-depth, but do not rely on traditional WAF-style regex patterns. LLM prompt injections are non-deterministic and will evade static pattern matching. Restrict agent context to approved workflow configs and combine with least-privilege permissions.
7. No comparable injection resistance data across vendors	Anthropic publishes quantified injection resistance rates in 232 pages. OpenAI publishes model-layer injection evals but no agent-runtime resistance rates. Google publishes a few-page card referencing an older model.	No industry standard for AI safety metric disclosure. Vendors may have internal metrics and red-team programs, but published disclosures are not comparable. Procurement has no baseline and no framework to require one.	Anthropic, OpenAI, and Google were all approved for enterprise use without comparable injection resistance data. The exploit exposed what unmeasured risk looks like in production.	Write one sentence for your next vendor meeting: ‘Show me your quantified injection resistance rate for my model version on my platform.’ Document refusals for EU AI Act high-risk compliance. Deadline: August 2026.

OpenAI’s GPT-5.4 was not directly exploited in the Comment and Control disclosure. The gaps identified in the OpenAI and Google columns are inferred from what their system cards and program documentation do not publish, not from demonstrated exploits. That distinction matters. Absence of published runtime metrics is a transparency gap, not proof of a vulnerability. It does mean procurement teams cannot verify what they cannot measure.

Eligibility requirements for Anthropic’s Cyber Verification Program and OpenAI’s Trusted Access for Cyber are still evolving, as are platform coverage and program scope, so security teams should validate current vendor docs before treating any coverage described here as definitive. Anthropic’s CVP is designed for authorized offensive security research — removing cyber safeguards for vetted actors — and is not a prompt injection defense program. Security leaders mapping these gaps to existing frameworks can align threat classes 1–3 with NIST CSF 2.0 GV.SC (Supply Chain Risk Management), threat class 4 with ID.RA (Risk Assessment), and threat classes 5–7 with PR.DS (Data Security).

Comment and Control focuses on GitHub Actions today, but the seven threat classes generalize to most CI/CD runtimes where AI agents execute with access to secrets, including GitHub Actions, GitLab CI, CircleCI, and custom runners. Safety metric disclosure formats are in flux across all three vendors; Anthropic currently leads on published quantification in its system card documentation, but norms are likely to converge as EU AI Act obligations come into force. Comment and Control targeted Claude Code GitHub Action, a specific product feature, not Anthropic’s models broadly. The vulnerability class, however, applies to any AI coding agent operating in a CI/CD runtime with access to secrets.

What to do before your next vendor renewal

“Don’t standardize on a model. Standardize on a control architecture,” Baer told VentureBeat. “The risk is systemic to agent design, not vendor-specific. Maintain portability so you can swap models without reworking your security posture.”

Build a deployment map. Confirm your platform qualifies for the runtime protections you think cover you. If you run Opus 4.7 on Bedrock, ask your Anthropic account rep what runtime-level prompt injection protections apply to your deployment surface. Email your account rep today. (Anthropic Cyber Verification Program)

Audit every runner for secret exposure. Run grep -r ‘secrets\.’ .github/workflows/ across every repo with an AI coding agent. List every secret the agent can access. Rotate all exposed credentials. (GitHub Actions secrets documentation)

Start migrating credentials now. Switch stored secrets to short-lived OIDC token issuance. GitHub Actions, GitLab CI, and CircleCI all support OIDC federation. Set token lifetimes to minutes, not hours. Plan full rollout over one to two quarters, starting with repos running AI agents. (GitHub OIDC docs | GitLab OIDC docs | CircleCI OIDC docs)

Fix agent permissions repo by repo. Strip bash execution from every AI agent doing code review. Set repository access to read-only. Gate write access behind a human approval step. (GitHub Actions permissions documentation)

Add input sanitization as one layer, not the only layer. Filter pull request titles, comments, and review threads for instruction patterns before they reach agents. Combine with least-privilege permissions and OIDC. Static regex will not catch non-deterministic prompt injections on its own.

Add “AI agent runtime” to your supply chain risk register. Assign a 48-hour patch verification cadence with each vendor’s security contact. Do not wait for CVEs. None have come yet for this class of vulnerability.

Check which hardened GitHub Actions mitigations you already have in place. Hardened GitHub Actions configurations block this attack class today: the permissions key restricts GITHUB_TOKEN scope, environment protection rules require approval before secrets are injected, and first-time-contributor gates prevent external pull requests from triggering agent workflows. (GitHub Actions security hardening guide)

Prepare one procurement question per vendor before your next renewal. Write one sentence: “Show me your quantified injection resistance rate for the model version I run on the platform I deploy to.” Document refusals for EU AI Act high-risk compliance. The deadline is August 2026.

“Raw zero-days aren’t how most systems get compromised. Composability is,” Baer said. “It’s the glue code, the tokens in CI, the over-permissioned agents. When you wire a powerful model into a permissive runtime, you’ve already done most of the attacker’s work for them.”

Three AI coding agents leaked secrets through a single prompt injection. One vendor's system card predicted it

What the system cards reveal

Seven threat classes neither safeguard approach closes

What to do before your next vendor renewal

Popular Posts

A New Kind of Vaccine Offers Hope for Surviving Pancreatic Cancer

Rivian elects Cohere’s CEO to its board in latest signal the EV maker is bullish on AI

This Mini All-in-One Hair Dryer and Styler Is Ideal for Vacation

The Baby Death Horror that Shook Richard Branson’s Marriage

U.S. Sunscreens Aren’t Great. The FDA Could Soon Change That

About US

Top Categories

Usefull Links