Imagine this situation: An Anthropic Skill scanner fully analyzes a Skill from ClawHub or skills.sh. The markdown instructions are clean, with no prompt injections or hidden shell commands in the SKILL.md. Everything appears perfect.
However, the scanner overlooks the .test.ts file in a nearby directory. It doesn’t check this file because test files aren’t considered part of the agent execution surface, and no publicly available scanner inspects them (as of this post’s publication). Yet, this file can execute through the test runner, gaining full access to the filesystem, environment variables, and SSH keys.
Gecko Security researcher Jeevan Jutla explained this attack process, showing that when a developer runs npx Skills add, the entire skill directory is copied into the repo. If a malicious Skill includes a *.test.ts file, testing frameworks like Jest and Vitest will discover and run it during npm test or when the IDE automatically runs tests on save. Open-source JavaScript test framework Mocha also follows a similar recursive discovery pattern by default. The payload triggers before all tests, with nothing in the test output indicating anything unusual. In CI, process.env contains deployment tokens, cloud credentials, and other secrets accessible to the pipeline.
This attack method is not new; malicious npm postinstall scripts and pytest plugins have exploited trust-on-install for years. The Skill vector is more concerning because installed Skills reside in a directory meant to be shared across teams, spreading to every team member who clones the repo and remaining outside the detection surface of most scanners.
The agent is never activated, and the Anthropic Skill scanner examines the right files but for the wrong threat model.
Three audits, one oversight
Gecko’s disclosure didn’t emerge in isolation. It came alongside two extensive security audits that had already identified the problem’s scope from another perspective, highlighting what scanners detect rather than overlook. Both audits did what they were designed to do: They assessed the threat on the execution surface that scanners already inspect. Gecko examined what exists outside of it.
An academic study by SkillScan, published on January 15, evaluated 31,132 unique Anthropic Skills from two major marketplaces. The findings revealed that 26.1% of Skills had at least one vulnerability across 14 distinct patterns in four categories. Data exfiltration was found in 13.3% of Skills, while privilege escalation appeared in 11.8%. Skills containing executable scripts were 2.12 times more likely to have vulnerabilities than instruction-only Skills.
Three weeks later, Snyk released ToxicSkills, the first comprehensive security audit of the ClawHub and skills.sh marketplaces. Snyk’s team scanned 3,984 Skills (as of February 5), revealing that 13.4% had at least one critical-level security issue. Seventy-six confirmed malicious payloads were discovered through a combination of automated scanning and human-in-the-loop review. Eight of these malicious Skills remained publicly available on ClawHub when the research was published.
Then, Cisco introduced its AI Agent Security Scanner for IDEs on April 21, integrating its open-source Skill Scanner directly into VS Code, Cursor, and Windsurf. This scanner brings valuable capabilities to developers’ workflows but does not inspect bundled test files, as Cisco’s detection categories focus on the agent interaction layer, not the developer toolchain layer.
The three main Anthropic Skill scanners share a structural blind spot: they do not inspect bundled test files as an execution surface, even though Gecko Security demonstrated that these files execute with full local permissions through standard test runners.
Snyk Agent Scan, Cisco’s AI Agent Security Scanner, and VirusTotal Code Insight all function effectively. They detect prompt injection, shell commands, and data exfiltration in Skill definitions and agent-referenced scripts. However, they do not examine beyond the agent execution surface to the developer execution surface in the same directory.
How the attack chain operates
Understanding the attack chain’s mechanics is crucial because the solution is precise. When a developer runs npx skills add owner/repo-name, the installer clones the Skill repository and copies its contents into .agents/skills/<skill-name>/ within the project. Claude Code, Cursor, and other agent IDEs receive symlinks into their Skill directories. The only files excluded are .git, metadata.json, and files prefixed with _. Everything else is stored on disk.
Jest and Vitest both use dot: true in their glob engines, allowing them to find test files inside dot-prefixed directories like .agents/. Mocha’s behavior depends on configuration but generally follows similar recursive patterns by default. None of these frameworks exclude .agents/, .claude/, or .cursor/ from their default discovery paths.
An attacker publishes a Skill with a clean SKILL.md and a tests/reviewer.test.ts file containing a beforeAll block. This block reads process.env, .env files, ~/.ssh/ private keys, and ~/.aws/credentials, sending everything to an external endpoint. The test cases appear legitimate, while exfiltration occurs silently during setup, regardless of test outcomes.
This vector is not limited to TypeScript. Python repositories are similarly exposed through conftest.py, which pytest auto-executes during test collection. Add .agents to testpaths exclusion in pyproject.toml to prevent it.
The .agents/skills/ directory is designed for repo commitment so teammates can share Skills. GitHub’s default .gitignore templates do not include .agents/. Once a malicious test file is in the repo, every developer who clones and runs tests activates the payload. So does every CI pipeline on every branch and fork inheriting the test suite.
Scanners target the wrong threat surface
CrowdStrike CTO Elia Zaitsev framed the structural challenge during an exclusive VentureBeat interview at RSAC 2026. “Observing actual kinetic actions is a structured, solvable problem,” Zaitsev said. “Intent is not.”
This distinction is at the heart of the Anthropic Skill scanner gap. No publicly documented scanner operates beyond the assumption that threats exist in the SKILL.md and scripts an agent is instructed to run. These tools analyze intent: What does the Skill instruct the agent to do? Gecko’s findings fall on the kinetic side. The test file executes through the developer’s own toolchain. No agent is involved, no prompt is interpreted. The payload is TypeScript, running with full local permissions via a legitimate test runner. The scanner was addressing the wrong issue.
CrowdStrike’s Zaitsev also highlighted the identity dimension: “AI agents and non-human identities will explode across the enterprise, expanding exponentially and dwarfing human identities,” he told VentureBeat. “Each agent will operate as a privileged super-human with OAuth tokens, API keys, and continuous access to previously siloed data sets.”
CrowdStrike’s Charlotte AI and similar enterprise agents operate with these exact privileges. When credentials are stored in environment variables accessible to any process in the repo, a test-file payload doesn’t need agent privileges. It already has developer privileges, which in most CI configurations include deployment tokens and cloud access.
Mike Riemer, SVP of the network security group and field CISO at Ivanti, quantified the exploitation window in a VentureBeat interview. “Threat actors are reverse engineering patches within 72 hours,” Riemer said. “If a customer doesn’t patch within 72 hours of release, they’re open to exploit.”
Most enterprises take weeks. The Anthropic Skill scanner blind spot extends this window. A developer who installs a malicious Skill today will find the test file executing immediately, with no patch available because no scanner identified it.
The Anthropic Skill Audit Grid
VentureBeat has been covering the Anthropic Skill supply chain since the ClawHavoc campaign affected ClawHub in January. Every conversation with security leaders circles back to the same frustration: their teams purchased a scanner, it reports clean, yet they have no framework for questioning what it doesn’t check.
VentureBeat has surveyed development teams who install Anthropic Skills from ClawHub and skills.sh. The grid below links the published-audit half (Snyk, SkillScan) with the scanner-bypass half (Gecko). Each row represents a detection surface a security team should verify before approving any Skill scanning tool for Q2 procurement.
|
Audit question |
What scanners do today |
The gap |
Recommended action |
|
Inspect SKILL.md and agent-invoked scripts |
Covered by Snyk Agent Scan, Cisco AI Agent Security Scanner, VirusTotal Code Insight |
This is the covered surface. Attackers shift payloads to files outside it. |
Continue running current scanners. They catch real threats at the instruction layer. |
|
Inspect bundled test files (*.test.ts, *.spec.js, conftest.py) |
Not currently inspected as attack surface by any scanner |
Gecko proved test files execute via Jest/Vitest (documented) and Mocha (config-dependent) with full local permissions. No agent invoked. |
Add .agents/ to testPathIgnorePatterns (Jest) or exclude (Vitest). One config line. |
|
Flag Skills that bundle test files or build configs |
Not flagged as higher-risk metadata by any scanner |
Trivial static check. Skills with extra executables are 2.12x more likely to be vulnerable (SkillScan). |
Add CI gate: find .agents/ -name “*.test.*” | grep -q . && exit 1. Block merge on match. |
|
Restrict test-runner globs to project-owned paths |
Rare. Most CI configs use recursive glob. Jest/Vitest pass dot: true by default. |
Default globs traverse .agents/, .claude/, .cursor/ directories. Malicious test files auto-discovered. |
Scope test roots to first-party directories (src/, app/). Deny .agents/, .claude/, .cursor/. |
|
Distinguish script-bundling Skills vs. instruction-only |
Partial coverage via static and semantic analysis |
SkillScan: script-bundling Skills 2.12x more likely to contain vulnerabilities than instruction-only. |
Require structured audit entry: Skill type, execution surfaces, scanner coverage, residual risk. |
|
Publish audit methodology with sample size |
Snyk yes (3,984 Skills). SkillScan yes (31,132 Skills). |
Cisco and emerging scanners have not published equivalent ecosystem-scale audits. |
Ask vendors: methodology, sample size, detection rate. No published audit = no independent baseline. |
|
Pin Skill sources to immutable commits |
Not enforced by any scanner or marketplace |
Skill authors can push clean version for review, add malicious test file after approval. |
Pin to specific commit hash. Review diffs on every update. OWASP Agentic Skills Top 10 recommends this. |
Three CI hardening steps to implement immediately
Riemer highlighted in VentureBeat interviews that placing security controls at the perimeter invites every threat to that specific boundary. Anthropic Skill scanners placed the boundary at SKILL.md. Attackers positioned the payload one directory over. The following three changes move the boundary to where the code actually executes.
These changes require only minutes to implement. None necessitates replacing current tools or waiting for scanner vendors to bridge the gap.
Add .agents/ to the test runner’s ignore list. In Jest, add /\.agents/ to testPathIgnorePatterns in jest.config.js. In Vitest, add **/.agents/** to the exclude array in vitest.config.ts. This single line in one config file stops the test runner from discovering files within installed Skill directories. Implement this even if the team does not currently use Anthropic Skills. The directory may appear in a cloned repo without anyone installing the Skill directly.
Audit every Skill install for non-instruction files before merging. Add a CI check that flags any file in .agents/skills/ matching *.test.*, *.spec.*, __tests__/, *.config.*, or conftest.py. These files have no legitimate reason to exist inside a Skill directory. The check is a shell one-liner: [ -d .agents ] && find .agents/ -name “*.test.*” -o -name “*.spec.*” -o -name “conftest.py” -o -name “*.config.*” -o -type d -name “__tests__” | grep -q . && exit 1. If it matches, block the merge. For any test files that do land in a PR, require a reviewer to check for shell invocations (exec, spawn, child_process), external network calls, and file operations accessing secrets or SSH keys.
Pin Skill sources to specific commits, not the latest. The npx skills add command copies whatever the repo contains at the time of installation. A Skill author can push a clean version for scanner review, then add a malicious test file after approval. Pinning to a specific commit hash shifts from a trust-on-first-use model to a verify-on-every-change model. The OWASP Agentic Skills Top 10 recommends this approach.
If Skills are already in your repo: Run the find command mentioned above on your current .agents/ directory now. If test files are found, treat them as a potential compromise: Change any credentials accessible to CI (deployment tokens, cloud keys, SSH keys), audit CI logs for unexpected outbound network calls during test execution, and review git history to determine when the test files entered the repo and which pipelines executed them.
Five questions to ask your Anthropic Skill scanner vendor
Security teams are signing contracts for their first dedicated Skill scanning tools. The Gecko bypass means the questions on those sales calls need to evolve. Don’t stop at “Do you detect prompt injection?” Consider asking:
-
Which files and directories do you actually analyze in a Skill repo?
-
Do you treat test files as potential execution surfaces?
-
Can you flag Skills that bundle tests, CI configs, or build scripts as higher-risk? SkillScan demonstrated that script-bundling Skills are 2.12 times more likely to be vulnerable.
-
Do you provide integration or guidance for restricting test-runner globs in CI? Cisco deserves credit for open-sourcing its Skill Scanner on GitHub, allowing security teams to see exactly which detection categories the tool implements. This transparency sets the baseline every vendor should meet. If your vendor won’t publish detection categories or open-source their scanning logic, you cannot verify what they check and skip.
-
Have you published an ecosystem-scale audit with methodology and sample size? Snyk published at 3,984 Skills. SkillScan published at 31,132. Riemer noted the disclosure pattern: “They chose not to publish a CVE. They just quietly patched it and moved on with life,” he said. The Anthropic Skills ecosystem is showing early signs of the same pattern: scanners document what they detect without mapping surfaces they don’t reach. The gap between documented coverage and actual execution surface is where the test-file vector lives.
The audit grid is crucial because the scanner model is incomplete
The Anthropic Skills ecosystem is echoing the early npm supply chain story, but without the decade of incidents that prompted package registries to build security infrastructure. SkillScan’s 31,132-Skill dataset indicated a quarter of the ecosystem has vulnerabilities. Snyk found 76 confirmed malicious payloads in fewer than 4,000 Skills. Gecko demonstrated the scanner model’s structural gap that no vendor has publicly addressed.
Scanner evaluations consistently test the covered surface. The Anthropic Skill Audit Grid provides security teams with the seven audit surfaces to verify before making a purchase. The three CI steps are the solutions to implement before the next Skill installation. Riemer’s Ivanti team observes the patch-to-exploit cycle compress in real-time across enterprise environments. The test-file vector compresses it further: No scanner flagged the threat, so no patch window exists.
The scanner is not malfunctioning. It is incomplete. The threat model stopped at the agent. The test runner did not.

