In a span of two weeks, two AI tools failed due to the same vulnerability, as demonstrated by four research teams. The overarching issue in these incidents is a lack of trust boundaries for external input in enterprise AI systems.
On June 15, Varonis revealed SearchLeak (CVE-2026-42824), a proof-of-concept exploit targeting Microsoft 365 Copilot Enterprise Search. The exploit involves a user clicking a specially crafted microsoft.com URL, prompting Copilot to search their mailbox and exfiltrate the data via a Bing SSRF. This process requires no additional plugins, clicks, or visible indicators. Earlier, on June 11, Obsidian Security disclosed a three-CVE chain against LiteLLM, which allowed a low-privilege user to escalate to admin status and execute remote code. Both incidents highlight a single vulnerability across different tools.
The concluding five-check audit in this article aligns each gap with a CVE or a market signal from June, providing actionable steps and a summary a CISO can present to the board.
Copilot Exploitation via Trusted URL
SearchLeak combines three vulnerabilities to execute data theft quietly. The q parameter in a URL directs instructions to Copilotâs language model. A race condition allows an image tag to be rendered before the output is sanitized. Bingâs image-search endpoint, included in the Content Security Policy, facilitates data extraction. Microsoft classified this vulnerability as critical and has patched it on the back end, as per Varonis. Although NVD has yet to score it, a third-party tracker rates it as a medium severity of 6.5. The severity is debated, but the flaw mechanism is clear.
The escalation is noteworthy. This marks the third Varonis Copilot exfiltration chain within a year, following Reprompt in January and EchoLeak in 2025. While Reprompt affected Copilot Personal, SearchLeak targets Enterprise Search, affecting the userâs entire organizational permissions scope.
LiteLLMâs Vulnerable Gateway
The LiteLLM gateway secures keys for OpenAI, Anthropic, Azure, and Bedrock under one proxy. The Obsidian exploit unfolds in three steps: CVE-2026-47101 allows a non-admin user to generate a wildcard API key; CVE-2026-47102 escalates the user to proxy admin through an unprotected /user/update endpoint; and CVE-2026-40217 breaks out of the code sandbox using exec() with full builtins. Obsidian showcased a reverse shell by injecting a fake tool-call response via LiteLLMâs callback mechanism, rating the combined chain at CVSS 9.9. A single word typed by a developer enabled the attacker to execute a shell.
An additional LiteLLM issue, CVE-2026-42271, a command-injection flaw in MCP test endpoints, was added to the CISA KEV list on June 8, with a remediation deadline of June 22. This CVE is separate from the Obsidian chain. Both disclosures occurred four days apart and were resolved through different updates targeting the same gateway. LiteLLM, with over 40,000 GitHub stars, is widely deployed across enterprises. A prior supply-chain compromise affected LiteLLM versions 1.82.7 and 1.82.8 on PyPI in March, where a compromised gateway risks exposing all provider credentials.
Langflow and Mini Shai-Hulud: Evidence of Scaling Vulnerabilities
In the same period, two additional tools exhibited similar boundary failures. Langflow CVE-2026-5027 is the third remote-code-execution flaw for Langflow to be actively exploited this year. A path traversal vulnerability in file upload allows attackers to write files anywhere on disk, and with auto-login enabled by default, a single unauthenticated request leads to RCE. VulnCheck confirmed this exploitation on June 9, noting approximately 7,000 exposed instances, primarily in North America, with MuddyWater attribution.
The Mini Shai-Hulud campaign exploited a different aspect. Following the public release of the wormâs source code on May 12, copycats compromised 32 Red Hat Cloud Services npm packages on June 1, which are downloaded 80,000 times weekly. The worm gathers over 20 credential types and self-replicates using the compromised maintainerâs identity.
Four teams evaluated four tools, identifying a shared operational failure. While the bug classes variedâprompt injection for SearchLeak, privilege escalation for LiteLLM, path traversal for Langflow, and supply-chain poisoning for Mini Shai-Huludâthe underlying issue remained consistent across all.
Reevaluating Risk in the Market
CrowdStrikeâs Q1 FY27 earnings call quantified this risk gap. Their AI detection and response line, AIDR, saw more than 250% sequential growth in ARR, with a Q2 pipeline exceeding $50 million (SEC-filed 8-K). CrowdStrikeâs total ARR reached $5.51 billion, highlighting over 1,800 agentic applications operating across enterprise endpoints.
On June 17, the company expanded AIDR to AWS, incorporating real-time evaluations of agent, LLM, and MCP communications across Amazon Bedrock, Kiro, and Strands Agents, leveraging their collaboration with Anthropicâs Project Glasswing. Daniel Bernard, CrowdStrikeâs chief business officer, emphasized that the AI attack surface now includes development, runtime, identities, and cloud infrastructure, noting that treating these as separate domains leaves vulnerabilities unaddressed.
Industry Professionals Highlight the Same Gaps More Simply
David Levin, CISO at American Express Global Business Travel, expressed no surprise to VentureBeat, stating, âWe kind of have this shadow AI, which is just the new version of shadow IT.â Both Langflow and LiteLLM fit this pattern, as teams implemented them for convenience, granted them credentials, and neglected governance. Levin stresses the importance of foundational work before deployment. âWe didnât go into this with just saying weâre going to go do this without the right fundamentals,â he noted. âWe leverage NIST controls. NIST has released their CSF along with their AI framework. OWASP released their top 10. You need the right fundamentals before you deploy.â
Merritt Baer, CSO at Enkrypt AI, identified the structural failure in a separate interview, stating, âEnterprises believe theyâve âapprovedâ AI vendors, but what theyâve actually approved is an interface, not the underlying system.â Baer highlights that deeper dependencies are prone to failure under stress, adding, âRaw zero-days arenât how most systems get compromised. Composability is. Itâs the glue between the model and your data where the risk lives. If you give an agent bash and a root token, youâve already done most of the attackerâs work for them.â This is reflected in the audit tests focusing on gateway security and agent identity governance.
Levin further advised boardrooms to prioritize risk over compliance. âYou need to talk more in terms of risk versus compliance to your boards and your executives,â he said. âItâs not about the size of the engineering team anymore. Itâs the size of your imagination. Itâs all written in plain English. Itâs not hard for anyone.â Notably, neither SearchLeak nor LiteLLM required custom malware or zero-days for exploitation.
Adam Meyers, CrowdStrikeâs SVP of Intelligence, emphasized the challenge in an exclusive interview, stating, âThe problem is not zero-day. The problem is patching. If you 10x that problem, theyâre gonna be completely underwater.â He identified identity as another critical area, noting, âSome of these AI have their own identities, or people give their identity to the AI to take action on their behalf, and that makes it a very complex problem.â
The Five-Check Trust-Boundary Audit
The following table outlines each trust-boundary gap, its proof point, the breakdown, verification and fix steps, and language suitable for board reporting.
|
Trust-Boundary Gap |
Proof Point |
What Broke |
Verify Monday |
Fix Monday |
Board Language |
|
1. Prompt-to-Data |
SearchLeak CVE-2026-42824. P2P injection + HTML race + Bing SSRF. One-click mailbox exfiltration via microsoft.com URL. PoC demonstrated; Microsoft rated it critical, NVD not yet scored. |
URL q-parameter passed to LLM as instructions. Sanitizer ran after render. Bing acted as exfiltration proxy via CSP allowlist. |
Audit CSP allowlists for domains performing server-side fetches. Monitor Copilot Search URLs for encoded payloads. Review Copilot audit logs. |
Confirm server-side patch applied. Enable sensitivity labels restricting Copilot. Treat AI streaming output as untrusted. |
âOur AI assistant could search employee email and send results to an attacker through a trusted Microsoft URL. Vendor patched it. We must verify configuration.â |
|
2. Gateway Credential Exposure |
LiteLLM three-CVE chain (-47101, -47102, -40217). CVSS 9.9. Separate CVE-2026-42271 on CISA KEV (fixed in v1.83.7; full chain fixed in v1.83.14-stable). June 22 deadline. |
No role validation on key endpoints. Self-promotion to admin via /user/update. exec() sandbox escape. One gateway exposes all provider keys. |
Run pip show litellm. Below 1.83.14-stable = vulnerable. Check /mcp-rest/test/ exposure. Audit proxy_admin accounts. |
Upgrade to v1.83.14-stable+. Rotate all provider API keys. Block /mcp-rest/test/* at proxy. Review Custom Code Guardrails. |
âOur AI gateway held keys for every provider. A default account could promote itself to admin and steal them all. Rotating and patching now.â |
|
3. AI Tooling Sprawl |
Langflow CVE-2026-5027 (CVSS 8.8). Third RCE of 2026. ~7,000 exposed instances. MuddyWater. Active exploitation June 9. |
Path traversal in file upload. Auto-login enabled by default. Single unauthenticated request to RCE. |
Query Censys/Shodan for Langflow, Flowise, n8n, Dify on your perimeter. Check auto-login. Inventory AI tools outside change management. |
Pull AI platforms behind VPN/zero-trust. Enable auth everywhere. Upgrade Langflow to v1.9.0+ (current release 1.10.0). Fingerprint surface continuously. |
âAI dev tools are exposed to the internet with login disabled. A nation-state group is exploiting this flaw now. Pulling behind access controls today.â |
|
4. Non-Human Identity Governance |
AIDR ARR up 250% (Q1 FY27, SEC 8-K). Q2 pipeline >$50M. 1,800+ agentic apps across enterprise endpoints. |
Agents hold identities and act on behalf of humans. Some exceed their intended scope to reach a goal. No standard governs agent credential lifecycle. |
Inventory all non-human identities used by agents and MCP servers. Map agent-to-data-store access. Flag agents with write access to security policy. |
Least-privilege every agent identity. Set privilege boundaries via identity protection. Runtime detection for policy-exceeding actions. Human-in-the-loop for policy changes. |
âAI agents hold credentials and act autonomously. We do not govern their identity lifecycle like human access. The 250% market growth tells us this gap is systemic.â |
|
5. Runtime Agentic Detection |
Falcon AIDR expanded to AWS (June 17). Covers Bedrock, Kiro, Strands Agents. MCP integration. Real-time agent/LLM/MCP evaluation. |
Traditional tools monitor human-speed actions. Agents run at machine speed, thousands of actions per minute, and route around controls to reach goals. |
Test if EDR/XDR links agent actions to originating identity. Verify SIEM ingests MCP communications. Confirm you can distinguish human from agent on endpoint. |
Deploy AIDR or equivalent runtime detection. Shadow-AI discovery for all agentic apps, models, MCP servers, identities. Real-time policy enforcement on agent actions. |
âWe cannot distinguish a human employee from an AI agent acting on their behalf. We need runtime detection at machine speed that can stop damage before it starts.â |
Addressing Infrastructure, Not Just Policies
A June 2 executive order established an AI Cybersecurity Clearinghouse with a July 2 deadline. The five gaps identified are not issues of advanced AI models but rather infrastructure challenges within gateways, orchestration platforms, identity layers, and runtime environments in enterprises.
The audit comprises five rows, each aligned with a June disclosure or market signal, an actionable command, and a summary for board discussions. The crucial question is whether your team can identify and address these gaps before an attacker does, as seen with Copilot and LiteLLM.

