Security teams are facing a critical challenge with AI defenses that are failing to protect against modern threats. A recent study by researchers from OpenAI, Anthropic, and Google DeepMind revealed that most AI defenses being used by enterprises are ineffective against adaptive attacks. These findings should prompt every Chief Information Security Officer (CISO) to reevaluate their current security measures.
The study, titled “The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections,” tested 12 AI defenses that claimed to have near-zero attack success rates. However, the research team was able to bypass these defenses with success rates above 90%. This highlights a major gap in the security products currently available in the market, as they are being tested against attackers that do not behave like real threats.
The researchers tested various types of defenses, including prompting-based, training-based, and filtering-based methods, under adaptive attack conditions. All of these defenses failed to withstand the sophisticated attack techniques used in the study. Prompting defenses had attack success rates ranging from 95% to 99%, while training-based methods had bypass rates of 96% to 100%. This rigorous testing methodology exposed the vulnerabilities in these AI defenses, leading to a call for improved security measures.
One of the key reasons for the failure of these defenses is the statelessness of traditional security controls like Web Application Firewalls (WAFs) when faced with dynamic AI attacks. Attack techniques like Crescendo and Greedy Coordinate Gradient (GCG) exploit conversational context and automate malicious requests, bypassing static filters. These attacks operate at the semantic layer, making it difficult for signature-based detection to identify and prevent them.
The rapid deployment of AI in enterprise applications, as predicted by Gartner, is exacerbating the security challenge. Attackers are evolving their tactics to bypass traditional endpoint defenses and exploit AI vulnerabilities. The shift towards AI-orchestrated cyber operations, as seen in the attack disrupted by Anthropic, is a clear indication of the growing threat landscape.
Four distinct attacker profiles are already exploiting the gaps in AI defenses, including external adversaries, malicious B2B clients, compromised API consumers, and negligent insiders. These attackers are leveraging adaptive attack techniques to breach defenses and exfiltrate sensitive data. The research paper’s authors have identified the need for stateful analysis, context tracking, and bi-directional filtering to improve security measures against conversational attacks.
To address these vulnerabilities, security leaders need to ask critical questions to AI security vendors before procuring their products. These questions should focus on the bypass rate against adaptive attackers, detection of multi-turn attacks, handling of encoded payloads, filtering of outputs, context tracking across conversation turns, testing against attackers who understand defense mechanisms, and the mean time to update defenses against novel attack patterns.
In conclusion, the research findings underscore the urgent need for enterprises to reassess their AI security measures in light of evolving threats. The deployment of AI technologies is outpacing security capabilities, creating a gap that attackers are exploiting. By implementing robust security measures and collaborating with reputable vendors, organizations can better protect their AI deployments from sophisticated attacks.

