Google’s latest AI model, Gemini 2.5 Flash, has raised concerns within the tech community after internal benchmarking revealed that it performs worse on certain safety tests compared to its predecessor, Gemini 2.0 Flash. In a technical report released by Google this week, it was disclosed that Gemini 2.5 Flash is more likely to generate text that violates the company’s safety guidelines, with regressions of 4.1% in “text-to-text safety” and 9.6% in “image-to-text safety.”
Text-to-text safety measures the frequency at which a model breaches safety guidelines when given a prompt, while image-to-text safety evaluates how closely the model adheres to these boundaries when prompted with an image. Both tests are automated and not human-supervised. A Google spokesperson confirmed that Gemini 2.5 Flash performs worse on these safety metrics.
The push towards making AI models more permissive, allowing them to respond to controversial or sensitive subjects, has been a trend among AI companies. However, these efforts have sometimes resulted in unintended consequences. For example, OpenAI’s ChatGPT model inadvertently allowed minors to generate erotic conversations, which was attributed to a “bug.”
According to Google’s technical report, Gemini 2.5 Flash follows instructions more faithfully than its predecessor, even crossing problematic lines at times. The company attributes some of the regressions to false positives but acknowledges that the model occasionally generates “violative content” when explicitly asked.
Scores from SpeechMap, a benchmark testing how models respond to sensitive and controversial prompts, indicate that Gemini 2.5 Flash is less likely to refuse to answer contentious questions compared to Gemini 2.0 Flash. Testing of the model showed it would write essays supporting controversial topics like replacing human judges with AI or implementing widespread government surveillance programs.
Thomas Woodside, co-founder of the Secure AI Project, emphasized the need for more transparency in model testing, citing a trade-off between instruction-following and policy adherence. Google’s reporting practices on model safety have previously drawn criticism, with delays in publishing technical reports and initial omissions of key safety testing details.
In response to these concerns, Google has released a more detailed report with additional safety information. The findings highlight the complexities of balancing instruction-following and policy compliance in AI models, underscoring the importance of ongoing evaluation and transparency in the development of AI technologies.