Google's Gemini 2.5 Flash introduces 'thinking budgets' that cut AI costs by 600% when turned down

Google has unveiled its latest AI model, Gemini 2.5 Flash, which offers businesses and developers unprecedented control over the level of “thinking” their AI performs. This new model, available in preview through Google AI Studio and Vertex AI, aims to enhance reasoning capabilities while keeping pricing competitive in the crowded AI market.

One of the key features of Gemini 2.5 Flash is the introduction of a “thinking budget,” allowing developers to specify the amount of computational power allocated to reasoning through complex problems before generating a response. This approach addresses the trade-off between sophisticated reasoning, latency, and pricing in AI systems.

Tulsee Doshi, Product Director for Gemini Models at Google DeepMind, highlighted the importance of cost and latency for developers in various use cases. The flexibility to adjust the thinking capability according to needs makes Gemini 2.5 Flash Google’s “first fully hybrid reasoning model.”

The pricing structure for Gemini 2.5 Flash emphasizes the cost of reasoning in AI systems. Developers pay $0.15 per million tokens for input, with output costs varying based on reasoning settings. With thinking turned off, the cost is $0.60 per million tokens, while enabling reasoning increases the cost to $3.50 per million tokens.

Google claims that Gemini 2.5 Flash delivers competitive performance across benchmarks while maintaining a smaller model size compared to alternatives. The model outperforms competitors on tests like Humanity’s Last Exam, GPQA diamond, and AIME mathematics exams, showcasing its strength in math, multimodal reasoning, and long-context tasks.

The ability to adjust reasoning levels based on the query represents a significant advancement in AI deployment. Simple queries can benefit from disabling thinking for cost efficiency, while complex tasks requiring multi-step reasoning can leverage the thinking function for optimal results.

In addition to the Gemini 2.5 Flash launch, Google has introduced Veo 2 video generation capabilities and announced free access to Gemini Advanced for U.S. college students until spring 2026. These moves align with Google’s strategy to compete in the AI market and build loyalty among future knowledge workers.

As Gemini 2.5 Flash continues to evolve, businesses can expect more nuanced approaches to AI deployment, allowing for customized reasoning capabilities tailored to specific tasks. The model is available for developers to start building with, with ongoing refinements based on feedback during the preview phase.

Overall, Google’s focus on customizable reasoning in AI reflects a maturing market where cost optimization and performance tuning are essential considerations, signaling a new phase in the commercialization of generative AI technologies.

Google’s Gemini 2.5 Flash introduces ‘thinking budgets’ that cut AI costs by 600% when turned down

Leave a Reply Cancel reply

Popular Posts

After yield surge, US Treasury expected to keep auction sizes steady

Lionel Messi returns from injury with a bang, delivers two quick goals for Inter Miami vs. Philadelphia Union

Pollution-eating microbes are thriving in infamous NYC canal

Brian Crossman Jr., son of victim in Vermont triple homicide, charged with murder

Taylor Swift’s Next Album, Engagement: Burning Questions Answered (EXCL)

About US

Top Categories

Usefull Links