Thursday, 30 Apr 2026
  • Contact
  • Privacy Policy
  • Terms & Conditions
  • DMCA
logo logo
  • World
  • Politics
  • Crime
  • Economy
  • Tech & Science
  • Sports
  • Entertainment
  • More
    • Education
    • Celebrities
    • Culture and Arts
    • Environment
    • Health and Wellness
    • Lifestyle
  • 🔥
  • Trump
  • House
  • ScienceAlert
  • White
  • VIDEO
  • man
  • Trumps
  • Season
  • star
  • Years
Font ResizerAa
American FocusAmerican Focus
Search
  • World
  • Politics
  • Crime
  • Economy
  • Tech & Science
  • Sports
  • Entertainment
  • More
    • Education
    • Celebrities
    • Culture and Arts
    • Environment
    • Health and Wellness
    • Lifestyle
Follow US
© 2024 americanfocus.online – All Rights Reserved.
American Focus > Blog > Tech and Science > Did xAI lie about Grok 3’s benchmarks?
Tech and Science

Did xAI lie about Grok 3’s benchmarks?

Last updated: February 23, 2025 12:10 am
Share
Did xAI lie about Grok 3’s benchmarks?
SHARE

Debates Surrounding AI Benchmarks: A Closer Look

The world of artificial intelligence is a rapidly evolving landscape, with ongoing discussions about the accuracy and transparency of benchmark results. Recently, a public dispute between OpenAI and xAI has brought these concerns to light.

The controversy began when an OpenAI employee accused xAI, Elon Musk’s AI company, of publishing misleading benchmark results for their latest AI model, Grok 3. In response, xAI’s co-founder defended the company’s actions, sparking a heated debate within the AI community.

In a blog post on xAI’s website, a graph was presented showcasing Grok 3’s performance on AIME 2025, a challenging math exam used as a benchmark for AI models. While some experts have questioned the validity of AIME as a benchmark, it is commonly utilized to assess a model’s mathematical capabilities.

The graph displayed Grok 3 Reasoning Beta and Grok 3 mini Reasoning outperforming OpenAI’s best model, o3-mini-high, on AIME 2025. However, it was noted by OpenAI employees that the graph did not include o3-mini-high’s score at “cons@64,” a crucial metric that gives models multiple attempts to answer each problem and can significantly impact their overall performance.

Further analysis revealed that Grok 3 Reasoning Beta and Grok 3 mini Reasoning’s initial scores on AIME 2025 fell below o3-mini-high’s score. Additionally, Grok 3 Reasoning Beta trailed behind OpenAI’s o1 model in terms of computing power. Despite this, xAI continued to promote Grok 3 as the “world’s smartest AI.”

The debate escalated as accusations of misleading benchmark practices were exchanged between the two companies. A neutral party attempted to provide a more accurate representation of each model’s performance at cons@64, shedding light on the complexities of comparing AI models.

The importance of considering the computational and financial costs associated with achieving benchmark scores was emphasized by AI researcher Nathan Lambert. This highlights the need for a more comprehensive understanding of AI models’ limitations and capabilities.

As the AI community continues to navigate the complexities of benchmarking practices, transparency and accuracy remain paramount. The ongoing debate between OpenAI and xAI serves as a reminder of the challenges and controversies inherent in assessing AI performance.

See also  Elon Musk suggests spate of xAI exits have been push, not pull

TAGGED:benchmarksGrokLiexAI
Share This Article
Twitter Email Copy Link Print
Previous Article Tottenham are asking not to be called Tottenham Tottenham are asking not to be called Tottenham
Next Article Musk orders US federal workers to report on work by Monday or resign Musk orders US federal workers to report on work by Monday or resign
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

Popular Posts

“Being Myself Is Snob”: JT and Alex Consani Launch the Glossiest MAC Collab Yet

MAC Cosmetics Launch Party: A Night to Remember in West Village Friday night in New…

July 14, 2025

Get home equity cash without refinancing

Interest rates on home equity lines of credit (HELOCs) and home equity loans are currently…

January 25, 2026

The Trump administration says it wants a ‘nuclear renaissance.’ These actions suggest otherwise.

In recent years, there has been a renewed interest in nuclear power as a clean…

April 24, 2025

The Procter & Gamble Company (PG) to Cut 7,000 Jobs, Streamline Portfolio for Growth

In our latest analysis, we've curated a list of the 12 Most Undervalued Dow Stocks…

September 30, 2025

Jonathan Bailey Is Revealed As People’s 2025 Sexiest Man Alive

Jonathan Bailey has officially been crowned People’s 2025 Sexiest Man Alive, adding another feather to…

November 4, 2025

You Might Also Like

ChatGPT Images 2.0 is a hit in India, but not a big winner elsewhere, yet
Tech and Science

ChatGPT Images 2.0 is a hit in India, but not a big winner elsewhere, yet

April 30, 2026
Africa Is Splitting Apart Faster Than We Thought, Forming a New Ocean : ScienceAlert
Tech and Science

Africa Is Splitting Apart Faster Than We Thought, Forming a New Ocean : ScienceAlert

April 30, 2026
Claude Code, Copilot and Codex all got hacked. Every attacker went for the credential, not the model.
Tech and Science

Claude Code, Copilot and Codex all got hacked. Every attacker went for the credential, not the model.

April 30, 2026
Pioneering geneticist and decoder of the human genome J. Craig Venter dies at 79
Tech and Science

Pioneering geneticist and decoder of the human genome J. Craig Venter dies at 79

April 30, 2026
logo logo
Facebook Twitter Youtube

About US


Explore global affairs, political insights, and linguistic origins. Stay informed with our comprehensive coverage of world news, politics, and Lifestyle.

Top Categories
  • Crime
  • Environment
  • Sports
  • Tech and Science
Usefull Links
  • Contact
  • Privacy Policy
  • Terms & Conditions
  • DMCA

© 2024 americanfocus.online –  All Rights Reserved.

Welcome Back!

Sign in to your account

Lost your password?