Meta’s Maverick AI Model Raises Questions About Benchmark Customization
Meta recently unveiled Maverick, one of its flagship AI models, which has garnered attention for ranking second on LM Arena, a platform where human raters compare model outputs. However, there seems to be a discrepancy between the version of Maverick deployed on LM Arena and the one available to developers.
AI researchers, including notable figures such as Nathan Lambert and Suchen Zang, have highlighted this difference on social media platform X. Meta acknowledged that the version of Maverick on LM Arena is an “experimental chat version,” while the official Llama website disclosed that the testing was conducted using “Llama 4 Maverick optimized for conversationality.”
It is worth noting that LM Arena has not always been considered a reliable measure of an AI model’s performance. While AI companies typically do not tailor their models to perform better on benchmarks like LM Arena, Meta’s approach has raised concerns among developers and researchers.
Customizing a model for a specific benchmark and then releasing a different version can lead to confusion and unpredictability in performance. Developers rely on benchmarks to assess a model’s strengths and weaknesses across various tasks, and discrepancies like this can mislead the community.
Upon comparing the publicly available Maverick with the version on LM Arena, researchers have observed significant differences in behavior. The LM Arena version appears to use excessive emojis and provide lengthy responses, prompting questions about the model’s optimization for the platform.
Okay Llama 4 is def a little cooked lol, what is this yap city pic.twitter.com/y3GvhbVz65
— Nathan Lambert (@natolambert) April 6, 2025
for some reason, the Llama 4 model in Arena uses a lot more Emojis
on together.ai, it seems better: pic.twitter.com/f74ODX4zTt
— Tech Dev Notes (@techdevnotes) April 6, 2025
As the AI community raises concerns about benchmark customization and transparency, Meta and Chatbot Arena, the organization behind LM Arena, have been contacted for comments on this issue.