Artificial intelligence (AI) faced its toughest math test yet in the “First Proof” challenge, where experts presented 10 math problems to AI models to solve in a week. The challenge, conducted by 11 top mathematicians, aimed to test the ability of large language models (LLMs) to perform mathematical research. The results, released on Valentine’s Day, showed that while AI made attempts, it did not come close to solving all the problems.
The mathematicians behind First Proof provided the AIs with 10 “lemmas” or minor theorems that required originality to solve. This challenge highlighted the limitations of AI in the field of mathematics and also showcased the growing interest in AI within the mathematics community. Online forums and social media were flooded with purported proofs from mathematicians of various levels.
OpenAI, one of the AI startups involved in the challenge, posted its solutions after a week-long sprint using its latest AI models and expert feedback from human mathematicians. However, the results were mixed, with only two out of the ten solutions deemed correct. The style of proofs generated by the AI models surprised the mathematicians, with some resembling 19th-century mathematics rather than the cutting-edge mathematics of the 21st century.
While the challenge highlighted the progress AI has made in mathematics, it also raised questions about the extent of human assistance in the solutions. Some submissions appeared to have varying degrees of human input, which was against the rules of the challenge. The submissions will undergo thorough vetting by experts to determine their validity and originality.
The First Proof team plans to conduct a second round with tighter controls, aiming to gather more feedback on AI’s capabilities in solving mathematical problems. While some mathematicians were impressed by the progress AI has made, others expressed disappointment in the results. The challenge served as an experiment to explore the intersection of AI and mathematics, paving the way for future collaborations and advancements in the field.

