Maybe AI agents can be lawyers after all

AI Advancements: Anthropic’s Opus 4.6 Raises the Bar

In a recent article, we discussed Mercor’s AI benchmarking efforts, highlighting the limited capabilities of AI agents in professional tasks such as law and corporate analysis. The scores were disappointing, with major labs scoring below 25%, leading us to believe that lawyers were safe from AI displacement, at least for the time being.

However, the landscape of AI capabilities can shift rapidly in just a few weeks.

The recent introduction of Anthropic’s Opus 4.6 has significantly altered the leaderboards. Anthropic’s latest model achieved nearly 30% in one-shot trials and an average of 45% with additional attempts at the task. The new model incorporates innovative features like “agent swarms,” which seem to enhance multi-step problem-solving abilities.

This remarkable leap in performance from the previous state-of-the-art showcases the continuous progress in foundation models. Mercor CEO Brendan Foody expressed his astonishment, stating, “jumping from 18.4% to 29.8% in a few months is insane.”

The APEX-Agents Leaderboard.Image Credits:Mercor (screenshot)

While 30% is still far from perfection, it serves as a reminder that the threat of AI replacing human professionals like lawyers is not imminent. However, this significant advancement should prompt legal professionals to reconsider their confidence in the face of evolving AI capabilities.