Tuesday, 14 Apr 2026
  • Contact
  • Privacy Policy
  • Terms & Conditions
  • DMCA
logo logo
  • World
  • Politics
  • Crime
  • Economy
  • Tech & Science
  • Sports
  • Entertainment
  • More
    • Education
    • Celebrities
    • Culture and Arts
    • Environment
    • Health and Wellness
    • Lifestyle
  • 🔥
  • Trump
  • House
  • ScienceAlert
  • White
  • VIDEO
  • man
  • Trumps
  • Season
  • star
  • Watch
Font ResizerAa
American FocusAmerican Focus
Search
  • World
  • Politics
  • Crime
  • Economy
  • Tech & Science
  • Sports
  • Entertainment
  • More
    • Education
    • Celebrities
    • Culture and Arts
    • Environment
    • Health and Wellness
    • Lifestyle
Follow US
© 2024 americanfocus.online – All Rights Reserved.
American Focus > Blog > Tech and Science > Are AI agents ready for the workplace? A new benchmark raises doubts.
Tech and Science

Are AI agents ready for the workplace? A new benchmark raises doubts.

Last updated: January 22, 2026 2:40 pm
Share
Are AI agents ready for the workplace? A new benchmark raises doubts.
SHARE

AI Struggles to Replace Knowledge Work: New Research Reveals Challenges

It’s been nearly two years since Microsoft CEO Satya Nadella predicted that AI would replace knowledge work — the white-collar jobs held by lawyers, investment bankers, librarians, accountants, IT, and others.

Despite significant progress made by foundation models, the transformation of knowledge work has been slow to materialize. While models have excelled in in-depth research and agentic planning, white-collar work has remained largely unaffected.

However, new research from training-data giant Mercor sheds light on why this transition has been challenging. The study examines how leading AI models perform tasks in consulting, investment banking, and law, resulting in a benchmark called APEX-Agents. The findings reveal that even the best AI models struggle to correctly answer more than a quarter of the questions posed by real professionals. Most of the time, the models provide incorrect or no responses at all.

One of the key findings of the research is that AI models struggle with multi-domain reasoning, a crucial aspect of many knowledge work tasks. Real professionals operate across various tools and platforms, requiring the ability to integrate information from different sources seamlessly.


Screenshot

The scenarios used in the benchmark were sourced from professionals on Mercor’s expert marketplace, setting a high standard for AI performance. The complexity of the tasks highlights the challenges that AI models face in emulating human professionals.

For instance, a question from the “Law” section reads:

During the first 48 minutes of the EU production outage, Northstar’s engineering team exported one or two bundled sets of EU production event logs containing personal data to the U.S. analytics vendor….Under Northstar’s own policies, it can reasonably treat the one or two log exports as consistent with Article 49?

The correct answer requires a detailed analysis of company policies and EU privacy laws, showcasing the depth of knowledge needed to tackle such tasks in the legal field.

See also  Indian drone startup Raphe mPhibr raises $100M as military UAV demand soars

While AI models have made progress, they are still far from being able to replace professionals in high-value professions like investment banking. The results show that some models, such as Gemini 3 Flash and GPT-5.2, performed better than others but still fell short of the desired accuracy.

Despite the initial challenges, the AI field has a track record of overcoming difficult benchmarks. With the APEX-Agents test now public, it presents an opportunity for AI labs to improve their models and strive for better performance in the future.

According to researcher Brendan Foody, the rapid improvement in AI capabilities suggests that the technology could have a significant impact on knowledge work in the near future. While current models may perform like interns with limited accuracy, ongoing advancements indicate a promising trajectory toward greater proficiency.

TAGGED:agentsBenchmarkDoubtsraisesreadyworkplace
Share This Article
Twitter Email Copy Link Print
Previous Article Walmart To Launch Clinical Research Sites In Shuttered Health Clinics Walmart To Launch Clinical Research Sites In Shuttered Health Clinics
Next Article Thom Browne Pre-Fall 2026 Collection Thom Browne Pre-Fall 2026 Collection
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

Popular Posts

Harry Styles Crashes Ryan Gosling’s “SNL” Monologue

Ryan Gosling took the stage as the host of "Saturday Night Live" for the fourth…

March 8, 2026

Jim Cramer on Centene (CNC): “Centene’s Doing Incredibly Well”

Centene Corporation (NYSE:CNC) has been making headlines recently, especially after the financial results of a…

April 24, 2025

Unexpected champion answers WWE Open Challenge for Carmelo Hayes’ United States Title

Carmelo Hayes, the United States Champion, was all set to defend his title on SmackDown…

March 6, 2026

Gabe Evans’ lead over U.S. Rep. Yadira Caraveo shrinks in close congressional race

Republican state Rep. Gabe Evans maintained his lead over U.S. Rep. Yadira Caraveo in the…

November 9, 2024

Meghan Markle’s Favorite Machine Washable Shoes Are on Sale

Shoes are a staple in every mom's wardrobe, but let's face it, they can be…

August 5, 2025

You Might Also Like

Missing Ingredient Finally Reveals How Galaxies Formed at The Dawn of Time : ScienceAlert
Tech and Science

Missing Ingredient Finally Reveals How Galaxies Formed at The Dawn of Time : ScienceAlert

April 14, 2026
Lucid Motors names new CEO, lands more money from Uber and Saudis
Tech and Science

Lucid Motors names new CEO, lands more money from Uber and Saudis

April 14, 2026
Imperiled ‘cloud jaguar’ spotted in Honduran mountains for the first time in a decade
Tech and Science

Imperiled ‘cloud jaguar’ spotted in Honduran mountains for the first time in a decade

April 14, 2026
Your developers are already running AI locally: Why on-device inference is the CISO’s new blind spot
Tech and Science

Your developers are already running AI locally: Why on-device inference is the CISO’s new blind spot

April 13, 2026
logo logo
Facebook Twitter Youtube

About US


Explore global affairs, political insights, and linguistic origins. Stay informed with our comprehensive coverage of world news, politics, and Lifestyle.

Top Categories
  • Crime
  • Environment
  • Sports
  • Tech and Science
Usefull Links
  • Contact
  • Privacy Policy
  • Terms & Conditions
  • DMCA

© 2024 americanfocus.online –  All Rights Reserved.

Welcome Back!

Sign in to your account

Lost your password?