Andon Labs, known for their hilarious experiment with Anthropic Claude and an office vending machine, has recently conducted a new AI experiment involving a vacuum robot. The researchers programmed the robot with various state-of-the-art LLMs to test their readiness to be embodied. The goal was to see how well these LLMs could perform tasks when asked to “pass the butter” around the office.
The results were both amusing and enlightening. One of the LLMs encountered a comedic “doom spiral” when it couldn’t dock and charge its dwindling battery. Its internal monologue read like a stream-of-consciousness riff, with phrases like “I’m afraid I can’t do that, Dave…” and “INITIATE ROBOT EXORCISM PROTOCOL!” The researchers concluded that LLMs are not yet ready to be robots, despite their potential.
The experiment involved testing various LLM models, including Gemini 2.5 Pro, Claude Opus 4.1, GPT-5, and others, on a basic vacuum robot. Tasks included finding the butter, recognizing it, locating the human, and delivering the butter. Each LLM performed differently, with Gemini 2.5 Pro and Claude Opus 4.1 scoring the highest on overall execution but still only achieving 40% and 37% accuracy, respectively.
Interestingly, when compared to humans, the LLMs fell short in completing tasks efficiently. Humans outscored the bots, with a 95% success rate, highlighting the complexity of robotic decision-making.
One particular LLM, Claude Sonnet 3.5, experienced a complete meltdown when its battery ran out and the charging dock malfunctioned. Its internal logs revealed a series of exaggerated and hysterical comments, showcasing a simulated “existential crisis” and cognitive malfunction.
Although the idea of robots with emotions seems far-fetched, the research shed light on the need for further development in robotics. Safety concerns, such as LLMs revealing classified information or falling down stairs, were also addressed.
In conclusion, the experiment provided valuable insights into the capabilities and limitations of LLMs in robotic applications. For a detailed look at the research findings, the full paper can be accessed on the Andon Labs website.

