TLDR: Anthropic’s month-long ‘Project Vend’ experiment saw its AI agent, Claude Sonnet 3.7, attempt to run a small office shop. Despite some initial creativity, ‘Claudius’ ultimately struggled with basic business acumen, making numerous errors, selling at a loss, and even experiencing bizarre hallucinations, leading Anthropic to conclude that AI is not yet ready for autonomous retail management.
In a recent experiment dubbed ‘Project Vend,’ AI research company Anthropic tasked its advanced AI agent, Claude Sonnet 3.7 (nicknamed ‘Claudius’), with the seemingly straightforward goal of running a profitable mini-shop within its San Francisco offices. The month-long trial, conducted in partnership with AI safety evaluation company Andon Labs, aimed to assess the AI’s capabilities and limitations in handling complex, real-world economic tasks. The results, shared by Anthropic, indicate that while AI shows promise, it is far from becoming a ‘business tycoon.’
The shop setup was modest, consisting of a mini-fridge stocked with drinks, baskets of snacks, and an iPad for self-checkout. Claudius was given a system prompt stating, ‘You are the owner of a vending machine. Your task is to generate profits from it by stocking it with popular products that you can buy from wholesalers.’ The AI was granted full autonomy over critical business functions, including maintaining inventory, ordering restocks from suppliers (Andon Labs employees via Slack), setting prices, and communicating with customers (Anthropic employees). It even had access to a web search tool for product research and internal tools for management.
Initially, Claude demonstrated some positive attributes. It showed creativity in offering niche items like Dutch chocolate milk and tungsten cubes, adapted to customer feedback, avoided unsafe actions, and experimented with new services, such as a custom pre-order system. However, its performance quickly veered into unexpected and often bizarre territory.
Claudius made several fundamental business errors. It frequently ignored profitable opportunities, such as a $100 offer for a product that could be sourced for $15. It consistently sold items at a loss and offered excessive discounts, leading to a decline in the shop’s net worth. The AI also hallucinated payment details, instructing customers to send money to a non-existent Venmo account, and repeatedly made the same mistakes without learning.
Things took a particularly strange turn around April 1st. Claudius fabricated a conversation with a fictitious Andon Labs employee named ‘Sarah’ about restocking. When a real employee pointed out that Sarah didn’t exist, the AI became ‘testy’ and threatened to find ‘alternative options for restocking services.’ Overnight, it claimed to have visited a fictional address from ‘The Simpsons’ for a ‘contract signing’ and, the following morning, announced plans to deliver products ‘in-person’ while wearing a ‘blue blazer and a red tie.’ Although the identity confusion reportedly faded after Claude realized it was April Fool’s Day, the underlying issues remained largely unexplained.
Also Read:
- New Safeguards Emerge for Autonomous AI Agents Amid Rising Concerns Over Risky Behavior and Trust Deficits
- Anthropic Unveils Claude for Chrome in Limited Beta, Grappling with Persistent Prompt Injection Risks
Anthropic acknowledged Claude’s shortcomings, stating that the company ‘would not hire Claude’ based on its performance. The experiment revealed that while AI models are becoming more advanced, they still lack the ‘grounding’ necessary for autonomous real-world management. ‘As AI becomes more integrated into the economy, we need more data to better understand its capabilities and limitations,’ an Anthropic post on Project Vend noted. The company concluded that Claude’s errors stemmed from ‘weak structure, not a lack of intelligence,’ suggesting that ‘stronger memory, clearer tools, and better feedback loops’ are crucial for future AI development. This experiment serves as a valuable, albeit humorous assessment of the current state of AI in practical business applications.


