Microsoft built a fake marketplace to test its AI agent - and it failed in a surprising way - Keyman Term Life

On Wednesday, Microsoft researchers released a new simulation environment designed to test AI agents, along with new research showing that current agent models may be vulnerable to manipulation. The study, conducted in collaboration with Arizona State University, raises new questions about how well AI agents perform when working without supervision, and how quickly AI companies can realize the promise of their future.

The simulation environment, named “Magentic Marketplace” by Microsoft, is built as a synthesis platform for experimenting with AI agent behavior. In a typical experiment, a customer agent might try to order dinner according to a user’s instructions, while agents representing different restaurants compete to get the order.

The team’s first experiment involved 100 individual customer-side agents interacting with 300 business-side agents. Because the Marketplace source code is open source, it is easy for other groups to adapt the code to run new experiments and reproduce the results.

Ece Kamar, managing director of Microsoft Research’s AI Frontiers Lab, said this type of research will be important for understanding the capabilities of AI agents. “There are real questions about how the world changes when these agents work together and talk to each other and negotiate with each other,” Kamal said. “We want to understand these things deeply.”

In our initial research, we investigated a combination of key models, including GPT-4o, GPT-5, and Gemini-2.5-Flash, and discovered some surprising weaknesses. Specifically, researchers have discovered several techniques that companies can use to manipulate customer agents into purchasing their products. Researchers found that efficiency decreased, especially as customer agents had more options to choose from and vast amounts of agent attention space.

“We want these agents to help us work through a lot of options,” Comer says. “And we find that the current model is really overwhelmed by too many options.”

Agents also encountered problems when asked to work together toward a common goal. Apparently, they didn’t know which agent should play what role in the collaboration. Although giving the model clearer instructions on how to collaborate improved performance, the researchers believed that the model’s unique features still needed improvement.

tech crunch event

san francisco
|
October 13-15, 2026

“You can instruct a model step-by-step, just like you would teach a model,” Comer says. “But if you’re essentially testing collaborative features, you would expect these models to have those features by default.”

Source link

What's Hot

Liverpool make battle for Bayern Munich forward ‘impossible’ for Euro giants to win

Sri Lanka Cricket tells players to stay in Pakistan after bomb attack | Cricket News

Michelle Obama talks about her fashion evolution in new book

YouTube TV’s Disney power outage is ruining my life (I can’t watch ‘Jeopardy!’)

What startups want from OpenAI

Why researchers are developing robots that look and act like bats

Productivity app Hero launches SDK to automatically complete AI prompts

Acquires radio streamer TuneIn for $175 million

Data centers currently attract more investment than finding new sources of oil supply

Toyota opens battery factory in U.S., confirms $10 billion investment plan | Automotive Industry News

White House considers $2,000 tariff dividend. Budget experts are skeptical. political news

Burger King more than triples number of stores in China through joint venture plan | Burger King Food News

Crypto billionaires among donors for White House ballroom

Opinion poll reveals greater dissatisfaction with President Trump’s government management

When the House of Commons returns, Prime Minister Johnson will swear in Congresswoman-elect Adelita Grijalva.

What's Hot

Microsoft built a fake marketplace to test its AI agent – and it failed in a surprising way

Related Posts

Subscribe to Updates