Google Deepmind has revealed the Genie 3, the latest basic world model that can be used to train general purpose AI agents. This states that AI labs create an important stepping stone on the path to “artificial general information” or human-like intelligence.
“The Genie 3 is the first real-time interactive, general purpose world model,” Shlomi Fruchter, research director at Deepmind, said during a press conference. “It goes beyond the narrow world model that previously existed. It’s not unique to a particular environment. It can generate both photography and the real world and the imaginary world, and everything in between.”
Although not yet published in the research preview, Genie 3 is built on both its predecessor Genie 2 (which can generate new environments for agents) and Deepmind’s latest video generation model Veo 3 (he is said to have a deep understanding of physics).

With a simple text prompt, Genie 3 can generate an interactive 3D environment for several minutes at 720p resolution at 24 frames/sec. This is a huge jump from 10 to 20 seconds that Genie 2 can generate. This model also features the ability to change the world generated using “fast world events” or prompts.
Perhaps most importantly, the simulations in Genie 3 are physically consistent over time, as the model can recall what it previously generated. This is the ability that DeepMind says researchers didn’t explicitly program the model.
According to Fruchter, Genie 3 influences educational experience, games, or creative concepts of prototyping, but its true unlock appears in training agents for general purpose tasks that have been stated to be essential to reaching AGI.
“We believe that the world model is aimed at roads to AGI, particularly specifically specifically, specifically specific agents,” Jack Parker-Holder, a research scientist on DeepMind’s open-endness team, said during a briefing.
TechCrunch Events
San Francisco
|
October 27th-29th, 2025

The Genie 3 appears to be designed to solve that bottleneck. Like VEO, it does not rely on hard-coded physics engines. Instead, Deepmind said the model teaches how the world works, how objects move, fall and interact.
“This model is auto-compressed, meaning that it produces one frame at a time,” Fulctor told TechCrunch in an interview. “To determine what happens next, we have to look back at what was previously generated. That’s an important part of the architecture.”
According to the company, that memory helps the consistency of the simulated world of Genie 3. This allows humans to develop grasps of physics in the same way as how humans understand that glass wobbling at the edge of the table is falling, or that they should duck to avoid falling objects.
In particular, Deepmind says that the model could also push AI agents to their limits. They force people to learn from their own experiences, just like how humans learn in the real world.
As an example, DeepMind shared the test for Genie 3 with a recent version of the generalist scalable, directive MultiWorld Agent (SIMA) and directed them to pursue a set of goals. In a warehouse setting, they asked agents to perform tasks such as “approaching a bright green garbage compactor” and “walking to a packed red forklift.”
“In all three cases, SIMA agents can achieve their goals,” Parker Holder said. “It just takes action from the agent. So the agent can achieve its goals, be simulated around the world and take action in the world. Genie3 moves forward.

That said, Genie 3 has its limitations. For example, researchers claim to be able to understand physics, but the demonstrations showing skiers barreling down the mountains did not reflect how snow moves in relation to skiers.
Additionally, the scope of actions an agent can take is limited. For example, rapid global events allow for a wide range of environmental interventions, but the agents themselves do not necessarily perform it. Additionally, accurate modeling of complex interactions between multiple independent agents in a shared environment remains difficult.
Genie 3 can also support continuous interaction for several minutes if time is needed for proper training.
Still, this model presents an attractive step for education agents to potentially plan, explore, seek uncertainty and improve through trial and error, beyond responses to input.
“We haven’t moved 37 moments to materialized agents yet. We can actually take novel actions in the real world,” Parker Holder said.
“But now we can potentially guide you through a new era,” he said.