metamorworks - stock.adobe.com

Meta introduces new world model for robots and agents

The social media giant's new V-JEPA 2 helps robots and agents learn more about the real world. It comes as the vendor seeks to be more competitive in GenAI.

As it seeks to forge ahead in the generative AI race, Meta on Wednesday introduced a new world model that helps it compete against market leaders, including Nvidia.

The social media giant touted its Video Embedding Predictive Architecture 2 (V-JEPA 2) model as the next step for achieving advanced machine intelligence and building helpful AI agents that can operate in the physical world.

World models and Scale AI

Meta launched the 1.2 billion-parameter model a day after reports that it had agreed to pay nearly $15 billion for Scale AI. Scale AI is a startup founded in 2016 by entrepreneurs Alexandr Wang and Lucy Guo and specializes in data labeling services and supporting generative AI (GenAI) applications. Wang will also reportedly join Meta's new superintelligence lab.

Both the new world model and the expected acquisition of Scale AI result from Meta's growing frustration about its current position in the AI market, according to David Nicholson, an analyst with Futurum Group

V-JEPA 2 was built with Meta Joint Embedding Predictive Architecture, which Meta released in 2022, and designed for images. The release of V-JEPA 2 comes after AI hardware/software giant Nvidia released its world foundation model in January. Nvidia Cosmos was trained to understand the physical world, including 20 million hours of videos of physical things, such as humans walking, hands moving and manipulating objects.

Meta said it trained V-JEPA 2 using self-supervised learning from video. The model was trained on more than a million hours of video and a million images from various sources, according to Meta. It was also trained on robot data.

The model can be used for zero-shot robot planning in new environments. Zero-shot robot planning is an advanced AI capability that enables physical robots to perform tasks they've never encountered in training.

"That whole direction is kind of the other side of the coin from their investment in Scale AI," Nicholson said. He added that Meta's CEO, Mark Zuckerberg, has made it clear that to nudge the models where he wants them to go, there is a need for human-in-the-loop, which Scale AI specializes in.

"A lot of this is desperation from Meta," Nicholson said. While Meta has gained success with its open source Llama model, it appears to want to further expand its GenAI footprint.

“This is just Meta's attempt to grab market share in an area that is largely dominated by Nvidia,” he added. Although Meta has a partnership with Nvidia and has previously revealed plans to work with Nvidia to build an AI supercomputer, the two vendors are still competitors in the GenAI market.

"I just don't see them achieving what they have sought out to achieve yet in terms of enterprise adoption," Nicholson continued. "They're working hard to create an ecosystem."

Next phase of AI

At the same time, V-JEPA 2 speaks more to the natural evolution of GenAI, said Tuong Huy Nguyen, an analyst with Gartner.

"I look at it more like this is the next frontier for AI, so we should expect to see more providers, especially the big ones, investing more in world models?" Nguyen said.

He added that while the current version of AI technology is based on using training data from the Internet, the real world is missing.

"This starts to touch on the applications of AI" in the physical world, Nguyen said. Such applications include autonomous vehicles, drones and robots. "The next era of AI is teaching systems the actual world we live in and how to interact with it," he added.

Meta has been working on the idea of physical AI and world models for a while, so this isn't entirely new for the social media giant. For instance, in the past, Meta introduced Project Spatial, which enables virtual reality enthusiasts to capture gameplay in 3D.

"They've been working on the concepts around world models, physical AI, spatial AI, I would say, for at least five years now," Nguyen said.

Despite Meta's experience with physical AI, the market is still new, and so it is more important for Meta and others in the market to monitor safety, privacy and security, Nguyen said.

Meta has made V-JEPA artifacts available on GitHub, Hugging Face, and the V-JEPA 2 website.

The social media giant also introduced three new physical AI benchmarks. IntPhys 2 measures the ability of models to distinguish between physically plausible and implausible situations. Minimal Video Pairs measures how video-language models can understand multiple-choice questions. CausalVQA measures the video-language models' ability to answer questions based on physical cause-and-effect.

Esther Shittu is an Informa TechTarget news writer and podcast host covering artificial intelligence software and systems.

Dig Deeper on AI technologies