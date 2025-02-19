Microsoft on Wednesday presented its research into a foundation model that can generate video game visuals and controller actions, advancing generative AI applications.

The first World and Human Action Model (WHAM), also called Muse, was developed by the Microsoft Research Game Intelligence and Teachable AI Experiences teams with Microsoft Xbox Game Studios.

Microsoft is currently open sourcing weights and sample data. Developers can also learn and experiment with the weights, sample data and WHAM Demonstrator -- a prototype with a visual interface for interacting with WHAM models -- on Azure AI Foundry. Azure AI Foundry is a platform that developers can use to build AI applications.

World models Microsoft's WHAM is the next set of foundation models in a generative AI market that already produces language models -- which imitate how humans write things -- and action models, which focus on applications or how people use things. WHAM is evolving from concepts from Google DeepMind, where models can simulate the world, said Omdia analyst Bradley Shimmin. "Muse is a continuation of that idea in world-building," Shimmin said. The models learn as they go without training and can change. This is a very big, new dimension in foundation models and AI. Dion HinchcliffAnalyst, Futurum Group Microsoft is not the first vendor to create a model like this. In January, Nvidia introduced world foundation models, or Nvidia Cosmos. Nvidia Cosmos was trained to understand the physical world. Nvidia trained Cosmos on hours of video footage of humans walking, hands moving and objects being manipulated. While Nvidia Cosmos is different from Microsoft's Muse, both models are important for building AI technology that can interact with any type of world, said Dion Hinchcliffe, an analyst at The Futurum Group. "This is a very big, new dimension in foundation models and AI," Hinchcliffe said.