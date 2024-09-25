The Allen Institute for AI on Wednesday introduced Molmo, a family of open multimodal AI models.

Molmo can understand visual data from everyday objects and signs, the Allen Institute for AI (Ai2) said.

In a video, the nonprofit research institute shows a Molmo model understanding and responding to various images and objects. Users show Molmo an image such as a parking sign, ask it a question and the model can understand what the sign means.

Molmo models also can point to what they perceive. The models can point to UI elements on the screens for developers.

Ai2 said it plans to open up Molmo's language and vision training data, fine-tuning data, model weights and source code in the future.

However, some model weights, inference code and demos are available starting today.

The models come in different sizes; Molmo 72B, Molmo 7B-D, Molmo 7B-O and Molmo 1B-e.

The Molmo-1B model is tiny and can fit on most devices, Ai2 said.

Open models The introduction of Molmo highlights the small gap in popularity between open and closed models in the generative AI market. "Open in the world of AI is getting off to a running start in a way that open in, say, the operating system world didn't," Futurum Group analyst David Nicholson said. In the operating system market, it took years before open source systems like Android and Linux caught up to proprietary systems like Mac OS and Windows OS. In contrast, open source (in which the vendor releases source code) and near open source (or, just open) has already caught up with closed source in the generative AI market, with open models from Meta and independent generative AI vendor Mistral having gained popularity. For example, according to Ai2, its 72B Molmo model is on par with the OpenAI GPT 4o and Google Gemini 1.5 proprietary large language models (LLMs) in terms of performance. Typically, if a vendor is truly open, it compromises in performance, Nicholson said. "Unless they completely made this up, it's remarkable that they are willing to publish all of the information about their models while delivering the kind of performance that they claim that they are," Nicholson added.

Visual data Ai2’s willingness to make its data open is also noteworthy, Gartner analyst Arun Chandrasekaran said. "The more transparent companies are in this space, particularly academic institutions like the Allen Institute of AI, the better it is," Chandrasekaran said. Ai2’s focus on vision with the Molmo models able to point to and understand the outside world is the way for AI models to get better and smarter, Nicholson, of the Futurum Group, said. "Training these systems to understand what they ‘see’ is really, really critical to making them smarter," he said. Ai2 is also focusing on the ability of the models to act as autonomous agents. In another video presentation, a Molmo model made a food order and scheduled it for pickup. "If what these folks are saying is true, then this is a completely open set of tools that people can use to build their own agentic AI, not just generative LLMs," Nicholson said.