your123 - stock.adobe.com

News

Meta releases two Llama 3 models, more to come

The social media giant's new open source LLM is telling of the challenges in the open source market and its future ambitions involving multimodal and multilingual capabilities.

Esther Shittu, News Writer

Published: 18 Apr 2024

Meta on Thursday introduced the next generation of its widely used open source LLM.

The social media giant revealed that the first two models in the Llama 3 family, the 8B and 70B parameters models, are now available for broad use. Llama 3 400B-plus parameter models are still in training, but the vendor said it would release new models in the coming months.

The 8B and 70B models of Llama 3 have reduced false refusal, which is when an LLM rejects a legitimate prompt – rates; improved alignment, which is embedding human values and goals in an LLM; and more diversity in model responses compared to Llama 2, according to Meta.

Although the models, which succeed models in the Llama 2 family released last year, are text-based, Meta plans to make Llama 3 multilingual and multimodal with a bigger context window in the future.

Open source LLMs and challenges

Meta's release of the two Llama 3 models comes as more open source models enter the generative AI market.

In March, Databricks introduced its open source model DBRX and claimed the model is faster than the Llama 2 70B model.

French startup Mistral introduced its mixture of experts model Mixtral 8x22B in early April. Meanwhile, the Allen Institute for AI (AI2) recently updated its 7B parameter open language model OLMo 1.7-7B.

"There are a lot more good open source options," independent AI analyst Mark Beccue said.

While Meta has been leading the open source market, its latest model also highlights some of the challenges of open source LLMs.

While Meta released the open source code for Llama 3, it did not disclose what training data it used to train the model.

The social media giant revealed that Llama 3 is trained on more than 15 trillion tokens from publicly available data sources but did not say what those data sources are.

This approach is different from other vendors, such as AI2, which released the dataset it used for OLMo 1.7-7B.

IBM also released the data sources it used to train its proprietary LLM Granite 13B.

"One of the things that I think is important for open source is exposing or explaining your data," Beccue said. "The datasets that these models are trained on -- when you're able to tell what that source is, then you could trace it in a way. That's much more open source."

Knowing the datasets and recipes, or ways a model is created, is essential when evaluating the capabilities of the different models, said Luca Soldaini, senior applied research scientist at AI2.

For example, if vendors claim a model can pass the MCAT medical school entrance exam at a higher rate than a percentage of the population, it's important to know if the model has seen the answers for the MCAT during training. That shows if the model is cheating.

"To do proper science and discuss the opportunities of this model as well as the risk, we need to have more transparency of everything that goes into these models," Soldaini said.

The right partners

Despite the lack of uniformity in the open source community about what makes a model truly open source, Meta's release of Llama 3 shows the emphasis open source providers are placing on partnering with the right vendors or platform providers.

Llama 3 is currently available on Amazon SageMaker. Meta said Llama 3 will soon be available on a variety of platforms, including Databricks, Google Cloud, Dell, Intel, Nvidia, Hugging Face, Microsoft Azure and IBM Watsonx.

The partnerships also show the cross-pollination and cross-investment between the different model providers, Forrester Research analyst Rowan Curran said.

The multi-model ecosystem also makes it feasible for enterprises to use open source models such as Meta Llama 3 in the enterprise context, he added.

"When it comes to getting these in the hands of folks who want to build things that can be put into production, those partnerships really accelerate that part of the application development process," he said.

Enterprises will rely more on these curated sets of models provided through model gardens or libraries to know what is safe to use, Curran added.

"Llama 3 being released through these partners helps give more assurance to enterprises who might want to build these things that there is a certain level of vetting that's going on here," he said.

An ambitious future

Enterprises interested in Llama 3 can see that with the release of these two models, the social media giant has ambitious plans, Gartner Research analyst Arun Chandrasekaran said.

However, the new Llama models appear to some observers to have smaller context windows compared to OpenAI and Google Gemini models. Meta did not specify the size of context windows.

But the vendor said the models use a tokenizer with a vocabulary of 128,000 tokens that encodes language better than previous Meta models.

Meta also said it would release stronger overall capabilities for Llama 3 models, including multimodality for the soon-to-be-released 400B-plus parameter models and multilingual models.

That shows Meta's commitment to investing resources to keep its model open and develop top performing models as well as indicates it wants to maintain its lead position in open source generative AI, Chandrasekaran said.

"Llama 2 was a big pivotal moment in enterprise open source, and it looks like Meta wants to capitalize and continue on that trend," he said.

A larger context window will help the models handle bigger volumes of data for analysis and summarization, Futurum Group analyst Paul Nashawaty said.

Meta's approach of releasing smaller models now and larger ones later indicates a sense of urgency and the company's need to maintain mindshare in the market, Chandrasekaran said.

"The cadence or the pace of development is becoming very important in this space, which is why you're probably seeing a staggered launch from Meta," he added.

Releasing both small and large Llama 3 models also makes sense for enterprises that need different options depending on the use case, Nashawaty said.

"Having access to the relevant information to make informed decisions provides their end users with the data needed to fulfill the requests," he said.

Meta also has incorporated Llama 3 into its AI assistant, Meta AI.

Meta AI is available on Facebook, Instagram, WhatsApp, Messenger and the web.

Esther Ajao is a TechTarget Editorial news writer and podcast host covering artificial intelligence software and systems.

Meta releases two Llama 3 models, more to come

The social media giant's new open source LLM is telling of the challenges in the open source market and its future ambitions involving multimodal and multilingual capabilities.

Open source LLMs and challenges

The right partners

An ambitious future

Dig Deeper on AI technologies

Meta restructures AI division aiming for superintelligence

Belatedly, OpenAI releases 2 new open weight models

Meta prepares for gigawatt datacentres to power ‘superintelligence’

Meta Llama 4 explained: Everything you need to know

Open source LLMs and challenges

The right partners

An ambitious future

Related Resources

Dig Deeper on AI technologies

Meta restructures AI division aiming for superintelligence

Belatedly, OpenAI releases 2 new open weight models

Meta prepares for gigawatt datacentres to power ‘superintelligence’

Meta Llama 4 explained: Everything you need to know