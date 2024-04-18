Meta on Thursday introduced the next generation of its widely used open source large language model.

The social media giant revealed that the first two models in the Llama 3 family (the 8B and 70B parameters models) are now available for broad use. Llama 3 400B-plus parameter models are still in training, but the vendor said it would release new models in the coming months.

The 8B and 70B models of Llama 3 have reduced false refusal (when an LLM rejects a legitimate prompt) rates. improved alignment (embedding human values and goals in an LLM), and more diversity in model responses compared to Llama 2, according to Meta.

Although the models -- which succeed models in the Llama 2 family released last year -- are text-based, Meta plans to make Llama 3 multilingual and multimodal, with a bigger context window in the future.

Open source LLMs and challenges Meta's release of the two Llama 3 models comes as more open source models enter the generative AI market. In March, Databricks introduced its open source model DBRX and claimed the model is faster than the Llama 2 70B model. French startup Mistral introduced its mixture of experts model Mixtral 8x22B in early April. Meanwhile, the Allen Institute of AI (AI2) recently updated its 7B parameter open language model OLMo 1.7-7B. "There are a lot more good open source options," independent AI analyst Mark Beccue said. While Meta has been leading the open source market, its latest model also highlights some of the challenges of open source LLMs. While Meta released the open source code for Llama 3, it did not disclose what training data it used to train the model. The social media giant revealed that Llama 3 is trained on more than 15 trillion tokens from publicly available data sources but did not say what those data sources are. This approach is different from other vendors, like AI2, which released the dataset it used for OLMo 1.7-7B. IBM also released the data sources it used to train its proprietary LLM Granite 13B. "One of the things that I think is important for open source is exposing or explaining your data," Beccue said. "The datasets that these models are trained on, when you're able to tell what that source is, then you could trace it in a way. That's much more open source." Knowing the datasets and recipes, or ways a model is created, is essential when evaluating the capabilities of the different models, said Luca Soldaini, senior applied research scientist at AI2. For example, if vendors claim a model can pass the MCAT medical school entrance exam at a higher rate than a percentage of the population, it's important to know if the model has seen the answers for the MCAT during training. That shows if the model is cheating. "To do proper science and discuss the opportunities of this model as well as the risk, we need to have more transparency of everything that goes into these models," Soldaini said.

The right partners Despite the lack of uniformity in the open source community about what makes a model truly open source, Meta's release of Llama 3 shows the emphasis open source providers are placing on partnering with the right vendors or platform providers. Meta said Llama 3 will soon be available on a variety of platforms, including AWS, Databricks, Google Cloud, Dell, Intel, Nvidia, Hugging Face, Microsoft Azure and IBM WatsonX. The partnerships also show the cross-pollination and cross-investment between the different model providers, Forrester Research analyst Rowan Curran said. The multi-model ecosystem also makes it feasible for enterprises to use open source models such as Meta Llama 3 in the enterprise context, Curran added. "When it comes to getting these in the hands of folks who want to build things that can be put into production, those partnerships really accelerate that part of the application development process," he said. Enterprises will rely more on these curated sets of models provided through model gardens or libraries to know what is safe to use, Curran added. "Llama 3 being released through these partners helps give more assurance to enterprises who might want to build these things that there is a certain level of vetting that's going on here," he said.