your123 - stock.adobe.com

Small language models an emerging GenAI force

Enterprises are unwilling to pay for large language models to accomplish simple business tasks with generative AI. They're looking at cheaper small language models.

The expense of using large language models on cloud providers is driving interest in models a fraction of the size to utilize generative AI in business.

The LLM powering GenAI services on AWS, Google Cloud and Microsoft Azure are capable of many processes, ranging from writing programming code and predicting the 3D structure of proteins to answering questions on nearly every imaginable topic.

The breadth of the capabilities is awe-inspiring, but taming such massive AI models with hundreds of billions of parameters is expensive. Enterprises are asking whether training a small language model (SLM) to power, for example, a customer service chatbot is more cost-effective.

"Our favorite customer quote is that generalized intelligence might be great, but I don't need my point-of-sale system to recite French poetry," said Devvret Rishi, CEO of startup Predibase, during a presentation this week at The Linux Foundation's AI.dev Summit in San Jose, Calif. Predibase provides software tools for training SLMs.

Devvret Rishi, co-founder and CEO, PredibaseDevvret Rishi

Over the last several months, Gartner has noticed an increase in the number of enterprise clients evaluating SLMs to reduce the expense of inference -- the complex process of training a GenAI model to produce useful responses to natural language questions.

"We have started to see customers come to us and tell us that they are running these enormously powerful, large models, and the inferencing cost is just too high for trying to do something very simple," Gartner analyst Arun Chandrasekaran said.

As an alternative, enterprises are exploring models with 500 million to 20 billion parameters, Chandrasekaran said.

"That's kind of the sweet spot," he said. "Those models are starting to gain traction, primarily on the back of their price performance."

SLMs for small jobs

SLMs can't match the breadth of tasks performed by Cohere; Anthropic's Claude; and OpenAI's GPT-4 on AWS, Google Cloud and Azure. However, SLMs trained on data for specific tasks, such as content generation from a specified knowledge base, show potential as a significantly less expensive alternative.

"Small models have limited model capacity. But if we concentrate their capacity on a specific target task, the model can achieve a decent improved performance," according to a paper from researchers at the University of Edinburgh in the United Kingdom and the Allen Institute for AI in Seattle.

In January, the consultancy Sourced Group, an Amdocs company, will help a few telecoms and financial services firms take advantage of GenAI using an open source SLM, lead AI consultant Farshad Ghodsian said. Initial projects include leveraging natural language to retrieve information from private internal documents.

Ghodsian experimented with FLAN-T5, an open source natural language model developed by Google and available on Hugging Face, to learn about SLMs. Ghodsian tested FLAN-T5's 248 million-parameter version.

"When you add resource document generation, it gives you way better results than using [LLMs], and it's a lot easier to run," he said. "You can even run it on a CPU. That's a big benefit."

Ghodsian used fine-tuning with retrieval augmented generation (RAG) to attain quality responses. RAG is an open source, advanced AI technique for retrieving information from a knowledge source and incorporating it into generated text.

"You get a really good answer from [FLAN-T5]," Ghodsian said. "Really good."

The potential of SLMs has attracted mainstream enterprise vendors like Microsoft. Last month, the company's researchers introduced Phi-2, a 2.7-billion-parameter SLM that outperformed the 13-billion-parameter version of Meta's Llama 2, according to Microsoft. The company has released Phi for research only.

SLM strengths, weaknesses

Providers of open source SLMs tout access to the models' inner workings as a crucial enterprise feature.

For example, users can access the parameters, or weights, that reveal how the models forge their responses. The inaccessible weights used by proprietary models concern enterprises fearful of discriminatory biases.

Another critical concern is data governance. Many organizations are worried about data leaks when fine-tuning a cloud-based LLM with sensitive information.

Our favorite customer quote is that generalized intelligence might be great, but I don't need my point-of-sale system to recite French poetry.
Devvret RishiChief product officer, Predibase

Open source technology also has its critics. In June, supply chain security company Rezilion reported that 50 of the most popular open source GenAI projects on GitHub had an average security score of 4.6 out of 10. Weaknesses found in the technology could lead to attackers bypassing access controls and compromising sensitive information or intellectual property, Rezilion wrote in a blog post.

Promising SLMs named by Chandrasekaran included Meta's Llama 2, the Technology Innovation Institute's Falcon, and Mistral AI's Mistral 7B and Mixtral 8x7B.

Mixtral 8x7B, which is in beta, has nearly 47 billion parameters but processes input and generates output at the speed and cost of a 13-billion-parameter model, according to Mistral. The French startup raised $415 million in funding this month, valuing the company at $2 billion.

Mistral's models and Falcon are commercially available under the Apache 2.0 license. Having a for-business certification is critical, Chandrasekaran said.

"We're starting to see more and more of these open source models being certified for commercial use, which is a pretty big deal for a lot of enterprises," he said.

Open source model providers have an opportunity next year as enterprises move from the learning stage to the actual deployment of GenAI.

"They're still deciding, but they're ready to jump as soon as January hits," Ghodsian said. "They've got new budgets and want to start implementing or at least do some [proofs of concept]."

Antone Gonsalves is an editor at large for TechTarget Editorial, reporting on industry trends critical to enterprise tech buyers. He has worked in tech journalism for 25 years and is based in San Francisco.

Dig Deeper on AI infrastructure