your123 - stock.adobe.com

News

Small language models an emerging GenAI force

Enterprises are unwilling to pay for large language models to accomplish simple business tasks with generative AI. They're looking at cheaper small language models.

Antone Gonsalves

By

Antone Gonsalves, Editor at Large

Published: 15 Dec 2023

The expense of using large language models on cloud providers is driving interest in models a fraction of the size to utilize generative AI in business.

The LLM powering GenAI services on AWS, Google Cloud and Microsoft Azure are capable of many processes, ranging from writing programming code and predicting the 3D structure of proteins to answering questions on nearly every imaginable topic.

The breadth of the capabilities is awe-inspiring, but taming such massive AI models with hundreds of billions of parameters is expensive. Enterprises are asking whether training a small language model (SLM) to power, for example, a customer service chatbot is more cost-effective.

"Our favorite customer quote is that generalized intelligence might be great, but I don't need my point-of-sale system to recite French poetry," said Devvret Rishi, CEO of startup Predibase, during a presentation this week at The Linux Foundation's AI.dev Summit in San Jose, Calif. Predibase provides software tools for training SLMs.

Devvret Rishi, co-founder and CEO, Predibase

Devvret Rishi

Over the last several months, Gartner has noticed an increase in the number of enterprise clients evaluating SLMs to reduce the expense of inference -- the complex process of training a GenAI model to produce useful responses to natural language questions.

"We have started to see customers come to us and tell us that they are running these enormously powerful, large models, and the inferencing cost is just too high for trying to do something very simple," Gartner analyst Arun Chandrasekaran said.

As an alternative, enterprises are exploring models with 500 million to 20 billion parameters, Chandrasekaran said.

"That's kind of the sweet spot," he said. "Those models are starting to gain traction, primarily on the back of their price performance."

SLMs for small jobs

SLMs can't match the breadth of tasks performed by Cohere; Anthropic's Claude; and OpenAI's GPT-4 on AWS, Google Cloud and Azure. However, SLMs trained on data for specific tasks, such as content generation from a specified knowledge base, show potential as a significantly less expensive alternative.

"Small models have limited model capacity. But if we concentrate their capacity on a specific target task, the model can achieve a decent improved performance," according to a paper from researchers at the University of Edinburgh in the United Kingdom and the Allen Institute for AI in Seattle.

In January, the consultancy Sourced Group, an Amdocs company, will help a few telecoms and financial services firms take advantage of GenAI using an open source SLM, lead AI consultant Farshad Ghodsian said. Initial projects include leveraging natural language to retrieve information from private internal documents.

Ghodsian experimented with FLAN-T5, an open source natural language model developed by Google and available on Hugging Face, to learn about SLMs. Ghodsian tested FLAN-T5's 248 million-parameter version.

"When you add resource document generation, it gives you way better results than using [LLMs], and it's a lot easier to run," he said. "You can even run it on a CPU. That's a big benefit."

Ghodsian used fine-tuning with retrieval augmented generation (RAG) to attain quality responses. RAG is an open source, advanced AI technique for retrieving information from a knowledge source and incorporating it into generated text.

"You get a really good answer from [FLAN-T5]," Ghodsian said. "Really good."

The potential of SLMs has attracted mainstream enterprise vendors like Microsoft. Last month, the company's researchers introduced Phi-2, a 2.7-billion-parameter SLM that outperformed the 13-billion-parameter version of Meta's Llama 2, according to Microsoft. The company has released Phi for research only.

SLM strengths, weaknesses

Providers of open source SLMs tout access to the models' inner workings as a crucial enterprise feature.

For example, users can access the parameters, or weights, that reveal how the models forge their responses. The inaccessible weights used by proprietary models concern enterprises fearful of discriminatory biases.

Another critical concern is data governance. Many organizations are worried about data leaks when fine-tuning a cloud-based LLM with sensitive information.

Our favorite customer quote is that generalized intelligence might be great, but I don't need my point-of-sale system to recite French poetry.

Devvret RishiChief product officer, Predibase

Open source technology also has its critics. In June, supply chain security company Rezilion reported that 50 of the most popular open source GenAI projects on GitHub had an average security score of 4.6 out of 10. Weaknesses found in the technology could lead to attackers bypassing access controls and compromising sensitive information or intellectual property, Rezilion wrote in a blog post.

Promising SLMs named by Chandrasekaran included Meta's Llama 2, the Technology Innovation Institute's Falcon, and Mistral AI's Mistral 7B and Mixtral 8x7B.

Mixtral 8x7B, which is in beta, has nearly 47 billion parameters but processes input and generates output at the speed and cost of a 13-billion-parameter model, according to Mistral. The French startup raised $415 million in funding this month, valuing the company at $2 billion.

Mistral's models and Falcon are commercially available under the Apache 2.0 license. Having a for-business certification is critical, Chandrasekaran said.

"We're starting to see more and more of these open source models being certified for commercial use, which is a pretty big deal for a lot of enterprises," he said.

Open source model providers have an opportunity next year as enterprises move from the learning stage to the actual deployment of GenAI.

"They're still deciding, but they're ready to jump as soon as January hits," Ghodsian said. "They've got new budgets and want to start implementing or at least do some [proofs of concept]."

Antone Gonsalves is an editor at large for TechTarget Editorial, reporting on industry trends critical to enterprise tech buyers. He has worked in tech journalism for 25 years and is based in San Francisco.

Next Steps

Allen Institute for AI launches open multimodal models

Dig Deeper on AI infrastructure

Search Business Analytics

Yellowfin boosts analytics suite with new NLQ capabilities
The vendor's latest update adds features that bring it more in line with competitors that have been faster to move beyond ...
Build a business intelligence team to optimize data use
Leaders who want to protect data investments must build a strategic business intelligence team with five core roles: the expert, ...
Improving business forecasting with synthetic data and simulation modeling
Synthetic data and simulation forecasting help executives overcome data constraints, test scenarios and strengthen strategic ...

Search CIO

12 top business process management tools for 2026
BPM platforms are becoming a business transformation engine as vendors infuse their tools with powerful AI and automation ...
What Big Tech's AI spending means for your IT budget
Hyperscalers are spending billions on AI. CIOs can't match that scale -- but they can adopt smarter budgeting strategies to ...
Top CIO conferences, according to the CIOs who attend them
CIOs highlight the conferences they prioritize, including Gartner forums, Dreamforce, AI Summit and SC. These events help them ...

Search Data Management

Confluent adds A2A support to fuel multi-agent AI networks
Including the open protocol enables users to build an orchestrated network of collaborative agents and could help the vendor ...
18 top big data tools and technologies to know about in 2026
Numerous tools are available to use in big data applications. Here are 18 popular open source big data technologies, with details...
Real-time data streaming for AI: invest where it matters
Don't let batch processing lead to missed opportunities. Build AI systems for continuous data flows that deliver instant ...

Search ERP

5 conditions for durable enterprise AI
Enterprise AI becomes durable when experimentation matures into governance, integration discipline and production-grade ...
How ERP and supply chain platforms protect profits
Working capital, forecast accuracy and supplier exposure are governed as much by ERP and supply chain design as finance policy, ...
Top 10 essential skills for ERP professionals in 2026
Both hard and soft skills are essential for ERP professionals, including project management and being up to date with technology.

Close