Getty Images/iStockphoto

Snowflake targets enterprise AI with launch of Arctic LLM

The data cloud vendor's open source LLM was designed to excel at business-specific tasks, such as generating code and following instructions, to enable enterprise-grade development.

Snowflake on Wednesday launched Arctic, a new open source large language model designed to enable users to build enterprise grade AI models and applications.

The release of Arctic comes just under a month after Snowflake rival Databricks launched DBRX. Like Arctic, DBRX is an open source large language model (LLM) aimed at helping users more easily use AI to inform business decisions.

The release also comes about two months after CEO Frank Slootman stepped down and Sridhar Ramaswamy was named the data cloud vendor's new leader.

While Slootman joined Snowflake in 2019 to grow the company's business, including taking Snowflake public in September 2020 with a record-setting IPO, Ramaswamy came to Snowflake in May 2023 when the vendor acquired Neeva, a search engine vendor whose tools were fueled by generative AI (GenAI).

Ramswamy's ascension to CEO of Snowflake seemed to herald a new emphasis on AI, including generative AI. While Databricks was quick to embrace generative AI following OpenAI's launch of ChatGPT in late 2022, developing its first LLM in March 2023 and acquiring MosaicML for $1.3 billion -- along with smaller acquisitions -- to spur AI development, Snowflake was initially less aggressive.

The launch of Arctic now gives evidence to the vendor's commitment to providing users with the tools to develop AI models applications they can use to inform business-specific decisions, according to Doug Henschen, an analyst at Constellation Research.

"It's good to see Snowflake moving quickly under new CEO Sridhar Ramaswamy to catch up in the GenAI race," he said. "It's important for data platforms to cover all bases, including analytics, machine learning, AI and now GenAI. And it's particularly important for Snowflake, which is so associated with data warehousing and analytics, to step up on the AI and GenAI front where Databricks has a lead."

Based in Bozeman, Mont., Snowflake is a data cloud vendor whose platform enables users to query and analyze data.

In addition to Databricks, competitors include tech giants such as AWS, Google Cloud, IBM, Microsoft and Oracle who also provide customers with a wide array of tools to manage and analyze data, including environments where developers can build AI models and applications.

Snowflake CEO Sridhar Ramaswamy.
Snowflake CEO Sridhar Ramaswamy speaks during a virtual press conference to introduce Arctic, the vendor's new open source large language model.

Arctic launch

Better decision-making and improved efficiency are two of the great promises of generative AI in the enterprise.

LLMs have vast vocabularies that previous natural language processing tools lacked. Those vast vocabularies let users interact with data in true natural language rather than code or business-specific language. By reducing the need to know code and lowering data literacy requirements, more employees within organizations can work with data, which leads to more informed decisions.

In addition, LLMs can generate code on their own and be trained to automate processes that previously had to be performed manually, making even trained data experts more productive.

"We have all lived by rigid, frustrating, difficult-to-use interfaces to computers and software for the better part of 50 years," Ramaswamy said on Monday during a virtual press conference. "AI is changing that because, for the first time, we can converse in fluent natural language and have the underlying software understand what we are saying."

LLMs, however, do not have any understanding of a specific business.

Whether proprietary models such as ChatGPT from OpenAI and Google Gemini or open source models such as Arctic and DBRX, LLMs are trained on public data. They can respond to intricate questions about historical events and generate entire treatises about those events on command, but they have no idea whether an individual business' sales are trending up or down.

To understand details about that business -- to be able to accurately respond to queries and enable users to make informed decisions -- models must be trained using that business' proprietary data. That can be done by either fine-tuning an existing LLM with proprietary data or developing a smaller, domain-specific model from the outset.

For those that elect to fine-tune rather than build from scratch, Snowflake has now added Arctic as a choice. In addition, by developing its own LLM, Snowflake is joining a trend that includes Databricks and Google, according to David Menninger, an analyst at ISG's Ventana Research.

"AI depends on data. As a result, all data platform vendors are investing in building out AI capabilities coupled with their data platforms," he said. "Initially, most vendors partnered with the early leaders in the LLM market. ... Now we're starting to see some of the data platform vendors offer their own LLMs."

Beyond simply adding a new LLM to the mix, there is a real advantage to Snowflake customers using a generative AI model built by Snowflake, Menninger continued. By using an LLM that exists within the same secure environment they store their data, customers do not have to move their data to an outside entity or import an outside entity into the Snowflake environment.

Integrating data with outside entities, irrespective of security measures, increases the risk of a data breach.

"Information architectures are complex," Menninger said. "Any time you can reduce the number of moving parts through tighter integration, it is a potential improvement for customers. In addition, it is always easier to get support for a single vendor solution than a multivendor solution."

Henschen likewise noted that Arctic provides a security advantage for Snowflake customers.

"Many customers want assurance that their data is not going outside the platform for training purposes," Henschen said. "Small, open source models can be used in much the same secure, platform-native way. But having a 'house brand' option still appeals from a security and cost perspective."

Before introducing Arctic, Snowflake established a partnership with Mistral AI and enabled integrations with other LLMs, such as Llama 2 and Document AI, through Snowflake's Cortex, a fully managed environment for AI development that is still in preview.

The vendor will still enable customers to use those LLMs. And given that some LLMs are better at certain tasks than others, many Snowflake customers might choose to integrate with a generative AI model built by another vendor.

For example, generative AI vendor Reka does well with multimodal AI, said Baris Gultekin, Snowflake's head of product for AI, during the media event.

Arctic was designed to be particularly good at enterprise applications such as SQL code generation and instruction following given that it was built to serve the needs of businesses rather than the public, Gultekin continued.

"LLMs in the market do well with world knowledge, but our customers want LLMs to do well with enterprise knowledge," he said.

Performance and productivity

Whenever a new LLM is launched, the performance of that LLM compared with existing models is crucial.

If the new LLM isn't as effective, it won't attract users.

Snowflake did not provide independent documentation. But according to the vendor's own benchmark testing, Arctic was similar to other open source models such DBRX and Llama3 70B in generating SQL and other coding, following instructions, performing math, and applying common sense and knowledge.

The new model led the others in coding and was competitive in each of the other categories, according to Snowflake.

Independent testing, however, would be more instructive, according to Henschen. So would early customer success stories that demonstrate the quality of Arctic's performance when applied to real-world scenarios.

Menninger, meanwhile, noted that each new model that is introduced seems to outperform the last.

"Vendors are playing leapfrog right now, which is great for customers," he said. "We are seeing improvements in accuracy, cost of training and cost of inferencing."

Perhaps just as important as performance benchmarks is that Arctic was built with a mixture-of-experts architecture to enable the performance efficiency that can lead to cost savings.

Cloud computing has led to a surge in computing costs in recent years. As a result, performance efficiency has become a critical means of helping enterprises control spending. Now AI development is causing a similar surge in spending.

A massive amount of data is needed to train AI models. Without enough data, AI models are prone to inaccurate results. In the case of generative AI models, which are already susceptible to hallucinations that can mislead users, those false results can lead to significant harm both to an organization's balance sheet as well as its reputation.

With Arctic, Snowflake is attempting to provide an open source generative AI model that not only matches the performance of other LLMs but also does so less expensively.

According to Gultekin, Arctic can perform model training tasks while activating a low number of parameters.

"The high training efficiency of Arctic means [users] can train custom models in a much more affordable way," he said.

No matter how well Arctic performs compared to other open source models and how cost effective it might be, the perception remains that Snowflake trails its closest rival in enabling customers to build AI applications.

Snowflake has been a more aggressive developer of tools aimed at helping customers create AI applications since Slootman stepped down and Ramaswamy became CEO. But even with the launch of Arctic, Snowflake will not be providing a complete environment for AI development until Cortex is made generally available, according to Henschen.

We have all lived by rigid, frustrating, difficult-to-use interfaces to computers and software for the better part of 50 years. AI is changing that because, for the first time, we can converse in fluent natural language.
Sridhar RamaswamyCEO, Snowflake

Cortex, first unveiled in November 2023, is a managed service that includes access to LLMs, vector search to help customers discover the data needed to train AI models, universal search across an organization's data environments and an AI assistant.

The suite is scheduled to be generally available in less than two weeks, according to Ramaswamy. However, software is not officially backed by a vendor for production use until it's generally available. Without that backing, many customers are unwilling to use software, Henschen noted.

"It's important to note that Snowflake Cortex … is still in private preview at this point," he said.

Snowflake, however, is moving in the right direction with respect to AI, according to Menninger.

In about eight weeks under Ramaswamy's leadership, Snowflake has partnered with Mistral AI, introduced data clean rooms to keep sensitive data private and developed its first LLM. Such moves, along with the pending general availability of Cortex, demonstrate that Snowflake understands that enabling AI development is critical to retaining customers and attracting new ones.

"Snowflake was already committed to AI and GenAI," Menninger said. "However, these new announcements suggest they realize that AI and GenAI [represent] a control point within their accounts, and they don't want to give up that control point."

Next steps

With Arctic now developed and launched, Snowflake would be wise to make AI governance a priority, according to Menninger.

Just as data needs to be governed to both protect organizations from violating regulations while also enabling employees to confidently work with data, AI now needs the same guidelines within organizations.

Without proper governance of AI, organizations will have to limit its use to make sure they don't accidentally violate quickly changing AI regulations. As a result, they won't get AI's full benefits.

"AI governance will continue to be an issue for all enterprises and therefore software vendors," Menninger said. "We expect that through 2026, model governance will remain a significant concern for more than one half of enterprises, limiting the deployment and, therefore, the realized value of AI and machine learning models."

Henschen, meanwhile, said that with Arctic now available to compete with DBRX and other open source LLMs, the focus turns to Cortex and seeing more evidence of Arctic's actual performance.

For customers to feel secure about developing AI models and applications with Snowflake's capabilities as the vendor begins to deliver on its promise to provide users with the tools to develop AI applications, the vendor needs to do more to back its AI environment.

"It's still early days for Snowflake's AI, machine learning and GenAI journey," Henschen said. "As a customer, I'd want the assurance of the general availability of Cortex and proof points and knowledge of indemnification and risks around using Arctic."

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Data science and analytics

Data Management
Content Management