Snowflake on Wednesday unveiled a host of new tools under development designed to help customers build and govern generative AI and machine learning models, including a managed service that provides access to the vendor's large language models.
The features were introduced during Snowday 2023, a virtual event for Snowflake users. Most are in various stages of the preview process and the tools together represent Snowflake's roadmap for the first half of 2024.
Based in Bozeman, Montana, but with no central headquarters, Snowflake is a data cloud vendor whose platform enables users to query and analyze data without many of the extract, transform and load processes that often slow data management and analysis.
For much of the past two years, one of Snowflake's main priorities was developing industry-specific versions of its platform tailored for customers in such industries as financial services, healthcare and telecommunications.
In recent months, however, like most other data management and analytics vendors, Snowflake has made generative AI a focal point of its product roadmap.
In May, the vendor acquired Neeva to add generative AI capabilities. A month later, Snowflake unveiled containerization capabilities aimed at enabling users to access generative AI software and the private preview of its own large language model (LLM) named Document AI.
New capabilities introduced at Snowday include tools resulting from Snowflake's acquisition of Neeva, further advancement of Document AI's development and numerous other AI and machine learning (ML) features.
Focus on AI
Snowflake's product development plans center around AI success, developing a strong data foundation and scaling with applications, said Christian Kleinerman, the vendor's senior vice president of product, during a virtual press conference on Oct. 26.
With respect to AI success, one of Snowflake's aims is to integrate AI/ML throughout its data cloud, according to Sridhar Ramaswamy, Neeva's co-founder and now Snowflake's senior vice president of AI. It also aims to enable customers to build their own AI/ML models using their own data.
Toward those ends, Snowflake is developing Cortex, a fully managed service now in private preview that provides users with access to LLMs, including Document AI; AI models; and vector search capabilities. In addition, Cortex provides an AI copilot and enables universal search across different data environments within an organization.
As a result of its potential to enable use of generative AI, David Menninger, analyst at Ventana Research, called Cortex the most significant of all the capabilities Snowflake unveiled during its virtual event.
"Like every software vendor, Snowflake is trying to play a more active role in the artificial intelligence and machine learning world," he said. "Cortex provides a foundation for Snowflake customers to use LLMs and generative AI more easily. The new features enable organizations to combine a variety of LLMs including specialized, domain specific LLMs."
Doug Henschen, an analyst at Constellation Research, similarly said Cortex will be a significant addition for Snowflake customers once it's made available.
He noted, however, that almost everything Snowflake unveiled during Snowflake's virtual event is still in private preview stage. That implies it could be many months -- perhaps not until the Snowflake Summit conference in June 2024 -- before the capabilities become part of the vendor's public offerings.
"Cortex is entirely in private preview. But we're seeing the vision for how generative AI and more conventional AI/ML will be made available on the Snowflake platform," Henschen said.
Among Cortex's capabilities are the following:
- Specialized Functions, a tool that allows users to access existing LLMs and AI models to accelerate analysis.
- General-Purpose Functions, a set of conversational capabilities that translate SQL text to code so users can "converse" with their data and then contextualize responses with vector search and vector embedding.
- Snowflake Copilot, an LLM-powered assistant that enables natural language query and coding.
- Universal Search, a tool inherited from Neeva that lets users find relevant data across databases and other potentially disparate data storage repositories.
- Document AI, an LLM that helps users extract data from text in documents.
All are in private preview. But once generally available, Cortex's great potential is to expand use of data and make anyone who works with data more productive in their role, according to Ramaswamy.
"By bringing the core functionality right into Snowflake, we have vastly increased the scope of what every user of Snowflake can do," Ramaswamy said. "This democratizes access to language models without doing a lot of programming."
In particular, natural language processing capabilities such as the Snowflake Copilot and General-Purpose Functions stand to make data workers more productive, he continued.
"Many large companies literally have thousands of SQL analysts that sit all day long and write SQL for a living," he said. "We think we're going to be making them significantly more productive. Just as importantly … [it's going] to make it much easier for new people to get going with SQL and with data."
Governance on the Horizon
While Cortex is a critical part of Snowflake's plan to foster AI success, Horizon is a key part of the vendor's plan to help customers develop a strong foundation for the data that informs AI.
Horizon is Snowflake's new governance layer, unifying the vendor's compliance, security, privacy, interoperability and access capabilities in one environment.
Its introduction is designed to simplify data governance by bringing previously disparate capabilities together. With Horizon, data administrators don't have to navigate numerous different tools within Snowflake to oversee their organization's data -- something Kleinerman called one of the vendor's most significant foundational developments.
"We're simplifying how we structure [governance] and introducing the next generation of technologies," he said. "All of this [enables] customers to have a strong data foundation for AI, generative AI and machine learning."
AI/ML models, including generative AI models, require a data foundation of both high data quality as well as complete data to generate accurate outputs.
Despite their artificial intelligence, the models can't differentiate between accurate and inaccurate data, so a model needs to be trained with accurate data to deliver accurate responses. Also, generative AI models deliver query responses regardless of whether they have the right data to inform those responses. To reduce the frequency of AI hallucinations that result when models don't have the data to correctly respond to a query, the data used to train the model must be complete.
Therefore, beyond bringing previously disparate data governance capabilities together to simplify administrative tasks, Snowflake is also adding new tools aimed at making data governance more comprehensive.
They include the following:
- Data Quality Monitoring that enables users to measure and record data quality metrics to make sure they're using good data to train models and inform decisions.
- Data Lineage UI so customers can see their data lineage and how data used in one part of the analytics process might affect that same data's usage later on.
- Trust Center to centralize cross-cloud security and compliance monitoring in a single location.
- New data classifications to help administrators define sensitive data.
- New privacy policies to protect sensitive data.
- New certifications, including compliance with the United Kingdom's Cyber Essentials Plus, the FBI's Criminal Justice Information Services and StateRAMP High, and the U.S. Department of Defense Impact Level 4 Provisional Authorization on AWS GovCloud.
The new certifications and data classifications are generally available, while Data Quality Monitoring and Data Lineage UI are in private preview. The privacy policies and Trust Center have not yet reached the preview stage.
The move to unify data governance capabilities, meanwhile, may be a response to competitors Databricks and Google that have recently unified some of their own data governance capabilities, according to Henschen.
"Horizon is … partly a response to competitors like Databricks and Google that have put cataloging at the forefront," Henschen said. "Snowflake already had a catalog and multiple governance capabilities. But they're now pulling everything together under the Horizon umbrella and providing a clearer and more comprehensive vision of what's ahead."
Doug HenschenAnalyst, Constellation Research
In addition to the launch of Horizon, new data foundation capabilities include support for Apache Iceberg tables and a new tool that helps customers manage the cost of using Snowflake.
Support for Iceberg tables, which will soon be in public preview, is aimed at helping customers unite their data within Snowflake's data cloud to both prevent it from getting isolated as well as enable emerging data architectures such as data mesh.
The vendor's Cost Management Interface, now in private preview, is designed to help users better predict and manage cloud computing costs that can quickly exceed expectations given the amount of time it takes to run certain workloads and the compute power required to run them.
In fact, given that Horizon is largely a repackaging of existing capabilities and the cost of cloud computing is a growing concern, the Cost Management Interface may be more attractive to some Snowflake users than the governance environment, according to Menninger.
"Customers may be more excited about the cost management interface that is being introduced," he said. "Cost management has been a concern with customers getting surprised by their Snowflake bills. These new capabilities will make it easier to track and manage costs and therefore avoid surprises."
Other new capabilities
Snowflake's third product development theme is enabling organizations to scale with applications. As a result, the vendor unveiled new tools for Snowpark, Snowflake's platform for developers.
Snowflake Notebooks provide users with a new programming environment in which Python and SQL coders can work with data whileSnowpark ML Modeling API simplifies model development in Snowflake with prebuilt frameworks.
In addition, new features in Snowpark ML Operations include a model registry, so users can more easily deploy and manage models in Snowflake, as well as a store, so customers can manage and monetize their models.
All are in various stages of preview.
As far as Snowflake's overall product development roadmap is concerned, Henschen said that the vendor's focus on AI success, developing a data foundation and scaling with applications is appropriate.
However, the fact that most of the tools that make up the roadmap aren't even in the public preview stage suggests that Snowflake's AI/ML development may trail its peers, he continued.
"It's a good set of announcements," Henschen said. "But with so many elements being in private preview, my sense is that both Databricks and Google are ahead of [Snowflake] on having AI, machine learning and generative AI capabilities within the platform and supporting customers who want to develop their own AI, machine learning and generative AI capabilities."
Meanwhile, Henschen and Menninger noted that Snowflake unveils products sooner in the development cycle than many other vendors, which can be a bit misleading.
As a result, both said Snowflake should wait before introducing new features that may not be available until next summer.
"Previews are great for developing robust capabilities but can be a little confusing to customers trying to understand which features are in what stage of development," Menninger said.
Henschen was more direct. "I'd like to see either a shorter lag time between private preview and public preview stages or a shift by Snowflake toward only announcing capabilities that are nearly ready for public preview."
Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.