Databricks is making AI -- specifically generative AI -- a main part of its platform.
To reflect that shift, the vendor has unveiled the Data Intelligence Platform. The new offering combines the capabilities of Databricks' Data Lakehouse Platform, which had been the vendor's flagship offering, with generative AI capabilities, including those that Databricks inherited from its $1.3 billion acquisition of MosaicML in June 2023.
Databricks introduced the Data Intelligence Platform, now in private beta testing, in a blog authored by co-founder and CEO Ali Ghodsi and seven others posted on Nov. 15. Other participants in the blog entry included co-founders Arsalan Tavakoli-Shiraji, Patrick Wendell, Reynold Xin and Matei Zaharia.
The vendor declined to give a timetable for generative availability and said pricing details for the new platform won't be revealed until it is ready for use by all customers.
Once it is released, the Data Intelligence Platform will represent a logical evolution of Databricks' capabilities, according to Kevin Petrie, an analyst at Eckerson Group.
Kevin PetrieAnalyst, Eckerson Group
"Data management vendors across the ecosystem are integrating generative AI into their tools so that data teams can prepare and deliver data more easily to all analytical projects," he said. "We've seen this trend among BI tools, catalogs and data pipeline tools, among other segments. So Databricks is wise to integrate generative AI across its data stack as well."
Compared with its peers such as Snowflake and tech giants including AWS and Microsoft, Databricks is in a strong position to infuse its existing capabilities with generative AI as well as enable customers looking to use tools from other vendors that best serve their needs, Petrie continued.
"Databricks' acquisition of MosaicML, coupled with its heritage in data science, gives Databricks a home-field advantage," he said. "There will, of course, remain a market for platform-agnostic tools … because most companies manage their data on more than one platform. And Databricks continues to support this broader ecosystem of tools."
Based in San Francico, Databricks was one of the pioneers of the data lakehouse, a data storage repository that combines the structured storage capabilities of data warehouses with the unstructured storage capabilities of data lakes.
The lakehouse format enables organizations to store all their data in one place so it can more easily be queried and analyzed.
In the years since Databricks initially launched its lakehouse, the vendor added a spate of industry-specific versions of its platform and critical capabilities such as a data catalog and machine learning model training.
But lakehouses still have their shortcomings, the authors noted in the blog post. Among them, lakehouses require significant technical skills that make the technology inaccessible to many employees, and relevant data is often difficult to find because of the massive quantity of data many organizations now collect.
To address those problems, Databricks is developing the Data Intelligence Platform, combining its longstanding lakehouse capabilities with AI.
While analytics has become a critical decision-making tool, helping organizations make informed decisions based on data, analytics has also largely remained the domain of data experts.
One of the goals of many analytics vendors is make their tools -- and by extension, data -- usable by any worker who can benefit from data in their role. But despite their intentions, the use of BI tools has been stagnant for decades.
The simple reason is that even as vendors have developed low-code/no-code capabilities and added natural language processing (NLP) tools to their platforms, the platforms themselves remain complex. They take training even for modest self-service use, and any deep analysis takes the expertise of a data scientist or trained analyst.
The same is true for data management, whether curating data, integrating and governing it, or undertaking some other task that prepares data for analysis.
Generative AI has the potential to help data management and analytics vendors reduce or eliminate the complexity that has hindered use of their platforms.
Generative AI large language models (LLMs) boast extensive vocabularies and text-to-code translation capabilities that enable users to query and model data using conversational language rather than the business-specific language required by previous NLP tools. That not only enables more access to the tools but also helps experts work more efficiently by eliminating time-consuming tasks.
In addition, generative AI can be trained to understand an organization's data, which leads to improved data discovery and ultimately better-informed decisions.
As a result, Databricks is making generative AI key to its new platform.
The vendor defines data intelligence as the deployment of AI models to understand the semantics of enterprise data. Databricks' Data Intelligence Platform, therefore, is the combination of that enterprise data stored in its lakehouse with AI to understand that data.
According to the authors of the blog post, the most challenging problems data management and analytics platforms have historically faced include the following:
- A technical skills deficit, given that querying data requires knowledge of coding languages such as SQL and Python.
- Lack of data accuracy and difficulty discovering the most relevant data to inform a given model, report or dashboard, which leads to time-consuming planning, curation and preparation.
- Spiraling costs when not managed by highly trained technical experts.
- Difficulty keeping up with changing governance and privacy regulations.
- Separation of data storage environments and model training environments, which will become a greater issue as more organizations desire LLMs trained on the own data that answer domain-specific questions.
While it won't be known how well the Data Intelligence Platform solves those problems until it's available to customers, the combination of Databricks' lakehouse with generative AI makes sense as a way to overcome barriers that have made data management and analytics difficult, according to Donald Farmer, founder and principal of TreeHive Strategy.
"This is a good effort at addressing the existing challenges in data management that they identify themselves," he said. "Using AI to curate, catalog and automate this is a promising step."
Data intelligence, in general, includes natural language interactions, semantic cataloguing and discovery, automated usage optimization, support for AI workloads and automatically updated governance and privacy, according to Databricks.
Databricks already provided a unified governance layer and a unified query engine that includes machine learning, business intelligence and extract, transform and load capabilities.
Now, beyond natural language query and modeling, the vendor is adding a data intelligence engine called DatabricksIQ that includes MosaicML's generative AI model generation tools. DatabricksIQ integrates directly with Mosaic AI, which is Databricks' AI platform, to provide capabilities that combine data with AI systems, including the following:
- Retrieval-augmented generation (RAG) through the Databricks Vector Database to find the right data for building customized models.
- Custom model-training on an organization's data or retraining existing models to improve AI applications with domain-specific understanding.
- Connection with Databricks' Unity Catalog data governance and data quality monitoring features.
- End-to-end MLOps with all data that gets produced automatically monitored in the Databricks lakehouse.
Among the capabilities included in the Data Intelligence Platform, those that promote data management and data integration such as RAG and connection with the Unity Catalog stand out, according to Farmer.
"Compared to other vendors, I like the emphasis on the data management and data integration tasks where others are implementing more front-end capabilities like search, query and natural language explanation," he said. "This is difficult work that often does not get the attention of other, more visible efforts by other vendors. "
The result will be that data experts will become more productive while new users will be able to use data management tools more quickly than they could with Databricks' current Lakehouse Platform, Farmer continued.
"Experts will find a lot of common tasks automated are now largely automated and efficient," he said. "Entry-level users will become more productive more quickly."
Petrie similarly highlighted improved efficiency as a benefit of the Data Intelligence Platform, particularly as a result of natural language processing.
"This reflects the symbiotic relationship of AI -- especially generative AI -- and data management," he said. "AI simplifies data management with a natural language interface and task automation. And data management supports AI by feeding the necessary inputs to models and algorithms. On both counts, innovations like this help companies improve productivity and innovate."
The road ahead
With the Data Intelligence Platform in beta testing and Databricks demonstrating how it is combining MosaicML's AI features with its own pre-existing capabilities, data governance needs to be a key part of the platform's roadmap, according to Petrie.
Data quality is imperative to analytics. Without good data, results can get skewed and lead to misinformed decisions.
The same is true for training AI models, particularly generative AI models that can deliver responses that seem logical but are actually incorrect if not trained on enough data or completely accurate data.
"The major challenge facing all companies, especially AI adopters, is data governance," Petrie said. "Many generative AI projects depend on text, which historically is not well governed. I'll be interested to see how Databricks helps companies improve their governance of text and other types of multi-structured data."
Farmer, meanwhile, noted that Databricks has room to improve the Data Intelligence Platform in various ways.
Among them are integration with other platforms and tight data privacy and security controls.
"What next … could be a long list," Farmer said. "Further development of AI models for more sophisticated analysis and predictive analytics -- I would love to see simulation and generation of test data. User experiences that cater to a wider range of user expertise. Greater compatibility with external data sources and platforms. And AI-enabled security and privacy features will be essential."
Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.