your123 - stock.adobe.com

Databricks acquisition of Lilac targets GenAI development

The data platform vendor's latest purchase expands its generative AI development capabilities by adding improved access to unstructured data that can be used to train models.

Databricks has added a new item to its shopping cart, completing its fourth acquisition in nine months aimed at providing tools for generative AI development by purchasing Lilac AI.

Financial terms of the acquisition, completed Tuesday, were not disclosed.

Lilac AI is a 2023 startup based in Boston whose platform is designed to enable data scientists to search, cluster and analyze all types of text data sets. In particular, the vendor's tools are aimed at understanding and preparing unstructured data to feed and train generative AI models and applications.

Databricks, meanwhile, is a data platform vendor that helped pioneer the data lakehouse for data storage. Over the past year, Databricks has prioritized generative AI, acquiring companies and partnering with vendors such as Mistral AI to provide customers with tools to build and manage AI models.

Databricks' acquisition of Lilac AI comes seven weeks after Databricks bought Einblick. In 2023, Databricks acquired Arcion and MosaicML.

Each added new capabilities to help customers build and train generative AI models and applications, with the $1.3 billion purchase of MosaicML leading to the launch of Databricks Mosaic AI, which is now Databricks' end-to-end platform for AI development.

The subsequent acquisitions of Arcion, Einblick and now Lilac AI each added capabilities to enhance the core capabilities that resulted from MosaicML.

As a result, the acquisitions have been strategically sound, according to David Menninger, an analyst at ISG's Ventana Research.

Databricks recognized early on the affinity between data and AI. The acquisitions have filled in gaps or rounded out capabilities in the Databricks portfolio. Everyone is racing to embrace GenAI, so adding more of these capabilities via acquisition is a way to accelerate the process.
David MenningerAnalyst, ISG's Ventana Research

"Databricks recognized early on the affinity between data and AI," he said. "The acquisitions have filled in gaps or rounded out capabilities in the Databricks portfolio. Everyone is racing to embrace GenAI, so adding more of these capabilities via acquisition is a way to accelerate the process."

Kevin Petrie, an analyst at Eckerson Group, likewise noted that Databricks has been strategic with its acquisitions.

The vendor has raised more than $3.5 billion in funding, most recently adding $500 million in September 2023. That has enabled Databricks to selectively acquire companies that help customers develop AI models that operationalize their own data.

"Databricks has taken a pretty methodical approach to helping companies broaden the data sets they manage, and the types of analytics projects that data feeds," Petrie said. "The key to GenAI success is enabling data and AI teams to implement language models that consume their own domain-specific data."

Additive capabilities

While the acquisition of MosaicML provided the foundation for Databricks' AI development capabilities, the purchase of Arcion added data ingestion tools that help feed the pipelines used to train models and applications. Einblick's acquisition then resulted in new natural language processing capabilities that enable non-technical users to work with data.

The addition of Lilac AI aims to enable Databricks customers to access and operationalize unstructured data to inform their data products, including generative AI models and applications.

Analytics has historically relied on structured data such as transactions and financial records to inform decisions. However, it is estimated that only 20% of all data is structured.

Much of the rest is unstructured, such as text, video, images and audio files. And just as analytics historically relied on structured data to inform decisions, analytics tools were historically unable to access unstructured data. Instead, that was loaded into a data lake and largely left unused, as enterprises based their decisions on only a small part of their whole operation.

Now, many enterprises are using vectors, which employ algorithms to assign numerical values to unstructured data to make it searchable, to gain access to their unstructured data.

Unstructured data is particularly important for generative AI, which requires as much data as possible to deliver accurate responses. The less data used to train generative AI, the more likely it is to hallucinate and deliver responses that are not only inaccurate but also sometimes close enough to a correct output as to be misleading and result in bad decisions.

Lilac AI specializes in enabling users to explore and operationalize text, extracting information from documents, PDFs, emails and other text-based elements that can be combined with other data to provide a more complete view of a given subject.

Specifically, the platform provides scalable capabilities that enable data scientists to explore existing data clusters. In addition, it uses human feedback to develop new data categories that can feed the retrieval-augmented generation pipelines to train AI models and applications.

Given that Lilac AI aims to enable text mining, the latest Databricks acquisition is significant, according to Menninger.

"Much of GenAI is about text data," he said. "Working with text data has typically been more difficult than working with structured data. The idea behind Lilac is to enable exploration and feature engineering on unstructured data, just like we do with structured data."

Petrie likewise noted that it's difficult to operationalize unstructured data. Adding tools that help users access and transform that data to make it functional is therefore important.

"It's a tricky process to take raw unstructured data and refine it into usable, trustworthy inputs for a language model," he said. "Lilac AI helps bring the data together, search it [and] add structure."

In addition, Lilac AI can help identify governance risks such as personally identifiable information, he continued. 

Beyond adding new access to text data, Databricks' acquisition of Lilac AI adds talent.

Lilac AI was founded by Daniel Smilkov and Nikhil Thorat, who each spent more than a decade at Google before starting Lilac AI. Both men are now joining Databricks. And given that Lilac AI was founded so recently and had not yet raised venture capital funding, it's likely that acquiring talent was just as important to Databricks as acquiring technology, according to Petrie.

"Lilac AI is pretty early-stage, so it appears that Databricks is buying talent and IP more than a market-ready product," he said. "They are right to move fast and get smart people on board, although this also means they're taking on some inevitable startup risk."

Talent, in fact, is one of Databricks' key considerations when making acquisitions, according to Chris Hecht, senior vice president of corporate development and product partnerships at Databricks.

Even before the technology, Databricks looks at a company's corporate culture to make sure it aligns with its own and that the talent that will be acquired along with a company's technology. Only after evaluating those two factors does Databricks more closely investigate the technology to see whether it augments the data platform vendor's existing capabilities.

"Alignment with our cultural principles … is the most important element of any acquisition, and obviously they need to meet Databricks' high talent bar," Hecht said. "We then evaluate whether the acquisition can either accelerate our product vision or allow us to expand into a high-priority market area."

In the case of Lilac AI, unstructured data evaluation for generative AI represented that expansion, he continued.

"For AI in particular, we focus on whether the acquisition would allow us to provide customers with more complete, end-to-end generative AI capabilities that harness the power of their own enterprise data," Hecht said.

The text mining process.
The latest Databricks acquisition adds text mining capabilities that can be used to feed and train AI models and applications.

Looking ahead

While the main focus of Databricks' recent acquisition spree has been to add capabilities that enable users to build generative AI models and applications, the vendor needs to also maintain its focus on traditional AI, according to Menninger.

Machine learning and predictive analytics have long been a focus for the data platform vendor. And though generative AI has been the dominant trend in data management and analytics since OpenAI launched ChatGPT in November 2022, traditional AI remains a critical way to inform the decisions that lead to business outcomes.

"Hopefully, Databricks will continue to support traditional, predictive modeling as well as GenAI," Menninger said. "Our research shows that enterprises are devoting about half of their budgets to each."

Though both are forms of AI, they deliver different types of value, he continued. Generative AI -- which includes such capabilities as NLP, code generation and text-to-code translation -- targets productivity. Predictive analytics targets decision-making.

"Both are important, but right now GenAI is getting all the attention," Menninger said.

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Data science and analytics

Data Management
SearchAWS
Content Management
SearchOracle
SearchSAP
Close