Getty Images

Databricks $1B-plus Tabular acquisition adds Iceberg support

The lakehouse specialist's latest purchase adds support for Apache Iceberg to its existing support for Delta Lake and is also a direct confrontation of rival Snowflake.

Databricks continued its recent buying spree with the acquisition of Tabular, a move that adds support for Apache Iceberg storage to Databricks' existing support for Delta Lake storage.

Databricks did not specify what it paid for Tabular but confirmed that the cost was more than $1 billion.

The deal, revealed on June 4 and Databricks' fifth acquisition in the past year, remains subject to due diligence but is expected to close during Databricks' fiscal second quarter, which ends on July 31.

Delta Lake and Apache Iceberg are the two main storage formats for data lakehouses, which are data storage repositories that combine the structured data storage capabilities of data warehouses with the unstructured data storage capabilities of data lakes.

Databricks was the developer of Delta Lake. The vendor made the storage format open source in 2019 and most recently updated it in June 2023. Apache Iceberg, meanwhile, is an open source storage format for large analytics tables and is often used in concert with Databricks rival Snowflake, which was holding its annual user conference earlier this week when Databricks unveiled the acquisition of Tabular.

Meanwhile, Databricks is set to hold its own conference, the Data+AI Summit, next week in San Francisco.

With Databricks providing support for both Delta Lake and Apache Iceberg once it completes its acquisition of Tabular, the acquisition is a competitive strike at Snowflake, according to Doug Henschen, an analyst at Constellation Research.

"The timing of this deal is obviously intended to grab some of the Snowflake Summit limelight but also to outdo its competitor on openness messaging with the suggestion that it will have huge influence over the future of the Iceberg standard," he said.

The technological intent, meanwhile, is to no longer lock enterprises in to either the Delta Lake or Apache Iceberg format by enabling the use of both on a single platform, according to Databricks.

"As one, we are going to lead the way with data compatibility so that you are no longer limited by which lakehouse format your data is in," the vendor said in a blog post about the acquisition co-authored by CEO Ali Ghodsi, Adam Conway, Arsalan Tavakoli-Shiraji and Reynold Xin.

Additive capabilities

Based in San Francisco, Databricks is a data platform vendor that was one of the pioneers of the lakehouse format. Over the past 18 months, Databricks has prioritized generative AI, launching two large language models and creating an environment for development.

Much of that environment for generative AI development resulted from acquisitions, starting with Databricks' $1.3 billion purchase of MosaicML in June 2023 and continuing with the subsequent acquisitions of Arcion, Einblick and Lilac AI.

Each added a strategic component with MosaicML becoming Mosaic AI, a platform for AI development, Arcion adding data ingestion capabilities, Einblick adding natural language processing technology and Lilac AI adding text analysis capabilities.

The acquisition of Tabular similarly adds capabilities Databricks did not previously support.

Based in San Jose, Calif., Tabular is 2021 startup that had raised $37 million in funding before reaching an agreement to be acquired by Databricks.

The company was founded by Ryan Blue, Jason Reid and Daniel Weeks. Blue and Weeks were the originators of the Iceberg project while working at Netflix and donated it to the Apache Software Foundation.

Tabular is an open table store whose SaaS-based data platform enables customers to use a single storage layer -- Apache Iceberg -- across various compute engines and frameworks, such as Flink, Snowflake, Spark and Trino.

By acquiring Tabular, Databricks customers will no longer be locked into the Delta Lake format and can use Iceberg as well, should they choose.

In addition, with Iceberg's developers now joining Databricks, perhaps the most significant outcome of the acquisition will be that the vendor will have influence within the Iceberg community, according to Henschen.

"The deal brings Databricks depth and influence in the Iceberg community, whereas it previously only had depth and obvious influence over Delta Lake," he said.

In addition, because Databricks already enables compatibility between Delta Lake and other storage formats, such as Hudi, with a tool named UniForm -- unveiled in June 2023 and just made generally available on June 3 -- the acquisition didn't come as a surprise as Databricks and Snowflake wage battle, Henschen continued.

"Databricks had already introduced three-way compatibility with Iceberg, Hudi and Delta Lake. So it wasn't a shock to see an in-your-face message acquisition in the ongoing … dueling between Snowflake and Databricks," he said.

Once the acquisition closes, Databricks will continue to enable compatibility between Delta Lake and Apache Iceberg through UniForm, according to Databricks. The vendor's long-term vision, however, is to combine the two in a single open data lakehouse.

The timing of this deal is obviously intended to grab some of the Snowflake Summit limelight but also to outdo its competitor on openness messaging with the suggestion that it will have huge influence over the future of the Iceberg standard.
Doug HenschenAnalyst, Constellation Research

But that long-range goal might not be particularly significant given the interoperability UniForm promises, according to Sanjeev Mohan, founder and principal at SanjMo.

He noted that Databricks' acquisition of Tabular has additive potential, particularly in terms of talent. But given that Tabular is not Apache Iceberg -- that Databricks didn't buy Apache Iceberg itself but instead a startup based on Apache Iceberg -- the acquisition might not be as much about combining capabilities to provide choice as it is about gaining influence with Apache Iceberg users such as Google, Cloudera, Confluent and Fivetran.

"Iceberg is already an open-source Apache project that [many enterprises] are already deploying, so it doesn't really matter who owns Tabular," Mohan said. "Now that Databricks will own both Delta Lake and Iceberg [technology], it can have an interesting strategy. But it doesn't upset anything."

Databricks has a longstanding relationship with Microsoft, he noted.

But with Apache Iceberg, a more widely used storage format than Delta Lake, Databricks' acquisition of Tabular could be seen as a means of trying to keep existing customers and appease potential new ones by not locking them in to just the Delta Lake format.

"Who uses Delta is a big question," Mohan said. "It's used by only two people: Databricks because they created it and Microsoft because the two are close. You don't want to lock yourself in to just one format … and now Databricks owns both table formats."

Next steps

Just as Snowflake held its user conference this week, Databricks will showcase new products and alliances at its conference next week.

In advance of the event, Databricks on Thursday unveiled a series of new and enhanced partnerships, including with Acxiom, Atlassian, Mastercard, S&P Global, Shutterstock and Tableau.

When Databricks unveils its new product development initiatives, the vendor would be wise to address simplicity and unification, according to Mohan. The vendor's recent acquisitions address different aspects of the AI development process, but demonstrating how they work together will be important.

"I'd like to see how Databricks is building a simplified and unified integrated stack to help customers build AI workloads faster," Mohan said. "Basically, there needs to be less stitching of moving parts and more of an integrated solution."

In addition, that integrated stack should include tools that encompass the full model development cycle from hardware through a development framework to the launch of models and applications.

Henschen, meanwhile, suggested that while Databricks serves the needs of data experts, it needs to do more to appeal to a broad audience, while rival Snowflake has made ease-of-use a priority.

"Databricks remains a favorite with data science teams. But it has yet to see mainstream adoption as an analytics and apps platform," he said. "Snowflake is doubling down on ease of use and productivity for the masses for data warehousing, app building and AI. It's broad Snowflake versus best-of-breed Databricks. Growth and scale win."

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Data management strategies

Business Analytics
Content Management