Nabugu - stock.adobe.com
Starburst Enterprise brings Apache Iceberg to data lakehouse
A new Starburst Enterprise release brings in new capabilities to help organizations use the Trino SQL query engine to analyze data stored on premises and in the cloud.
The emerging concept of a data lakehouse is continuing to gain traction, with multiple vendors looking to help organizations effectively query and use cloud data lakes.
Among the vendors that is growing in the industry is Starburst, one of the leading contributors to the open source Trino SQL query engine technology.
With a data lakehouse, organizations need to have some kind of organization for data stored in a data lake, an area where the Databricks-originated Delta Lake open source project is used and supported by multiple vendors, including Starburst.
On Nov. 16, Starburst, based in Boston, released the latest version of its Starburst Enterprise platform, adding support for the open source Apache Iceberg project, a competing effort to Delta Lake.
The new Starburst update also includes an integration with the open source DBT data transformation technology. Security is another area of improvement, with an optimized integration with the Apache Ranger data access security technology.
"The latest announcement is a continuation of the expansion of Starburst's value proposition, building on its expertise around Trino and distributed query processing," said Ventana Research analyst Matt Aslett.
"Much of the initial adoption of Trino and Starburst Enterprise was driven by data lake environments, so Starburst Enterprise is evolving to address emerging data lake capabilities including table formats, such as Apache Iceberg and Delta Lake, and data transformation workflow tools such as DBT," he added.
Matt AslettAnalyst, Ventana Research
Enabling data lakehouse in the cloud and on premises
Aslett added that although many data lakes are now largely based on cloud object storage, Starburst has a role in enabling analysis of data in on-premises and multi-cloud environments. Starburst Enterprise now provides support for MinIO technology, which provides on-premises storage capabilities.
The growing capabilities of Starburst were also highlighted by Merv Adrian, an analyst at Gartner, who said he sees the new release as continuing the vendor's overall momentum.
"We first began looking at Starburst simply as a Trino-based example of what we call an analytics query accelerator," Adrian said. "That's all about improving performance for difficult data lake queries and the like."
In Adrian's view, Starburst's support of Apache Iceberg and partnership with MinIO are good additions to the company's capabilities. Going a step further, Adrian noted that Starburst is adding polish for enterprise users with better observability for tuning and operation purposes, as well as improved security with more support for Apache Ranger.
Why Iceberg is a key data lakehouse addition
Matt Fuller, co-founder of Starburst, acknowledged that many organizations use Databricks and Delta Lake, and it's a strong technology. But some organizations prefer an alternative to Delta Lake, which is why it's important that Starburst provides an option, he said.
Fuller emphasized that Starburst now supports both Delta Lake and Iceberg, which provides users with the flexibility to choose what technology works better for them.
Delta Lake and Iceberg have some similarities. Both use the open source Apache Parquet file format for data. Fuller explained that Delta Lake and Iceberg are table formats that sits on top of files, providing a layer of abstraction that enables users to organize, update and modify data in a model that is like a traditional database.
"The initial use cases in Starburst are for people who have already been using Iceberg as an independent table format," Fuller said. "When people come to us and they're looking to deploy a data lakehouse strategy, we can talk through the options, so it also could be a net new use case does as well."
Expanding the data lakehouse in the future
The initial support for Apache Iceberg in Starburst Enterprise is read-only, which is all that Trino currently supports. Fuller said that a large initiative for Starburst and the Trino community is to add data manipulation language support to enable broader read/write capabilities for data.
Starburst is also set to invest in its data connector strategy, bringing more non-relational databases sources into the platform. Those sources could include SaaS data that could be in a cloud platform such as Splunk or Zendesk, for example.
Fuller also hinted at a major feature update coming in early 2022 as well that will bring new data mesh-type capabilities to Starburst.
Hudi powering data lake efforts at Walmart and Disney+ Hotstar