
Getty Images/iStockphoto
StarTree adding Iceberg support to simplify, speed analysis
With open table data storage formats gaining popularity, the vendor's pending support for Apache Iceberg promotes flexibility while also making analysis more efficient.
StarTree on Wednesday unveiled native support for Apache Iceberg that will enable StarTree Cloud users to deliver insights directly from their data lakehouse to analytics and AI applications.
Data lakehouses are hybrids of data warehouses, which enable the storage of structured data such as financial records, and data lakes, which enable the storage of unstructured data such as text and images.
Meanwhile, Apache Iceberg is an open source table format for storing large analytics datasets and is one of the table formats on which data lakehouses are built; Apache Hudi and Delta Lake are also popular table formats for data lakehouses.
StarTree Cloud's support for Iceberg is now in private preview.
Once support for Iceberg is generally available, StarTree Cloud will connect directly to Iceberg-based lakehouses so that users no longer need to duplicate or convert data from their Iceberg tables to move it from data lakehouses to analytics and AI applications using reverse extract, transform and load (ETL) pipelines.
Given that support for Iceberg enables StarTree to turn open tables from a passive storage format to one that directly feeds real-time applications, it is a valuable addition for joint StarTree Cloud and Iceberg users, according to Stephen Catanzano, an analyst at Enterprise Strategy Group, now part of Omdia.
"Adding Iceberg support is extremely significant as it … eliminates the need for reverse ETL pipelines or data transformation into proprietary formats, removing latency, complexity and cost barriers that previously prevented companies from serving data directly from their lakehouse," he said.
Built on the open source Apache Pinot online analytical processing framework, StarTree is a database-as-a-service provider based in Mountain View, Calif. In April, the vendor added support for the Model Context Protocol (MCP) standard for agentic AI development.
New capabilities
Open tables are datasets that are compatible with numerous engines such as Apache Spark, Apache Flink, Presto/Trino and ClickHouse as well as any cloud data platform that provides support for open tables.
Due to their flexibility to work with various systems, they are gaining popularity among both enterprises that store data in multiple systems and users put off by potential vendor lock-in when using a proprietary table format that only works with one provider's data management platform.
Meanwhile, Iceberg has emerged as perhaps the most popular open table format.
Beyond being vendor agnostic, Iceberg provides lakehouse capabilities that enable users to combine structured and unstructured data, separates metadata from file storage to improve workload performance and has a dedicated community of users such as AWS, Apple and Netflix -- which founded Iceberg -- that are making contributions that advance Iceberg's functionality.
"Apache Iceberg has emerged as the industry standard for managing historical data at scale, with adoption happening with all of the big vendors now and becoming the standard over Delta Lake," Catanzano said. "Its open table format … addresses critical needs in large-scale data management."
Because Iceberg enables users to combine disparate data types to fully inform applications, can connect to various systems and is capable of handling petabyte-scale workloads, it has become a popular means of helping to train generative and agentic AI models and applications.
Among the data management vendors now providing support for Iceberg are AWS, Cloudera, Databricks -- even though it was one of the founders of Delta Lake -- Dremio, Google Cloud, Oracle, Qlik and Snowflake.
Now StarTree is planning to do the same by adding native support in StarTree Cloud with a combination of customer feedback and market observations providing the impetus, according to Kishore Gopalakrishna, StarTree's co-founder and CEO.
"On the customer side, we saw a clear trend of organizations fully committing to Apache Iceberg as the foundation of their open lakehouse, and they didn't want to move their data into proprietary formats just to make it usable," he said. "At the same time, we've been tracking the rapid rise of Iceberg as the de facto open table format for managing large-scale analytics data in cloud object stores."
Key capabilities of StarTree's support for Iceberg include the following:
- Direct querying of Iceberg tables from StarTree, eliminating costly and complex data egress.
- Real-time indexing and aggregation of data, including unstructured data formats such as text.
- Local caching so users can store frequently used data in the same memory space as applications to reduce the latency and improve the concurrency of queries.
- Intelligent prefetching that proactively loads relevant data to speed performance.
While each has value, eliminating the need to move data is perhaps the highlight of StarTree's support for Iceberg, according to Catanzano.
"It fundamentally changes how companies can utilize their lakehouse investments for real-time applications," he said.
In addition, real-time indexing and aggregation is a valuable feature, Catanzano continued.
"It enables the high-performance, low-latency access that makes StarTree uniquely positioned to deliver interactive insights directly from the data lakehouse," he said.
Next steps
Beyond moving its native integration with Iceberg from the preview stage to general availability, StarTree's roadmap includes providing more capabilities that enable customers to build agents and improving its observability capabilities, according to Gopalakrishna.
To better aid agentic AI development, in addition to adding support for MCP, the vendor has plans to integrate with vector embedding models. Meanwhile, tools such as anomaly detection tool StarTree ThirdEye provide observability.
"Our focus is on helping companies build intelligent, responsive and scalable data products that meet the needs of both human users and AI agents," Gopalakrishna said.
Improving ThirdEye, possibly by adding new machine learning capabilities, is wise, according to Catanzano.
In addition, he suggested that StarTree integrate with AI vendors to provide users with tools to develop sophisticated AI applications trained on now easily accessible data from their data lakehouses and develop tools to aid customers in specific verticals.
"They may [create] industry-specific solutions that address unique real-time analytics challenges in sectors like e-commerce, financial services and IoT where immediate insights drive business value," he said.
Eric Avidon is a senior news writer for Informa TechTarget and a journalist with more than 25 years of experience. He covers analytics and data management.