Getty Images/iStockphoto

Addition of new AI capabilities shows Starburst's growth

The open source data lakehouse vendor continues to evolve beyond being a data mesh specialist with the additions of agentic tools, a development environment and a data catalog.

Starburst on Monday unveiled a swath of new capabilities, including AI Workflows and AI Agent, which demonstrate the vendor's growth beyond being a data lakehouse specialist.

Built on the Trino query engine, Starburst's platform -- Galaxy for cloud-based customers and Enterprise for on-premises deployments -- is commonly used as the connective tissue for the data mesh method of managing data.

Over the past few years, Starburst has broadened its offerings to become a more full-featured data management vendor by including a data catalog to foster data discovery, a governance layer, data transformation and streaming data capabilities, and even self-service analytics tools.

Now, Starburst is making AI a focal point, introducing both a secure environment for customers to develop AI applications as well as its own AI capabilities aimed at simplifying and speeding data management and analysis.

AI development, which includes applying powerful models to proprietary data, is a high-risk/high-reward proposition, according to Kevin Petrie, an analyst at BARC U.S. Given that Starburst is addressing the risks with its new capabilities while enabling the reward, the vendor's news tools are significant for users.

"Starburst addresses the biggest risks, including data access, data quality, privacy, incompatible systems and complex architectures," Petrie said. "Their new product features, assuming they reach full availability soon, reflect a mature approach to the market."

Based in Boston, Starburst is a data management vendor whose lakehouse platform combines the structured data storage capabilities of data warehouses with the unstructured data storage capabilities of data lakes. Vendors providing similar capabilities include Databricks, Dremio and IBM, which in 2023 acquired Presto-based Ahana.

Branching out

Given the potential for generative AI (GenAI) to make workers better informed and more efficient, many enterprises have increased their investments in AI development since OpenAI's November 2022 launch of ChatGPT represented a significant improvement in GenAI technology.

Simultaneously, with data providing the intelligence aspect of AI, many data management vendors have expanded beyond their historical focus to build environments within their platforms that make it easier and more secure for customers to develop and deploy AI applications.

Rivals Databricks and Snowflake have been among the most aggressive developers of environments for building AI tools, while others, including Alation and Informatica -- among many others -- have also made enabling AI development a priority.

Now, Starburst is joining the fray with the unveiling of AI Workflows, a set of tools in private preview designed to enable customers to develop, deploy and manage AI models and applications.

The suite includes AI Search to transform unstructured data to vector embeddings in Apache Iceberg, AI SQL Functions to train GenAI models using SQL, and AI Model Access Management to govern AI models and applications. In addition, given the native connectivity between AI Workflows and Starburst's lakehouse, the development environment does not require users to move data or build complex pipelines.

Beyond AI Workflows, Starburst is adding new AI capabilities of its own with AI Agent.

The vendor previously launched natural language processing capabilities that enable users to analyze data without writing code. Now, it is adding a prebuilt conversational interface -- currently in private preview -- that can be deployed by analysts or autonomous agents and aims to simplify product documentation and insight generation.

"These AI Workflows and Agents … build on Starburst's core strength of accessing data on premises or in cloud environments," Petrie said.

In particular, transforming unstructured data into vector embeddings to prepare it for retrieval-augmented generation is a valuable addition, he continued.

"That's a key requirement of GenAI models," Petrie said.

Customer feedback provided the impetus for developing both AI Workflows and AI Agent, according to Matt Fuller, co-founder of Starburst and the vendor's vice president of AI/ML products.

Beyond the new AI capabilities, Starburst introduced the following:

  • Starburst Data Catalog, a metadata-based hub for indexing and governing data that replaces the Hive Metastore in Starburst Enterprise.
  • Auto-Tagging, a feature that uses large language models to detect and classify sensitive information at the column level.
  • Fully managed Iceberg pipelines in Starburst Galaxy that include built-in maintenance features such as file compaction and options for streaming ingest and batch-style loading.
  • Automated Table Maintenance of Iceberg workloads to reduce storage costs and improve query performance.
  • Automatic Query Routing in Starburst Galaxy that routes queries to the correct cluster to improve query performance.
  • A services offering that provides users with blueprints for developing data infrastructures that enable AI-powered data management and analysis.

The services offering and Auto-Tagging are generally available, while the other new features are in various stages of preview.

Collectively, the new capabilities further Starburst's goal of providing customers with fast, governed access to distributed data, according to Fuller.

"These are not to chase checkboxes, but to streamline how enterprises activate their data across environments for analytics and AI," he said.

Starburst aims to compete with Databricks and Snowflake as it grows beyond being a lakehouse for data mesh deployments, Fuller continued, noting that Starburst provides an alternative to both with its focus on interoperability and open standards.

Meanwhile, perhaps the highlight beyond the AI capabilities is the services component, according to Petrie.

"Their new service offering addresses a critical pain for many organizations, [which is that] their data architectures are not ready to support AI models or applications," he said. "Starburst can help them modernize without costly migrations."

Next steps

In the long term, Starburst's roadmap will continue to focus on AI, according to Fuller. With AI-powered agents that can act autonomously the latest trend, Starburst aims to ensure that customers have a data foundation that can support agentic AI applications.

"Our roadmap centers on breaking down data silos and delivering the infrastructure needed to power these agents with governed, contextual insights from ingestion to insight," Fuller said.

Petrie, meanwhile, suggested that as Starburst evolves into AI development, it should do more to integrate data operations, development operations and model operations.

"AI innovation centers on the integration of data, models and applications," he said. "To support this integration, Starburst would do well to partner with more AI/ML model platforms and application-development frameworks."

Eric Avidon is a senior news writer for Informa TechTarget and a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Data governance