This content is part of the Conference Coverage: A conference guide to AWS re:Invent 2023

Starburst update adds new GenAI, streaming data features

Galaxy, the cloud-based version of the vendor's platform, now includes text-to-code translation capabilities as well as automated governance and real-time ingestion tools.

Starburst on Tuesday unveiled new capabilities designed to enable organizations to develop and scale data products, including AI applications.

New features now in Galaxy, the cloud-based version of Starburst's platform, include self-service analytics powered by generative AI, near real-time streaming data ingestion and automated data governance.

Taken in sum, the update adds some needed but not necessarily innovative features, according to Donald Farmer, founder and principal of TreeHive Strategy.

"It's not bad," he said. "They have done some necessary engineering, which no doubt was time consuming."

Farmer noted, however, that while Starburst's features may not be fully differentiated from those from competitors such as Databricks and Snowflake, Starburst's foundation on Trino makes it stand out. Trino is an open source query engine.

"[Other] companies also offer data analytics solutions with varying focuses on real-time processing, AI integration and cloud-based services," Farmer said. "Compared to its competitors, Starburst's unique selling point is its foundation on open source Trino, which may appeal to organizations looking for more flexibility and less vendor lock-in."

Compared to its competitors, Starburst's unique selling point is its foundation on open source Trino, which may appeal to organizations looking for more flexibility and less vendor lock-in.
Donald FarmerFounder and principal, TreeHive Strategy

Starburst revealed the update at AWS re:Invent 2023, a user conference in Las Vegas hosted by the tech giant.

Starburst is a data lake vendor whose offerings, which include Enterprise for its on-premises customers in addition to Galaxy for its cloud-based users, are frequently used to develop a data mesh architecture for data management.

Data mesh is a decentralized approach to data management that distributes ownership of data to different domains within organizations, rather than keeping it locked up by a centralized data team.

One intent is to take advantage of the domain expertise of those within a given department, such as finance or human resources. Another intent is to reduce bottlenecks that inevitably develop when one team has to do all the data preparation and analysis for an entire organization.

Meanwhile, to avoid data getting isolated within departments, data mesh architectures employ data catalogs and other tools to connect domains and enable cross-departmental sharing and collaboration.

In June, Starburst updated Galaxy to improve data governance and access across different clouds. Two months earlier, the vendor developed an integration with DBT Labs to better enable data transformation.

New capabilities

While Starburst's update adds new generative AI capabilities and generative AI has been a primary focus for many technology vendors, the most significant new features in Galaxy are streaming data ingestion and automated data governance, according to Farmer.

The streaming data ingestion is powered by Apache Kafka and enables customers to ensure that their applications are informed by the most up-to-date data.

The automated data governance is accomplished through machine learning models in Gravity -- a governance layer in Galaxy -- that apply access controls as soon as data is loaded into Starburst's data lake.

"Two standout features are streaming ingestion and automated data governance," Farmer said. "The ability to load a data lake in near real time using Kafka is a very common demand these days, even when not strictly demanded by use cases. Starburst needed to make this happen."

Regarding automated data governance, he added that benefits include speed and accuracy.

"Automated data governance is cool because identifying personally identifiable information when it lands is much better governance than waiting for an automatic scan or a human review," Farmer said. "Too much can go wrong otherwise."

Meanwhile, in the year since OpenAI's release of ChatGPT marked a significant improvement in generative AI capabilities, most data management and analytics vendors have made generative AI a significant part of their product development plans.

Generative AI enables text-to-code translation capabilities that let users interact with data using conversational language rather than code.

Using such natural language processing capabilities, employees within organizations that didn't have the training to query and analyze data are now able to do so. That, in turn, should help expand analytics use within organizations beyond where it's been -- around one quarter of the workforce -- for the past couple of decades.

Meanwhile, trained data experts no longer need to write reams of code to perform tasks such as developing data applications and pipelines to preparing and modeling data for analysis.

At the time Starburst updated Galaxy in June, Matt Fuller, the vendor's co-founder and vice president of product said AI was a focal point of Starburst's roadmap.

Now, the vendor is adding those text-to-SQL processing capabilities that enable self-service users and make data teams more efficient.

Beyond generative AI-powered self-service analytics, streaming data ingestion and automated governance, Starburst's latest Galaxy update also includes the following features:

  • Automated data maintenance that takes over tasks such as data compaction to reduce the manual demands placed on engineers and other data experts as the amount and complexity of that data grows.
  • Universal data sharing with built-in observability through Gravity that enables customers to package data sets into data products that can be shared to inform collaborative decisions or monetized to sell to third parties.

Together, the new features aim to help customers more easily develop data applications and scale them to meet the real-time needs of the organization, according to Alison Huselid, Starburst's senior vice president of product management.

"First and foremost, a data application requires a performant and scalable data foundation," she said. "This set of features together ensures that foundation remains solid by providing near real-time governed access to all their data with warehouse-like performance."

Toward that end, Huselid highlighted streaming data ingestion as perhaps the most significant of the new features.

Meanwhile, motivation for the development of new features came from a mix of customer requests and Starburst's own vision of where organizations' data management needs are headed, Huselid continued.

"Over the last year, as we have seen increased adoption in data lake-based architectures for building data applications, we have also seen an increasing need for data teams to have an easier, more streamlined way to build and manage their data lake," she said.

Top benefits of generative AI for businesses.
Seven benefits of generative AI for the enterprise.

Future plans

With Starburst's latest Galaxy update now available, the vendor's product roadmap will continue to focus on making it easier for users to build and manage data lakes to a variety of applications, according to Huselid.

In addition, generative AI will continue to be an emphasis, she said.

That emphasis on generative AI is appropriate, according to Farmer. In addition, more automation and data governance would be wise areas of focus, he continued.

"What's next [should be] more AI, more machine learning integration, more automation in administration and governance," Farmer said. "I think their work on automated data maintenance is a good start, and I would like to see more of that."

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Data management strategies

Business Analytics
Content Management