Getty Images/iStockphoto

Starburst targets data discoverability with new capabilities

The vendor unveiled three new tools during its virtual user conference, including an automated catalog and a tool that automatically indexes and caches data.

Starburst on Wednesday unveiled a trio of new features for its data management and analytics platform, including an automated data catalog for Starburst Galaxy that aims to enable users to more quickly and easily search and discover data.

Galaxy is the vendor's cloud-native offering, while Enterprise is for its on-premises customers.

In addition to the automated data catalog, the vendor revealed that users can now use Python to access both Galaxy and Enterprise. It also revealed a tool named Warp Speed that automates indexing and caching of data an accelerates queries up to seven times over manual indexing and caching, according to the vendor.

Starburst, founded in 2017 and based in Boston, is a data and analytics vendor whose platform is designed to let customers build a data mesh architecture.

Data mesh is a decentralized approach to data management and analytics that relies on the domain expertise of power users within departments to help oversee their organizations' data operations. Other vendors offering data mesh tools include Informatica and Talend.

Starburst unveiled its new capabilities during Datanova, a virtual user conference hosted by the vendor.

The automated data catalog and Warp Speed are now in private preview, while access via Python is generally available. Warp Speed is expected to be generally available to Enterprise users by the end of February and Galaxy users within the next three months.

New capabilities

Among the features unveiled during Datanova, the automated data catalog and Warp Speed have the potential to be of most benefit to customers, according to Doug Henschen, an analyst at Constellation Research.

Even before developing the automated data catalog, Starburst's query engine automatically collected metadata about user behavior related to a given dataset when that dataset was connected to Starburst. Now as soon as a dataset is connected to Starburst, the platform not only collects metadata but also adds the dataset to a catalog so it can be found and queried for relevant analytics use.

Warp Speed, meanwhile, represents Starburst repackaging of capabilities it inherited through its June 2022 acquisition of Varada.

The tool indexes and caches workloads, automatically assembling workloads into blocks based on patterns of use and other characteristics that make it reasonable to group workloads together.

Ultimately, Starburst built both tools to make data more discoverable and speed up the process of organizing data. The desired result is to make it faster and easier to find relevant data that leads to insights and actions.

"The features that will have the broadest appeal [will] be the indexing and caching feature, which will drive performance, and the automated data catalog, which will make it easier for any users to find data and assets that could be potentially helpful," Henschen said. "More sophisticated users might develop data products … but performance gains and improved data access benefit every user."

A sample Starburst dashboard
A sample screenshot from Starburst shows a dataset's metadata.

Similarly, Vishal Singh, head of data products at Starburst, cited the automated data catalog as having the most potential significance for users among the new capabilities unveiled during Datanova.

Singh noted that it enables users to know which datasets to query without having to first search for them on the different databases an organization might use. It also avoids potentially finding various versions of the same dataset in a few different places and having to explore those datasets to discover which is most relevant.

"A [user] wants to write a query and ask a question, and if you know what query to write, that's the easiest path," Singh said. "As the data ecosystem gets bigger, people lose context of their data. What we are trying to do is help users understand the context of the data before they write their queries."

The features that will have the broadest appeal [will] be the indexing and caching feature, which will drive performance, and the automated data catalog, which will make it easier for any users to find data and assets that could be potentially helpful.
Doug HenschenAnalyst, Constellation Research

Alison Huselid, Starburst's senior VP of product, added that the automated data catalog can potentially result in better decisions even before the query begins, which then leads to a more streamlined analytics process.

"It can help users decide what kind of compute they need to run for a query, what kind of cluster they need to run and what kind of scale they need," she said. "That can help you make smarter decisions as you go down the path of getting the actual insight you're trying to derive."

While the automated data catalog and Warp Speed are designed to help users derive insights more quickly and accurately, enabling users to access Galaxy and Enterprise using Python is aimed at letting data scientists access Starburst's infrastructure with familiar and favorite tools.

In particular, the addition is a response to requests from customers who want to migrate PySpark workloads to Starburst without rewriting code, according to the vendor.

Henschen noted that taken together, the new capabilities are part of Starburst's plan to broaden its capabilities to the point that it can be an organization's lone platform for data. Unlike more established data and analytics vendors, such as Alteryx and AtScale, that have had a decade or more to develop full-featured data management platforms, Starburst is still in its startup phase.

"All these moves are designed to broaden the capabilities of the Starburst platform such that it could be an organization's single platform for data," Henschen said.

Whether Starburst is quite ready for that remains to be seen. But the new capabilities at least make it more possible, he continued.

"Starburst is addressing the performance gap in a data fabric approach in which you 'play data where it lies' rather than going through the trouble and expense of putting data in a centralized store," Henschen said.

In the future

Data discoverability has been a recent focus for Starburst, even before adding Warp Speed and an automated data catalog.

In November, the vendor unveiled data discoverability and governance features at the AWS re:Invent conference. In September, Starburst added data sharing capabilities.

Next, Starburst plans to target data lineage, according to Singh.

Beyond data lineage, adding partnerships to expand capabilities through connectors and integrations is also part of Starburst's roadmap, according to Huselid. She noted that one example is an integration with DBT Cloud unveiled during Datanova.

Henschen, meanwhile, said he'd like to see Starburst continue adding administrative and governance capabilities to make data mesh a more secure approach to analytics.

"I wouldn't be surprised to see new stewardship and governance features and functions designed to support the combination of decentralized freedom of data usage with centralized guardrails and visibility," he said. "The new data catalog is a step in that direction."

Eric Avidon is a senior news writer for TechTarget Editorial and is a journalist with more than 25 years of experience. He covers analytics and data management.

Next Steps

Starburst update adds new genAI, streaming data features

Dig Deeper on Business intelligence technology

Data Management
Content Management