Data mesh specialist Starburst on Tuesday launched three new data discoverability and governance features for its Galaxy platform. It also revealed that its Enterprise platform now supports AWS Lake Formation.
Starburst, founded in 2017 and based in Boston, unveiled the new capabilities at AWS re:Invent 2022, a user conference hosted by cloud computing giant Amazon Web Services. The conference was held in-person in Las Vegas with some content also available to remote attendees.
Starburst offers Galaxy for its SaaS analytics customers and Enterprise for its on-premises users. Both are designed to let customers build a data mesh architecture, which is a decentralized approach to data management and analytics.
The approach enables data teams within different domains -- for example, marketing or human resources -- to manage and analyze their own data rather than have a centralized data team do all the data management and analysis for their entire organization.
Data mesh uses the domain knowledge of an organization's workers, with the theory being that an expert in marketing or human resources data will be better at working with their domain's data than a data generalist. Its intent is reducing the demands placed on a centralized data team that often result in bottlenecks.
But rather than isolate an organization's data within each domain responsible for its own data, data mesh connects an organization's data with data catalogs and other data integration tools that enable sharing and collaboration.
Like Starburst, Talend and Informatica are among the vendors offering data mesh tools.
New Galaxy features
The three data discoverability and governance capabilities now part of Galaxy are aimed at helping organizations develop analytics tools, such as charts and dashboards, that let users analyze their data and reach insights that lead to action, according to Starburst.
Before the new features, Starburst users could see their data. But they were given no insight into what data was valuable and what to do with that data.
With new data discovery capabilities, Starburst now enables analytics users to better search and understand their data by automatically populating the data with metadata that gives greater context, including how it is being used by others.
To simplify the extract, load and transform process for data engineers, new automated schema discovery features enable users to discover new datasets whether they've been transformed or not.
Finally, granular access control capabilities allow data administrators to clearly see and understand which users can access what data and both monitor and change permissions as needed.
Combined, the capabilities are a significant improvement for Starburst users, according to Doug Henschen, an analyst at Constellation Research.
"This is a notable step up in metadata management capabilities for the Starburst Galaxy service," he said. "This brings core discovery, schema and access-control capabilities right into the product and may satisfy the needs of a majority of customers."
Customers in highly regulated industries, however, may still require the more advanced data governance capabilities provided by specialists like Alation and Collibra, Henschen added.
Meanwhile, of the three new capabilities, Henschen noted that data discovery could prove most useful, calling it "the most fundamental to the broadest base of potential users."
Russell Christopher, Starburst's director of product strategy, highlighted the new data discovery capabilities.
"The discoverability features -- for me, as an administrator at times and a consumer at times -- allow me to go really fast from a very high level down to a very deep granular level of the data," he said.
Christopher noted an important aspect of the new discoverability and governance features is that they reduce the need for third-party tools.
"The value add, as a user, is that I don't have to bounce between five different places to carry out five different tasks," he said. "I'm able to do all of that in one place, regardless of whether I'm a data engineer or if I'm not a data wizard and I'm just a user who wants to know what's there."
Enterprise and AWS
Beyond the data discoverability and governance capabilities for Galaxy, Starburst unveiled an integration between Enterprise and AWS Lake Formation.
AWS Lake Formation is the tech giant's fully managed service used to build and manage data lakes, and the integration aims to better enable joint Starburst and AWS data lake users to develop a data mesh approach to analytics within their organization, according to Christopher.
Doug HenschenAnalyst, Constellation Research
The integration directly connects Starburst to data stored in AWS data lakes, and enables joint customers to securely work with their data within the data lake rather than requiring them to move their data back and forth between environments.
Data analysts can now query data directly in their AWS data lake while data administrators can put data governance measures on data without extracting it from its data lake.
"This is rounding out our capabilities to do data mesh wherever the data lives, which is challenging," Christopher said.
It's particularly expanding Starburst's security capabilities, he continued.
"We have our own baked-in capabilities to secure data and keep it from prying eyes. But if users and administrators have already taken the time to lock down that data inside AWS, we're able to respect that," Christopher said. "It's another layer of our ability to [secure data]."
Henschen expressed surprise that Starburst didn't already have a deep integration with AWS Lake Formation. But he likes the data mesh approach to data governance, the origin of which is generally credited to Zhamak Dehghani, a consultant at Thoughtworks.
He noted that only a small percentage of organizations have implemented a data mesh. With the integration enabling users to have a more direct connection to metadata, schema and access controls, it is designed to better enable data mesh.
"I personally like the mesh concept because it's in sync with the reality of decentralized management and use of data across far-flung enterprises," Henschen said. "Whether organizations broadly embrace the terminology and employ the unifying aspects of mesh for enterprise-level insight and oversight has yet to be seen."
While Starburst now supports AWS Lake Formation, it does not have the same level of support -- enabling in-database data management and governance -- for Azure Data Lake from Microsoft or Google Cloud's BigLake.
According to Christopher, AWS has more data lake customers than the other two cloud computing giants. Starburst will add deeper integrations with their tools as more customers use Azure Data Lake and BigLake for their analytics needs.
"To be successful at data mesh, you need to be able to connect to everything and you need to be able to make everything secure," he said. "Lake Formation is one of those things we know that customers who are really serious about being on data lakes want."
With its latest updates now generally available, Starburst -- which raised $250 million in funding in February 2022 -- will add more functionality to Galaxy, according to Christopher.
The vendor first unveiled the SaaS platform in preview in February 2021 and is not yet as complete as Enterprise, so adding new capabilities is a priority.
"We're building this thing. And Starburst Enterprise has been out there four or five years, and Galaxy has only been out there for a year and change," he said. "It's just about completing the vision."
In particular, with data discovery and governance capabilities in place, Starburst plans to focus on enabling users to easily turn certain datasets into analytics assets like charts and dashboards. Starburst has built a foundation for analytics and is now ready to add the analysis.
Henschen noted that the introduction of Galaxy in 2021 and now metadata management capabilities in late 2022 represent progress and should help Starburst attract new customers.
When it only offered Enterprise, it was only usable by large enterprises with ample financial resources (Starburst does publicize precise pricing details). Galaxy, however, made Starburst available to smaller companies with a consumption-based pricing model.
"It's still a comparatively small company," Henschen said. "The introduction of the Galaxy service was an important step to make Starburst more accessible and faster and easier to deploy for a broader base of organizations. The addition of metadata management capabilities will also amp up the value offered to a broader base of users within those organizations, so it's a good sign of progress."