Getty Images/iStockphoto

Collibra update targets data quality, lineage and discovery

The data management vendor's Data Intelligence Cloud now includes pushdowns that enable work within Snowflake and Databricks and prebuilt workflows focused on data visibility.

Collibra on Tuesday updated its Data Intelligence Cloud with prebuilt workflows aimed at making data more visible across multiple sources and new capabilities designed to improve data quality.

In addition the data management vendor unveiled a series of new and improved integrations to increase connectivity to its technology partners.

Based in New York and Brussels, Collibra is a cloud-based vendor whose Data Intelligence Cloud enables customers to automate the data preparation process. Among its features are data scoring, which measures data quality to show whether data can be trusted and used for analysis, and data governance capabilities that help comply with regulations.

Competitors include other independent data management vendors, such as Alation and Informatica.

Recent Collibra updates include an integration with Snowflake and the launch of a new tool to measure an organization's data maturity against its peers.

In addition the vendor revealed last month that its Data Intelligence Cloud became an endorsed application on the SAP Store as part of the vendors' partnership that began with SAP's launch of its Datasphere in March.

New capabilities

Improved data quality and observability are now areas of emphasis for Collibra, according to Laura Sellers, the vendor's chief product officer.

As a result, Collibra's latest Data Intelligence Cloud update includes new integrations designed to increase connectivity and enable customers to better manage their data.

Among them are new integrations with the following:

  • Databricks' Unity Catalog to provide visibility and understanding of data stored in Databricks.
  • Google Cloud Storage to better enable customers to map and ingest metadata using Collibra.
  • Azure Data Factory to improve understanding of data by automatically developing lineages.
  • Azure Data Lake Storage to provide users insight into their containers, directories and files within Collibra.

In addition the update includes new integrations with analytics platforms MicroStrategy, Power BI and Tableau.

Each of the integrations are significant because they provide Collibra users with better visibility into their various data systems that can lead to cost savings, according to Stephen Catanzano, an analyst at TechTarget's Enterprise Strategy Group.

"Intelligence for [users] is the visibility to improve efficiencies and reduce costs," he said. "These new capabilities expand what systems they can connect with. The integration with Google Cloud Storage gets them into the Google fabric, which is significant. The integrations with BI tools are also exciting since they are data and processing intensive and areas where cost savings can be large."

Sellers, meanwhile, noted that data consumers use tools from myriad vendors. To meet their needs, Collibra -- and other vendors -- develop partnerships to build ecosystems for data management and analytics.

"Partnerships are key to us," Sellers said. "There's not a single company that I have seen that has its data in just one cloud. They're dealing with hybrid sources, multiple clouds and on-premises software. So this release is about all things partnerships for us."

Beyond the new integrations, new tools aimed at improving data quality include the general availability of Data Quality Pushdown for Snowflake and the public beta testing of Data Quality for Databricks.

These new capabilities expand what systems they can connect with. The integration with Google Cloud Storage gets them into the Google fabric, which is significant. The integrations with BI tools are also exciting since they are data and processing intensive and areas where cost savings can be large.
Stephen CatanzanoAnalyst, Enterprise Strategy Group

With the tools, joint customers of Collibra and either Snowflake or Databricks can process data directly in their cloud data storage repositories, eliminating the need to move data from Snowflake or Databricks into Collibra for preparation and then back into Snowflake or Databricks.

That saves time, effort and the cost of data egress as well as increases security by reducing the movement of data. In addition the Data Quality Pushdown versions include automated anomaly detection capabilities, which further improves efficiency.

Ultimately, however, the primary benefit of the Data Quality Pushdowns in increased time-to-value, according to Sellers.

"It's being able to … securely process data directly where it is. There's no data movement so no need for data egress," she said.

Sellers added that while Data Quality Pushdown for Snowflake is generally available and Data Quality Pushdown for Databricks is in public beta testing, Collibra plans to add similar Data Quality Pushdown tools for other cloud data storage repositories, including Amazon Redshift, Google BigQuery and Microsoft Azure.

"For those cloud players, we want to go as deep as we can so we can support anybody who's in those ecosystems with the full breadth of functionality that exists within our platform," she said.

Also of potential significance to Collibra users are new prebuilt workflows that address data lineage and data discovery.

The workflows are part of Workflow Designer, which is now in public beta testing. Users can simply click on a "workflow deploy" button to deploy.

The data lineage configuration enables quick access to data lineage information and includes prebuilt integrations. The Collibra Data Marketplace, meanwhile, comes with a self-service interface that makes it easy for users to discover data, according to the vendor.

Next steps

While Collibra's latest update adds visibility across multiple sources and targets data quality, it does not add generative AI capabilities.

In the seven months since OpenAI launched ChatGPT, which marked a significant leap in the capabilities of large language models, numerous data management and analytics vendors have unveiled plans to infuse generative AI throughout their platforms.

The hope for many of the vendors is that generative AI can make data management and analytics tools usable by more than just data experts within organizations. It would accomplish this by eliminating the need to know code and reducing the level of data literacy currently required to work with data.

However, with concerns lingering about the security of generative AI platforms and the accuracy of their data, even those vendors that have unveiled plans to incorporate generative AI have not yet made any generative AI-driven capabilities generally available.

Collibra, meanwhile, does have plans to integrate generative AI once it can securely do so, according to Sellers.

"Generative AI is a huge disruptor and is really exciting," she said. "I really, truly believe it's going to change the user experiences in all software. But it's not an area we were ready to release anything with this update. We are definitely looking into how to leverage it in the product to drive a better experience and more intelligent automation."

Beyond generative AI and more Data Quality Pushdowns, Sellers added that Collibra's roadmap is essentially focused on simplifying data management for users.

Catanzano, meanwhile, said he'd like to see Collibra and other vendors help organizations monitor the cost of AI processing. Most AI processing is done in the cloud, and many organizations are struggling to keep cloud computing costs under control.

"I'd like more focus on AI cost reduction," Catanzano said. "AI processing is costing a fortune. If [Collibra] can monitor and manage costs there, it would be exciting."

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.

Next Steps

Collibra adds AI governance to data management platform

Dig Deeper on Data management strategies

Business Analytics
Content Management