agsandrew - Fotolia
Data analytics workloads that run on the Databricks Unified Data Analytics Platform can now benefit from automated data governance capabilities from Immuta.
Databricks is one of the lead contributors to the Apache Spark open source data query engine, and Spark is the foundation of its data analytics platform. Immuta's automated data governance software provides security controls that help organizations manage personally identifiable information.
Cognoa's data science team uses Databricks as a distributed computing platform for all of its computationally intensive machine learning tasks, Chief AI Officer Halim Abbas said.
"As a digital behavioral health company, data privacy and security are at the core of what we do," Abbas said. "[But] our legacy practices were extremely time- and labor-intensive."
Cognoa needed to provide its data scientists with data to build models, while removing sensitive information. However, this involved numerous steps, including complex data engineering, manual policy enforcement and labor-intensive reporting.
The convoluted approach to data security Cognoa employed previously created friction between compliance officers and machine learning engineers. The former needed to ensure the company protected end users' privacy in accordance with healthcare data laws, while the latter wanted data faster, Abbas added.
"We needed to expedite our data processing, while also finding a way to dynamically anonymize sensitive information for reporting," he said. "We therefore required a solution that could help us enforce data access roles, permissions and policies beyond the standard resource- or table-based control levels."
Immuta competes with fellow data governance startups such as Okera, as well as offerings from Informatica and other large vendors. With Immuta's platform, Cognoa is now able to apply the appropriate restrictions to data and enforce data access and policy restrictions in real time, based on the needs of its data scientists, according to Abbas.
Governance critical for data analytics
Organizations that run data management and analytics workloads with Apache Spark and Databricks face common challenges, such as managing fine-grained access controls at scale, said Steven Touw, co-founder and CTO of Immuta.
They also need to have detailed audit logs for all data-level access that show who accessed what data, when, and for what purpose in order to comply with data protection laws such as CCPA , GDPR and HIPAA.
Halim AbbasChief AI Officer, Cognoa
Immuta's platform can enforce policies either through a proxy or directly in the database engine, but Immuta for Databricks takes the latter approach. As such, users can clearly see what controls are being applied to a given data set inside of Databricks.
The platform can also identify and catalog sensitive information in Databricks tables and provides a simplified policy builder for data platform engineers to help them create policies that are understandable by nontechnical users.
In addition, Immuta for Databricks includes the ability to create secure data collaboration zones, where users with different permissions can read and write data sets without risk of a data leak within a Databricks cluster.