blobbotronic - stock.adobe.com
Enterprises are pressed by the growing volume and types of data on one side and the rise in data privacy regulation and hacking on the other. They often face scaling issues when it comes to traditional approaches to data privacy that include a blend of manual human-centric processes and rules-based systems for identifying, managing and protecting sensitive data with personally identifiable information.
AI data privacy complements these efforts by automating more processes of discovering and classifying PII using adaptable AI models that can work across more data types. Gartner predicts that 40% of privacy compliance technology will rely on AI by 2023.
"Enterprises are using AI tools because of the sheer volume of data," said Mike Kiser, senior security strategist and evangelist in the office of the CTO at SailPoint, an identity management company. "Without [the] intelligent automation that AI can provide, attempting to protect data privacy would be like trying to stop the flood of a low-lying coastal town with a pickup truck full of sandbags."
Enterprises also are starting to utilize more types of unstructured data to deliver highly personalized experiences.
"Teams have to work with a now-broader data set of live private data earlier in the design process," said Jenai Marinkovic, virtual CISO at Tiro Security, a cybersecurity staffing firm.
A rich toolset of AI-based systems that can be used to identify and detect personal data hidden inside these different types of data is critical.
Benefits of AI data privacy
"The main benefit of applying AI and machine learning to privacy problems is that an effective solution can be created to handle situations that are not very clear-cut and cannot otherwise be solved easily," said Manmeet Singh, CEO and co-founder of Dataguise, a data privacy compliance software provider.
This is particularly important for managing sensitive data, such as transferring healthcare records between providers without any HIPAA violations.
"Using AI for the managing and processing of sensitive data can eliminate human errors, like accidentally publishing information or sending it to the wrong location," said Darren Deslatte, vulnerability operations leader at Entrust Solutions, a systems integrator.
Manmeet SinghCEO and co-founder, Dataguise
AI is also good at spotting trends and patterns that a human would fail to detect, such as a sophisticated and randomized exfiltration attempt, said Ani Chaudhuri, CEO and co-founder of Dasera, a data protection and governance platform. Identifying these kinds of problems manually would otherwise require painstaking effort.
Another key benefit of AI data privacy is reduced time to safe data, said Matthias Meier, product manager at Privitar, an enterprise data privacy vendor. This could be a key success factor for organizations that aspire to use and democratize their data safely and efficiently. The AI can remove bottlenecks and automatically enforce standardized privacy practices at scale and reduce the risks of errors that can result from human ad hoc processes.
Challenges of AI data privacy
A key challenge lies in training AI data privacy tools to understand your industry, your company's business model and the regulations in your area.
"There may be AI models that are able to understand and meet the data privacy protection needs of famous regulations like GDPR or the California Consumer Privacy Act, but if you take the same models to the healthcare vertical and expect it to be effective out of the box on HIPAA, it's going to be a challenge," Chaudhuri said. You need time and sufficient data points to train every AI model, and the same is true for AI data privacy products.
Another concern is that setting AI models loose on sensitive data could also create another attack surface for hackers to breach your systems, Deslatte said. Attackers might find ways to exfiltrate data out of the AI algorithms or the way these algorithms interact with the data platform.
Here are some of the top use cases for AI data privacy.
The biggest use case lies in using AI to identify fields in data sets and documents that include PII by sampling the data.
"When companies have hundreds of databases in the cloud and on premises, identifying what needs to be protected is not a simple task," said Eliano Marques, executive vice president of data and AI at Protegrity, a data security software provider. This is especially true for semi-structured data, documents and images, which are spread out across complex data ecosystems.
Enterprises should consider a real-time and batch mode strategy for data, Marques said. The real-time strategy can classify sensitive data and apply a type of protection -- such as tokenization, where sensitive data is replaced by a unique identifier. A batch mode strategy might be kicked off after an individual asks to be forgotten, a requirement specified under privacy regulations such as GDPR. In this scenario, AI could help find unstructured data relating to a person down to the level of a PDF contract document.
AI data privacy can also automate the discovery of personal data and identities under ambiguous conditions. This is important for finding personal data in unstructured documents where there is ambiguity to the value of the data, Singh said. An example is finding the word April in a sentence. Without additional context, it would not be possible to say whether April refers to a person's first name or the month. AI and machine learning can disambiguate based on the surrounding context in this case.
New AI techniques like federated learning allow data scientists to build models on data from multiple parties without sharing the actual data, Marques said. For example, there may be two hospitals that want to build an algorithm for early detection of a disease but don't want their individual patient details to leave the premises. Both hospitals could benefit from more accurate models without violating HIPAA regulations using federated learning.
Identifying anomalous behaviors
AI can improve the monitoring of access to sensitive data. This would involve training models to recognize normal behaviors and flag anomalies. Determining whether a user's behavior needs to raise an alert is dependent on several factors, including historical behavior, the behavior of the group the user belongs to and the types of data accessed. An AI or machine learning mechanism will be a lot more effective than any rules-based approach for raising alerts on these anomalies without generating a lot of false positives, Singh said.
AI could also automate and streamline interactions with users relating to privacy requests, such as knowing what personal information an organization may have on them -- the right to know -- or for deleting personal information -- the right to be forgotten.
"The volume of these requests is only going to rise as the general public places a higher value on their privacy," Kiser said. In these cases, AI could be implemented into a conversational interface on the front end connected to various data inventorying capabilities on the back end.
AI could also help consumers understand what various data protection policies mean in practice by enhancing knowledge, said Peter Cassat, a partner with Culhane Meadows, a corporate law firm. He said the regulatory community has focused on building in consumer choice and the right to opt out. This means describing what information is being collected, how it is being used and what choices the customer has.
As organizations find more creative ways to use data, they also need to find ways to break down and describe the algorithmic decision-making that is occurring, the data that is fueling it and what is done with the outputs.
"If done effectively, that enhances individual privacy by allowing intentional information flow," Cassat said.
Organizations are also starting to use AI data privacy tools to identify data breaches and notify consumers, said Jakub Kobeldys, lead developer at VAIOT, an intelligent contract tools provider. For example, Google now embeds password breach notifications into Chrome. These tools crawl the dark web for hacked credentials and warn users that their data might have been compromised on a website they have used.