Serg Nvns - Fotolia


GDPR and AI: Data collection documentation essential to compliance

It's important to remember that artificial intelligence data and AI algorithms must hold up against GDPR regulations. Here's where GDPR and AI intersect and what CIOs can do to remain compliant.

With the European Union's new General Data Protection Regulation now in effect, it's easy to focus on customer-facing systems that hold and manage the bulk of personal data. But at the same time, CIOs must also consider how their artificial intelligence applications use various types of data and develop strategies to keep these systems compliant.

When it comes to AI compliance under GDPR, AI applications and algorithms can certainly hide potential regulatory problems.

"If any company does business in the EU or with a global reach that collects EU citizen personal data for any reason and uses AI capabilities, they must be respectful with the use of that data, protect it to the best of their ability and adhere to the regulation in full," said Richard Curran, security officer of datacenter group sales at Intel Corp.

Articles 13-15 in the new regulations repeatedly stipulate that individuals, or "data subjects," be able to access information about how their data is processed, including AI-generated data.

With penalties as high as 4% of a company's revenues, CIOs might want to err on the side of caution until the dust settles around what EU regulators consider appropriate -- especially when it comes to GDPR and AI compliance.

A good practice may be to keep humans in the loop where possible: If an automated system driven by machine learning reaches a decision, people should be given an opportunity to intervene and review.

"This may be necessary, even if at the expense of a slight loss of efficiency relative to a completely automated decision system," said Venkat Rangan, CTO and co-founder of Clari Inc., a developer of AI-driven sales management software.

As anyone who's been in a legal situation and/or sued can tell you, it's better to over document and over communicate than it is to come up short.
Joel Vincentchief marketing officer, Zededa

It's also important to remember that the data used in AI algorithms must be acquired consensually. For a GDPR and AI marriage to be successful, it's important to provide easily understandable examples of what the AI algorithm is trying to determine and to presenting that prior to opt-in. The regulation states that the subject who is impacted by the automated decision needs to have enough information about what was collected -- and how it was used -- to be able to make an informed decision about whether or not to opt-out, Rangan said.

The nonbinding language of Recital 71 provides more clarity about these safeguards by requiring systems to provide enough information for the data subject to intervene in the process, to get an explanation of how the data is used and to challenge these data processes if they want.

"It is important to explain how AI models are trained and built, what features they are leveraging and what their accuracy levels are, and present an option to opt out of such processing," Rangan said.

Complex documentation required

The requirements in the GDPR framework sound specific: Any organization processing an individual's personal data must be able to provide them with meaningful information about how their data is used.

But it's important to remember that GDPR is a legal framework, and, like all legal frameworks, is subject to interpretation.

Joel VincentJoel Vincent

"As anyone who's been in a legal situation and/or sued can tell you, it's better to over document and over communicate than it is to come up short," said Joel Vincent, chief marketing officer at Zededa, an IoT software service.

Providing documentation on direct matching of data based on name, email address or other identifiers unique to an individual in a traditional system is straightforward. However, in a system using AI and machine learning for probabilistic matching, identity resolution, customer profile generation and behavioral activity attribution, things become complex very quickly, said Derek Slager, CTO and co-founder of Amperity Inc., a customer data platform that uses machine learning for customer segmenting.

Slager said it is important to examine the technical details of your AI data processing to ensure that it can continue. These include:

  • References to white papers by the chief data scientist;
  • An explanation of the complex nature of how clustering occurs;
  • Layman's explanations of how the data processing pipeline fits together; and
  • Customer service training on how to educate people enough to make informed decisions about how their data is processed.

Strong identity solutions should be in place with evidence showing how the user ID generating the data for analysis is tracked from generation of that data to its storage. Data signatures are important and there needs to be reasonable proof that, if asked, you can identify the user and prove that they have opted into the collection and usage of data.

Industry experts acknowledge it will be difficult for CIOs to create perfect systems for tracking data usage in AI. In the beginning, at least, it's more important to focus on providing evidence that you made reasonable efforts to implement technology to do this. If someone files a complaint and a company cannot prove they have the necessary permissions, any judgment against the enterprise will likely be based on whether the company implemented reasonable measures or was negligent and reckless with a person's data, said Zededa's Vincent.

Data storage, deletion strategies

Data scientists are constantly crafting better AI algorithms using data that did not previously generate value. The problem for CIOs lies in finding a balance between getting rid of old data that could be a liability and managing it in a way that allows the development of better AI algorithms.

One strategy to ensure GDPR and AI might be to anonymize data that provides no immediate value.

Sultan SaidovSultan Saidov

"If data is no longer relevant, it should be purged or anonymized after a certain time interval," said Sultan Saidov, co-founder and CPO at Beamery, a London-based company that built a CRM for recruiting with algorithms for segmenting and targeting.

Other experts suggest CIOs take a more aggressive stance in purging unused data.

Richard CurranRichard Curran

"The company using AI can only provide the service using the minimum information for that specific purpose with an EU citizen's consent," Intel's Curran said. "Once the AI service has completed for any number of reasons, all data should be deleted, and proof should be provided upon request or audit."

CIOs will likely focus on the customer-facing side of data usage for traditional web apps and for data stored in CRM and ERP systems. It's also a good idea to keep an eye on other systems that may store data related to systems like human resources and IoT.

"If you are using algorithms to rank job candidates, this needs to be made clear when they apply," Saidov said.

Dig Deeper on Risk management and governance

Cloud Computing
Mobile Computing
Data Center
and ESG