alex_aldo - Fotolia

Analytics an uneasy balance between data collection and privacy

In the age of GDPR and privacy regulations, attention must be paid to user privacy. Data management tools that employ AI as part of analytics can help achieve that.

Advanced analytics, BI and AI are booming and can potentially offer great business benefits, but these technologies are extremely data hungry. Meanwhile, GDPR and other privacy regulations are forcing companies to re-evaluate how much data they collect and what they do with it.

One of those companies is Ruffalo Noel Levitz LLC, an enrollment and fundraising management firm based in Cedar Rapids, Iowa, that works with about 1,900 academic institutions and nonprofits. The company, which touches 240 million prospective students and donors each year, recently decided to invest in a formal data governance program to help ease the tension between data collection and privacy concerns.

"Compliance with data protection and privacy regulations is just one reason in a long list of reasons why," said Alison Burchett, Ruffalo Noel Levitz's associate vice president of product management and data governance.

In particular, the company was looking for a way to keep track of data as it moves through various systems. "We handle data across the student lifecycle, from college recruitment all the way through alumni giving," Burchett said. "Which means we are tracking a single person through multiple data systems over the course of many years."

The biggest regulations that companies like Ruffalo Noel Levitz face today are the European Union's GDPR and the California Consumer Privacy Act, but there are also regulations specific to particular industry verticals or business functions. In addition, many other jurisdictions are in the process of following in Europe's and California's footsteps or are seriously considering it.

But getting a handle on the balance between data collection and privacy isn't just good for compliance. It also has business benefits -- including ones for a company's customer data analytics projects.

Alison Burchett, associate vice president of product management and data governance, Ruffalo Noel LevitzAlison Burchett

Getting such benefits was one of the driving forces behind Ruffalo Noel Levitz's decision to launch a data governance initiative. "One of our biggest challenges is establishing a single 360-degree view of our constituent records," Burchett said.

Deploying data governance tools from Infogix Inc. helped provide Ruffalo Noel Levitz with that 360-degree view. And being able to track data accurately also means that the company can move toward more advanced technologies. "My hope is that we'll be able to leverage machine learning -- at least for data quality purposes -- in the near future," Burchett said.

Privacy and analytics can coexist

Ruffalo Noel Levitz isn't alone in its effort to balance privacy and data collection. Many companies are implementing systems and data governance processes that allow them to comply with new privacy regulations and that will also make collected data more readily available for AI, BI and other analytics platforms, said Paige Bartley, a senior analyst at 451 Research.

Take, for example, GDPR's requirements that companies must be able to show their customers all the data they've gathered about them. For most companies, that's an onerous task. Meanwhile, analytics tools looking for patterns in customer behavior also need access to all the information a company has on each individual.

"The common requirement in both use cases is to associate all the data in a company to the same customer identity," Bartley said. "They're two sides of the same coin."

It's true that, in the short run, data collection and privacy regulations mean that companies will have less data to work with. But, in the long run, they'll benefit due to the data being of higher quality, according to Bartley.

"It's an opportunity to build trust with customers," she said. "They volunteer more accurate personal data, and there's less obfuscation behavior." For example, people may provide fewer junk email addresses if they can trust companies not to abuse that information.

Making sense of siloed data

Often, the reason it's so hard to get a good view of any particular customer -- and ensure the customer's privacy -- is that the collected data is scattered in data silos on different systems.

"This is the overarching challenge that businesses have," said Anthony Di Bello, vice president of strategic development at OpenText Corp., a Waterloo, Ont.-based vendor that offers the EnCase eDiscovery tool and other information management products. "It's a privacy problem. It's a security problem. It's a risk and legal problem."

Some of the data might be siloed in a customer management database and some in a transactional database. There might be emails to and from customers stored in individual employee mailboxes. There might be files containing customer information in cloud object storage systems. There might be data scraped from social media, product reviews the customer has written, images they've uploaded, conversations with chatbots or recordings of customer support calls. Any of that collected data could potentially hold personally identifiable information that falls under the purview of data privacy regulations.

Plus, there are the backups of all these systems. There could also be leftover files in applications that are no longer being used, or in random spreadsheets and text documents. And in many cases, the data is inconsistent. Customers move and change phone numbers or email addresses or just provide information in different formats each time. Different systems use different key information, such as an email address here, a customer number there and a totally different user ID somewhere else. There also are variations in how customer data is listed, and typos.

Many enterprises don't even have a way to search all their different systems, databases, cloud services and employee desktops. And if they were able to run, say, a keyword search, the results would be messy and unusable -- like web searching was before Google came along. Relevant results may also be missing because they don't include that specific keyword.

"Say I'm searching for information about an individual," Di Bello said. "There may be information about that individual that's not identified by name, but by some reference number."

Data governance complications

Pulling all this data into a single data lake could help, but it isn't always an option.

"Some jurisdictions have rules about sending data outside the region," said Margaret Alston, director of consulting at TrustArc, a privacy compliance company based in San Francisco. "Even in a single region, like the EU, there are some country-specific differences."

Also, as part of their data governance policies, companies have to decide whether they want to follow the most stringent privacy standards for all their collected data and potentially lose out on business opportunities or create separate systems for the more lenient jurisdictions and then deal with increased complexity. And that's just the tip of the iceberg.

Companies may need to get consent in certain locations but not others, or they may face limitations on how much data they can collect. In some regions or industries, there might be additional concerns about how data is used. For example, if a company uses AI to decide whether to give someone a loan, that might run up against financial regulations.

Finally, if all the data is in one big basket, that makes it a very tempting target. "Think about the data breaches, the risks of what can happen when you put a lot of data in one place," Alston said.

We use AI to dynamically detect where information is private and where governance is needed.
Emily WashingtonSenior vice president of product management, Infogix Inc.

AI may improve data privacy

To help enterprises solve this data collection and privacy problem, various data management vendors are adding AI technologies, such as machine learning algorithms, to their products. This includes Infogix, the vendor that provided data governance software to Ruffalo Noel Levitz.

"We use AI to dynamically detect where information is private and where governance is needed," said Emily Washington, senior vice president of product management at Infogix. "For example, we can identify whether the information is a U.S. Social Security number or an email address or a street address. And we do scoring to detect when information may be of a private nature."

Infogix, which is based in Naperville, Ill., has a team dedicated to staying on top of regulations, Washington added -- and it has been doing that for a long time. With customers in many different industry verticals, that includes a lot of very specific regulations.

The company can also help enterprises looking to find data that's stored in different systems in different formats, according to Washington. "If you have a hundred applications and you're checking for something like an email address, being able to know how that email address lives across those hundred applications is difficult," she said. "Machine learning will help in a more dynamic way."

Some companies may decide to build AI-powered data management systems from scratch. Usually, though, going with a vendor has some advantages. First, vendors already have connections built to get data from the most common database platforms and cloud vendors. And vendors will have a set of pretrained data models to identify common data types. But working with an outside vendor also creates its own problems, like having your data in yet another place or risking that your trade secrets get out.

At data privacy software vendor Integris Software Inc., for example, customers currently use its tools either on premises or in private clouds. "Most of our customers are Fortune 1000 companies and have their own infrastructure," said Raghu Gollamudi, the Seattle-based company's co-founder and CTO. "It's all behind their firewall."

Customers can also decide whether they will share any of the models created from their data with others. Some models, for example, are generic tools that help the system identify something like an email address. When these learning models are transferred to new data sets, it gives AI a big head start to learn how to manage the collected data and identify privacy risks.

"But anything that's very specific to a customer or is based on highly sensitive information -- those models we will never use for transfer learning," Gollamudi said.

Be mindful of AI's limitations

Not everyone is on board with applying AI to data collection and privacy efforts, however.

"AI is not yet to the point of hunting down personal data and neatly assigning it to an individual with a high enough degree of certainty to make it feasible," said Kon Leong, president and CEO of ZL Technologies, a data governance, privacy and compliance vendor in Milpitas, Calif. "Because even if it's right 90% of the time, the other 10% of the time it could come back to bite you."

Even when AI works properly as part of the data collection process, companies should be careful about how they use it.

"AI works very well within its limitations, but it is by no means the magic answer to all compliance problems," said Rob Perry, vice president of product marketing at ASG Technologies, an enterprise information management software vendor based in Naples, Fla.

In particular, organizations will need to keep a close eye on AI systems to catch any unintended bias, Perry said. "The human factor is critical to avoiding ethical and regulatory missteps."

Dig Deeper on Data science and analytics

Data Management
Content Management