This content is part of the Essential Guide: GDPR, AI intensify privacy and data protection compliance demands

Data-rich organizations turn focus to ethical data mining

As data analytics has increasingly become a core component of organizations' strategies, concerns have arisen around how data is mined. Experts offer tips.

In light of the data breach scandals that have engulfed Facebook, Equifax and others, more companies are starting to have conversations about ethical data mining. While much of that talk remains at a high level, organizations need to get aggressive about training users and formulating best practices in order to stay on the right side of the line on ethical data-driven business operations.

As the number of cyber breaches has escalated in recent years, companies have significantly stepped up data security efforts in order to mitigate the financial and reputational risks that accompany such high-profile intrusions. At the same time, however, they've paid far less attention to the ethical issues surrounding those incidents, as well as the ongoing usage of big data, including unauthorized data sharing, reselling customer information and creating algorithms that exhibit some sort of bias.

"If you don't spend the time and money that might be necessary to get the ethics right, you could have a horrible public relations nightmare," said Lucy C. Erickson, an American Association for the Advancement of Science Science & Technology Policy Fellow hosted by the National Science Foundation.

The scrutiny around ethical data mining best practices has increased, given how central data and data analytics have become to a company's core business charter. According to a recent TDWI report, 82% of enterprises are prioritizing analytics and BI as part of their technology budgets. Research from NewVantage Partners finds that 91.6% of Fortune 1000 companies are boosting their investments in big data and AI in order to stay agile and competitive.

The ability to store nearly limitless amounts of data, coupled with the enterprise shift to treat data as an asset and practice algorithmic decision making, is driving the current concerns about ethical data mining. "Professions that have traditionally worried about ethics are medicine and law, but as computer science increasingly impacts people's lives in similar ways, we are seeing greater emphasis on ethics in this domain as well," Erickson said. "There is an increasing appreciation for the need for computer programmers to grapple with how they are intersecting with the lives of humans in new ways, specifically in areas like data privacy and algorithmic decision making."

Spirit of the law

Companies' duty of care over their data is much broader than the idea of following every law to a perfect T.
David ThomasCEO, Evident ID

Companies can follow the letter of the law and still cross an ethical line with their data mining efforts, cautioned David Thomas, CEO of Evident ID, which provides online identity verification services. There is no consistent legal definition of personal data in the United States, which means there's a great deal of latitude with how to handle data; in comparison the European Union has closed gaps with the General Data Protection Regulation, Thomas said. Nevertheless, Thomas contends there is only so much clarity regulation can provide, which in turn shifts the burden to the enterprise.

"Companies' duty of care over their data is much broader than the idea of following every law to a perfect T," Thomas said. "There's a difference between following the spirit of the laws and regulations and following the letter of the law. There are so many ways data can be misused, companies have to start thinking about their duty to care about this asset they're entrusted with."

To start with, Thomas said companies should not just focus on collecting as much personal information as possible, but rather on developing a roadmap for why they're collecting data and how it aligns with their specific business agenda. In addition, attention to governance practices -- such as who enforces data privacy policies and makes decisions about how to leverage data -- should also be codified and transparent to the entire organization. The need for transparency also filters down into giving customers clarity in exactly how their data will be used.

"Everyone is trying to offer a frictionless user experience; the last thing they want to do is present users with an incredibly detailed audit log of how their data is being used," he explains. "Companies are summarizing their usage policies at a high level because it's off putting to users, but then users have no clue what they are signing up for."

GDPR definition of personal data
While the United States currently has no consistent legal definition of personal data, the European Union has set parameters with the General Data Protection Regulation.

One way to turbocharge the ethical data mining debate is to start the conversation prior to professional life at the university level. Vandana Janeja, an associate professor in the Information Systems Department at the University of Maryland, Baltimore County, has started doing just that in her data science classes. Janeja integrates ethics-related instruction as students learn about data management practices tuned for each phase of the data lifecycle. "Any time students ask questions about data or how to manipulate data, we put an ethical lens on it and encourage ethical critical thinking," she said.

The frenetic pace of technology may end up being the greatest hurdle to sound ethical data mining practices. To do it right, companies need to consider data design and privacy from the beginning of their initiatives, including what subgroups should be considered, who might be affected by the data use and whether potential training data puts algorithms at risk of reflecting and amplifying bias.

"All these approaches are meant to speed things up and move forward at this rapid speed," Erickson said. "Yet, oftentimes ethical issues require a slowing down of the process."

Dig Deeper on Data science and analytics

Data Management
Content Management