Manage

Data: Its value and its consequences

Red Hat

Data is in a funny place these days.

On the one hand, it’s at the core of many of the hottest trends.

IoT? That’s often about connecting information technology to physical operational technology through data flows.

Artificial intelligence? As practiced today, it’s mostly focused on machine learning, which depends on huge training sets to work its (apparent) magic.

On the other hand, the storing of personal data is under increasing scrutiny. Current attention is mostly focused on the European Union’s GDPR regulation. But it’s reasonable to assume the retention and use of personally identifiable information will become subject to more and more rules over time.

This presents something of a conundrum.

At the MIT Sloan CIO Symposium in May, Elisabeth Reynolds, the executive director of the Work of the Future Task Force, observed that “the regulatory framework is often opposed to sharing data for the greater good.”

The anonymization challenge

For example, data about vehicle traffic, electricity usage or personal health metrics could potentially be aggregated and studied to reduce congestion, optimize power generation or better understand the effectiveness of medical treatments.

However, the more fine-grained and comprehensive the data set, the harder it is to truly anonymize. The difficulty of effective anonymization is well-known. Either your search history or your car’s GPS tracks would probably leave little doubt about who you were or where you live.

The problem is only compounded as you start mixing together data from different sources. Gaining insights from multiple data streams, including public data sets, is one of the promises of both IoT and AI. Yet, the same unexpected discoveries made possible by swirling enough data together also make it hard to truly mask identities.

Is the data useful?

Conversely, swirling together lots and lots of data doesn’t always lead to insights that let you improve some outcome popping out the other end.

This also isn’t a novel result.

The last time the IT industry went through an infatuation with data was the 1990s, when data warehousing was the fad du jour. One common problem was that even reams of data often didn’t lead to novel or otherwise non-obvious observations. We sell more snow shovels during snowstorms? You don’t say.

Furthermore, even unexpected correlations often don’t lead to useful actions. A popular fable of the data warehousing era involved a drug store chain which discovered that men swinging by to pick up diapers on the way home on Friday would often pick up a six pack of beer at the same time. The story apparently has a basis in reality; it stemmed from a study NCR did for Osco. But it never led to shelves being rearranged to further encourage the observed behavior.

Reynolds wondered if we’re seeing a similar pattern of inaction with smart cities, the concept that cities which are instrumented in various ways can be optimized based on that information. “Smart city 1.0 was ‘we have all this data and it’s great.’ But to what end?” Reynolds asked. She raised the specific example of Toronto, where Google has wired up a couple of blocks with sensors that can tell us what’s happening in the area. “But does the city have resources to do good with that?”

Rhyming with the past

There’s a certain familiarity to both the opportunities and the challenges associated with using data today. The details are different than in the past. And the scale of what’s possible arguably acts as a magnifier for both the good and the bad.

But, at some level, we’re still wrestling with many of the same basic issues. There’s a tradeoff between the utility and the anonymity of data at scale. And knowing things isn’t the same as doing something about them.

All IoT Agenda network contributors are responsible for the content and accuracy of their posts. Opinions are of the writers and do not necessarily convey the thoughts of IoT Agenda.