Getty Images


What to know about dark data management

With machine learning, dark data has many use cases. But before organizations can even think to use it, they must tackle several management tasks.

We all fill out online forms for something, dialog box after dialog box on endless forms. We are so used to it, we don't notice anymore or pay attention to what happens to that data, assuming marketing will use it to target more ads in support of their products. For the most part that is true, but it's not the entire story.

Have you ever noticed that those forms often collect unrelated data, or seem to go on forever? Companies only get one chance to get input from a customer, so they make that opportunity count. They ask more questions than they might need, all in hopes of using that data in the future. Dark data exists everywhere -- companies are flooded with it, but are unsure what to do with it, or its benefits and risks.

Benefits of dark data

The initial benefit of dark data is it can be used in the future. A few years ago, that idea was exactly that -- more of an idea than a regular occurrence. But as machine learning has grown in both scale and scope, it has become easier to take unstructured data and turn it into something usable.

This isn't just because of compute power, although that does help. Machine learning filters and pulls in unstructured data, works with it and turns it into a structured format with value to the IT organization. This can be done by scraping existing data values for key phrases and data fit into structured subsets on which machine learning reports.

The key is no company has all the answers when it comes to operations, sales and marketing. New questions and trends occur, and now companies can go into existing data sources to answer key questions to gain advantages.

While the machine learning aspect does cost money and effort, the data already exists. Ensure you're asking the right questions, because timing is important -- companies don't have the lead time to re-poll data sources to answer every question.


Data collected with an initial purpose isn't dark data. A lot of dark data is captured in some other process and then set aside. How long it has been to the side is critical.

Year-old data on the housing market or health care trends might still be valid data. But if you put the same time frame on holiday sales, the data will have little to no value, because the market or needs have changed too much. While the age of data has always been a struggle, it's more critical with dark data because it was not the original focus. This doesn't mean it has no value, but that it must be put into context to avoid making mistakes with it.

Dark data can reveal a lot about a customer base, depending on how you look at it and what key information is pulled. Real estate companies might see little value in knowing their customers' education levels, but this information reflects job positions and income and is something certain companies might want to know.

A different view on that data could be which people might default on loans because of student debt. You can view somewhat unrelated data multiple ways depending on how and what you're looking for. Context is critical, because the data set was not specific in nature when it was gathered. Exercise great caution here.

Storage and maintenance

A secondary aspect with dark data is its general storage and maintenance. Storage is more cost-effective than ever, but someone still has to pay the bill for data that might be used.

Up to 90% of a company's data is considered stale or dark unstructured data. That can mean the storage budget is 90% for something you might use -- but might not.

Another part of dark data management is data security and backups. These services and efforts cost money in terms of monthly fees, licensing or personnel. The price tag can add up beyond the machine learning costs of the data, so ensure there is justification for these costs.

It might take one or two key use cases to show that your organization's dark data has real value, but you have an ace up your sleeve: No one wants to throw away data, as we always think there is a use right around the corner. And sometimes there might be -- if you're willing to pay the price.

Next Steps

Data management trends in 2021: More money, more cloud

Dig Deeper on Systems automation and orchestration

Software Quality
App Architecture
Cloud Computing
Data Center