Getty Images

Data hoarding and the role of storage

The promise of future insights drives companies to store data. But there are potential drawbacks when storing data and better management methods than keeping everything.

This year, data generation will exceed 100 zettabytes, and there's no end in sight. Enterprises are faced with balancing storage and management costs with the risk of potential data loss.

But companies may be tempted to keep more data than they need. Storage prices have fallen, and newer technologies like AI promise better and faster data insights.

Still, retaining every bit and byte can open up liabilities that outweigh potential value, according to Vincent Berk, chief revenue and strategy officer at cybersecurity firm Quantum Xchange and former computer science faculty member at Dartmouth College.

"There is an enormous amount of data being stored because of the unproven premise that one day value can be extracted from it," he said.

Berk warned that sweeping data retention policies can become costly. As companies generate and store more data, the risks associated with data loss, leaks and breaches go up. While determining what a company should keep and what should go involves departments outside of IT, storage admins should look beyond being inventory keepers to also being bookkeepers helping set standards.

Useful data and managing it

Companies shouldn't focus on storing too much or too little but storing what is valuable and what is useful, according to Marc Staimer, president of Dragon Slayer Consulting, an analyst firm in Beaverton, Ore. The problem for companies is determining the value of data -- especially its future value, he said.

"You don't know what might be valuable," Staimer said. "But if you have everything, you're covered."

Companies can take a few steps when determining vale, according to Christophe Bertrand, an analyst at TechTarget's Enterprise Strategy Group. First they need to consider the data and its use from a compliance and governance perspective. Then they should consider its value from a business perspective. After that, companies can weigh the cost of storing data and decide whether to keep it.

"It is not a matter of storing less or storing more but storing smart," Bertrand said.

Storage vendors have created techniques to do just that through data compression and deduplication --techniques that strive to use current storage hardware more efficiently. Data compression changes the structure of data to reduce the size of its storage footprint. Data deduplication removes redundant copies of data. These data reduction techniques have been around for a while but continue to see refinement, Bertrand said.

Vast Data, for example, added similarity-based data reduction to Vast CLI in recent years to reduce similar data blocks.

The larger question for companies is still determining exactly what they are storing and why.

"How do you manage something you don't understand or can't measure?" he asked.

As data continues to grow, there will be an increased focus on data storage management and classification, Bertrand said.

This is where companies like Hammerspace, which was founded in 2018, or a product like Spectra Logic's Spectra Vail, which launched in 2021, come into play. Both products strive to unify data stored in distributed environments.

AI, the cloud and other issues

Hammerspace and Spectra Logic provide similar functionality. Hammerspace offers an abstraction layer across storage vendor products through its Global Data Environment. Spectra Vail provides software to create a single global namespace across on-premises and cloud native storage offerings.

Data storage management vendors like Hammerspace and its competitor Komprise offer companies better visibility into what data they're storing. But they may not help companies make decisions about how to overcome data hoarding tendencies.

There is an enormous amount of data being stored because of the unproven premise that one day value can be extracted from it.
Vincent BerkChief strategy officer, Quantum Xchange

Given that storage prices have come down over the last decade, companies may want to hang onto as much data as possible for some perceived future worth, according to Jared Endicott, an analyst at Launch Consulting Group, an IT consulting firm headquartered in Bellevue, Wash.

"Roughly 80% of the data stored today is unstructured: text, emails, correspondences and so on," Endicott said. "That is the kind of data folks anticipate being useful for machine learning and AI purposes."

Even if that is the case, companies need to have a roadmap, be conscious of what is being stored, create policies to only save valuable or necessary data or data that fits the roadmap anticipations, he added.

David Feller, vice president of product management at Spectra Logic, said data hoarding is on the cusp of getting worse. By his accounting, companies are saving only about a fourth of the data they generate. But as AI advances and eventually eases data management and data governance problems, companies will keep more data than before.

"The value of AI is going to be in analyzing and putting metadata constructs on top of the data so that that it becomes valuable," he said. Companies will eventually benefit from data hoarding tendencies, he said, "because it's really hard to recreate it."

When do you delete data?

If companies are going to store more data, they will also need policies for managing and deleting it when it is no longer useful, Berk said.

Storage administrators should play a part in building out those policies. While they are primarily concerned with ensuring data is available to the business, storage admins interact with the data. That means they could bear some responsibility for a data leak. Berk advised that admins be aware of issues such as the minimum requirements for judging the liability of the data they handle and set security standards from there.

Staimer said establishing polices around what not to retain isn't easy; it always just depends. He reiterated that it comes down to the value and potential value of the data weighed against how much it costs to store the data and the liabilities associated with it.

"Data governance is, at the end of the day, what you need to have to determine [when to delete data]," he said.

Adam Armstrong is a TechTarget Editorial news writer covering file and block storage hardware and private clouds. He previously worked at StorageReview.com.

Dig Deeper on Storage management and analytics

Disaster Recovery
Data Backup
Data Center
Sustainability and ESG
Close