Could AI be the killer app for cold data?

Enterprises are beginning to use cold data they still need to store as a way to train AI models and gain more value from data they thought had served its purpose.

The rapid evolution of AI is opening up a new world of opportunities around data, which itself is the lifeblood of AI success. Though the initial focus is understandably on active data, an opportunity is emerging to tap into the vast quantities of cold data many organizations retain to train models and extract value in new ways. It's early days, but recent innovations should embolden organizations to think expansively and creatively about how they can use their existing data in new ways, both for quick AI wins and longer-term strategic success.

The issue of how to treat cold data that has served its initial purpose but is still retained by the organization often presents a thorny challenge to IT departments, not to mention compliance and security teams. Such functions are often conflicted about whether how to retain it. Most often this data is retained out of necessity -- to meet a compliance or regulatory requirement, for example. Many organizations routinely delete older data to keep costs down, stay in compliance and to avoid a smoking gun problem.

AI begins to drive storage uses

A key challenge in the past was that organizations often lacked a compelling business reason to retain old data. With the advent of AI and other innovations, however, this is beginning to change. Some organizations are beginning to heat up their cold data as it potentially becomes a differentiator.

This high level of interest in using cold data for AI is supported by recent research from TechTarget's Enterprise Strategy Group, "Navigating the Cloud and AI Revolution: The State of Enterprise Storage and HCI." AI came out as the top initiative driving new storage and hyperconverged infrastructure projects in 2024.

A number of recent developments caught my eye. The first was the release of AiR, a new service from cloud object storage specialist Wasabi. Based on the company's recent acquisition of Curio, AiR is billed as an intelligent media storage service that uses AI to automatically add metadata tags to rich media content. This makes it searchable for things such as people, brands, logos and other keywords, and allows users to directly and quickly access relevant content.

An opportunity is emerging to tap into the vast quantities of cold data many organizations retain to train models and extract value in new ways.

Observing that object storage without metadata is like the internet without search, AiR is aimed squarely at media organizations, and Wasabi hopes it will help breathe new life into cold media content. The applications appear legion. For example, a marketer working for a sponsor that wants to find an image of the star football player with their sponsor logo in the background, or a media company looking to find and replace what might be objectionable content for certain regions. These capabilities might not be new per se, but as well as adding cutting edge technology, Wasabi is also addressing cost concerns. Subscribers to AiR will be charged only for storage, regardless of how many times they access, query or move their data. It's a fascinating approach to encourage media users to engage with their archive content without being nickeled-and-dimed in the process.

Using backup data to train AI models

Another example is in the realm of data protection and concerns an emerging capability from SaaS-backup specialist Own Company, formerly OwnBackup. Backup is the classic 'insurance policy' use case, and although backup companies have long talked about the potential to use this long tail of data for purposes other than restoring from a failure, or more recently, ransomware attacks, progress has been slow. But AI might be a game changer.

Own is entering the fray with a plan to do just that. It is positioning a new capability -- Own Discover -- to help customers activate their backup data by using it to train AI models. This might not be an obvious use case for backups, but Own notes several attributes that make it well-suited for such a task -- it is in a time-series format, is organized and fully assembled, is up to date and protected. What's more, Own Discover notes that a wealth of value exists within historical SaaS data that might drive business potential, such as sales and revenue forecasting, and predicting customer churn; this is especially interesting given Own's strong ties to Salesforce. Own Discover is in limited availability release, and it will be interesting to see how customers respond to such capabilities as they continue to look for quick and easy AI wins.

Demand for long-term storage

Finally, consideration should also be given to the storage archive medium itself. Organizations have never had as much choice here in terms of large-scale, low-cost object storage -- both on premises and in the cloud. But with the major hyperscale cloud providers dropping egress fees for customers opting to leave these platforms, and with many organizations expressing a preference to retain their most valuable data on-premises, the AI opportunity might prove a catalyst for alternatives.

One option increasingly in play here is tape. Although it's easy to regard tape as a throwback technology, it continues to play a critical role in many data centers, including at some of the world's largest computing environments. If we assume that the AI opportunity around cool or cold data is going to encourage more organizations to retain more of their data longer, then given mounting storage costs, power constraints and environmental considerations, it follows that the demand for cost-effective, long-term storage at significant scale, will also increase.

Accordingly, tape providers are responding here as well, with increasingly integrated systems that combine tape-based archiving fronted with disk-based object or file storage systems. Offerings such as Spectra Logic's Glacier product combines the tape specialist's BlackPearl file and object storage system with Amazon S3 API compatibility with its range of tape libraries, for example. Spectra recently overhauled its library management software and added a new Spectra Cube library. Its aim is to cater more directly to customers that might not have existing tape library skill sets and who need support for modern, cloud-based and AI applications.

These are just a few recent examples of what we've encountered that are indicative of the huge potential that exists in the long tail of cold data. No doubt hurdles will crop up along the way -- not least regulatory -- but the opportunity to unlock new value in retained data seems substantial.

Simon Robinson is a principal analyst at TechTarget's Enterprise Strategy Group who focuses on existing and emerging storage and hyperconverged infrastructure technologies, and on related data- and storage-management products and services used by enterprises and service providers.

Enterprise Strategy Group is a division of TechTarget. Its analysts have business relationships with technology vendors.

Next Steps

How AI and cloud storage can work together

How AI in SSDs could help enterprises

Dig Deeper on Storage management and analytics

Disaster Recovery
Data Backup
Data Center
and ESG