Anyone with even a remote connection to IT knows that the amount of data we create and store each year is massive. But in enterprises, a lot of that data is copies of other data made at a point in time for a specific reason. It’s often only slightly different than data that has already been copied, resulting in multiple copies of identical data being stored.
Those copies are made for different reasons. For example, a snapshot might be taken during the software development lifecycle, where many virtual machines that work together are developed in a rapid cycle. It doesn't take long for several virtual machines that are only incrementally different to be stored.
Multiple snapshots of data can also be stored as part of a company’s backup and recovery processes. By taking multiple snapshots throughout the day, enterprises attempt to significantly reduce the amount of data that could be potentially lost in the event of a system incident. So, to reduce their recovery time objective, they end up storing large volumes of data.
In fact, a recent IDC report found 45% to 60% of total storage capacity consists of what is considered “copy data,” with 82% of survey respondents saying they have at least 10 copies of each database.
The challenges around copy data management aren’t just about volume. Time is also a significant issue. Even if you could buy and deploy unlimited storage, how do you move that data in and out of a backup system in a reasonable time? And faced with such a massive amount of data, how do you find what’s important when you need it?
With virtualisation engines and other applications improving their ability to quickly take snapshots of data, many businesses have been taken by surprise by the rapid growth in their data storage and retrieval needs. This isn’t just happening in operations, but end users, empowered by the availability of cloud-based services, are making more copies of data as well.
While the reasons for making so many separate copies of data may make sense in isolation, the uncontrolled and unmanaged creation of all those copies has created many challenges. The management and infrastructure costs are significant. Storage appliances and other data management systems are relatively unsophisticated in how they manage copy data, particularly when it comes to data inside applications, and can’t differentiate between information that is valuable to the business and other data.
Resolving the challenges around copy data management requires a multi-pronged solution. To start with, you need an understanding of the data you are going to manage within the copies. Content classification is critical so the right data is deduplicated, stored and managed. Getting that right can reduce the amount of data stored by as much as 97%.
That management needs to be platform independent and work whether the data is stored in a local SAN or with a cloud service provider.
Rather than taking a blanket approach to copying data, a more targeted method, where only important data is copied, is critical. Again, proper data classification will facilitate this and cover data that sits either inside an application or on a file system.
Copy data management is a challenge that has snuck up on many enterprises. It is now very easy to take snapshots or make copies of data, which has led organisations to retain burgeoning volumes of data that are often not needed in the long term.
Understanding what data you have through a disciplined approach to data classification, followed up with appropriate controls and data management systems, will reduce the number of copies of data you retain. As well as the obvious cost savings achieved by reducing the amount of storage you require, there are operational benefits, such as improving backup management and making it easy for the business to find the right data quickly.