Rawpixel - Fotolia
Cloud data management is the ability to migrate data to the cloud, move data between storage tiers within the cloud, move data between cloud providers and to present that data to other applications or workloads. Most data protection vendors claim to provide some form of cloud-based data management.
The capabilities within each product vary, so it is essential to understand what aspects of cloud data management an organization requires. It's also important to understand what challenges the cloud providers themselves impose when an organization attempts to manage its cloud-based data.
Why manage cloud data?
Cloud providers, even more so than traditional data centers, have multiple tiers or classes of storage. Customers are charged for the capacity they consume and the number of I/O transactions they execute. Each tier within the cloud offers different performance profiles, and the cost for using that storage directly relates to the level of performance the customer needs.
These tiers are what spawned data management in the first place. Unlike on-premises data management, there are real and immediate cost savings if an organization can identify and move data to a lower class of storage.
Challenge 1. Getting data to the cloud
If an organization wants its data protection vendor to provide cloud-based data management, IT professionals will often be forced to choose from two methods of moving data to the cloud: treating cloud storage as the primary storage area for protected copies of data or as an archive of on-premises data.
In the first method, data protection vendors may or may not keep a copy on premises, but in either case, 100% of the data ends up on cloud storage. In the archive method, older backup copies are typically moved to the cloud and the vendor only keeps the data most likely needed for recovery on premises. The advantage of the archive method is that it reduces the on-premises investment in cost and floor space for physical data protection storage.
To fully exploit the cloud, both in terms of storage and computing resources, an organization likely wants a full copy of data to reside on cloud storage, and it certainly wants the most recent changes to data reflected in the set that lives in the cloud. Most data protection products can rapidly update a full baseline copy of data, even if the software locates that copy in the cloud. As a result, any product that has a rapid update capability is suitable to overcome the challenge of getting data to the cloud.
One potential concern is the frequency of the protection event. Most data protection applications only run several times per day, so the copy that cloud-based applications may be working on can be hours behind the production copy. In situations like testing, development, reporting and analytics, this lag time is often acceptable. In cases like cloud bursting or cloud failover, it may not be.
Challenge 2. Moving data within the cloud
There is a significant cost difference among the various cloud tiers, so understanding where first to place data and when to move it to another tier is a crucial task. The temptation may be to initially place data on the least-expensive tier of storage possible. The problem is that not all applications can interface with every tier, and not every tier can deliver the performance the application requires to process the data.
For example, if an organization uses a copy of cloud data to test the next release of its application, the application may only run on the fastest, and most expensive, block tier of cloud storage and not the slower but more affordable cloud object tier. There is a cost associated with moving that data from an inexpensive tier, and it takes time for the copy to complete.
IT professionals must decide, for each use case, where data will be stored initially. They also need to determine how to move data to another tier when required. Not all cloud-based data management offerings can transfer data between tiers within the cloud, and even fewer can move data in an automated fashion. Moving data between tiers may also require a cloud data management product to "transform" the data to a format that is usable by cloud-based applications or services.
The decision point on where to store data comes down to how often the cloud application needs the most recent copy of data and the size of the refreshed data set. The cloud data management processes may best serve the application by placing the data directly on that application's storage, even if the storage is more expensive from a cost-per-gigabyte perspective. The cost to move the data and the time the move required may offset any savings when compared to directly storing the data where it is needed.
Challenge 3. Moving data between clouds
Concerns with this approach are similar to those when moving data within one cloud, with the added worry of converting between providers. The I/O charges to move data out of the cloud are significantly higher when moving data between clouds. The time required to make a move is likely considerably longer.
No matter how often customers move or copy data between cloud providers, they need to understand the associated costs. If the move is a one-time operation, an organization may find it is cheaper to make a new copy in the new cloud from the on-site data set and then delete the copy of data on the original provider's cloud. If the cloud-based application always requires the latest copy of data, an organization may find it less expensive to mirror the data, sending it to both clouds at the start. If the movement between clouds is occasional or unpredictable, the ability to perform a cloud-to-cloud migration is a requirement and is a key feature to consider.
Organizations need their cloud-based data management product to send data to multiple cloud providers. While seemingly an obvious requirement, many offerings only support one cloud provider. A second requirement is to simultaneously send data -- preferably subsections of the data -- to two different clouds at the same time. Finally, the product should support the ability to move data directly between clouds. Again, while seemingly obvious, many data management products can't perform a direct copy between cloud providers.
There is always value in improving an organization's ability to manage data. In the cloud, the rewards are instant. Cloud-based data management presents a unique set of challenges that IT needs to address. The key is understanding how to get data to the cloud, how the organization will use it and how to move data within and across clouds.