Andrea Danti - Fotolia
A data protection system plus management creates value, risk
As new data protection products enable more frequent backups, they are also veering into production elements. Be careful, though, with how you use these management capabilities.
Imagine that you have a way to access all of your production data without any performance impact on your production applications. You could use this data access for reporting and planning or test upgrades and changes.
Modern data protection products also have data management functions that allow businesses access to the backup of data that they store. Once a computer -- virtual machine or physical server -- is backed up, a copy of that backup can be made available to another computer for whatever purpose you need. This secondary access is independent of the production server, with no performance impact to the running application. The secondary copy can be used to generate reports or to test changes such as application or infrastructure upgrades.
Older products expect that the data will be restored to another location before the reporting and testing can occur. The more modern data protection system allows access directly from the backup without any time-consuming restore or additional expensive primary storage to hold the restored data.
What is different?
One of the significant changes in the data protection system is the integration of solid-state storage with hard disks for the backup store.
Traditionally, backup storage was built for cheap capacity, and the only performance that mattered was large block writes to make backups fast. On these disk-based systems, small-sized read/write performance is poor, including metadata operations such as managing backups and restores. Adding solid-state storage means that small I/Os can be extremely fast, and metadata can be fast without needing to be small enough to be in RAM.
Fast metadata allows quick access to deduplicated data and allows many restore points to be accessed, possibly concurrently. With fast, small I/Os, applications can be run directly from the backup store, with no need to restore to primary storage before using the data. High-performance backup storage allows virtual restores, where the data is not copied at restore; it is presented from the backup storage. Adding solid-state storage to a data protection system allows for greater transactional performance.
At the same time, we have seen the rise of DevOps methodologies where infrastructure is controlled by automation, rather than people clicking in GUIs. Modern data protection platforms offer APIs to control every aspect of their operation.
Presenting a copy of production data to a test system is a few lines of code and can be integrated into DevOps workflows for automated software testing. Using your data protection system for automated, DevOps workflows requires high transactional performance for a large number of restores -- without data being copied again -- and to deliver performance for the restored application.
The final difference is that some of the newer products have a scale-out architecture, where capacity and performance can be increased by adding hardware appliances in the same way as a hyper-converged infrastructure platform. The data protection and management platform can start small and grow over time as use increases or data retention requires more capacity.
A scale-out architecture also enables performance scaling as more performance comes along with more capacity. Scaling out reduces the chance of a performance bottleneck as the capacity of the platform increases. Scaling out can also defer expense by allowing you to buy just enough capacity for now and buy more capacity over time as the need arises.
What do you get?
Your data is usually ingested (backed up) into the data protection system periodically rather than in real time. As a result, the data in the platform is always a little out of date. Newer data protection products tend to allow more frequent backups than older systems. The data in the platform may only be minutes or hours old, rather than days or weeks old as we saw in legacy backup architectures. Once data is in the protection platform, a virtual copy can be made available for non-production purposes at a production-like performance.
With SSDs, data can be available in seconds, particularly when used in automated workflows like DevOps. A single backup operation can provide the basis for multiple copies of production; there is no need to copy from production for each reuse. It can be as simple as populating a database server for reporting purposes. Reports can run against yesterday's data without any impact on the production system. It might be more frequent, integrating live data into a continuous integration/continuous deployment pipeline for high-fidelity software testing. Some of these data platforms include data scrubbing features that prevent private information from being readable in the software test systems.
Is backup now production?
There is a significant risk in integrating your data protection system with any business function, such as DevOps-based software development or critical business reporting. The aim of having a separate backup platform is to isolate the failure domains so that a failure in the backup system doesn't affect production and a breakdown of the production system can be recovered using the backup system.
If we start using the backup system as a core part of our business, then it is effectively now a production system. How do we recover that software development capability if the backup platform fails? If the backup platform provides critical business reporting, when is the downtime window for maintenance? This is not an insurmountable problem; all these modern platforms offer replication to another location or cloud platform. It is an important consideration when you use your data protection platform for more than just backups.
Converging backups and data protection has been possible for some years. The advent of cost-effective solid-state storage and scale-out hardware architectures enabled more useful products. Combining data protection and data management will be most valuable when it is driven by automation through APIs to allow uses like reporting and DevOps. Make sure that you consider the impact of using your data protection system for production activities. Plan for business continuity.