Sergey Nivens - Fotolia
Readers of this column know I'm preoccupied with the idea of automated data management. Data management is where the proverbial rubber meets the road when it comes to the future of IT.
You can software-define to your heart's content, hyper-converge until it hurts, but the simple truth is the coming data deluge, combined with eagerness to divine nonintuitive truth from all that data and the government's desire to ensure the privacy of those who produce the data, spells trouble for the not-too-distant future. There will be too much data, too much storage infrastructure and too many storage services for human administrators to manage. Some sort of automated technology will be required. Time is running out, however: The zettabyte (ZB) apocalypse is on the horizon.
The coming zettabyte apocalypse
IDC projected more than 160 ZB -- that's one followed by 17 zeros -- of new data by 2024. Microsoft Azure analysts reported a year or so back the entire annualized manufacturing capacity of disk and non-volatile RAM storage vendors totaled close to 1.5 ZB. Simple math says we cannot hope to host the volume of data that's amassing with the equipment we have available.
I'm delighted to hear from industry friends that the deficits will be made up for by new data reduction and data compression technologies. However, I can't help remembering no one ever got the 70-1 reduction ratios deduplication vendors promised back in the first decade of the millennium. So I'm not as enthusiastic as others about the deduplication dike blunting the data tsunami.
Improvements in capacity utilization efficiency from innovations such as fractal storage, using fractal rather than binary algorithms to store data, are more promising. This could get us close to storing massively more content in the same footprint we have now. Even simpler would be using optimized storage mechanisms that eliminate all data copies and enable the use of all types of storage -- file, block and object -- concurrently. (StorOne is important here, given its core patents in this area.)
Ultimately, however, storage will go hyperscale, and there will simply be too much data to manage. Unsurprisingly, backup vendors have been among the first to blow wise to this reality.
A possible starting point
I recently chatted with Dave Russell, formerly a Gartner analyst, now Veeam Software's vice president of enterprise strategy. Russell showed me a diagram of what he called the journey to intelligent or automated data management. Coming from Veeam, it wasn't surprising the originating point of the diagram was backup.
Backup is a core data management function. The only way to protect data is to make a copy -- a backup -- that can be stored safely out of harm's way and restored if the original data is damaged or deleted. That's Data Protection 101.
To Russell, backup is one place to begin the journey to intelligent data management. I agree, though possibly for different reasons. To effectively perform a backup, you need to define, based on an evaluation of business processes the data serves, what data to back up and, based on the access and update frequency of the data, how often.
This kind of data classification exercise produces a data protection policy for certain data that admins can ultimately expand to become a policy for data lifecycle management. Russell says the initial focus on backing up data associated with a certain server-storage kit will expand over time to a more aggregated approach covering many servers and storage units or even multiple clouds. This evolution will require a better tool set for managing lots of data protection processes across diverse infrastructure in a way that affords greater administrative efficacy and visibility. At that point, automation can start to become more proactive by allowing the automated data management system to apply policies to data and perform necessary protection tasks.
On Russell's diagram, an interesting transition occurs. This visibility stage marks a change from policy-based data management to behavior-based data management. The behavior of the data provides the basis for classifying it and delivering necessary storage resources for its storage and appropriate services for its protection, preservation (archive) and privacy (security) until all the handling of data is completely automated -- the final stage in Russell's diagram.
Lost in translation?
I like the idea of changing from policy-based to behavior-based management, but it may be difficult to communicate to customers. One vendor that went to market with a cognitive data management conceptualization of automated data management at the beginning of this year found that the "cognitive" part impeded adoption. Folks aren't quite sure what to make of computers automatically evaluating data behavior and taking actions.
A friend at the cognitive data management company told me he tried to use the metaphor of a driverless car to get the concept across. That didn't work out well when tests of driverless vehicles produced multiple wrecks and injuries because of flawed programming and slow network updates.
He's since retooled his marketing message to suggest his product automates data migration and is gaining quite a bit of traction as a result, because data migration is the bane of most storage admins existence, a thankless task that consumes the bulk of their workday. Like Veeam, which is trying to get to data management nirvana from the starting point of backup, my friend's company is trying to get there from the starting point of improving the efficiency of data migration.
Both approaches to building real automated data management could get your data where it needs to go safely and securely. The question is whether the industry can get to automation in time to avoid catastrophic losses of data.