rolffimages - Fotolia
- George Crump, Storage Switzerland
It may seem odd that the European Union's GDPR regulation and ransomware threats both require backup software to change the way data is recovered. Each requires backup applications to scan or analyze data as it is being recovered. This is particularly problematic for vendors that mostly back up data as images -- meaning their data protection systems don't have a granular understanding of the data they are protecting.
GDPR and backup
Scanning or analyzing data during recovery is critical for companies affected by GDPR and other data privacy regulations, including California's Consumer Privacy Act. Most of these privacy regulations -- and more are on the way -- grant users of a service the ability to have all of their data removed from an organization's storage systems. Furthermore, the organization has to prove the request was fulfilled. While it is relatively easy to remove data from production storage, it is difficult to remove data from archive storage and almost impossible to remove it from backup storage.
While we have no case law to verify this yet, it is reasonable to assume data stored in backups does not need to be part of a removal request because that data is not readily or immediately accessible. Data in backup storage, however, can't accidentally make its way back onto production storage. Assuming it is acceptable to keep a list of individuals requesting removal from an organization's servers, then data protection systems need to compare what's been backed up with the list of forgotten users and not restore their data during a restore operation. Essentially, the software should restore their data to a null device for disposal.
Ransomware and backup
Ransomware is constantly evolving, so tools for recovering from ransomware must evolve as well. Increasingly, ransomware developers are allowing their software to lay dormant on production storage for a month or two prior to activating. This dormancy means backup software repeatedly backs up the trigger file and the malware makes its way to multiple backup storage targets. When the trigger file detonates and IT uses data protection systems to salvage data, they also restore the backup trigger file, which, once restored, starts re-encrypting files again.
Backup software needs to access a threat list of known malware files during a restore and compare data being restored with the known threat list to eliminate recovering known ransomware files. The problem with all of this scanning during recovery is its impact on recovery time. The backup software now needs to check every file it recovers against two different lists to confirm it is acceptable to recover the data.
To do this effectively, data protection systems will need to scale up their ability to take advantage of processors. The process also requires much stronger metadata management. Interestingly enough, if the backup application uses scale-out storage, it has access to plenty of compute power. If it can tap into those resources, then the backup software may be able to perform these scans with minimal performance impact.
Alternatives to scanning
There are alternatives to scanning every restore, however. The first is to limit backup retention time to just a few days and archive all data. The archive would be used for long-term restores and the backup for rapid restores of recently backed up systems. Because the archive software natively copies data file by file, it can scan the data for ransomware trigger files and immediately eliminate any right-to-be-forgotten data as well.
The challenge with this approach is that archive software isn't designed to continuously scan systems for data to copy. Archiving is more of a scheduled, once-per-night approach. Archive software vendors will need to optimize their products for more efficient scans.
Another alternative to comprehensive scanning is to integrate backup and archive. Because the primary target of ransomware and right-to-be-forgotten requests deal with unstructured (file) data, the data protection systems should move away from image-based backups and back to file-by-file backups. Because the quantity of files continues to increase, the software has to be intelligent in the way it performs file-by-file backups so backup windows don't increase. With a granular understanding of files being protected, the integrated backup and archive product can now easily remove specific files from the protected set.
Ransomware and data privacy are fundamentally changing the way backup and recovery occur. Scanning and verifying data during recovery is critical, but data protection systems need to make sure that they don't increase recovery windows as a result. Alternatives that move to an archiving strategy over backup show promise, but archive products need to make sure they optimize their own processes to truly succeed.