Arjuna Kodisinghe - stock.adobe.
StorageCraft DRaaS outage highlights layered protection need
As a result of the StorageCraft outage, some of the vendor's DRaaS users lost the ability to fail over. Having a local backup is one method of protection against cloud outages.
Updated April 14, 2022
A backup and disaster recovery vendor's cloud outage caused by human error has analysts stressing the necessity for multiple points of data protection.
StorageCraft, an Arcserve company, reported an issue with its disaster recovery as a service (DRaaS) on March 9, according to Arcserve's status page. The StorageCraft DRaaS degradation has affected regions in the U.S., Ireland and Australia. As of April 6, StorageCraft Cloud Services in Ireland and Australia still reported a partial outage. On April 8, StorageCraft Cloud Services reported an end to that outage.
"We are acutely aware of the importance of following protocol in the management of data to eliminate any type of error, including human, and are undertaking a complete internal review and audit," Arcserve CEO Brannon Lacey said in an email to SearchDisasterRecovery.
External factors such as cyber attacks can cause outages, so it's incumbent on vendors to minimize internal errors, said Christophe Bertrand, practice director at Enterprise Strategy Group (ESG), a division of TechTarget.
"This shows not just for Arcserve or StorageCraft, but for the industry in general, if you're delivering a service, whether it's an application, a workflow or a backup -- especially backup and disaster recovery -- the standards are really high to ensure that the service doesn't get interrupted," Bertrand said.
As a general rule, to proactively prevent against outages, data protection customers shouldn't be putting all their eggs in one basket, said Johnny Yu, research manager at IDC.
"This doesn't necessarily mean you need multiple backup vendors or you need to be using both a DRaaS and some separate backup system on top of it," Yu said. "But you should have contingencies."
StorageCraft DRaaS outage includes metadata issue
Engineers identified the cause of the issue on March 12, according to the incident report on Arcserve's status page. An update on April 6 said the vendor was "continuing to work on a fix for this issue."
During planned maintenance, an array of servers containing critical metadata was decommissioned prematurely, according to Lacey.
"As a result, some metadata was compromised, and necessary links between the storage environment and StorageCraft DRaaS cloud were permanently lost," Lacey said. "Impacted partners cannot replicate to or fail over machines in some of the StorageCraft Cloud Services data centers."
The vendor did not say how many customers were affected by the StorageCraft outage nor how much data was lost. It has notified the affected partners.
"We have identified several recovery scenarios dependent on the type of machine that partners are running. These scenarios have been communicated to partners," Lacey said. "We are in the process of re-seeding machines and are continuously communicating progress to all partners."
The vendor did not provide a specific timetable for full recovery.
"We have gathered all necessary data of impacted machines to address the situation fully in an acceptable timeframe," Lacey said.
StorageCraft and Arcserve merged in 2021, bringing together two data protection product portfolios, including Arcserve's Unified Data Protection backup and recovery software, and StorageCraft's ShadowProtect data protection software and OneXafe scale-out storage and backup appliances.
Arcserve traditionally sold its products through value-added resellers, while the StorageCraft merger provided a managed service provider (MSP) ecosystem and SMB customers.
Christophe BertrandPractice director, Enterprise Strategy Group
Smaller organizations without big IT departments rely on MSPs and backup as a service (BaaS) and DRaaS vendors, according to Bertrand.
"So you would expect a very high standard of service and management," said Bertrand, who previously worked in product marketing at Arcserve. "Human errors happen, clearly. But I feel that this is probably going to be something that puts a dent in their reputation.
"When the service becomes unavailable because of something that could have been prevented, to me that's a negative for the provider, there's no way around it."
On the positive side, Bertrand said the communication about the StorageCraft DRaaS outage has been clear and consistent, which is what customers should expect from a major service provider.
"I do hope it's an opportunity for them to shine in their recovery and be better in the future," Bertrand said.
3-2-1 backup, recovery testing among recommendations
BaaS and DRaaS use is on the rise, according to ESG's 2021 "Evolution of Data Protection Cloud Strategies" report.
In 2021, 69% of 381 organizations were using BaaS, compared with 59% in 2019 and 39% in 2016, the report stated. Meanwhile, 60% were using DRaaS, compared with 53% in 2019 and 39% in 2016.
Still, cloud service outages happen, said Krista Macomber, senior analyst at Evaluator Group.
"Just look at the large outage that AWS had late last year," Macomber said.
Yu said customers should follow the 3-2-1 backup strategy, which calls for three copies of data, stored on two different types of media, with one copy sent off site. In this age of ransomware, however, offline backups are another level of protection.
"Ransomware can happen at any time," Yu said. "You really need that offline copy as something to restore from."
Both Yu and Bertrand said it's important to have a local backup in addition to a cloud-based one.
"Should the [cloud] backup source become unavailable, at least you have a local copy for operational recovery," Bertrand said. "It doesn't have to be a lot of storage locally -- it could be just a few days, the last couple of backups, whatever is [needed for] operational recoveries."
Macomber recommended documentation and regular testing of the recovery process, including DR.
"This will help to uncover any potential issues in recovering or in meeting RPOs and RTOs ahead of time so that they can be addressed," Macomber said.