The Role of Flash in Enterprise Data Protection
IDC research director Phil Goodwin delves into how customers are using flash to meet evolving data protection requirements, the business benefits this brings to the table and more.
0:00:19.9 Phil Goodwin: Hello everyone. Good afternoon and welcome to this session on the role of flash in data protection. And I welcome you also to the Flash Memory Summit. I know you've been enjoying a lot of different content from various people, including my colleagues here at IDC. My name is Phil Goodwin, I'm a research director at IDC, and I cover multi-cloud data management and protection. And, of course, for today that focus is really going to be on the data protection side of it.
Now, it's not immediately intuitive that flash has a role in data protection because the immediate thought is, "Well, it's pretty expensive. Do we really care about low latency with respect to data protection and so forth?" But I am here to tell you that there are some specific use cases where flash can play a very important role in data protection. Now, to build that case for you, I'm going to go through really four primary areas, and that includes the state of data protection, and I realize we have a mix here in the audience of both IT practitioners as well as IT suppliers, and I want to keep it relevant for both the state of data protection is a little bit of an inside baseball thing, if you're an IT practitioner, but I hope you find it interesting. Anyway, it is kind of interesting.
0:01:38.2 PG: And then also, I'll talk about some of the unique user requirements that we're really seeing right now with respect to data protection, and then I'll get into the specific roles of flash, and I'm going to take a use case approach to how flash can assist in a data protection environment, and then I'll talk about the deployment models where we most commonly see flash used in the field. And, of course, no presentation would be complete without IDC's guidance at the end, so I'll have some key guidance for you at that point.
0:02:12.6 PG: IDC divides the data protection market really into three large buckets -- I kind of like to call this the universe of data protection -- and the first two buckets you see here on the screen, one is data replication and protection. Now this includes the traditional data protection software, backup and recovery, as well as replication software, such as clones, snapshots, mirrors, things like that, all gets built into this particular market. You can see here that it was $94 billion in 2019 and grew at a rather robust 4.6%. That's a pretty good growth rate for a market that's been around for a long, long time, often considered to be mature, but is surprisingly dynamic today.
The second part of that is what we call data protection as a service, and this is the DR as a service, archive as a service, backup-as-a-service-type of market, and it's an area which we see growing at about 14.4%. Currently, we expect it to be about a $7 billion market in 2020, but here's the interesting thing from a COVID-19 perspective -- the data replication and protection market seems to have been hit pretty hard. In fact, we're now forecasting that the growth rate for that market is going to drop to actually a negative 0.8% through 2024.
0:03:35.9 PG: But I think the beneficiary of that is the data protection as a service, where we believe that growth rate is going to increase to about 16% compound annual growth rate per year as organizations accelerate their move to cloud and utilize cloud for additional capabilities, and we're going to talk about some of those use cases here in just a slide or two.
0:03:58.1 PG: The third part that we look at within data protection market -- the universe that I talked about -- is purpose-built backup appliances. Now, these are appliances, as the name implies, that are used as a backup target most commonly, or originally, they were really a replacement for tape. They've certainly evolved far beyond that today, but you can kind of think of that as the place where the data replication and protection software actually moves the data. They come in two flavors: one is what is a pure target device, in other words, one that supports multiple third-party data protection backup recovery applications, and they're really a direct target for those devices or for that software.
0:04:43.7 PG: And then the other is what we call an integrated appliance, and that's where the backup and recovery software is integrated directly with the appliances. It's bundled and sold as that bundle, and so it typically will support one vendor only, which is the vendor that supplies it, which is usually a data protection supplier like backup your recovery software. That market was about $4.4 billion last year. The interesting thing about it in a COVID-19 world is I had actually forecast that it was going to decline this year and in future years. It has remained remarkably steady; it's clearly continued to solve key problems, especially for on-premise backup and recovery, and so that market continues to be very strong.
0:05:30.0 PG: Now, I'd like to start talking about some of the key issues that IT organizations and IT practitioners are having to deal with. And the first one that you see here is data sprawl. What I mean by data sprawl is it is created by two key factors -- one is simply the growth of data. According to our research data, is growing at about 35% to 45% per year, depending upon the industry, the size of the organization and other factors, but what that really translates to is that organization's data is doubling every two to three years. Now, petabyte-scale repositories are no longer all that uncommon, so if you imagine that kind of scale -- whether it's hundreds of terabytes or into the petabytes, doubling every two to three years -- just the sheer volume of data is a problem, but combine that with the fact that at IDC we forecast as many applications will be deployed within the next few years as have been deployed in the previous 40.
0:06:33.2 PG: About 80% of those are going to be at the cloud or at the edge. In other words, only about 20% in the traditional data center. So, when you have all those different applications, generating data in a robot environment or IoT applications, out in the cloud through cloud-native storage objects and so on and so forth, what you have is data from the organization, the enterprise, spread across the core edge and cloud in different repositories, different geographies, different data types, different use cases and so forth. And just the sheer logistics of how you capture that data, how you store that data, how you protect that data, how you apply policies to that data becomes a serious problem, so data sprawl was one of the major things that we see organizations dealing with, and flash can actually play a role in assisting with that as we'll see in a few moments.
The second part that has been a dynamic, certainly as a result of the COVID-19 pandemic, is the work-from-home phenomenon, and we believe that the work from home has fundamentally transformed the workforce.
0:07:43.2 PG: Yes, people are going to be going back to the office, but more and more people are simply going to stay working from home. What that caused IT organizations to do is really to adapt their infrastructure to support large contingents of work-from-home workers. When we look at some of the spending trends among organizations at IDC, what we find is that organizations are continuing to invest in those kind of systems that really support the work-from-home users, so networks, backup and recovery devices such as laptops, desktops, printers, you can go down the list, and organizations are continuing to invest in that area. As a result, protecting those mobile devices has become particularly important.
0:08:30.5 PG: More and more data is being moved outside the walls of the organization, whether it's the data center or simply a physical office, and that data needs to be protected both from a security standpoint as well as from a backup and recovery standpoint, and that has put additional pressure on IT organizations. So, user self-service has emerged as a key capability for a lot of organizations when it comes to data backup and recovery. Organizations really want to move as much of that effort as they can, both to the end user or to service providers, and in that regard, what we're finding is that the common backup models are through things like SharePoint, OneDrive, iCloud, so on and so forth, other file second share services that are out there, and there are many of them that do a very good job of providing this kind of capability. But in addition to what we find is organizations are increasingly moving to a backup-as-a-service model for their end-user devices, many of those can be done very transparently to the IT organization by outsourcing that kind of effort to a provider. It allows organizations to free up their IT resources to pay attention to other . . . to take on other tasks while making sure the data is properly protected.
0:09:53.7 PG: The third area that IT organizations are particularly dealing with is ransomware and malware, and I believe that it is the highest risk element to data loss. Certainly, we have human error, we continue to have natural disasters, fires, hurricanes all that kind of fun stuff. But it really is the ransomware and the malware that is the biggest threat to organizations. In fact, in surveys that we do, data security is repeatedly the No. 1 concern over and over in these surveys. Data protection is right behind it, often it's No. 2 or No. 3. So, what we're finding is those two things are going hand-in-hand with organizations where they are, in my opinion, conflating data protection with data security, really lumping them under one element. But I'd consider really guarding the front door, which is data security, and then guarding the back door, which is recovery in the event of some kind of malware intrusion.
0:10:56.0 PG: We also find that ransomware is adapting. Originally, you could detect ransomware because I/O activity suddenly shot up. Well, the bad guys, of cours,e figured that kind of stuff out. And, so, they've also learned that you attack the backup first. If you can attack the backup, then it makes recovery almost impossible, and therefore increases the chance that you're going to have to pay the ransom in order to get your data back.
We also find that some of these systems are now infecting incrementally, meaning very slowly over time, they will simply add extensions to file names, for example, maybe encrypt individual files or small amounts of files until they finally reach critical mass and then become triggered. Where flash can come into this role is really in being able to do that, facilitate that detection and some of the processing that goes on with finding those kinds of malware capabilities.
0:11:55.0 PG: So, AI, as you have certainly been hearing throughout this conference, I'm sure, plays an important role in this regard. Now, I want to take you through some of the industry trends that we're seeing that are driving data protection, and the first one is what I call the race to zero, which means if we talk about SLAs or RPO and RTO, what we're really talking about in the race to zero is zero data loss and zero downtime or zero RPO, zero RTO, and as we're getting closer and closer to that, I do believe that we will be able to achieve that and we're starting to see some of the technologies that are capable of doing just that.
0:12:35.0 PG: But even today, the best practice RPO, recovery point objective, is about 15 minutes, and the recovery RTO is typically between somewhere between an hour and four hours, depending on the size of the system -- varies a little bit if you're getting into disaster recovery, and of course, the nature and the size of the data loss and so forth, but more and more we're moving to systems that can do that kind of almost instant recovery.
0:13:03.1 PG: The second element is container data protection, and containers are different in that you not only have to protect persistent data, but you also now need to protect persistent containers. So, recover and contain around the data, but moreover Kubernetes is actually very dynamic, and so you may not be able to restore containers or the data to a specific point in time, several months previously, if you're not able to restore Kubernetes to that state as well, so we've moved from a time-based architecture to an event-based architecture, as well as moving closer and closer to the application. In many cases, now it's the DevOps guys who need to do this implementation, not the backup admin or somebody like that. Emerging to meet some of these needs are things like continuous data protection, and these are usually instantiated as journal-based backup and recovery applications, which actually give incredibly granular recovery points, some often sub-minute, maybe even down to individual transactions and so forth.
0:14:15.0 PG: And then, of course, artificial intelligence is emerging as a way to facilitate not just the ransomware that I talked about earlier, but also how to more intelligently back up and detect and determine what to do with data and how to manage that data appropriately.
0:14:32.2 PG: So, let's talk about four key customer requirements. Requirement No. 1 is rapid reliable backup. Now, that may sound obvious, and perhaps it is, but in surveys that I do, I'm still surprised at the number of organizations that have a 25% failure rate on a daily basis of their backup to recovery jobs. IDC's best practice -- just to throw it out there -- is 96%. So, if you're at the 75%, honestly, you're at significant risk for data loss, and those organizations that have that kind of risk simply are going to fall behind competitors who are able to give their organizations better data availability.
The second requirement is instant restores, and I say instant because that really means in a matter of minutes anywhere, from two minutes to 10 minutes or so on, to be able to do a recovery of things like virtual machine images or virtual desktop images, sometimes individual files and backups. You generally can't get instant restore for very large volumes of data, but of course, that's not what you do most of the time. Most of the time, it is these relatively small recoveries that are required, and it is that instant restore that organizations value.
0:15:49.0 PG: The second is work-from-home data protection that we talked about, being able to automate for the end user their backups to be sure that data is backed up from nonprofessional or non-IT professional users who need to have that kind of capability without having to understand the technology behind it.
0:16:08.0 PG: Then the fourth that I talked about a little bit earlier is the user self-service, but in this case, there are actually other use cases as well. So, for example, DevOps teams being able to make an instant clone of a database or of a file system or something, so they can advance their DevOps and speed up the time to market that they have for their applications.
So, now let's talk about, specifically about some of the roles of flash in data protection, and this gets back to some of the requirements that I had in just a prior slide, and the No. 1 requirement here or capability is really to be able to pin an image within a flash . . . to your flash component within the array, so that you can supply virtual machine images and other individual file recoveries and so forth, or the database image really, in a matter of seconds or, worst case, a matter of minutes.
0:17:07.7 PG: The second is, as I've talked about the DevOps, and this is one where DevOps teams often want to be able to do recursive testing, so they're constantly refreshing their image of the database. And what we found in our surveys is that on hard disk-based systems, a typical refresh of a database is often a four- to eight-hour process, whereas by being able to go to flash, that can often be reduced down to minutes. That's a big deal for DataOps people who can do that kind of imaging rapidly for their testing. The other thing is, it often has very minimal impact on production systems, whereas the hard drive systems, because of the nature of the way you create clones, can have a significant impact.
0:17:57.3 PG: The third is artificial intelligence. Again, this is ransomware and malware, and I like to say tongue in cheek that if you really want to find a good AI developer, talk to the ransomware guys. It is a bit of an arms race between the bad guys and the good guys in terms of artificial intelligence and how to detect them while they're trying to figure out how to avoid being detected, and flash can play a role in there by being able to facilitate the real-time analysis behind data security to make those kind of judgments rapidly and to initiate action in the event of suspicious activity.
0:18:34.2 PG: Then the fourth is rapid data ingest. When you think about the data sprawl that I showed earlier, the data needs to be able to move from the edge to the core or the edge to the cloud, or from the core to the cloud, and so on and so forth, so that rapid data ingest that can be facilitated by flash can be incredibly valuable for certain applications that have that kind of requirement. Also, if you can do faster, more frequent backups, you significantly reduce the risk that you have of data loss, and candidly, you're probably going to also improve the reliability of your backups when you're using flash systems.
0:19:14.3 PG: There are several deployment models that organizations use, and the No. 1 deployment model is the purpose-built backup appliance. There's also a general-purpose disk that people use in order to facilitate their backups as a backup target, but for both of these, the most common instantiation is as a flash tier, so there'll be say roughly 5% or so of the total capacity of the appliance is dedicated to flash in order to give the benefits of rapid backup ingest, rapid restore -- those kind of things that I've talked about in the prior slide. We are seeing a trend, though, towards all-flash arrays being used for backup and recovery.
0:19:58.2 PG: Now, you might say you really don't need that for all the backups, but I can tell you when you need to do fast restores, all-flash arrays can make a huge difference. The other area where they make a big difference is that backup systems are no longer being used just for recovery when bad things happen, but actually to leverage that data for secondary uses such as the DevOps or for analytics or for task dev and many other functions as well. And that's where an all-flash platform can really serve multiple purposes at the same time.
So, let me wrap up here with some key guidance, and I have three things for you. The first is the world is moving to flash, even for data protection. Right now, it's mostly as a tier, but I think more and more you're going to see the all-flash arrays being deployed. Second thing to remember is, when it comes to backup and recovery and data protection, it's all about the restore.
0:20:56.2 PG: Yeah, backup is important, but it's axiomatic to say that it's the restore that is critical and more and more users are needing and demanding instant restore for critical systems.
And then the third is that secondary use cases have become an important part of many organizations and their decisions in deploying backup and data protection systems.
0:21:27.6 PG: And with that, I hope that this information has been helpful, I think . . . I hope it's given you some food for though. I always love hearing feedback from people, and you can see my contact information here, so feel free to drop me a line. Let me know how you're doing or if you have any questions, I'd be happy to chat. Have a good day.