FotolEdhar - Fotolia
Panzura CEO Patrick Harr said he expects to see a significant shift in the way customers deploy his company's Freedom family of NAS, archive and collaboration products designed for cloud storage.
Harr estimated that more than 90% of Panzura storage customers have used the company's physical appliances. But he predicted about 40% would choose software options by the end of 2017. He said he expects most of that group to run the Freedom software-defined storage in VMware virtual machines on premises, and others to deploy the software directly in Amazon or Azure public clouds on commodity servers.
"It's really the use case of lift and shift into the cloud, where they don't want to rewrite the application," Harr said. "They still want to use CIFS and NFS, and so they're using us as the file system that's talking to the object store natively in Amazon. We use some advanced capabilities that Amazon doesn't do on its own."
Harr discussed industry and Panzura storage trends during a recent interview with SearchStorage shortly after he marked his one-year anniversary as Panzura CEO and the company released the seventh version of its distributed file system. He noted the company secured $32 million in funding this year to pursue its vision.
What are the biggest trends you're seeing with Panzura storage customers?
Patrick Harr: The customers we're working with right now are picking their cloud providers. They're literally saying, 'I'm not doing any more of that same NAS strategy. I'm just not doing it.'
The second trend is hybrid cloud is now reality. There was a lot of marketing discussion in the past, but we didn't see a tremendous amount of real deployment. That is a definite change happening in the industry.
The third key trend that's very important for us and in the industry in general is the significant growth of unstructured data. There's a mass amount of data. So, what do they do with that data? The unstructured data growth is placing significant stress on traditional storage systems.
We've been hearing about unstructured data growth for years. What's new here?
Harr: We have 4K video. We've got IoT [internet of things]. We've got just mass amounts of data being created. The question is: Do I keep doing what I was doing? Or, can I take advantage of something new? Security was always this big hindrance of moving into the cloud. I don't think that's the issue anymore. Now, it comes down to, can I still accommodate performance?
With flash coming down precipitously in price, that gives us an opportunity to put flash at the edge to deliver performance, and with that, take advantage of cloud from the scale side. I don't think that's been there before. As we've seen in the last three years, flash has been the focus on the structured side. Taking advantage of it for unstructured data is not something we've focused on because of the cost.
How has the flash story changed in the last year with Panzura storage?
Harr: Before, we used to have hybrid systems, meaning flash and disk. Now, we use all flash. The other thing that we have now taken advantage of is NVMe [nonvolatile memory express]. We can do high-performance acceleration for NFS, which has opened up or expanded our workflows that we can support. Now, we have high-performance CIFS delivery, as well as NFS delivery.
Do you support NVMe in your Dell appliances?
Harr: Yes, there is a sys log there that provides the acceleration ... And then our file system and the OS is what supports the NVMe. It's our software that drives that.
What sort of performance difference are you seeing with NVMe-based SSDs?
Harr: At a top level, we're seeing about a 10x performance gain there over using standard SAS SSDs.
Do many of your customers use NVMe-based SSDs?
Harr: There are vertical market spaces that are important for us -- life science, genomics, seismic data, even in video processing in the media and entertainment space. We're beginning to support this, so we can provide higher-performance support for those applications.
Are customers willing to spend the extra money on NVMe-based flash drives?
Harr: Absolutely, because our model is not to support 100% on flash. Based off this unstructured data set, 90% of that [data] becomes cold after six months. So, this means we are only caching the active data set, which typically is 5% to 10%, and keeping that on the flash and the NVMe. The rest is stored in object storage, which is very inexpensive. It's very scalable.
Which object storage vendors do you see most among your customer base?
Harr: IBM Cloud Object Storage, formerly Cleversafe, is No. 1. We have a very good relationship with them, and we're probably in eight of their top 10 customers because they do still require file services. We also partner with Dell EMC ECS [Elastic Cloud Storage]. That's the No. 2 provider that we see. And then you have smaller ones from Western Digital to HDS [Hitachi Data Systems] to Scality, etc.
Everyone kept saying that object storage is the Holy Grail, and they're going to immediately start moving directly into object storage because of the [low] cost, the economic scale and the durability side. But those workloads are still CIFS- [and] NFS-based, and we bridge that gap because we don't require a lift-and-shift or a rewrite.
Do most Panzura storage customers keep cold data on premises or in a public cloud?
Harr: We drive in excess of about 2.5 PB of new storage a month, and 75% of that data lands on a public cloud. We support the major public clouds. Amazon's the No. 1 data provider that our customers use. We also support Azure, Google, IBM SoftLayer, which is now called Bluemix, and Alibaba.
We've seen Azure pick up steam as Microsoft has continued to expand their footprint down in the channel. As a result, customers now have Azure credit, and they're beginning to utilize those credits and aggressively move toward that.
What data types tend to stay in private clouds? Video, for instance, would require a big pipe to get to the public cloud.
Harr: Video can be certainly one of those data types. There tends to be a threshold at very large data sets. I can move to above 10 [PB], 15 PB, and that's when they begin to look at the economics of that, because it still can be cheaper just to store that internally. It really depends on what the customer wants to do.
What trends are you seeing with hybrid cloud, when customers store some data on premises and some in the public cloud?
Harr: You're still going to have workloads that stay on prem. Economics still does play a factor, because if you're very large-scale and can amortize the cost of that, there's still something to be said about [having] both storage and compute on prem. The key thing is customers still want the flexibility of what they're seeing from an API standpoint in the cloud.
What trends to you expect to see two or three years down the road with private versus public cloud storage?
Harr: I break this into new workloads versus existing workloads. I think, more and more, you're going to see new workloads as cloud data workloads, and more and more will run those in the public cloud based on the flexibility it provides.
Second, on the traditional workloads, I think today 90% to 95% of those still are on prem. And we will see a lift and shift of those, because we're working with some of those customers. We just closed an opportunity, where they've lifted and shifted into Azure. So, I would predict that you're going to see instead of 90% to 95% [on premises], we're going to move into 60% to 65% on prem. You are going to see a pretty significant shift into the public side.
But I would also say in that same breath that we're seeing more hybrid from a deployment standpoint. With those partnerships we have either with VMware or with Red Hat, and what others are doing in hybrid cloud, it makes it much more of a viable model. That means 100% of those workloads are not going into the public cloud as others have predicted in the past. I think you're just going to have a balance there, and it will ultimately be probably in the 70-30 realm.
What are your thoughts on customers using multiple clouds and the potential for creating data silos?
Harr: We get asked from our customers right now for multicloud strategies. That's really been driven by more of a DR [disaster recovery] type of a focus, where I want to have data on one cloud and, if something happens, I can immediately start using additional clouds. That is one strategy that customers can entertain, but it does come down to economics and what a customer's willing to pay, because you are maintaining two copies of that data.
I do believe as we move forward, though, that multicloud will be extremely beneficial to customers not necessarily in that paradigm, storing data on two different clouds, but rather taking advantage of the compute and the data [storage] based off the SLA [service-level agreement] that Amazon provides versus the compute and data [storage] that Google provides.
For example, Google is better at machine learning, and it has very low latency that works. It can also be beneficial for analytics. Do I have the opportunity to have a copy of data over there, to take advantage of what they do well? And then in Azure's case, [can I] take advantage of the .NET capabilities of what they do well?
Over time, basically, you're going to use the right cloud based off the right workload and the right SLA that it provides. And as a preview, we've been working on a hyper-converged cloud platform that will virtualize data across those clouds and be able to send the data to any one of those clouds based off that workload requirement. So, we absolutely believe that will be more prevalent in the future.
Why would they need Panzura?
Harr: Because you get into effectively a 'Hotel California' problem. Let's think about it: Once you get into a particular cloud, it's very difficult to get out, because you have such mass amounts of data there. Based off physics, it's difficult to move.
And it can be expensive.
Harr: Absolutely. And, in turn, I want to have the ability to have a policy to move data from X to Y. We do this currently with customers where they started off on [EMC] Atmos, for example. Now, they want to move off. They move into the new version of EMC's platform, or they want to move from Amazon to Azure based off the credits that Azure is offering.
So, the customers want that flexibility, but if you're hardcoded to the API set within Amazon, it's very difficult to do that. We believe, if you have a virtualized platform that sits on top of those, you have the ability to move that data and that workload based off the SLA for the customer.
Comparing public vs. private cloud storage
Switching cloud storage providers
Avoiding hybrid cloud implementation errors