FMS Special Presentation: UC Santa Cruz School of Engineering Will QLC Flash Replace Hard Drives?
Guest Post

What Do Users Need to Know About Next-Generation Form Factors?

Learn about challenges organizations face in terms of storage performance and meeting application needs while juggling other responsibilities, including managing budgets and rack space.

00:11 Keith Parker: Thank you everyone for joining us on this panel, where we're going to go ahead and discuss how the storage is really changing what can be done with storage. My name is Keith Parker, and I'm the director of product marketing here at Pavilion, and I'm honored to be joined by some experts in the industry. We have Mr. Kevin Tubbs from Penguin Computing, Marc Staimer, independent analyst from Dragon Slayer Consulting, and of course Costa Hasapopoulos, the chief field technical officer from Pavilion.

And with that, we'll go ahead and kick this off. And I'd actually like to start with you Marc if I could. So, as an analyst you've talked to a lot of customers out there, and, I know that one of the things you talk to them about is some of the challenges they face in terms of getting performance and meeting their application needs and doing that while still balancing all their other problems such as how much rack space do they have? And balancing budgets and everything else that goes on. Could you talk a little bit about what some of those challenge you at?

01:12 Marc Staimer: Sure. You kind of nailed it with your introduction right there. If you look at what they're trying to do, they're running into problems with a number of factors; one is latency. And when it comes to performance, people look at IOPS, they look at throughput, but at the end of the day, latency is a key factor, because latency affects IOPS, and there are two aspects of latency that come into play. There's first byte latency. And there's tail latency. Now, this is a big factor in high-performance computing, and the number of the applications like MPIO applications where they have to synchronize the different applications at the end, especially in the messaging aspect of it. But, in reality, it affects every application.

01:54 MS: When you talk about databases and you talk about performance, latency is a key factor. The next key factor they run into from a performance perspective is throughput more than anything else, because, when you start looking at analytics today -- and I'm talking big data analytics -- I'm talking data warehouse analytics, I'm talking AI analytics, machine learning, deep machine learning. It's all about how much data can you ingest and analyze at one time? That's throughput. A few people look at that number except the ones who are doing the analytics, and most storage systems are not designed for that. And the last thing which you brought up, which is very important, is rack space in the data center. It is a precious commodity.

02:37 MS: In almost every data center anywhere, the way that they allocate the cost of the data center is by rack space, by rack units. And so, the more rack units you take up to solve your storage issue or whatever issue you have, the more cost is associated. But all that comes down to one very important factor -- it's called time. Everything we're talking about comes down to time: time to market, time to completion, time to move on to the next project. Time is the big factor, but when you look at time . . . Let's just take one: time to market. Time to market can generate more revenue if you get early, than if you get on time. And if you're late, you can lose tremendous amounts of revenue and market share by not getting on time. So, time is the key factor in all of this. But all the things I just mentioned are key problems in every data center today.

03:34 KP: Great. Thanks Marc. So, with that in mind, and talking about time and how people are dealing with these challenges, Costa could you talk a little bit about how the Pavilion HyperParallel Flash Array is addressing some of those, and will you just kind of tell us what it is?

03:49 Costa Hasapopoulos: Yeah, Keith thanks, and Marc, that's a good lead in because all that's really a great question. So, there are multiple reasons why we classify our solution as hyperparallel and platform. So, I think we'll start with hyperparallel hardware. And, remember when I talk about . . . We're talking about a four rack unit form factor, kind of what you've talked about Marc, very dense form factor and, in that hyperparallel hardware, we're talking 20 independent controllers, 72 NVMe drives delivering over two petabytes of capacity, 4000 gig Ethernet or InfiniBand ports, and now we even have 200 gigabit InfiniBand available, right?

04:26 CH: All this connected together with our internal 6.4 terabit PCI switch-based architecture, all sharing these hardware components, connecting at nanosecond speeds inside our box. So, architecturally, that hyperparallel architecture for us, we look more like a switch than a traditional storage array. Hardware-wise, we can start with a couple of controllers and maybe a couple of hundred gig . . . Four gig . . . A hundred gigabit Ethernet ports, 18 drives and scale those independently all the way up to 20 controller, 72 drives and 4000-gig Ethernet or InfiniBand ports, right? So, the hyperparallel hardware architecture enables the hyperparallel platform.

05:11 CH: Because of the nature of our scale-out hardware architecture in our box, connected via our high-speed internal network, our PCIe network, coupled with some unique software that we enable and provide this many controller, many drives, many ports to work together or work independently. In that platform we can provide multi-host protocol as multi-storage function protocol. The hardware can be configured through software via GUI or many different API types to support multi-protocols in its entirety. In other words, the whole system could be block, or the whole system could be file or any combination based upon a controller, right?

05:54 CH: And so, we can then run many of these block protocols including NVMe over Fabric, RoCE, RDMA . . . We're really one of the earlier pioneers in that space, and these final protocols of NFSv3 or v4 object, but we also enable parallel file systems like tech Spectrum Scale, Lustre or BeeGFS in any one of those combinations.

So, guys, effectively our unique and densely packaged hardware, coupled with our unique software platform, enables several unique advantages to the traditional dual-controller architectures of today. Most notably, performance density, consistent, predictable, scalable high performance with ultra-low latency. And that scales independently with each one of those controllers. Effectively, every time we had a controller in our box, you get a million IOPS and six gigabytes of the reprint, right? You add two, you get 2X, you add three, you get 3X.

So, since we have these mini controllers, we can start small and add performance as the customer's performance needs change. Since we have multiple controllers, and as we have a unique HA advantage where you can effectively buy an extra controller with no workload, have four active controllers, one standby, in the case of a failure, you have 100% controller performance, kind of RAID for controllers. So, as we net this out, the hyperparallel platform enables flexibility, performance, density, predictability at scale.

07:27 MS: Keith, let me add something to what Costa just said, because what you guys have done, you've solved two major problems on the storage side, one of which gives you a limit on what's going on with the CPUs. There's only so many cores they can get. We're running into Moore's law, coming to an end, and the way that most storage vendors have tried to solve this is to scale out. Well, the problem with scale out, adding more controllers that way, is you have the network issue between those controllers. And usually if you're not running on your own dedicated network, which most people end up not doing, is you're going to run into congestion and contention on that network, and that's going to affect latency, it's going to affect throughput, it's going to affect IOPS. Even if you have your own dedicated network, you're going to have those issues. You solved both those problems with your architecture.

08:17 KP: Yes, absolutely. That's a great point, Marc. And so, with that, so given the performance and the flexibility, maybe Kevin, could you talk a little bit about what customers can do with the Pavilion HyperParallel Flash Array?

08:30 Kevin Tubbs: Yeah, one of the things we're seeing for customers is a trend to be workload optimized. We spoke about it earlier with the data-driven workload, especially with AI, machine learning and traditional analytics. Those put varying different challenges on workloads and workflows, also especially at different stages. So, you may have one part of the workflow workload that's IOPS-driven, another that's throughput driven, and your ability to deliver that performance and a density at certain levels is very important.

Another key point that we look at, that was mentioned earlier, is the idea of software-defined. We're definitely seeing this trend with everything retreating to the workflow, optimization of the workflow, you move to a flexible requirement of a software-defined data center. That's absolutely necessary for delivering the right solution. So I think in those cases, I'm able to take a single platform, which Costa said was an important part of it, and take the performance, the right balance of performance, right balance of drives, technology, right balance of network technology and deliver it to the application.

09:49 KT: On the compute side, we're starting to have various scale-out and scale-up capabilities of the compute. You may have a single dual-socket CPU node, and it has a single 100-gig interface, so it can maximum draw 11 gigabytes or 12 gigabytes per second. And you may scale out and have hundreds of thousands of those nodes.

Then on the other hand, you may have different nodes, say like a GPU node that has eight different, 200 gig HDR cards in it. So you have this different brand of the compute that's delivering net workload or software-defined workload to the customer, and having that flexibility to tune and put the right type of performance is very important, and then being able to deliver that in my closest at different protocols. And by doing that, I can size a completely tuned environment of compute, network and storage resources that will ultimately shorten the time to insight and discovery for the customers, as Marc talked about earlier. So that's what we're excited about, the ability to use this platform.

11:03 KP: Great, thanks so much, Kevin. So, Costa, if I could go back to you for a moment. So, one of the questions I get a lot is, everybody talks about, we're the fastest, we're the greatest, we have the best, best platform. Could you maybe talk a little bit about how Pavilion actually does compare it to the competition, and try and quantify that into something that's measurable, so that it's a fair comparison to the competition and what they're able to deliver versus what Pavilion does?


11:32 CH: Kevin spoke about flexibility. Companies like Penguin and others, and Penguin customers are looking for that flexibility of a platform, and Marc mentioned that density. I think we combined a couple of those things together, right? One of the questions I get asked is, how is your performance? And I have to ask, well, to what? To the block competition, to the file competition, to the object competition? Because we do all three. People have also said, well, you're kind of the Swiss Army knife, maybe you're not very good at any one of those things. Well, if you use kind of an Olympic podium, you know, silver, gold and bronze gold, and I can't even talk.

12:06 MS: Silver and gold.

12:08 CH: Yeah, that's what Marc said. Bronze, silver, gold, right? If you do that, I think if you put us on three different podiums -- block, file and object -- we'd be silver or gold of each one of those platforms. To be honest with you, we'd probably be gold, right?

So let's start with block. Pavilion in our hyperscale 20 controller architecture in four rack units can do 120 gigabytes a second read, 90 gigabytes a second write, and over 20 million IOPS at 25 microsecond latency. When I look at the competition in block, you kind of have to say the traditional storage vendors and then the newer players. I'll talk a little bit about the traditional storage vendors. I spent 10 years as a CTO for the field for the Americas for Hitachi data systems. I ran IBM worldwide sales for storage. I've been in this space a lot. So, if you look at the likes of the Hitachis, the Dell EMCs, the Pures, the NetApps of the world.

13:03 CH: You look at those kind of guys, right? Whether you look at their websites, ranging somewhere to 10-15 million IOPS, maybe 100 gigabytes a second to 300 like, "Oh wow, that's 300. That's a lot more than you guys." But all those take one to two racks of equipment to do that, not rack units, one to two full racks of equipment. Of course, Pure never publishes their IOPS, so if you go get some answers, I know it's about 18 gigabytes a second of read/write, to our 120. Now they're talking about latency, a million, a microsecond . . . One millisecond to . . . I think, it's 150 microseconds, that's 10 times our latency on the competition, right?

13:52 CH: So you look at those players, then you look at the competitors and newer players, you know the Acelleralls of the world that are all block, that don't . . .New guy Fungible, they just came out actually, whenever this airs, in October, and you look at those, and I won't name names here, but you can say one of the nearest one says they're doing 15 million IOPS at 60 gigabytes a second in two rack units. Well, that's . . .In our four rack units, we're twice that performance, and that's one third of our capacity, so again, whether you're talking about new guys or the traditionals in the block, we're really high up on the ranks.

We switch the NFS performance. We're 90 gigabytes a second read, 56 gigabytes a second write. Remember, that's four rack units, two petabytes, pretty much everyone in that space, whether you talk VAST, the Wekas, the Isilons, the FlashBlades of the world, they're all somewhere in the 15-40 gigabytes a second range. Remember, we're 90.

Then you switch to object, we're 80 gigabytes a second read, 35 gigabytes a second write. People aren't seeing that sort of high-performance object performance. So again, on those podiums, probably gold in all three of the categories or right up there in the top.

15:14 CH: So in summary, I think that's a testament to our hyperparallel architecture, that we enable not just best-in-class performance, but best-in-class performance density, but also best-in-class flexibility. But I think, as Marc you mentioned, and Kevin, you alluded to this, what customers are waiting for and looking for is not just high performance, not just low latency, but consistent, predictable, scalable, high performance and low latency -- all of those together.

Now, when I interviewed for my job for this company back in March, the first thing that popped in my head, "That's all great, 10 times, 10 times, but are you 10 times the price?" All this I talked about is competitive market prices for a typical all-NVMe solution key. So, you wanted some numbers, there's a few numbers and maybe a little long-winded answer, but I think I'd put us in a running race against anybody.

16:10 KP: That's great, thanks so much, Costa. That really, really shines a light on how we truly compare by breaking it down on a per rack basis, that's just so useful.

So what's on your mind, Marc? Maybe you could tell us a little bit about what are some of the use cases, what would somebody do with all that performance?

16:28 MS: Well, there are a lot of interesting applications that have been coming out, the traditional ones being databases, and there are literally different kinds of databases today. There's some that are transactional and there's some that are big data analytical, so you've got to look at what you're doing there, but ultimately, you fit both. So that's one thing, but where it's going is, as I mentioned earlier, AI, neural networks, machine learning, deep machine learning. In every one of these cases, you're building these very robust and ultimately expensive applications to analyze data quickly.

17:05 MS: The last thing you want to do is have them wait on the data that they're analyzing. That's a key factor, but the second issue is, and I keep coming back to this, that time to market is huge. Getting whatever you're trying to get done, the insights, the actions, the products, the services, the longer it takes you to do that the more revenue you're losing, the faster you do it, the more revenue you're gaining, and so all these applications . . . I'll give you a great example. In big pharma, we all know we're living in the age of COVID, what are they all doing right now? Working on therapeutics and working on vaccines, looking at what molecules can deactivate the virus inside of a human being, that all takes a huge amount of analytics. It's not done where they're mixing chemicals. It's all done virtually. That means you have to have great performance, and this is where a product like the Pavilion Hyperparallel Array comes into play.

18:06 KP: Yeah, that's exactly right, Marc. And I don't know about anybody else, but I, for one, can't wait for them to get through that, because as much fun as these virtual events are, I can't wait to get back in person.

18:17 MS: Me too.

18:20 KP: Alright, well, okay, great. And so, Kevin, I know Penguin does a lot with its high performance and flexible solutions for their customers, maybe you could talk a little bit about how GPU-based systems are changing the game, really. I know you deal a lot with GPU-based environments, so maybe you could talk about why a solution like this might be important for that type of environment?

18:46 KT: No, absolutely. Yeah, we're definitely seeing this trend of GPU and other accelerator technologies, and it kind of goes to what Costa was mentioning about, not only getting a performance density out of our storage, you're starting to get a performance density out of the compute as well, and that's definitely where GPUs play.

And look at some of the new systems out of Nvidia, the DGX A100. When you start to put together that dense of a compute platform, and it's usually matched in balance with that dense of a networking platform.

So here you have a single node that's capable of PCIe gen 4 speeds and allowing up to over 100 toward the theoretical max of 200 gigabytes per second into a single node. So that's a completely different performance profile for a single compute node. And if you look at some of the problems that are trying to be solved for these specific insights that Marc discussed, they're not deploying one ATPU server or one Nvidia DGX or either some other type of accelerator. They're doing at scale. They're going to scale that from one to hundreds, and you even have customs that scale out to thousands.

20:13 KT: As you do that, it has to be an intelligent balance of compute, network and storage performance and your ability to aggregate that together. And the key thing there, and I've seen it more recently with some of our customers, the workload dictates the technical performance requirements.

So, if you look at AI and deep learning, a lot of that, those analytics profiles are read intensive. And designing a system that's able to do that and have the read performance, you may look into the exact same organization that needs to flip that infrastructure to doing different types of simulations or other types of modeling, and in those specific algorithms, now there needs to be an equal write performance. And being able to design a solution that has balanced read and write and at the speeds and performance levels of something driven by a GPU, requires a very dense amount of performance. And then we talk about how to scale this to hundreds, thousands of nodes across multiple data centers, that density comes into play.

21:31 KT: And some of those workloads, it may be a single node that's always hitting and consuming all 100 or 200 gigabytes per second of performance, or you may have a balance of scale-out where there's hundreds of thousands of nodes and it needs to scale-out the MPI communication between nodes.

So, the ability to tune that as a single block device that you're attaching a compute to all the way up to a parallel file system on top of NVMe or flash storage, that's the gauntlet that sets forth for trying to solve this problem. And it's all software and workload driven, so your ability to design that inside of a modern enterprise right now is your ability to do that with the same technology investment. So that's why the flexibility and ability to change workloads and reconfigure and design for all of the workloads of the different groups in some organization comes into . . . It becomes an important factor.

22:46 KP: That's great. Thank you so much, Kevin. So, if we could just end on a particular note. So, Costa, could you bring us home and talk about, how is this really applicable to different types of workloads, whether it's something for a parallel file system or a scale-out NAS or DAS, how does this all tie together?

23:07 CH: Yeah, that's a good question. And to dovetail on what Kevin said, I'll make a few comments here before I get into that, but . . . That performance density and predictable format, those are the key advantages. And what Kevin's alluding to is, that's one of the keys to our hyperparallel architecture, that ability to be flexible. And then so to address all these different workloads, you have to be able to do block file and object because there's no one right answer.

And by the way, I think we will see that these very high-performance workloads in the future will be very much on object. Historically, object has been for slow stuff, but when you're getting 80 gigabytes a second read and 35 for write for objects, all of a sudden you've got a real new protocol that's real for the next generation of cloud-like applications, right? I think Kevin mentioned another good point about people are not just looking for block or file, but they may need an external file system, like a Spectrum Scale or Lustre or BeeGFS, right?

24:09 CH: But I think another really, really key point Kevin made is this, the ability of, yes, in GPUs, read is very important, right? And you're going to see over the next few weeks some pretty impressive numbers coming from Pavilion on some Nvidia performance, whether that's block or whether that's file by the way, but that's more to come. But Kevin made a great point, it's not just about read, it's about write.

You have to think about it, most people's write performance, most of our competition, is materially less than their read performance. Remember those stats I gave you? Our write performance is 120 gigabytes a second. Our read and our write is 90. Most people, their write is 50% would be really good. It's probably 30-25% of their read architecture, read performance. Now, one of the things that's very unique about that is that's part of our hyperparallel architecture and our cashless architecture. For example, we don't use storage class memory. You can sure put it in there, but we don't need that.

25:12 CH: We don't have a cache. We have a patent that writes what we write to our memory. We write to these drives. We write to our drives at nanosecond speed. So, think about what Kevin just said of this big scale. If you put a rack of our equipment . . . Remember I was saying, our competitors, a rack of EMC, Hitachi or whatever it happens to be, it'd be about maybe 300 gigabytes a second. By the way, that's somebody's two rack performance. Our one rack of our equipment, 10 of our units stacked on top of each other, is 1.2 terabytes a second, not gigabytes, terabytes a second in one rack. And as Kevin said impressive, almost a terabyte a second write, 900 gigabytes a second write. Now then you take that and put Spectrum Scale on top of this. And by the way, these are numbers from one of our largest customers. They're doing, in a rack of our equipment through Spectrum Scale, measured at the host, a terabyte a second with 10 of our boxes in about 850 gigabytes a second write. So, you get that performance, right?

26:25 CH: And I know we've been doing a lot of work with Penguin on BeeGFS as well and we're seeing BeeGFS getting up into that level of performance, so while luster a little bit less, but still pretty impressive numbers, right? So, when you couple our support for native parallel file systems inside the box, block, external file systems and those flexibility, is I think, that's where we shine.

And to summarize and as you said, take it home, I think what's as important to all this is the hyperparallel architecture: 20 controllers, 40 Ethernet or InfiniBand ports, all this capacity in a single box, right? So, in fact, we have a customer that's running NVMe over RoCE for Greenplum, NFS for a specific application, iSCSI for several VDI environments, and then another application running NVMe over Fabric RoCE, all on one single platform. Consistent, predictable, scalable performance. Again, with all that flexibility. So, I think that's what people are looking for. If you tie in what Kevin had mentioned and what Marc had alluded to, what customers are really looking for.

27:38 KP: Wow, that's amazing. Thank you so much. I'd like to thank everyone for attending this presentation. I'd like to thank our panel members, Mr. Kevin Tubbs, Marc Staimer, and Costa Hasapopoulos for joining us today. And thank you. And if you'd like some more information, please feel free to visit us at Thank you.

Dig Deeper on Flash memory and storage

Disaster Recovery
Data Backup
Data Center