Performance at Scale for Model Training
An AI leader explores how AI and machine learning applications are different than traditional file-based applications, performance at scale of model training and more.
Download this presentation: Performance at Scale for Model Training
00:03 Shailesh Manjrekar: Hi folks, good morning. My name is Shailesh Manjrekar. I head AI here at WekaIO. And I'll be talking today about performance at scale of our model training.
To give you a little sense of the flash market and AI/ML applications, so AI/ML applications are today the fastest growing applications in the data center. And they tend to leverage advances like with 3D XPoint, with computational storage, QLC flash technology, and several other new technologies in addition to the rapid progress done by the 3D NAND flash market, because this market is primarily driven by low latency and high throughput applications.
The flash and SSD market is estimated to grow to almost 90 billion by 2022. So, it's a fairly big market. And the underlying infrastructure, whether it be cloud data center or enterprise data center, they need to be optimized for the kind of . . . The workflows or pipelines for the AI/ML application's use, for either model training or whether to analyze petabyte scale data sets, while satisfying the critical cost constraints.
01:27 SM: So, if you look at this market, most of the growth is driven by software-defined storage. So, on the right-hand side, you'll see in survey we conducted with the ESG Group, the Enterprise Strategy Group. And you'll see almost 55% of our enterprises today are leveraging software-defined storage, 14% are aspiring to go there. And that's primarily because of the flexibility software-defined products provide, the agility they bring to the table, and Weka is, of course, a big part of it.
This article is part of
Flash Memory Summit 2020 Sessions From Day Three
So Weka, on a high level, we are . . . Essentially, our parallel file system design grounds up for NVMe parallel flash. And what we do is we provide a simplicity for a NAS device or a NAS storage system, we provide and we expose a POSIX interface. So, the look and feel is like a local flash file system. We leverage technologies like NVMe over Fabrics to provide highest performance speed at lowest latency. So, we are the highest performing vendor today for, either for SPEC SFS, for IO-500, or STC, stands for Securities Trading Council. And while we do this, we provide the scale which is relevant for this kind of new application. So, we can support billions of files in a single directory and trillions of files across the namespace, very important capability when it comes to AI/ML applications.
03:09 SM: So, a little bit about the company itself. We are an AWS advanced technology partner, 600% growth last year and we have seven of the top 50, Fortune 50 companies as customers to date, several partnerships, strategic partnerships across vendors, like, Hitachi Vantara or Hewlett Packard, Lenovo, Nvidia, several of them are investors as well as our strategic partners.
03:42 SM: So, this presentation I'll be primarily dividing into three parts, we'll talk about how AI/ML applications are inherently different than the traditional file-based applications. Next, we'll talk about, as a result of this, how the underlying architectures are changing to be able to cater to those new requirements. And then finally, talk about how Weka is basically presenting solutions to be able to cater to these new applications. And as data is becoming your new source code, how are you able to leverage data to provide actionable intelligence, to operationalize data sets and also to be able to provide governance to these data sets?
So, let's add value into the first part. So, where do we see these kinds of applications? So, the traditional markets, whether it be high-performance computing, high-performance data analytics or AI/ML, they are now getting penetrated with GPU computing, either using GPUs or it could be using FPGAs or they could be using AI accelerators. But there is a heavy influx of what we call "accelerated computing" either using GPUs, FPGAs or AI accelerators across these use cases.
05:00 SM: As far as Weka goes, we claim we power mobility as a service stack. So, we have customers all the way from sensor manufacturers, all the way to connected car or in-car intelligence, all the way up to services, which is WeRide as a company. So, these are some of the several marquee customers what we have, whether it be Sirius, whether it be TuSimple which is an electric truck company, autonomous truck company. Sirius is an in-car intelligence company, WeRide is a services company, and Inova is a lighter manufacturer.
So, talking about autonomous vehicles, safety is really a non-compromisable primary objective. You need tons of data to be able to train these models which need to be extremely accurate, okay? And you need to have a mechanism to be able to reproduce the experiments, because for the most part, the deep neural networks are like a black box. So, the only way you can find out why an autonomous vehicle turn left when it was supposed to turn right is by keeping a tab on the data which was used for training that particular model.
06:18 SM: So, this is an example of what we call a software-defined CI/CD pipeline for a software-defined car. So, you have several of these survey cars running on the street collecting data and this can range to the extent of several petabytes at least a couple petabytes per survey car per year. All of this data gets ingested at the edge, what we call "edge aggregation."
And then, there can be some form of ETL processing, then it gets uploaded to the core, which is mostly where training happens. And then you can leverage the cloud, which can be used for economies of scale. And then on the top you'll see large compute environments with several GPUs all orchestrated by Kubernetes, and somewhere you have data being ingested, somewhere you have data being labeled, you have model training happening, somewhere you're doing simulation, which is also popularly known as hardware-in-loop, or software-in-loop testing.
07:21 SM: So, this is how a typical AWS data pipeline would look like. Well, the other thing which is happening with these particular applications is use cases are now moving from computer region to more around conversational AI or NLP, NLU, what we call natural language processing, natural language understanding, and they're also becoming multimodal.
And the same thing is happening with advances in deep learning, so you have deep learning, you have transfer learning, you have federated learning, active learning. And this neural net itself they're becoming fairly complex with the likes of BERT and Megatron are now becoming very popular for NLP. And what that means is that there are several billion hyperparameters, which each of these neural net models comprise of, and that makes the training extremely complex, and it becomes time-consuming and extremely expensive as well.
08:27 SM: Again, labeling of this data set becomes very important. And a lot of these companies use a technique called "semantic segmentation" for labeling of their data set. We did a paper with Hewlett Packard, our partner, and this particular paper shows advantages of Weka versus an NFS-based solution. So, you can see, almost it took 15.85 milliseconds to be able to train, or do semantic segmentation whereas with Weka, we were able to bring that down by almost 7.4x in terms of latency. And that directly resulted into almost saving four and a half days of our 100 survey cars because it directly attributes to the top line and bottom line when you're doing a semantic segmentation labeling for ground truth.
09:34 SM: That brings me to my second part, which is really about as a result of this new applications, the underlying infrastructures are changing. So, with GPU computing, what has happened is your compute layer has become extremely densified with, for example, Tesla V100 GPU from Nvidia, you are now looking at 5,000+ cores. And that kind of parallelism needs different kind of storage stacks.
So, the traditional storage stacks just cannot keep those GPU cores busy and better utilize that. And that's the reason why you need a storage stack which has that kind of parallelism built-in into it. If you look at a traditional AI/ML pipeline, somewhere you're doing massive ingest, somewhere you're doing labeling, somewhere you're doing neural net training, somewhere you're doing validation, somewhere you're doing inference which is all about low latency, and somewhere you're doing lifecycle management.
10:26 SM: So, as you can see at each of these phases from a storage perspective, these are different requirements. And if you look at traditional storage stacks, more than often you'll end up being in the storage silos. Weka with its mixed workload characteristics, its ability to handle performance and low latency very effectively, along with the data management capabilities is in a very unique spot to be able to cater to this kind of pipelines, and that's the reason why we are winning today.
So that brings me to the new approach, which is what Weka is really formulating and spearheading here. So, Weka has put together an end-to-end solution blueprint, what we're calling the Weka AI, and with this solution blueprint, this is really an end-to-end solution, and we have done this primarily for two reasons.
11:23 SM: First and foremost, the personas whom we talked to, they are data scientists, chief data officers and so on. They don't know a whole lot about infrastructure. So, this kind of end-to-end blueprints make it a lot more easy for them to consume this holistic stacks.
And the second reason is also, it makes it easier to articulate our value proposition by not having to talk in terms of storage speeds and feeds but in terms of AI/ML data pipelines. So here if you can see, we are at the center stage of this as the custodians of the data set. We also work with private and public cloud vendors to be able to extend a single global namespace. But then we also work with GPU vendors, whether it be Nvidia, whether it be HP, and so on and so forth. We work with container and Kubernetes vendors like OpenShift or HP's middle platforms. And then we work with accelerated libraries and MLOps platforms to be able to showcase an entire end-to-end solution. So, we have partnership with Valohai, we have partnership with Run:AI and a number of those players.
12:38 SM: And what this does is it enables us to showcase data ingest, it enables us to showcase feature engineering, hyperparameter optimization, hype inference and all of that on a single platform in a single pane of glass. So that's Weka AI is all about.
This is how the actual deployment looks like for Weka, how we can run in a Kubernetes environment, we provide persistent volume claims through our CSI plug-in, so you can provide now statefulness to this AI/ML applications and pipelines where you need persistence across different stages of the pipeline.
We also have several end-to-end reference architecture, so this was one done with HP with four HPE Apollo 6400 servers, and you can see we're leveraging both inference as well as our training benchmarks here, you can see we have almost our highest number of images per second with four Apollo 6400 servers, and you can see this both for training as well as for inference, and then we have white papers, which you can refer to with HPE. Similarly, we also have white papers with Nvidia, we did a reference architecture with nine DGX both, again, for training as well as inference, and you can see we can all the way go up to 160 images per second, both on internet and InfiniBand.
14:15 SM: This is a Gartner criteria laid out for what kind of storage systems are ideal for an ideal AI/ML environment, and you can see other competitive approaches, whether you're using converged appliances, whether you're using parallel file systems and on the left-hand side, you can see the Gartner criteria, and you can see us on the right-hand side with the massively scalable parallel file system, how we are able to provide you the best of these characteristics to a software-defined mechanism.
Some of the other advantages we provide is being able to reduce the epoch time, so this particular customer, we were able to reduce the epoch times all the way from two weeks to four hours, almost 80% savings for the customer, which resulted into directly top-line and bottom-line benefit for this customer.
15:19 SM: Another important aspect of AI or AI pipelines is this ability to provide responsible AI, and which we are able to provide very effectively by virtue of a capability, what we have called snap-to-object. So, we are able to capture the entire file namespace and tie that to a particular experimentation. So, six months down the line if a different data scientist wants to go back and reproduce his experiment, so he's able to do that effectively, not at the compute level, but also at the data set level. And these aspects are extremely important when it comes to explainability of the experiments or the pipelines or the model training, what you're doing. Also, we effectively are able to set up hybrid workflows and are able to provide you compliance and security, end-to-end security by virtue of our ability to provide end-to-end security, whether you are doing in-flight or at-rest security.
16:23 SM: Lots of small files. This is another hallmark, when you are running a lot of AI/ML applications and experimentation that you create, you tend to create lots of small files. And our ability to support billions of files in a directory is another characteristic why we are winning in this space.
GPUDirect Storage is a next generation of I/O technology from Nvidia, and what that means is we are able to bypass the CPU complex and the CPU memory and able to provide data directly into GPU memory.
So, these green arrows that you see are the new approach, whereas the red arrows are the old approach where the data was put into CPU memory and then brought back. Nvidia is the only software-defined storage company as part of this program, and we were able to showcase almost 163 gigabytes of performance with GPUDirect Storage with the latest benchmarks. But we are also faster than a locally attached NVMe drive. We have a reference architecture with Nvidia all the way scaling up to nine DGX servers, and then also we are the highest performing vendor today for IO-500 benchmark, which primarily focus on a small metadata I/Os. We did this in AWS I3en instances, which are flash based.
17:55 SM: And that concludes my session, so Weka FS is actually the only vendor which is able to leverage a flash and hard disk drive very effectively to be able to provide you benefits in terms of performance and capacity. As I said, we are also available in the cloud, and we blend both on-premises as well as our cloud deployments to provide you hybrid deployments with unlimited elasticity and utility pricing. We are born in the cloud, we can work across availability zones. We are elastic, both in terms of growing the cluster as well as shrinking the cluster. We provide you enterprise capabilities, and we are extremely cost-effective, even cheaper than AWS, FSx for Lustre, which is also available, which is a comparative product available in the AWS environment. So that concludes my session today. Feel free to reach out if there are any questions, comments. Thank you for attending.
Enterprise Strategy Group (ESG) is a division of TechTarget