Tech Accelerator

Flash Memory Summit 2020 Sessions From Day Three

In this FMS on-demand content library, delve into storage trends and updates -- from computational storage to artificial intelligence and machine learning and more.

Artificial intelligence, machine learning and computational storage were some areas of focus during the last day of the Flash Memory Summit 2020. The conference was held remotely due to COVID-19 concerns and ran for three days.

TechTarget's SearchStorage partnered with the Flash Memory Summit to be able to bring you the content from the event on demand.

You can review the presentations from the last day of the conference below, including insights from the SuperWomen in Flash Leadership award winner on key issues for women in the storage industry.

To learn what was unveiled during day one, please click here. Content from day two can be accessed by clicking here and keynote presentations can be found here. You will be able to download the presentations as well.

Computational Storage

Session A-9: Computational Storage Increases System Throughput and Scalability

Computational storage is a new way to approach large-scale problems by shifting some compute power to the storage. Some data processing can then be performed close to where the data resides, avoiding time-consuming transfers of large data sets and reducing the burden on central computing facilities. Use cases include database, big data, AI/ML and edge applications. The framework for computational storage is driven by SNIA and the NVM Express standards groups. The pressure of ever-increasing amounts of data and the need for applications scalability should lead to a big future for computational storage.

Annual Update on Computational Storage

This presentation takes a look at the problems solved by computational storage, how it fits within the data-centric computing trend, computational storage architectures, and more. Link to Presentation

Using Computational Storage to Handle Big Data

Machine learning, deep learning and analytics all require enormous amounts of data. Moving all that data to servers for processing is becoming increasingly burdensome and time consuming. Sure, processing cores and GPUs are more powerful than ever, but network bandwidth and DRAM cost limit how big a data set can be analyzed. As such, offloading tasks to storage can reduce network traffic and optimize the work done by expensive processors. Computational storage allows data to reside close to processing power, thus allowing processing tasks to be inline with data accesses. NVMe SSDs and external storage appliances can act as many distributed processing engines that can perform compression, sorting, searching and profiling. Inference engines placed in the storage can even do pre-processing for machine learning. Ultimately, much of the application could be distributed to all the ASICs, FPGAs and small processing units in NVMe SSDs. Computational storage examples already in use illustrate ways to overcome these issues, and there are still other promising directions to explore for the future. Link to Presentation

Neil Werdmuller and Jason Molgaard

Flexible Computational Storage Solutions

Moving large amounts of data between storage and compute, as we have been doing it in traditional model, will not scale as storage capacities and volumes of data keep increasing. A shift to computational storage that brings compute closer to the stored data provides the solution. Data-driven applications that benefit from database searches, data manipulation and machine learning can perform better and be more scalable if developers add computation directly to storage. Flexibility is key in the architecture of a computational storage device hardware and software implementation. Hardware flexibility minimizes development cost, controller cost and controller power. Software flexibility enables software from purpose-built code to Linux with containers, including eBPF, to execute on the same hardware architecture. Implementing the hardware and software flexibility into a computational storage drive requires forethought and deliberate consideration to achieve a successful solution. This presentation shows how to simplify and provide hardware and software flexibility in computational storage architectures. You'll get an understanding of how to maximize controller performance while minimizing power and complexity in their controller architecture. You will also learn about the breadth of software options, including Linux and the Linux ecosystem, to unlock the benefits of computational storage. Link to Presentation

Session A-10: Keys to Making Computational Storage Work in Your Applications

Computational storage applications require careful analysis. Obviously, storage with compute included costs more and may not include the latest enhancements. And there is the cost of any software that may have to be developed. Then again, less time is spent moving data around and the system will be more scalable. The trade-offs are generally increased cost vs. increased performance. Link to Presentation


Session B-9: Latest Trends in Storage for AI/ML

AI/ML applications have varied requirements for different stages. Model training requires access to huge numbers of small files containing training data. Model execution depends on heavy computational capabilities and low latency. Combining the two is difficult, particularly because models generally have to be retrained frequently to avoid having significant errors.

Dave Eggleston

Best Storage Strategies for AI and ML

Here is a look at four artificial intelligence trends, including GPUDirect Storage and computational storage. This presentation also shares some examples of these trends. Link to Presentation

Shailesh Manjrekar

Performance at Scale for Model Training

Model training for complex deep neural networks is becoming a major issue as use cases transition from computer vision to multi-modal and conversational AI. Storage I/O creates a huge bottleneck as the compute layer becomes both more complex and more parallel. A low-latency, high-throughput parallel file system is an essential part of the solution. It both keeps computation times reasonable and offers scalability to handle ever-larger data sets. Link to Presentation

Session B-10: Storage for AI in 2025 and How We Got There

Dave Eggleston, J Metz, Dejan Kocic, Gary Grider and Scott Sinclair

AI applications are going to be everywhere, and storage systems will have to be designed specially to meet their needs. During training, storage systems must be capable of handling large numbers of small data files. During model execution, the key problem is maintaining a steady flow of data to expensive chips such as GPUs and AI coprocessors. Link to Presentation

Session B-11: Storage for Model Training and Execution

Storage plays a key role in the running of AI/ML applications. They depend heavily on steady streams of data from small case files during training and into and out of GPUs and other high-powered chips during execution. Today's systems must be scalable to ever larger amounts of data and be able to work with ever more capable and data-hungry computational devices.

Analyzing the Effects of Storage on AI Workloads

The past decade has seen explosive growth in AI hardware, frameworks and algorithms. This has led to some unique challenges for architecting storage systems for AI workloads. Learn about some of these challenges and methods for overcoming them. Link to Presentation

Designing Powerful AI Systems with NVMe, PCIe and Logical Volumes

This topic explores important trade-offs and performance considerations when designing storage systems used for artificial intelligence and machine learning workloads. Technologies covered include PCIe switching and fabrics, Nvidia's GPUDirect Storage and logical storage volumes on redundant arrays of independent disks (RAID). Learn how to design with these technologies to ensure best-in-class transfer rates from NVMe to accelerator memory for optimal AI and ML performance. Link to Presentation

Using PM and Software-Defined Architectures to Optimize AI/ML Workloads

Data is being generated in larger volumes and faster rates, causing congestion, I/O bottlenecks, storage outages and cost overruns for high-performance workloads. As data-intensive workloads scale, it's critical to implement memory-centric architecture to meet the demands of large data sets. Persistent memory promises an immediate and tangible solution for machine-generated data. Persistent memory combined with DRAM and the right software defined architecture becomes a new memory tier that provides a larger and more persistent memory capacity that renders the storage tier obsolete. This session educates the flash and AI/ML communities on the properties of persistent memory and fosters a discussion on the adoption of persistent memory-based solutions for I/O intensive workloads such as AI, ML and analytics. Link to Presentation

Kiran Modukuri and CJ Newburn

Accelerating the Data Path to the GPU for AI and Beyond

As workflows shift away from the CPU in GPU-centric systems, the data path from storage to GPUs increasingly becomes the bottleneck. Nvidia and its partners are relieving that bottleneck with a new technology called GPUDirect Storage that includes a new set of interfaces. When partners are enabled with GPUDirect Storage, the direct memory access engine in the NIC or local storage is able to move data directly to and from GPU memory, rather than going through a bounce buffer in the CPU. This can improve bandwidth, reduce latency, reduce CPU-side memory management overheads and reduce interference with CPU utilization. GPUDirect Storage was revealed for the first time at the 2019 Flash Memory Summit. In this talk, we show what's happened since then. We illustrate the benefits of GPUDirect Storage with recent results from demos and proof points in AI, data analytics and visualization. We describe technical enhancements, including a compatibility mode that allows the same APIs to be used even when all software components and support are not in place.

Flash Technology

Session C-9: Flash Technology Advances Lead to New Storage Capabilities

Advances in non-volatile memory continue. New versions of flash promise higher capacities, shorter access times and longer lifetimes. Costs will continue to decrease with the emergence of new multilevel versions (such as QLC) and new, smaller processes. Flash should continue to the major non-volatile memory technology for the foreseeable future with advances in it overshadowing other approaches.

Annual Flash Update -- The Pandemic's Impact

How is the flash market faring in this crazy world? Will COVID-19 lead to a flash market collapse? Will the U.S. and China trade war lead to an oversupply or shortages? How could the U.S. presidential election impact future flash market dynamics? Where are the new opportunities, and which existing markets are headed for trouble? Will 3D XPoint memory undermine SSDs? Will SSDs finally displace the HDD? In his fourteenth annual update of the flash memory market, analyst Jim Handy shares the outlook for the market, the technology and the broader economy. Link to Presentation

Flash and Other Emerging Memory Technology Trends

All the 3D NAND manufacturers are continuing to increase the number of vertical 3D NAND gates with their own technology innovations such as HAR 1-stack VC, Bit-Cost Scalable (BiCS), CMOS under the array (CuA) and periphery under cell (PUC). Process integration, design architecture and cell operation offer many innovative changes and challenges. For emerging memory, including STT-MRAM, XPoint, PCRAM and ReRAM (CBRAM), many of the memory players and foundries are eager to develop EM for higher speed, low power and almost unlimited retention/endurance operation. Find out about recent emerging memory technology details and products from major players.

Using Software to Improve the Performance and Endurance of High-Capacity SSDs

NAND flash technologies with more levels per cell (such as QLC) produce larger SSDs at much lower per-bit cost. However, such SSDs also have much lower performance and endurance. New software, based on AI methods, can overcome the drawback. It uses part of the drive as a static SLC cache, thus greatly reducing the number of accesses to the multilevel part. It also places data intelligently on the drive according to customer usage patterns, thus reducing the number of writes and extending lifetimes. The overall result is higher performance and longer lifetime for large SSDs at no additional hardware cost and only a small reduction in capacity. Link to Presentation

Flash Memory Technologies and Costs Through 2025

We are currently migrating from 96L to 128L (112-144L) generations as 3D NAND continues its roadmap of bit increase and cost reduction. Next-generation later counts are being pragmatically decided by each company and will be announced soon (160?, 196?). We have detailed cost models for each generation and each company and model the impact of increase QLC adoption on costs and bit growth. Data shows cost reduction for next five years and how technology and design changes are dramatically shifting who the cost leaders are. We also show detailed analysis of YMTC's Xtacking technology, fab capacity and impact to the NAND market. Link to Presentation

Session C-10: Next Great Breakthrough in Flash Memory

Leah Schoeb, Rory Bolt, Daniel Worledge and Bill Gervasi

Advances in non-volatile memory keep coming despite warnings about a slowdown or stoppage in technological innovation. Flash remains the dominant technology, and its run is likely to continue. Breakthroughs keep occurring, and advances raise density and speed and decrease costs. What will be next? Will it be multiple levels beyond QLC, a fourth dimension, smaller process dimensions or something else? Of course, flash technology has been around for a while, and the next great breakthrough could be the long-awaited emergence of a major contender such as MRAM, RRAM, memristors or carbon nanotubes. Link to Presentation

Persistent Memory

Session D-9: Hands-on Testing of Persistent Memory's Effects on Analytics Performance

Enterprises worldwide are gathering data at increasing rates. Shaving time off data analysis -- or analyzing more data at once -- can help them increase business agility by enabling them to make important decisions sooner. One way to increase analysis speed is to use persistent memory, a new technology that offers performance between memory and traditional storage. Test results showed that adding persistent memory enabled a standard server to process 50% more query streams 3.2% faster and cut single-stream processing time by 26%. By equipping servers with persistent memory, enterprises could thus accelerate the turning of analytics results into action that furthers business initiatives. Link to Presentation

Hyperscale Applications

Session D-10: SmartNICs: The Key to High-Speed Converged Networks

Rob Davis, Kevin Deierling, Manish Muthal, Eliot Rosen and Bob Doud

Cloud data centers have led the way to high-speed converged networks that are scalable, easy to administer, efficient and fault tolerant. Such networks handle storage functions as well as traditional data transfers. However, at the high frequencies needed to achieve the throughput needed by large clouds, protocol processing becomes a huge burden, leading to an overstressing of central resources. The SmartNIC moves much of that processing to the network interface card, improving both throughput and scalability. The SmartNICs can also provide a variety of other functions, including cybersecurity, storage functions (such as deduplication and mirroring), pattern analysis, large memory management, software-defined networking, and network or storage virtualization. One could even add neuromorphic processors to aid in AI tasks. Such NICs are readily available today with a variety of built-in functions and in many different implementations. Network, storage and system designers need to learn how to take full advantage of the distributed intelligence they provide. Link to Presentation


Session A-11: Flash Controllers for Application Acceleration

This session provides details on improving the endurance, performance and reliability of 3D TLC and QLC NAND flash devices. Important architectures, signal processing and machine learning algorithms that flash and SSD controllers can employ are revealed. The session also presents novel implementations for reducing power usage and solving problems caused by write amplification. Learn about new technology developments and have time for questions and answers with top industry experts.

Open Source Processors for Next-Generation Storage Controllers

RISC-V is a new open-source hardware instruction set architecture developed by prominent researchers at UC Berkeley, but it is now widely commercially deployed. It is designed for a wide range of applications and has subsets for everything from small embedded systems to supercomputers and rack-mounted parallel computers. It has features aimed at multicore applications and at scientific computing, as well as facilities for special or custom extensions. Because it is non-proprietary and royalty-free, it can be adopted worldwide without concern for costs or political issues. Its ecosystem is emerging rapidly to offer essential support such as IP, software tools, design software, test equipment, security and development environments. In particular, the CHIPS Alliance, an open, collaborative organization designing open source RTL SoCs, and peripherals as well as open source software development tools, is enabling a high level of innovation for storage controllers and data center architectures. Link to Presentation

Achieving Latency and Reliability Targets With QLC in Enterprise Controllers

The 3D QLC NAND flash memory characteristics exhibit remarkable challenges in terms of latency and reliability resulting in an unassertive adoption in the enterprise storage market. We demonstrate that these challenges can be effectively mitigated through an innovative controller design and novel flash management algorithms. Emphasis is given to read voltage calibration and data placement alternatives and their critical role in achieving low error-rates and low read latency performance. We present experimental results that demonstrate the improvements from various read voltage calibration and data placement schemes and discuss the specific trade-offs in accuracy and controller complexity of such schemes. Our findings demonstrate that implementing these techniques in a QLC controller not only achieves TLC-like performance but can even outperform traditional TLC controller designs, which are widespread in enterprise storage systems today.

Machine Learning for Bad Page Prediction in Flash

Flash memory is prone to failures as the number of program-erase cycles increases, resulting in an increase in the bit error rate. Once the bit error count exceeds a certain threshold, error correction engines are either incapable of continuing to correct the errors efficiently or they may fail entirely. This leads to an interest in learning the behavior of the error count increase and obtaining an ability to make failure predictions. This talk tackles this problem using a machine learning approach, although standard ML techniques may not work well with the particular data in hand. The error counts are collected from actual flash memory and one can expect to see more pages with a lower error count than pages with a higher error count. This feature of the data set leads to a formulation of our goal in terms of a classification problem with significant class imbalance in the underlying data. The talk also covers various classification methods that address such class imbalance, including cost-sensitive boosting techniques, bagging procedures, ensemble support vector machines and cost-sensitive neural networks.

Salman Rashid

SPI NAND Host-Side Error Correction

Memory-intensive applications such as high-end consumer products, networking and industrial systems are putting cost pressures on designers that drive engineers to find new avenues for reducing system cost while improving performance. An essential technology for maintaining reliability and extending memory longevity in SPI NAND flash is found in error code correction (ECC). With the goal of achieving greater efficiency in these applications for SPI NAND flash-based systems, developers use architectures in which ECC is implemented in the host MCU, instead of those with integrated ECC. This presentation addresses the differences between integrated and host-based ECC, and details each approach's impact on system performance, reliability and, ultimately, cost. Link to Presentation

Industry Trends

Session C-11: SuperWomen in Flash

This session begins with a discussion among women working in the storage industry. Hear them share stories about how they got to where they are today, future career plans and how they feel the industry is treating them. Then, the SuperWomen in Flash Leadership Award winners zero in on the key issues for women in the storage industry.

Fahima Zahir, Purvaja Narayanaswamy, Shriya Paramkusam, Renee Yao and Ginger Gilsdorf

Young Superwomen in Flash: What Is Their Situation?

In this session, we turn to the question of young SuperWomen in flash: What is their situation? Young women continue to find the career paths in the storage industry to be difficult to follow. There are few role models for them, and few companies have made a big effort to encourage them. Our panel of young women technologists discuss why they decided on this industry, how they feel it is treating them, what their plans are for the future, and how they think their situations could be improved. Link to Presentation

Camberley Bates, Barbara Murphy and Deepti Reddy

SuperWomen in Flash Panel

Join the SuperWomen in Flash Leadership Award Winners presenting their view on key issues for women in the storage industry. Link to Presentation

New Memory

Session D-11: Scaling of New Memory Technologies Used for Persistent Memory

Mahendra Pakala

New memory technologies such as phase-change memory (3D XPoint) and MRAM are now gaining wide interest as persistent memory in storage systems. A key issue in determining whether they will achieve large markets is whether manufacturers can scale them successfully at lower process dimensions to obtain usable combinations of density, power and performance. Such scaling depends on the emergence of new materials and supporting metallization, as well as advances in memory cell etching and encapsulation. Data from short loops and test chips are now available to evaluate the impact of new materials and processes on device performance. Major scaling challenges include lowering power requirements for phase-change memory and increasing densities for MRAM. Link to Presentation

Enterprise Systems

Session A-12: What Do Users Need to Know About Next-Generation Form Factors?

Keith Parker, Costa Hasapopoulos, Marc Staimer and Kevin Tubbs

This panel discussion dives into some challenges organizations face when it comes to getting performance and meeting their application needs while still balancing other problems such as managing rack space and budgets, and other key considerations. Link to Presentation

Session C-12: Getting Serious About Containers and Flash Memory

Jean S. Bozman, Leah Schoeb, Russ Fellows, Cody Hosterman, Robert Starmer and Rob Hirschfeld

The widespread use of containers to speed up distributed applications is changing data centers everywhere. What is the effect on storage? Unlike VMs, containers are ephemeral, lasting only a short amount of time. So, storage must be allocated to them rapidly and deallocated as soon as they are no longer around. Furthermore, standards for just how containers handle storage have been hard to come by. Both management tools and the flash memory used in systems must be able to handle rapid turnover in a flexible manner to build a solid foundation for public, private and hybrid clouds. Link to Presentation


Session D-12: Will QLC Flash Replace Hard Drives?

Randy Kerns, Thomas Isakovich, Roger Peene, Jeff Denworth, Shawn Rosemarin and Ken Steinhardt

QLC flash appears to be a winner in many mass storage and archiving applications because of its high access speed at relatively low cost. But will it replace hard drives everywhere? Are its capacities high enough and its cost low enough to make hard drives obsolete? Does its relatively short lifetime matter in applications where data is seldom accessed? What characterizes the applications where hard drives remain the right answer? Link to Presentation

Special Presentations

IT Brand Pulse Awards

Frank Berry

The IT Brand Pulse Awards honor 2020 Flash Storage Innovation Leaders, as voted on by IT professionals in the recent annual independent, non-sponsored brand leader survey. Link to Presentation

UC Santa Cruz School of Engineering

Frank Howley

The Baskin School of Engineering at UC Santa Cruz houses two of the world's leading storage research centers: The Center for Research in Storage Systems and the Center for Research in Open Source Software. Please join this short presentation to learn about them and the Corporate Sponsored Senior Projects Program and how your company can get involved. Link to Presentation

Dig Deeper on Flash memory and storage

Disaster Recovery
Data Backup
Data Center
Sustainability and ESG