Big data analytics requires powerful, high-performing computing systems that deliver high throughput with low latency,...
while processing massive volumes of data. To meet these requirements, data scientists are turning to all-flash storage designed to target analytics workloads.
All-flash arrays (AFAs) are a logical choice for high-performance applications. AFA storage outperforms traditional HDDs, providing more reliable and efficient storage. And with the cost of flash technologies dropping, the shift to all-flash for analytics workloads was inevitable.
The challenge of big data analytics
Data volumes are growing at alarming rates, and organizations are struggling to keep up. The more data that is generated, the more pressure there is to mine the data, generate forecasts, build predictive models, incorporate machine learning and AI, and perform other analytics-related operations.
To do all that, data scientists require fast access to petabyte (PB)-scale data sets in a consistent and predictable manner, often to support real-time analytics and decision-making. These enormous workloads are read-intensive operations that require a high rate of IOPS in conjunction with ultralow latency.
Because they're read-intensive, these workloads can get by with lower endurance rates than more traditional workloads. Read operations put less strain on the arrays.
What these workloads do require, however, is scalability in the petabyte range without any performance degradation. The underlying platform must also support parallel I/O operations, while providing consistent responses to the random I/O patterns inherent in analytics workloads.
The advantages of AFA storage
To meet their workload demands, data scientists used to add RAM to the servers. But most servers limit the amount of RAM that can be installed, so that meant standing up more machines, often a costly and unwieldly process.
More robust caching is another way to address performance issues. But analytics workloads typically don't benefit from caching because the entire data set is processed as a whole.
AFAs are now being used to address the analytics workload challenge. They can be as much as 10 times faster than HDDs, while boasting a latency rate of less than a millisecond. With no moving parts, AFA storage is also more efficient and reliable than HDDs.
Analytics workloads tend to require fewer servers when using AFAs, because they support higher capacities. This, along with being faster and having lower latencies, make them well-suited to the data-intensive processing of analytics workloads.
Falling prices have also been instrumental in the move to all-flash for analytics. Not only has the cost per gigabyte plummeted in the last couple years, but all-flash arrays require less space and power, leading to further savings.
New flash technologies
The advantages that all-flash storage offers analytics come from recent advances in flash technologies.
The development of triple-level cell (TLC) storage has been critical in boosting flash capabilities. This solid-state NAND flash memory stores three bits of data per cell, making it less expensive than single-level cell and multi-level cell flash. And 3D NAND flash, which stacks memory cell layers on top of one another to provide higher densities, when combined with TLC technologies, has resulted in large increases in storage capacities and performance. TLC and 3D NAND have switched the focus away from endurance and toward capacity and performance -- a boon to analytics.
AFA storage also benefits from the introduction of nonvolatile memory express (NVMe), a host controller interface and storage protocol that accelerates data transfers between servers and SSDs.
These technologies make it possible for AFAs to support sophisticated analytics in a way not possible before. However, the storage platform must be purpose-built for analytics. For example, the storage must be directly connected to the server, using NVMe or similar technologies.
Dennis Martin, president of analysis firm Demartek, talks about NVMe
The AFA storage market
AFA pioneer Pure Storage is among the vendors targeting analytics workloads. Pure's FlashBlade solid-state arrays are designed for everything from analytics for scientific research to AI. FlashBlade can scale to 1.6 PB of usable storage, using multiple 52 TB blades that support a massively parallel architecture for maximum performance and consistent I/O.
IBM, Kaminario, Micron and SanDisk are among other vendors in this growing market. And at the heart of this trend is the latest generation of all-flash arrays, which deliver the levels of high performance and low latency required to support big data analytics workloads.