Sergey Nivens - Fotolia
On the surface, unstructured data and flash-based SSDs seem like a perfect match. But it takes careful planning...
and analysis to implement a cost-effective, big data flash system that meets business needs.
Storage managers have begun considering flash for unstructured data storage because of the growing use of big data analytics. According to IDC, sales of big data and business analytics offerings are growing at a compound annual rate of 11.9%. By 2022, IDC projected the market will be worth $260 billion.
Organizations are storing a lot more data to feed those big data analytics systems. And a lot of that data is unstructured text documents, emails, images, videos and other files that don't fit into a traditional database.
As a result, enterprises are buying significantly more storage than in the past. IDC reported that organizations worldwide bought 111.8 exabytes of storage in the second quarter of 2018, 71% more capacity than they purchased in the second quarter of 2017.
But capacity alone isn't enough to meet big data needs. Performance is also a critical factor.
Flash seems like an obvious choice when it comes to performance, because it's dramatically faster than traditional HDDs. In addition, flash-based SSDs take up less space, consume less energy and produce less heat than HDDs, all of which make scaling easier.
So, what's not to love?
The challenges of using flash for unstructured data
Cost is the major downside to using flash for big data storage. In recent years, flash prices have dropped precipitously, sparking greater interest in both hybrid and all-flash arrays (AFAs). The latest IDC numbers showed the AFA market growing 54.7% year over year, while hybrid array sales climbed 23.8%.
Greg SchulzAnalyst, StorageIO
However, flash-based SSDs still cost more than HDDs, and getting enough capacity for large data repositories can stretch budgets. "The biggest challenge faced by enterprise customers for storing unstructured data on flash is cost -- that is, trying to get as much SSD capacity for the least amount of money," Greg Schulz, analyst at consulting firm StorageIO, wrote in an email. "Another challenge is getting the most capacity per cost without compromising on reading as well as write performance of data, as well as metadata."
Schulz also noted that the information available from vendors can be confusing, and it often can be hard to find a flash approach that meets a business's unstructured data storage needs.
The following five tips can help make flash-based SSDs work with unstructured data.
- Analyze your needs. Before investigating the available options, you first must understand your business users' requirements. "Do your homework," Schulz advised. "Know your applications' performance, availability, capacity, economic characteristics, as well as how they will access unstructured data."
- Analyze your options. Armed with that information, assess the available products, including flash-based SSDs, as well as all-flash and hybrid arrays. Several vendors -- notably Dell EMC's Isilon and Pure Storage's FlashBlade -- offer flash designed for big data. However, general-purpose AFAs or hybrid arrays may also meet unstructured data storage needs. Schulz recommended considering whether the specific technology will work for your environment or if you will have to adapt your environment to a given product.
- Use capacity management techniques. Because unstructured data volumes are growing fast, look for approaches that have built-in capacity management capabilities, such as compression, deduplication and automated tiering. These features will help keep data volume under control over the long term.
- Consider hybrid and tiered systems. In many cases, the most cost-effective approaches may be hardware that uses flash for hot data storage and HDDs for cold data. Alternatively, it may include tiers of faster and slower flash and cloud storage. The best type of flash SSD storage can be a hybrid mix of higher-capacity, lower-cost, read-optimized storage for storing large amounts of data, combined with lower-capacity, longer-endurance, write-optimized storage for handling metadata and serving as a write buffer, Schulz said.
- Keep up with the trends. The flash market is changing rapidly, so what seems cost-prohibitive today may become affordable tomorrow. Some big data flash storage products that were early to market were withdrawn because of lack of interest. However, the big data flash market is expected to grow as flash costs fall, and new products will likely be entering the market.
"In the future, there will be another class of large-capacity flash using QLC [quad-level cell] that will be more oriented toward read-mostly content," Randy Kerns, an Evaluator Group analyst, wrote in an email. "That hasn't come to the market yet but will at some point. That will create some new opportunities and issues around controlling data placement."
Evaluator Group analyst Randy Kerns discusses how to survive the growth of unstructured data.
Storage managers need to stay informed of these trends in order to make the best decisions about deploying flash-based SSDs and other flash-based options for unstructured data storage.