Maxim_Kazmin - Fotolia

Seven tips to improve data storage efficiency

Storage networks, server-side flash, network cache, software-defined storage and all-flash or hybrid arrays help storage performance and efficiency.

IT professionals constantly wrestle with storage performance to make sure applications receive the resources they require to run optimally. Back when hard disk technology was state of the art, this meant using techniques that were expensive and inefficient -- such as striping data across dozens, if not hundreds of hard disks and then formatting those drives to make only half their capacity available to applications.

The arrival of more affordable flash storage promises to break storage's bottleneck on application performance for the foreseeable future. To get the most out of flash, you need to implement it in the right way and with the right complementary technologies, however. That way, you can extract maximum performance and greater efficiency from your solid-state storage deployments and storage networks overall.

For active data, for instance, flash delivers better performance with fewer moving parts than hard disk drives. The result is that flash is often less expensive to deploy than hard disks for primary data use cases, especially over the long haul. The problem with solid-state storage is that only about 5% to 10% of data center data is active at any given point in time. So you might as well save some cash and store the remaining 90% or more on much higher-capacity, less-expensive HDDs or, as an ever-increasing number of organizations are doing, in the cloud.

As this example illustrates, flash isn't necessarily going to improve data storage efficiency and performance on its own. You need to start with a solid foundation, which leads us to the first of our seven tips for achieving faster, more efficient storage.

Improve the storage network

While it is true that the latency of a hard disk-based system won't expose a network's weak point, a flash-based system will. Consequently, before upgrading to flash storage or adding additional SSDs to an existing system, you should first maximize the capability of your storage network. There are three components of that network to consider: the host bus adapters (HBAs) or network interface cards (NICs) in the servers and storage system, the network switch and the cabling infrastructure.

It's tempting to only look at the bandwidth capabilities of the first two components (NICs/HBAs and switches), which should be 10 GbE or 16 Gbps Fibre Channel (FC) or faster. While bandwidth is important, latency and quality of delivery is more so. Most data centers don't generate enough continuous transactions to flood a high-speed network. Instead, they generate millions of very small transactions. The efficiency of the network in moving these transactions from the servers to storage and back again is critical to extracting maximum performance from a flash investment.

Data storage efficiency vs. performance

Efficiency and performance are diametrically opposite forces in the world of storage, as efficiency typically increases value at the expense of performance. So many of the techniques we use to increase data storage efficiency -- such as thin provisioning, deduplication and compression -- actually hurt storage system performance. Flash storage creates a middle ground between efficiency and performance, though. Yes, using these techniques on flash hurts performance, just as they do with hard disk drives. But because flash performance is so high, it usually delivers excess performance cycles. Consequently, running your usual data storage efficiency routines won't noticeably impact performance from a user perspective.

Cabling is also critical, and an often overlooked factor in storage network performance and data storage efficiency. You should build the cabling infrastructure on fiber optics to support the high bandwidth and low latency capabilities of both current and next-generation networks, and structure it for easily determining port assignments. You also need to understand "link loss budget," which is the amount of signal lost between connections.

Once you've got your storage network fine-tuned, it's time to consider flash deployments.

Implement server-side flash

In a server-side flash design, the network and storage attached to that network remains the same, with basically hard drive-based storage arrays installed where the speed and quality of the storage network is not as critical as when implementing a shared flash array. How you leverage server-side flash can vary, however.

The design that is least impactful to the network is when you isolate server flash. Here, you install an SSD or flash PCIe card that is only responsible for the I/O for that server. The server itself becomes a single point of failure, so this use case is only suitable for read caching of data that is stored on a shared storage array.

By contrast, there are server-side flash techniques that aggregate internal flash storage from multiple servers to create a virtual flash pool. These server-side flash aggregation products build in redundancy and are suitable for read and write caching or even as a storage tier. They do introduce the network factor in terms of performance, however, since the aggregation requires a network to create the virtual storage pool.

Deploy a network cache

Unlike a storage system upgrade, which only increases the performance of a single system, a network cache improves the performance of every storage system on the network. These devices essentially sit inline between the storage system and the servers, caching the most active data. Many network caches are available in high-availability configurations, making them suitable to cache both read and write I/O. You can also size network caches to have flash storage areas large enough to store an organization's entire active data set, essentially turning existing arrays into archive and data protection storage systems.

An important benefit of network cache is the ability to improve storage performance without replacing current data protection policies and procedures. Those procedures remain unchanged because data would now be located on both the cache and the original storage system.

Note that it is important to look for a network cache that can programmatically flush the cache prior to a snapshot or backup job beginning. You should also consider the quality of your network infrastructure and its components before deployment.

Consider a cloud-enabled network cache

This variant of the network cache option takes a hybrid cloud approach. Several vendors -- such as Avere, Microsoft Azure StorSimple, Nasuni and TwinStrata from EMC -- offer all-flash network caches that migrate inactive data to a cloud storage location like Amazon, Azure or Google instead of local storage. In fact, this is probably one of the most practical paths to an all-flash data center, as the data center can now truly be all-flash while old data is stored and protected in the cloud.

Implement SDS with a small flash array

Another option to improve storage performance and data storage efficiency is to use software-defined storage (SDS). These products run on either an appliance or within a hypervisor and provide a common set of storage software features across a variety of hardware arrays. Some SDS systems can leverage existing storage hardware as well as provide automatic migration of data between them. If you add a small flash array to existing infrastructure, you can use SDS to automatically move the most active data set to the array to improve performance and, as an added bonus, simplify management because all storage management then becomes unified.

Optimize applications

Closely examine the applications you intend to run before implementing a new or enhancing an existing storage system. Many storage professionals find this particularly daunting because they neither own the application nor understand the code that surrounds it. The good news is that there are programs available that can examine application code, provide an unbiased analysis of its quality and make specific recommendations on what to change and where.

While it's tempting to skip this step and just throw more hardware at the issue, don't. A code-related performance problem may be masked by high-performance storage, but it will not allow flash to live up to its full potential -- thereby forcing administrators to chase their tails looking for other potential performance detractors, such as the storage network. Fixing code before implementing flash may even circumvent the need for flash in the first place, or lower how much flash you need to purchase.

Buy a new all-flash or hybrid array

These are ideal for data centers that have existing hard disk-based systems that still have useful life and are still covered under the original warranty, so you can redeploy those older HDD systems and augment them with a new flash array. At some point, however, you will need to purchase a new storage system. Today, that means choosing between an all-flash or hybrid array. The initial decision is relatively simple: if the organization can afford an all-flash array that will meet its capacity requirements (it's safe to assume that performance requirements will be met), then purchase one and don't look back.

Many organizations aren't going to find a flash array that will fit their budget, however. They can get much of the same benefit of an all-flash array without that level of investment by selecting a hybrid array, which combines flash and HDDs into the same system and then, through software, automatically moves data between them.

The primary concern over hybrid arrays, a cache miss, is a thing of the past. This was a worry when flash capacity was so expensive that the flash tier of a hybrid array comprised less than 5% of total storage capacity. Now, though, the flash tier is often 25% of capacity (if not more), significantly lowering the likelihood of a cache miss.

In summary

The road to improving storage performance does not begin with an all-flash investment. It starts with a close examination of the storage network as whole. Once that's done, there are many other storage performance and data storage efficiency enhancement options to consider, many of which include some type of flash storage deployment. Which products work best varies from data center to data center and -- thanks to a couple of the tips in this article -- some IT shops may not even need to upgrade their storage systems at all.

About the author:
George Crump is president of Storage Switzerland, an IT analyst firm focused on storage and virtualization.

Next Steps

How to effectively compare storage system performance

Dig Deeper on Primary storage devices

Disaster Recovery
Data Backup
Data Center
and ESG