- Share this item with your network:
Maxim_Kazmin - Fotolia
The data storage capacity growth survival guide
You can keep your storage investments in place, allow for capacity increases, and solve performance and application problems, while gathering data to plan for new growth.
- Logan G. Harbaugh, Independent consultant
Business pressures outside the data center could lead to hasty decisions within, such as too quickly zeroing in on the cause of a storage performance slowdown. Hastiness could lead you to upgrade your storage network to all-new, faster connections, only to discover later that you haven't solved anything. Why? Because, as it turned out, the real problem was latency caused by having too many servers banging on the system, rather than the speed of any individual connection.
Measuring application performance from end to end, preferably with a system that does historical tracking, is the only real way to discern exactly what's causing a slowdown. You collect data over days, weeks or months to see how it's moving over time, which can identify periodic spikes from regular events -- from users logging in at 8 a.m. to weekly backups at midnight -- and identify traffic consumption trends. That way, in addition to zeroing in on what's negatively impacting storage today, you can detect trends and potential future problems, and maybe even fix them before they become a real issue.
The onset of storage performance troubles often goes hand-in-hand with initiatives to increase -- or make the most of -- data storage capacity. Fortunately, storage management products not only help solve and prevent pokey storage problems, they often include tools to help increase available capacity without negatively affecting performance.
Zeroing in on capacity
Storage management applications help manage data storage capacity by identifying files that don't need to be on a system (do your users really need to store a few terabytes of home movies in their work folder?), or by moving files that haven't been opened for a while to older, slower storage. Also, when running low on capacity, you can search, move or delete large files or certain file types, such as .mpeg; compress data; and migrate it off primary storage to secondary, tertiary or other storage.
While you can manually search using tools in Windows, those built specifically for storage management from array vendors or a third party can do much more (see "Tools of the trade"). These include building flexible queries that run during off hours, as well as warning end users when files need moving before, for example, files over a certain size are automatically deleted. And while compression tools are also available in Windows -- and may be included with a storage system or storage management application -- care should be taken that compression overhead doesn't negatively impact storage performance because of write delays. And, since a whole file must be decompressed and recompressed whenever even one byte is changed, the impact on overall file system performance can be disproportionate.
Tiering tools can move files and leave pointers to the new location. Consequently, when a user clicks on a file in the old location, it is returned from the new one and opened. Alternatively, you can create a single namespace that concatenates several different storage systems, so the process of accessing files is completely transparent to users, no matter where they actually reside. Some tiering tools can even move files to the cloud, where costs are much lower, and return them as needed.
Tiering and performance
Caching and tiering have something in common: If they're sized large enough to provide space for the 10% of data typically active at any given time, all the data stored on the next tier down will effectively have the same performance as the top tier. That's because all the data can be served from the fast 10%, with files being moved up from the next tier as needed.
This also works when there are more than two tiers. For example, a flash tier with a data storage capacity of 10 TB can front a fast hard drive tier with 100 TB, which can front a high-capacity tier with a petabyte of storage, which can then front a cloud tier with 10 PB. Applications then get the effective performance of the flash tier for the whole 11.1 PB of capacity in the storage infrastructure.
The key here is predictive software that can fetch a whole file when only a part is requested and monitor the overall system to ensure the data's migrated up the chain as needed. Big arrays integrate these tools into their storage controllers, or you can purchase them separately in products such as DataCore SANsymphony.
Because some users dump gigabytes or terabytes of music, movies and more on company storage, you should regularly accumulate trending data on the percent of overall storage in use to identify such drains on resources and move the data elsewhere. However it's done, whether through the software included with your array, a separate storage management app such as Veritas InfoScale Operations Manager, or by tracking with a spreadsheet and the drive properties function in your server OS, it doesn't matter as long as it's done accurately and consistently.
In addition, having an "official" data policy in place -- from encouraging users to keep music files off company servers to the archiving of old files -- is a good starting place. A formal data policy can keep your backside covered when a data cleanup deletes files users thought would be kept indefinitely.
Tools of the trade
Although Windows Server and your existing storage may include storage management tools that measure application performance to help identify the source of any problems and increase data storage capacity, you may want to consider buying a third-party tool. Because even if some of your storage includes storage management software, it's likely proprietary and will only help manage your current vendor's storage arrays.
A third-party tool is capable of measuring and managing data across multiple vendors' hardware, and can even help add additional tiers of storage or migrate some storage to the cloud. Examples of third-party storage management include DataCore SANsymphony, Veritas InfoScale Operations Manager and SolarWinds Storage Resource Monitor. Most vendors offer a 30-day trial or limited version you can try to see if their product meets your needs before buying.
There are also appliances that provide data management capabilities, from storage monitoring to acting as cloud storage gateways. These are often available for testing as virtual machines that can be installed, previewed and then purchased as a separate hardware appliance if you like the functionality.
One simple way to avoid squabbles with users is to manually or (preferably) use tiering software to move old, unused, obsolete or inappropriate data from primary storage to secondary or tertiary storage. If your SAN or NAS doesn't support this directly, you can buy auto-tiering software like SANsymphony separately. These programs let you use older storage as a secondary tier where you can move specific file types or files that haven't been accessed for a while, keeping your new and expensive high-performance storage free for appropriate tasks.
Storage, like other computer systems, tends to slow down as it gets older. A filer that was very fast when first installed and only had to support a few users and a few thousand files, may not perform as well with 12,000 users and 40,000,000 files -- even if there appears to be sufficient disk space available. And that archive of a thousand CD-ROM images may only take up a few terabytes of disk space, for example, but the number and size of the files could cause trouble by slowing file system performance down and increasing the amount of system memory used by storage.
A monitoring tool that finds and notifies administrators of these kinds of issues can quickly pay for itself, and will often postpone the purchase of new hardware by regaining lost efficiency. Often included with storage systems, examples of stand-alone monitoring tools include SolarWinds Storage Resource Monitor and Cloudera Enterprise.
Beyond pruning files and reducing overhead, it's possible to help an existing storage system deliver data faster with quicker response times without buying a new array. For instance, you can place an all-flash caching appliance, such as Permabit's Enterprise Flash Caching Appliance or Cirrus' Data Caching Server, between servers and storage. By caching most-used data and serving it up from the cache instead of old storage, the appliance transparently accelerates existing storage for relatively little cost.
Caching holds data temporarily, moving files in and out as they're requested, while a tier 0 system adds a new permanent layer of faster storage. Tiering software can cause a tier 0 layer to act as a cache, but it isn't automatic like a cache.
Depending on the
Expanding existing arrays
Adding modules, additional shelves of drives or new nodes in a cluster is an option for those with substantial investments in one vendor's storage. However, while storage vendors often make discounts available to existing customers, and no new training is enticing, it may be the case that your existing array's controller won't support the faster interconnect speed of the new drive shelf, at least not without an additional hardware upgrade that requires taking storage controllers offline for hours and buying a new level of support.
It may be simpler and cheaper in the long run to add a new storage system, using storage management software to integrate the two systems. This has the added benefit of allowing the purchase of a highly rated system every few years, rather than locking you into a vendor that may not have kept up with the latest technologies.
Relatedly, while it may seem like buying partially populated storage systems with the intention of purchasing additional drives as prices drop, capacities increase and performance improves is an effective strategy for growing data storage capacity, it isn't unless the times between buys are very short -- quarterly or even monthly. Over longer intervals, it may become impossible to find drives to fit, say, a 2-year-old chassis, let alone one that's 5 years old.
This is also true of systems that come fully loaded, with capacity beyond what you buy initially. Here, you simply purchase software keys from the array vendor to unlock more storage as needed. However, by the time you require more capacity, you're likely to need orders of magnitude more, rather than twice what you already have. In a typical capacity-on-demand system, you might buy 50 TB with a system that has 100 TB. By the time you're ready to unlock the additional 50 TB, you may find you really need 250 TB, or 500 TB.
Clustered storage: Open-ended systems
One answer for allowing expansion without having to maintain hardware compatibility is to cluster storage. This could be a system with software distributed over a number of storage nodes, think HyperGrid and Isilon. So when storage gets low, adding additional nodes not only increases data storage capacity, it improves performance as well, as reads and writes are then distributed over all the nodes in the cluster. And since many clustered products are hardware-agnostic, they allow you to purchase nodes as a complete bill of materials from one vendor or built with hardware available through many different vendors.
Heterogeneous storage: Fighting vendor lock-in
If you buy a SAN or NAS from one of the main vendors, you get a system with lots of features and good performance for a reasonable cost. But two years later, you may go back to the same vendor, only to discover they've purchased a couple of competitors and are now pushing you toward a forklift upgrade to their newest system or, worse, no longer have the parts to expand your existing system -- despite salesman assurances that parts would be available for at least five years.
There are ways to avoid this scenario. Buy a software-defined storage system, storage management system or appliance that allows you to create a single access point for multiple arrays from multiple vendors. These systems enable you add new storage systems from any vendor you like while getting maximum performance from new arrays, all while continuing to use the old ones.
It's possible to keep your existing investments in storage active, allow for capacity increases and find and solve problems with storage and application performance, all while gathering data to plan for new growth without surprises. All it takes is measurement, analysis and tracking software, which will let you stay on top of the performance of your storage system.
This doesn't have to be expensive, either in purchases or time spent. Having accurate data will not only point you toward possible inexpensive fixes, it can help justify the purchase of new storage when it's really needed.
Steer clear of roadblocks to storage capacity management
Set your useless data free
The challenge of managing data capacity