You know you need storage, and you have a pretty good idea of how to size the SAN layer. But how do you go about determining the correct number and proper type of disks? How many drives in a RAID group will it take to give you the performance you need for a database or email or file server, or for that new VMware implementation? Can you have too many drives in a RAID group, or even in the actual storage RAID array?
We'll now cover these parameters for the RAID array and underlying disk technology. When should you use SAS, SATA, FC or SCSI? What are the other metrics for sizing storage arrays, such as optimal max spindles per controller? What about the other vendor performance specifications?
Vendor data sheets tell you just about all you need to know about storage arrays and disk drives: raw capacity, throughput (usually in MB/sec), IOPS, reliability (MTBF), and drive type (SAS, SATA, FC, etc). Most storage arrays are going to have Fibre Channel interfaces on the front end (some offer InfiniBand), and can be FC, SATA or SAS on the disk end. They support RAID unless the array is JBOD (Just a Bunch Of Disks). DAS (direct attached storage) is what your environment typically contains if you don't have a SAN, and can refer to internal drives in your server or a SCSI RAID array attached to one or two servers.
RAID is basically a group of disks, usually with one or both characteristics of parity and striping. Parity is redundancy of your blocks of data on the disks; striping allows the individual drive speeds and feeds to add up, giving you more performance than a single disk could provide. Each RAID type has tradeoffs in reliability, performance and cost. (The level of redundancy you choose can cost you a lot of usable capacity.)
Some aspects of RAID can affect throughput or IOPS. (Remember, you'll want a high IOPS performance for latency-sensitive applications such as databases or email). Performance under RAID can take a hit during a drive failure because the RAID controller will be working hard to rebuild the RAID group using your global hot spare.
Wait a second, Joel! I just bought four 1 TB drives. What do you mean I only have 2.7 TB of usable capacity?
Usable capacity vs. raw capacity. The capacity you have to play with won't be exactly what's stated on the spec sheet. That 750GB SATA drive is likely to be effectively about 690 GB due to disk geometry or "overhead." A 1TB drive might format out to about 900GB.
The type of RAID also affects your usable capacity. Most RAID arrays will allow you to have one (or more) global hot spare, a disk waiting to step up for a failed drive in a RAID group, which would begin rebuilding the redundancy on that drive after a disk failure. When calculating usable capacity, don't forget to take away the drive capacity of your hot spares.
What do you mean we're only getting 200 MB/sec? The data sheet says that this is an 800 MB/sec RAID array.
The performance delivered by your SAN storage will vary significantly, depending upon how much of your data access can be defined as sustained sequential reads, random reads, sustained sequential writes, random writes or some combination of the above. Don't worry -- you won't need to go through benchmark reports on each LUN unless you want to target a specific application problem. Odds are, unless you're sizing a specific application, you'll have a mix of the above.
Keep one thing in mind. For a RAID array, the worst-case performance scenario is random write, and the best-case performance scenario is sustained sequential reads. Guess which of those two performance numbers is going to be prominently displayed in the vendor data sheet.
The maximum number of drives behind a pair of controllers is usually specified, and that number is more than likely going to be "too many." You need to find out how many drives are supported where maximum performance is gained, and how many you can add before performance is noticeably affected. (This is only an issue on modular storage; your big iron enterprise arrays will support a massive number of drives just fine.) When sizing your SAN disk requirements, I advise getting input from your DBA or email admin on their specific requirements.
How do you identify which kind of drives you need for your environment? By categorizing your capacity requirements in this fashion:
IOPS-sensitive applications will be your first category. Throughput-sensitive applications (video streaming, video editing, backup to D2D or VTL, etc) will be another. Your basic file servers, web servers, print, home drive space and archive space will go into an archive category.
For transaction applications, such as database or email applications, you'd lump those servers' capacity requirements into the IOPS-sensitive category. Recommendation: 15K SAS or FC disks, and lots of them. Should you determine the RAID type and work out your usable capacity and find you need five 300GB 15K RPM SAS or FC drives to get to that usable capacity, you might be better off going with ten 15K 146GB SAS or FC drives. With twice as many spindles, you'd get more IOPS, and ten 15K 146GB drives would cost less than five 15K 300GB drives.
If you have a large storage environment and need a lot of drives, you'd be smarter to get the 300GB 15K drives, since you'll have enough spindles to gain IOPS benefit, and will still be able to use the capacity. If you have a target for the amount of IOPS you need, the performance consideration below will give you a rule of thumb on meeting this requirement. You might find that you need 20 76GB 15K drives instead. Your RAID groups probably won't be that big but you can make more than one RAID group, and the person integrating this will be able to optimize how it is used by the server utilizing this set of storage resources. (The more drives you have in a RAID group, the longer rebuild takes during a failure.)
Video editing, video streaming, backup and certain types of file servers go into the throughput-sensitive category. (A workload that might put 'simultaneous' demands on your storage resources might require 10K or 15K SAS or FC disks, but 7.2K RPM SATA disks with the right kind of controllers could also deliver enough performance.) I've seen 800MB/sec sustained sequential write performance on storage servers using newer SATA RAID controllers. Note: if your web, file and print servers do not have a demanding speed or IOPS requirement, you'd want to put them into your deep and cheap archive category. This would be your 750GB or 1TB 7.2K RPM SATA drives. I also recommend RAID 6 for SATA storage solutions since you'll see twice the failure rate of SATA drives compared to SAS or FC according to MTBF stats from manufacturers.
If you want high availability in your storage solution, you'll need some level of redundancy or parity to protect your data in the event of one or more drive failures. You'll also probably need to leverage striping (RAID 0) to aggregate the individual disk drives' performance, and in reality some combination of parity and striping. This is where you end up with RAID 50 (5+0 or 0+5) or RAID 10 (1+0 or 1/0). Let's compare the more common RAID offerings (RAID-1/0, RAID-5, RAID 50 and RAID-6 solutions) based on speed, space utilization and performance during rebuilds and failures.
Comparison of RAID types
RAID-1/0 is where data is striped (RAID-0) across mirrored (RAID-1) sets. (RAID-0-1 is not the same as RAID-1/0; I don't recommend RAID-0-1 for Microsoft Exchange data.) Transactional performance with RAID-1/0 is good because either disk in the mirror can respond to read requests. No parity information needs to be calculated so disk writes are handled efficiently. Each disk in the mirrored set must perform the same write.
If a disk fails in a RAID-1/0 array, write performance is not affected because there a member of the mirror can still accept writes. Reads are moderately affected because now only one physical disk can respond to read requests. When the failed disk is replaced, the mirror is again established, and the data must be copied or rebuilt. However, your disk capacity is cut in half, because you are creating 1 for 1 redundancy on the disks.
RAID-5 involves calculating parity that can be used with surviving member data to recreate the data on a failed disk. Writing to a RAID-5 array causes up to four I/Os for each I/O to be written, and the parity calculation can consume controller or server resources. Transactional performance with RAID-5 can still be good, particularly when using a storage controller to calculate the parity.
When a disk fails in a RAID-5 array, the array is in a degraded state, performance is less and latencies are higher. This situation occurs because most arrays spread the parity information equally across all disks in the array, and it can be combined with surviving data blocks to reconstruct data in real time. Both reads and writes must access multiple physical disks to reconstruct data on a lost disk, thereby increasing latency and reducing performance on a RAID-5 array during a failure.
When the failed disk is replaced, the parity and surviving blocks are used to reconstruct the lost data, a lengthy process that can take days. If a second member of the RAID-5 array fails during the Interim Data Recovery Mode or rebuild, the array is lost. RAID-6 was created to address this vulnerability.
RAID Levels 0+5 (05) and 5+0 (50) are techniques where you have block striping with distributed parity combined with block striping. RAID 05 and 50 form large arrays by combining the block striping and parity of RAID 5 with the straight block striping of RAID 0. RAID 05 is a RAID 5 array comprised of a number of striped RAID 0 arrays; it is less common than RAID 50, which is a RAID 0 array striped across RAID 5 elements. RAID 50 and 05 improve the performance of RAID 5 through the addition of RAID 0, particularly during writes. It also provides better fault tolerance than the single RAID level does, especially if configured as RAID 50. Most characteristics of RAID 05 and 50 are similar to those of RAID 03 and 30. RAID 50 and 05 are preferable for transactional environments with smaller files than 03 and 30. If you're doing video editing, I suggest investigating RAID 03 and 30.
RAID-6 adds another parity block and provides about double the data protection over RAID-5, but at a cost of even lower write performance. As physical disks grow larger, and consequently RAID rebuild times grow longer, RAID-6 may be necessary to prevent LUN failure if an uncorrectable error occurs during the rebuild, or if a second disk in the array group fails during rebuild. Due to disk capacity, some vendors support RAID-6 instead of RAID-5.
To achieve the IOPS goal of the Exchange 2007 requirements for a given capacity, RAID 5 may actually require more spindles than RAID 10.
Ultimately, performance depends on the performance characteristics of the drives, the configuration of RAID groups and the type of RAID. When choosing RAID 5 (or RAID 6), it's important to consider that each host IOP has 4+ operations associated with it due to this being a partial stripe RAID 5 write or RAID 6 double stripe write. The operations read drive/read parity, recomputed parity, write drive, write parity, and reduce the effective IO rate of the drive by ¼.
Selecting a RAID type
To select a RAID type, you'll need to balance your requirements for capacity, throughput, transactional I/O and failure/rebuild performance. RAID-1/0 is the ideal configuration for databases and email, and it works well with large capacity disks. Having more writes as a percentage of total I/O in your environment? Use RAID 1/0. RAID 1/0 will give you performance consistency even during a drive failure.
For RAID-5 and RAID-6, rebuild performance can have a significant effect on storage throughput, cutting it as much as half, depending on the storage array and configuration. Scheduling rebuilds outside of production hours can offset this performance drop, but you'll sacrifice reliability. In a cluster continuous replication (CCR) environment, you can prevent the throughput reduction affecting users by moving the Mailbox server to the passive node, thereby making it the active node. If neither option is available, additional I/O throughput should be designed into the architecture to accommodate RAID-5 or RAID-6 rebuild conditions during production hours. This additional I/O throughput can be up to twice the non-failed state I/O requirements.
If your backup solution (VTL or D2D pool) needed to sustain a certain amount of data throughput, you'd have to consider how many drive resources would be needed to handle that level of 'sustained, sequential write' performance. Simply put, if your RAID array can do 350 MB/sec sustained sequential writes according to the specs, odds are those numbers are based on load balancing across all the controllers to the disk resources. You'd need to make sure there are enough drives to get you to maximum performance of the array. Usually you can do this with at least two or three 'trays' of disks. Plan on creating RAID groups for each 'channel' (data path going to the RAID controllers on the RAID array). So if you have a dual controller RAID array, you'd want to have your RAID groups evenly divided between any controllers you have.
About the author: Joel Lovell is senior storage consultant for Storage Engine Inc. His specialty is high-performance storage and storage consolidation. He is EMC-trained in business continuity solutions, enterprise storage infrastructure and enterprise storage management. He previously was a strategic storage specialist for the Americas for Silicon Graphics and a senior systems engineer for EMC.