A Redundant Array of Independent Drives (or Disks), also known as Redundant Array of Inexpensive Drives (or Disks) (RAID) is an term for data storage schemes that divide and/or replicate data among multiple hard drives. RAID can be designed to provide increased data reliability or increased I/O performance, though one goal may compromise the other. There are 10 RAID level. But which one is recommended for data safety and performance considering that hard drives are commodity priced?
I did some research in last few months and based upon my experince I started to use RAID10 for both Vmware / XEN Virtualization and database servers. A few MS-Exchange and Oracle admins also recommended RAID 10 for both safety and performance over RAID 5.
Quick RAID 10 overview (raid 10 explained)
RAID 10 = Combining features of RAID 0 + RAID 1. It provides optimization for fault tolerance.
RAID 0 helps to increase performance by striping volume data across multiple disk drives.
RAID 1 provides disk mirroring which duplicates your data.
In some cases, RAID 10 offers faster data reads and writes than RAID 5 because it does not need to manage parity.
RAID 5 vs RAID 10
From Art S. Kagel research findings:
If a drive costs $1000US (and most are far less expensive than that) then switching from a 4 pair RAID10 array to a 5 drive RAID5 array will save 3 drives or $3000US. What is the cost of overtime, wear and tear on the technicians, DBAs, managers, and customers of even a recovery scare? What is the cost of reduced performance and possibly reduced customer satisfaction? Finally what is the cost of lost business if data is unrecoverable? I maintain that the drives are FAR cheaper! Hence my mantra:
NO RAID5! NO RAID5! NO RAID5! NO RAID5! NO RAID5! NO RAID5! NO RAID5!
Is RAID 5 Really a Bargain?
Cary Millsap, manager of Hotsos LLC and the editor of Hotsos Journal found the following facts - Is RAID 5 Really a Bargain?":
- RAID 5 costs more for write-intensive applications than RAID 1.
- RAID 5 is less outage resilient than RAID 1.
- RAID 5 suffers massive performance degradation during partial outage.
- RAID 5 is less architecturally flexible than RAID 1.
- Correcting RAID 5 performance problems can be very expensive.
My practical experience with RAID arrays configuration
To make picture clear, I'm putting RAID 10 vs RAID 5 configuration for high-load database, Vmware / Xen servers, mail servers, MS - Exchange mail server etc:
| RAID Level | Total array capacity | Fault tolerance | Read speed | Write speed |
| RAID-10 500GB x 4 disks |
1000 GB | 1 disk | 4X | 2X |
| RAID-5 500GB x 3 disks |
1000 GB | 1 disk | 2X | Speed of a RAID 5 depends upon the controller implementation |
You can clearly see RAID 10 outperforms RAID 5 at fraction of cost in terms of read and write operations.
A note about backup
Any RAID level will not protect you from multiple disk failures. While one disk is off line for any reason, your disk array is not fully redundant. Therefore, old good tape backups are always recommended.
Please add your thoughts and experience in the comments below.
Further readings:
- Battle Against Any Raid Five initiative - A website dedicated to RAID related issues.
- RAID article from the wikipedia - Provides tons of information about both standard and non standard RAID levels.
Featured Articles:
- 20 Linux System Monitoring Tools Every SysAdmin Should Know
- 20 Linux Server Hardening Security Tips
- 10 Greatest Open Source Software Of 2009
- My 10 UNIX Command Line Mistakes
- Top 5 Email Client For Linux, Mac OS X, and Windows Users
- Top 20 OpenSSH Server Best Security Practices
- Top 10 Open Source Web-Based Project Management Software
- Top 5 Linux Video Editor Software
- Email this to a friend
- Download PDF version
- Printable version
- Comment RSS feed
- Last Updated: Dec/28/2009




{ 24 comments… read them below or add one }
after years of experience, i avoid raid 5 at all costs.
the only level i dislike more is basic raid 0 configurations.
the cost per gig of drives is so cheap today, that i don’t see a reason to use less than raid 10 if you’re combining multiple disks. otherwise stick with raid 1.
if you’re going to spend the money for 3 drives, build a raid 1 with a hot spare. that way you’re covered in the event of a failure of one drive and you can replace it when convenient.
It looks that with increasing HDD capacities RAID 5 will be not able to provide data safety…
Very good article: Why RAID 5 stops working in 2009 at blogs.zdnet.com/storage/?p=162&tag=nl.e539
“With a 7 drive RAID 5 disk failure, you’ll have 6 remaining 2 TB drives. As the RAID controller is busily reading through those 6 disks to reconstruct the data from the failed drive, it is almost certain it will see an URE.
So the read fails. And when that happens, you are one unhappy camper. The message “we can’t read this RAID volume” travels up the chain of command until an error message is presented on the screen. 12 TB of your carefully protected – you thought! – data is gone. Oh, you didn’t back it up to tape? Bummer!
So now what?
The obvious answer, and the one that storage marketers have begun trumpeting, is RAID 6, which protects your data against 2 failures. Which is all well and good, until you consider this: as drives increase in size, any drive failure will always be accompanied by a read error. So RAID 6 will give you no more protection than RAID 5 does now, but you’ll pay more anyway for extra disk capacity and slower write performance.”
@Tomas M,
Nice find.
@shawn,
Good advice, this can be useful for webservers.
The ZDnet article is iffy at best. The statistical math is wonky, and the numbers are a worst case scenario if and ONLY if you max out the size of the array.
Besides, if you’re using SATA in the Enterprise, you deserve the high failure rate.
As to the comments about hard drive space being cheap – please share with me where you’re getting cheap 500G SAS or SCSI hds.
As to the comments about hard drive space being cheap – please share with me where you’re getting cheap 500G SAS or SCSI hds.
I agree with you but if you purchase large in bulk you may get a good discount.
Besides, if you’re using SATA in the Enterprise, you deserve the high failure rate.
This post covers up SATA vs SCSI / SAS issue nicely.
VonSkippy,
See this paper, the authors conclusions:
I do use SCSI or SAS for all enterprise servers for speed with high-RPM spindles and cache.
However, I do have several terabytes of SATA storage for archives.
Hope this helps!
i agree that sas is a better option, albeit more expensive, but that said; i run multiple low-mid range dell servers on sata 500gig raid 1’s and they’re perfectly fine.
i’d also like to point out that the linked article about sata references 150’s and not 3.0’s/sata ii specs.
i’ve never had issue running sata ii’s with 16mb cache or higher (like 32’s now) in raid configurations; even for high end database systems.
is RAID-10 fault tolerance only 1 disk? shouldn’t it be 2 disk but not on the same raid 1 pair?
my comment is only based on the example. you can lose 1 disk on each raid 1 pair, but not both on the same pair.
You have completely overlooked the use of hot spare drives. RAID 5 is fantastic with a good hardware controller, especially with multiple global hot spares. There is also RAID 50 to consider. Thought provoking article. Thanks.
You also have overlooked Sun’s ZFS filesystem, which is a quantum leap over existing filesystems, and does not require a hardware RAID controller. ZFS maintains a checksum for each byte, and if necessary relocates data on a bad sector to a good sector. ZFS has a ZRAID function which is supposedly better than RAID5. ZFS will also allow you to mirror with three or more drives, so if one fails there are at least two remaining. It also allows snapshots – you can rollback to a known good state – plus tons of other features. ZFS is available on Solaris, opensolaris, Nexenta, and FreeBSD operating systems. Unfortunately it is not on Linux, yet, because ZFS is released under the CDDL. Also the FreeBSD/Nexenta ZFS versions lag behind the Solaris version. You can use any hardware the operating system recognizes, IDE/SATA/SCSI/SAS, but with all the motherboards out there with 6 SATA ports, a cheap and effective file server can be put together using ZFS and SATA drives. You could have two servers with this configuration for the price of one server using a hardware RAID/SAS configuration. Back the data up from one server to the other – eliminates the tape backup problems.
Obviously every case is different. RAID 5 will always give you better price/performance/disk space than RAID 10 for 90-100% READ profiles.
In general, I’ve found that you can get the best performance for your money if you use RAID 10 on bigger slightly slower disks than you can using RAID 5 on smaller faster disks.
Everything is explained here:
Link
Hmm, you make it sound like RAID-5 is the worst thing in the world.
Sure it’s not as safe as RAID-10, but it’s MUCH more price efficient.
Also, I’ve personally run over 10 different RAID-5 systems with all different kind of OS’s and HW’s and never had a complete failure. This is both in Windows & Linux with SCSI, IDE and S-ATA drive configurations.
I should mention that my systems has not been subject to highly intensive database transactions with huge amount of read/write so speed hasn’t been my concern.
Reading this article i just find a “tiny hint” that you’re blowing everything out of proportion =)
Although despite perhaps having to change 5 drives over 8 years and all has been successful to rebuild themselves. Is that just pure luck? Certainly sounds like it reading this…
Have a nice day and happy “RAIDing” out there =)
//MARTiN
As someone who has suffered drive failures in both raid 5 & 10 i am suprised not to see any mention of a few other considerations.
Time to rebuild a 4 disk Raid 10 is about 5 to 10 times faster than rebuilding a Raid 5.
Mainly due to the parity calculations but also depends or controller algorithms and the fact that the Raid 5 may have 50% more data on anyway.
The cost of building a 2tb raid is always going to more expansive for Raid 10. (3 drives vs 4 drives) but with 1tb HDDs being relatively cheap (£70 in UK) you would have to value your data very low not to take the Raid 10 option. Then theres the ‘cost’ of your time to be considered.
before anyone dismisses Esata raids, they do perform very well (mine are in 90s), and when the motherboard dies you dont have to worry about losing all your raids because the new motherboard uses a different raid chip!!
…they dont require you to install drivers during windows installation…
…or the BIOS Raid setting….
….you can add them at any time without having to solve the [ide][raid] motherboard setting problems for those who want to add a raid AFTER installing Windows….
..and of course you can move your esata raid to another machine without worry.
Thanks for listening.
What of RAID 6?
RAID 6 is harder to define. It’s a sort of marketing term launched by some RAID vendors, and thus there are differences in implementation from one RAID 6 implementation to another.
RAID 6 offers more redundancy than RAID 5 (which is absolutely essential, RAID 5 is a walking disaster) at the cost of multiple parity writes per data write. This means the performance will be typically worse (although it’s not theoretically much worse, since the parity operations are in parallel). Additionaly it is more complex, which nets a more complex and possibly more buggy implementation, and less flexilbility with management.
I’m not sure of the theoretical differences of RAID 6 vs RAID 5 with hot standby.
In general, RAID 6 has the same performance signature as RAID 5 with improved reliability but a higer hardware cost. You can also get improved reliability along with higher performance with RAID 10, which is what I would recommend instead.
Hi guys,
Lets agree that RAID-5 is not safe enough on paper, how does it fare in REAL life?
A few people have said here its good enough.
RAID-6 is absolutely misunderstood here. In a RAID-6 config you d have to lose 3 disks BEFORE your system went down. That just does not happen these days. I have never seen it in my 6 years of Storage Admin across enterprise customers, ranging from banks to telecoms to govt. orgs.
The whole “write performance hit” because of parity is also not quite as presented by some here. These days parity is constructed in NVRAM before being written to disk, its written at the SAME time as any other I/O. Hence if your disks and CPU are not challenged you wont suffer any perf. impact at all.
I am also bafffled why some would regard IO to RAID as a bottleneck when typically there are other bottlenecsk that are prevailing, such as network, SANs, CPU, misconfigurations.
Reconstruction next: of course reconstruction will take longer with RAID-DP. Some vendors have got DEDICATED parity disks that will reconstruct whilst the other disks are serving data. Hence the impact will once again be MINIMAL. It will be on CPU but hey, if your storage system is designed properly you are not CPU challenged = no perf. hit.
On IBM when a disk fails all disks inclusive those that serve data, will reconstruct and thus performance is known to drop 30-40%!! how expensive is that? So EACH time a disk fails you will have a cost.
Regards,
Eric
Well, it does, if someone steals your computer or the building catches fire. There’s not much point in making yourself absolutely protected against one threat (individual drive failures) as long as other threats (loss of all drives) exist that will walk past that protection. If you can make the risk to your data due to individual drive failures small compared with other risks then that’s enough.
RAID 5 is a mathematically elegant compromise that strikes me as a pretty good solution to that problem for many applications. It’s obviously a great deal safer than the current most prevalent solution in many contexts, which is a single hard drive, even if it’s not as secure as a less sophisticated scheme like RAID 10.
At least in theory. The actual performance, though, depends on the controller, and I don’t know enough about the choices there.
One advantage to Raid-10 is that if a drive does go down, you won’t notice a performance hit as you would with a Raid-5 while it has to rebuild the array.
Regards,
Rob
Raid-5 vs Raid-10
You can minimize the price difference and play it off, but the difference between RAID 5 and RAID 10 cost as storage space goes up gets pretty hairy.
C = Cost per disk
S = total storage area
s = size of individual disk
RAID 5 cost: C((S/s) + 1)
RAID 10 cost: C((S/s) * 2)
(it should be obvious that you would round up S/s)
If S is 2 TB, s is 146 GB, and C is $500, RAID 5 would cost $7,500. Equivalent RAID 10 would cost $14,000. That’s a pretty hefty price increase, especially when you’re on a budget that still has to pay for CPU’s/RAM/etc…
That said, I don’t run RAID 5 on production databases. I **do** run RAID 5 on reporting databases that handle massive read-only queries, but I can’t afford RAID 5 performance loss for a disk rebuild on a production machine, nor can I afford to take the chance at losing 2 disks holding critical data (whether I have backups or not).
As for safety, RAID 10 definitely has the edge. In the example 2 TB array above, if I chose RAID 5 and had one disk fail, after that every other disk failure gives 100% chance of array loss. If I chose RAID 10, it would have to be the mirror to the disk that already failed (only about a 4% chance, and you can further reduce the odds for RAID 10 failure by creating mirrored pairs with disks from different manufacturers). So play the odds: when a disk fails, you’re looking at 100% chance total loss if another fails, or 4% chance total loss if another fails? Take your pick.
It seems to me that, in theory and for large arrays and assuming the hard drives themselves are the bottlenecks, RAID 6 is going to be better than RAID 10 is every way: faster (less duplication needed), safer (it can guarantee survival after two drives die whereas RAID 10 can’t; it can even correct small errors on the disk) and more space-efficient (less duplication needed).
Whether this is achievable with extant controllers I don’t know, but if it isn’t I’d take it as evidence, not that it’s impossible, but that the market for that level of security isn’t large enough to make it worth developing the hardware. The standard implementation of RAID 5 may not suit every need but the general approach of using parity vice duplication is in principle sound.
Comparisons that neglect the price of hard drives are probably unsound, because by adding another half dozen hard drives you can always make the system safer yet.
The different approaches, etc. all represent three-dimensional surfaces of compromise in a discrete four-dimensional space of safety, speed, space and budget. Except for quite small arrays, the hyperplanes of the parity-based solutions are going to look more attractive than those of more primitive schemes like RAID 10.
The possibility of a 2nd failure in the same mirror sets makes me reluctant to use RAID10.
Does the stress on a drive caused by rebuilding a mirror set make it more likely for a 2nd disk to fail? I wonder if anyone has any actual data on failure rates of RAID10.
RAID 6 seems like the best compromise to me, and I hope that modern hardware controllers with large cache will mitigate the performance issues.
RAID 10 is certainly worth it depending on the context and performance of your data. In fact it may save you money especially when you consider the performance degradation associated with RAID 5 and high random read / write IO databases. Another consideration is to look at host based striping at the same time. The key factor is that the storage system and the host should be sharing the load as much as possible. For more details, please feel free to comment and post on my article
When RAID 10 Is Worth The Economic Cost
RAID 10 is certainly worth it depending on the context and performance of your data. In fact it may save you money especially when you consider the performance degradation associated with RAID 5 and high random read / write IO databases such as Exchange. Another consideration is to look at host based striping at the same time. The key factor is that the storage system and the host should be sharing the load as much as possible. For more details, please feel free to comment and post on my article
When RAID 10 Is Worth The Economic Cost Link
“RAID 6 is going to be better than RAID 10 is every way: faster (less duplication needed)”
Parity calculations is what kills RAID 5 and RAID 6 for write performance, and unless your workload is a) read-only and/or b) highly sequential RAID 10 will outperform RAID 5 or RAID 6…particularly in a random read-write scenario RAID 10 easily outperforms RAID 5 and RAID 6.
Link #1 (ibm.com) and Link #2 (dell.com)
You’ll notice that media streaming or database logs (highly sequential) is where RAID 5 and RAID 6 shine, being outperformed only by RAID 0 on those tests. In virtually every other test RAID 10 has better performance.