≡ Menu

fault tolerance

A Redundant Array of Independent Drives (or Disks), also known as Redundant Array of Inexpensive Drives (or Disks) (RAID) is an term for data storage schemes that divide and/or replicate data among multiple hard drives. RAID can be designed to provide increased data reliability or increased I/O performance, though one goal may compromise the other. There are 10 RAID level. But which one is recommended for data safety and performance considering that hard drives are commodity priced?

I did some research in last few months and based upon my experince I started to use RAID10 for both Vmware / XEN Virtualization and database servers. A few MS-Exchange and Oracle admins also recommended RAID 10 for both safety and performance over RAID 5.

Quick RAID 10 overview (raid 10 explained)

RAID 10 = Combining features of RAID 0 + RAID 1. It provides optimization for fault tolerance.

RAID 0 helps to increase performance by striping volume data across multiple disk drives.

RAID 1 provides disk mirroring which duplicates your data.

In some cases, RAID 10 offers faster data reads and writes than RAID 5 because it does not need to manage parity.

Fig.01: Raid 10 in action

Fig.01: Raid 10 in action

RAID 5 vs RAID 10

From Art S. Kagel research findings:

If a drive costs $1000US (and most are far less expensive than that) then switching from a 4 pair RAID10 array to a 5 drive RAID5 array will save 3 drives or $3000US. What is the cost of overtime, wear and tear on the technicians, DBAs, managers, and customers of even a recovery scare? What is the cost of reduced performance and possibly reduced customer satisfaction? Finally what is the cost of lost business if data is unrecoverable? I maintain that the drives are FAR cheaper! Hence my mantra:
NO RAID5! NO RAID5! NO RAID5! NO RAID5! NO RAID5! NO RAID5! NO RAID5!

Is RAID 5 Really a Bargain?

Cary Millsap, manager of Hotsos LLC and the editor of Hotsos Journal found the following facts - Is RAID 5 Really a Bargain?":

  • RAID 5 costs more for write-intensive applications than RAID 1.
  • RAID 5 is less outage resilient than RAID 1.
  • RAID 5 suffers massive performance degradation during partial outage.
  • RAID 5 is less architecturally flexible than RAID 1.
  • Correcting RAID 5 performance problems can be very expensive.

My practical experience with RAID arrays configuration

To make picture clear, I'm putting RAID 10 vs RAID 5 configuration for high-load database, Vmware / Xen servers, mail servers, MS - Exchange mail server etc:

RAID LevelTotal array capacityFault toleranceRead speedWrite speed
RAID-10
500GB x 4 disks
1000 GB1 disk4X2X
RAID-5
500GB x 3 disks
1000 GB1 disk2XSpeed of a RAID 5 depends upon the controller implementation

You can clearly see RAID 10 outperforms RAID 5 at fraction of cost in terms of read and write operations.

A note about backup

Any RAID level will not protect you from multiple disk failures. While one disk is off line for any reason, your disk array is not fully redundant. Therefore, old good tape backups are always recommended.

Please add your thoughts and experience in the comments below.

Further readings:

Multipath I/O is a fault-tolerance and performance enhancement technique whereby there is more than one physical path between the CPU in a computer system and its mass storage devices through the buses, controllers, switches, and bridge devices connecting them.

A simple example would be a SCSI disk connected to two SCSI controllers on the same computer or a disk connected to two Fibre Channel ports. Should one controller, port or switch fail, the operating system can route I/O through the remaining controller transparently to the application, with no changes visible to the applications, other than perhaps incremental latency.

This is useful for:

  1. Dynamic load balancing
  2. Traffic shaping
  3. Automatic path management
  4. Dynamic reconfiguration

Linux device-mapper

In the Linux kernel, the device-mapper serves as a generic framework to map one block device onto another. It forms the foundation of LVM2 and EVMS, software RAIDs, dm-crypt disk encryption, and offers additional features such as file-system snapshots.

Device-mapper works by processing data passed in from a virtual block device, that it itself provides, and then passing the resultant data on to another block device.

How do I setup device-mapper multipathing in CentOS / RHEL 4 update 2 or above?

Open /etc/multipath.conf file, enter:
# vi /etc/multipath.conf
Make sure following line exists and commented out:

devnode_blacklist {
        devnode "*"
}

Make sure default_path_grouping_policy option in the defaults section set to failover. Here is my sample config:

defaults {
       multipath_tool  "/sbin/multipath -v0"
       udev_dir        /dev
       polling_interval 10
       default_selector        "round-robin 0"
       default_path_grouping_policy    failover
       default_getuid_callout  "/sbin/scsi_id -g -u -s /block/%n"
       default_prio_callout    "/bin/true"
       default_features        "0"
       rr_min_io              100
       failback                immediate
}

Save and close the file. Type the following command to load drivers:
# modprobe dm-multipath
# modprobe dm-round-robin

Start the service, enter:
# /etc/init.dmultipathd start
multipath is used to detect multiple paths to devices for fail-over or performance reasons and coalesces them:
# multipath -v2
Turn on service:
# /sbin/chkconfig multipathd on
Finally, create device maps from partition tables:
# kpartx -a /dev/mapper/mpath#
You need to use fdisk on the underlying disks such as /dev/sdc.

References:

  • man page kpartx,multipath, udev, dmsetup and hotplug