5 Tips To Speed Up Linux Software Raid Rebuilding And Re-syncing

Posted on in Categories Data recovery, Hardware, Linux, Storage last updated May 1, 2017

It is no secret that I am a pretty big fan of excellent Linux Software RAID. Creating, assembling and rebuilding small array is fine. But, things started to get nasty when you try to rebuild or re-sync large size array. You may get frustrated when you see it is going to take 22 hours to rebuild the array. You can always increase the speed of Linux Software RAID 0/1/5/6 reconstruction using the following five tips.

Recently, I build a small NAS server running Linux for one my client with 5 x 2TB disks in RAID 6 configuration for all in one backup server for Linux, Mac OS X, and Windows XP/Vista/7/10 client computers. Next, I type the command cat /proc/mdstat and it reported that md0 is active and recovery is in progress. The recovery speed was around 4000K/sec and will complete in approximately in 22 hours. I wanted to finish this early.

A note about lazy initialization and ext4 file system

When creating an ext4 file system, the Linux kernel uses lazy initialization. This feature allows the faster creatation of a file system. A process called “ext4lazyinit” runs in the background to create rest of all inode tables. As a result, your RAID rebuild is going to operate at minimal speed. This only affects if you have just created an ext4 filesystem. There is an option to enable or disable this feature while running mkfs.ext4 command:

  1. lazy_itable_init[= <0 to disable, 1 to enable>] – If enabled and the uninit_bg feature is enabled, the inode table will not be fully initialized by mke2fs. This speeds up filesystem initialization noticeably, but it requires the kernel to finish initializing the filesystem in the background when the filesystem is first mounted. If the option value is omitted, it defaults to 1 to enable lazy inode table zeroing.
  2. lazy_journal_init[= <0 to disable, 1 to enable>] – If enabled, the journal inode will not be fully zeroed out by mke2fs. This speeds up filesystem initialization noticeably, but carries some small risk if the system crashes before the journal has been overwritten entirely one time. If the option value is omitted, it defaults to 1 to enable lazy journal inode zeroing.

Tip #1: /proc/sys/dev/raid/{speed_limit_max,speed_limit_min} kernel variables

The /proc/sys/dev/raid/speed_limit_min is config file that reflects the current “goal” rebuild speed for times when non-rebuild activity is current on an array. The speed is in Kibibytes per second (1 kibibyte = 210 bytes = 1024 bytes), and is a per-device rate, not a per-array rate . The default is 1000.

The /proc/sys/dev/raid/speed_limit_max is config file that reflects the current “goal” rebuild speed for times when no non-rebuild activity is current on an array. The default is 100,000.

To see current limits, enter:
# sysctl dev.raid.speed_limit_min
# sysctl dev.raid.speed_limit_max

Sample outputs:

dev.raid.speed_limit_min = 10000
dev.raid.speed_limit_max = 20000

NOTE: The following hacks are used for recovering Linux software raid, and to increase the speed of RAID rebuilds. Options are good for tweaking rebuilt process and may increase overall system load, high cpu and memory usage.

To increase speed, enter:
echo value > /proc/sys/dev/raid/speed_limit_min
sysctl -w dev.raid.speed_limit_min=value
In this example, set it to 50000 K/Sec, enter:
# echo 50000 > /proc/sys/dev/raid/speed_limit_min
# sysctl -w dev.raid.speed_limit_min=50000
If you want to override the defaults you could add these two lines to /etc/sysctl.conf:

#################NOTE ################
##  You are limited by CPU and memory too #
dev.raid.speed_limit_min = 50000
## good for 4-5 disks based array ##
dev.raid.speed_limit_max = 2000000
## good for large 6-12 disks based array ###
dev.raid.speed_limit_max = 5000000

Tip #2: Set read-ahead option

Set readahead (in 512-byte sectors) per raid device. The syntax is:
# blockdev --setra 65536 /dev/mdX
## Set read-ahead to 32 MiB ##
# blockdev --setra 65536 /dev/md0
# blockdev --setra 65536 /dev/md1

Tip #3: Set stripe-cache_size for RAID5 or RAID 6

This is only available on RAID5 and RAID6 and boost sync performance by 3-6 times. It records the size (in pages per device) of the stripe cache which is used for synchronising all write operations to the array and all read operations if the array is degraded. The default is 256. Valid values are 17 to 32768. Increasing this number can increase performance in some situations, at some cost in system memory. Note, setting this value too high can result in an “out of memory” condition for the system. Use the following formula:

memory_consumed = system_page_size * nr_disks * stripe_cache_size

To set stripe_cache_size to 16 MiB for /dev/md0, type:
# echo 16384 > /sys/block/md0/md/stripe_cache_size
To set stripe_cache_size to 32 MiB for /dev/md3, type:
# echo 32768 > /sys/block/md3/md/stripe_cache_size

Tip #4: Disable NCQ on all disks

The following will disable NCQ on /dev/sda,/dev/sdb,..,/dev/sde using bash for loop

## sample for loop ##
for i in sd[abcde]
  echo 1 > /sys/block/$i/device/queue_depth

Tip #5: Bitmap Option

Bitmaps optimize rebuild time after a crash, or after removing and re-adding a device. Turn it on by typing the following command:
# mdadm --grow --bitmap=internal /dev/md0
Once array rebuild or fully synced, disable bitmaps:
# mdadm --grow --bitmap=none /dev/md0


My speed went from 4k to 51k:
cat /proc/mdstat
Sample outputs:

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md5 : active raid1 sde2[2](S) sdd2[3](S) sdc2[4](S) sdb2[1] sda2[0]
      530048 blocks [2/2] [UU]

md0 : active raid6 sde3[4] sdd3[3] sdc3[2] sdb3[1] sda3[0]
      5855836800 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]
      [============>........]  resync = 61.7% (1205475036/1951945600) finish=242.9min speed=51204K/sec

Monitoring raid rebuilding/recovery process like a pro

You cat /proc/mdstat file. This read-only file contains information about the status of currently running array and shows rebuilding speed:
# cat /proc/mdstat
Alternatively use the watch command to display /proc/mdstat output on screen repeatedly, type:
# watch -n1 cat /proc/mdstat
Sample outputs:

Fig.01: Performance optimization for Linux raid6 for /dev/md2
Fig.01: Performance optimization for Linux raid6 for /dev/md2

The following command provide details about /dev/md2 raid arrray including status and health report:
# mdadm --detail /dev/md2
Sample outputs:
Fig.02: Finding out information about md2 raid array
Fig.02: Finding out information about md2 raid array

Another option is to see what is actually happening by typing the following iostat command to see disk utilization:
watch iostat -k 1 2
watch -n1 iostat -k 1 2

Sample outputs:
Fig.03: Find out CPU statistics and input/output statistics for devices and partitions
Fig.03: Find out CPU statistics and input/output statistics for devices and partitions

  • See man pages – md(4),mdadm(8)
  • /etc/cron.d/mdadm and /usr/share/mdadm/checkarray on Debian/Ubuntu Linux

Posted by: Vivek Gite

The author is the creator of nixCraft and a seasoned sysadmin and a trainer for the Linux operating system/Unix shell scripting. He has worked with global clients and in various industries, including IT, education, defense and space research, and the nonprofit sector. Follow him on Twitter, Facebook, Google+.

60 comment

  1. i love your blog, but now i hate you … i added a new drive last week and watched mdadm reshape for ~80hrs … guess it’ll be easier next time. thanks for the tip 😛

  2. That’s an awesome tip. From my rough calculations I assume it finished in about 1hr 45min. May not be a good thing if you bill your clients per hour 🙂

  3. Hi there,

    this for sure will help a lot when I’ll install our budget backup SAN :-). Do you by any chance also now whether such limits exists for drbd-syncing?


    1. Yes, drbd also suffer from the same issue and can be fixed by editing /etc/drbd.conf. Look out for rate directive:

       rate 40M;

      Another note LVM really slows down performance with drbd; so make sure you avoid it. See this help page [drbd.org].

      1. I always faced this speed problem and I thought this is due to software issue. With this tip I reduced my build time from 9.2 hours to less than 1 hour.

  4. One of my machines has 5x 2TB in a RAID5. I got even more speed out of my volume after tweaking the drives and the md0 volume itself.


    blockdev --setra 16384 /dev/sd[abcdefg]
    echo 1024 > /sys/block/sda/queue/read_ahead_kb
    echo 1024 > /sys/block/sdb/queue/read_ahead_kb
    echo 1024 > /sys/block/sdc/queue/read_ahead_kb
    echo 1024 > /sys/block/sdd/queue/read_ahead_kb
    echo 1024 > /sys/block/sde/queue/read_ahead_kb
    echo 256 > /sys/block/sda/queue/nr_requests
    echo 256 > /sys/block/sdb/queue/nr_requests
    echo 256 > /sys/block/sdc/queue/nr_requests
    echo 256 > /sys/block/sdd/queue/nr_requests
    echo 256 > /sys/block/sde/queue/nr_requests
    # Set read-ahead.
    echo "Setting read-ahead to 64 MiB for /dev/md0"
    blockdev --setra 65536 /dev/md0
    # Set stripe-cache_size for RAID5.
    echo "Setting stripe_cache_size to 16 MiB for /dev/md0"
    echo 16384 > /sys/block/md0/md/stripe_cache_size
    echo 8192 > /sys/block/md0/md/stripe_cache_active
    # Disable NCQ on all disks.
    echo "Disabling NCQ on all disks..."
    echo 1 > /sys/block/sda/device/queue_depth
    echo 1 > /sys/block/sdb/device/queue_depth
    echo 1 > /sys/block/sdc/device/queue_depth
    echo 1 > /sys/block/sdd/device/queue_depth
    echo 1 > /sys/block/sde/device/queue_depth


      1. One more question on the software raid possibilities:
        with a hardware raid, when one disk dies, you can still boot the machine. I had achieved the same for RAID-1 with info found on the net, but how does one goes to achieve the same for a 6 disk RAID-5 setup? I can’t find that info…. Any suggestions or links?

        One possible solution I came up with (but I’m not too keen on using it) would be to install the system to an USB drive/stick (of which one can easily make backups, or even use 2 usb devices in RAID-1) and keep the 6 drives as pure data-disks (no OS) in RAID 5.

        Thanks in advance

        1. You could split the disks into two partitions, a small 100M partition, and a data partition which uses the rest of the disk. Set up two raid volumes. The small partitions would be raid1, and the rest raid5/6 whatever. install everything on the raid6 volume and mount the raid1 as /boot. Setup grub for each disk:

          root (hd0,0)
          setup (hd0)
          root (hd1,0)
          setup (hd1)

          In this way, any drive could be removed and you could still boot the machine.

    1. I suffer from renaming interfaces on reboot … sometimes system disk is sda sometimes sde …it’s a little bit weird … but it made me make your script a little bit more dynamic 🙂

      #Parses output of mdadm --detail for specified array (here md0) for devices used in raid
      disks=$(sudo mdadm --detail /dev/md0 | grep "/dev/sd" | awk '{print $7}' | cut -c6-8)
      for i in $disks
      echo "Setting read-ahead to 16MiB for disk "$i" in RAID (probably)"
      blockdev --setra 16384 /dev/$i
      echo "Setting read_ahead_kb to 1024 for disk "$i" in RAID..."
      echo 1024 > /sys/block/$i/queue/read_ahead_kb
      echo "Setting nr_requests to 256 for disk "$i" in RAID..."
      echo 256 > /sys/block/$i/queue/nr_requests
      echo "Disabling NCQ for disk "$i" in RAID..."
      echo 1 > /sys/block/$i/device/queue_depth
      # Set read-ahead.
      echo "Setting read-ahead to 64 MiB for /dev/md0"
      blockdev --setra 65536 /dev/md0
      # Set stripe-cache_size for RAID5.
      echo "Setting stripe_cache_size to 16 MiB for /dev/md0"
      echo 16384 > /sys/block/md0/md/stripe_cache_size
      #echo 8192 > /sys/block/md0/md/stripe_cache_active  << it's readonly

      It’s my first post in here, so I hope it will be nicely formated 🙂

    2. Hi Cr0t,

      on which hardware specs reliy your setra and read_ahead_kb values?
      i got WD20EARS with 64mb memory.
      so is it save/usefull to increase your values?

      Best practice values ? limits?
      thanx 😉

  5. The easiest solution I found was to combine raid 1 (for booting the system) & raid 5 (for data) on the same six disks. Agreed, I now have a 6 disk raid 1, but it is only 2GB. The rest is raid 5. Hopes this helps someone facing the same problems sometimes.

    Cheers, Jord

  6. A suggestion, boot from a pendrive.
    I just leave a pendrive in a usb port and boot from that as there isn’t any space for more drives.
    Had some problems installing Fedora 12 some months ago using the onboard fakeraid and locking up, so reinstalled as standalone drives and mdadm.
    I wanted to make a kickass colocated web/mysql server and I’m still soaking it, but I’m worried that resyncing may cause performance issues at critical times, as I have seen while developing.
    (it’s now part of a 3-node multiple master mysql cluster+slave experiment, with one node at the ISP linked to here at home over 30mbps / ssh tunnels) 🙂

    My setup:
    RAID10, persistent superblock (host i7/920 + 12Gb RAM)
    |=========md0========|====md1====| /dev/sda 1Tb
    |=========md0========|====md1====| /dev/sdb 1Tb
    |=========md0========|====md1====| /dev/sdc 1Tb
    |=========md0========|====md1====| /dev/sdd 1Tb
    |=========md0========| /dev/sde 750Gb
    |=========md0========| /dev/sdf 750Gb
    |======| /dev/sdg 2x250Gb (HW raid1)
    |=| /dev/sdh 1Gb pendrive /boot

  7. So can you add the intent bitmap during a RAID 5 reshape? That would really help me out…

    Definitely should have looked at this before. Great blog post, man.

  8. When I run sudo mdadm –grow –bitmap=internal /dev/md1

    I get:

    mdadm: failed to set internal bitmap.

    The RAID is rebuilding when I ran this. Do I need to do this before I rebuild? If so, how??

    md1 : active raid5 sde[4] sdf[2] sdg[3] sdd[1] sdb[0]
    5860543488 blocks super 0.91 level 5, 4k chunk, algorithm 2 [5/5] [UUUUU]
    [=>……………….] reshape = 8.2% (161204840/1953514496) finish=1231.1min speed=24262K/sec

    No change in speed.

  9. Setting the speed limits didn’t help in my case.
    I have raid6 with 6xwd20ears, with 0xfb partitions (4kb physical 4kb logical, aligned).
    Resync speed after setting up the array is around 20000kb/s, I assume it will get even slower towards the end of the process.

    Maybe it would be faster with AAM set to 255 (now set to 128), unfortunatly I cannot set it through my raid controller (only works if the drive is plugged in on the mainboard sata controller).

    I’m also not able to set the bitmap, but i think it would not affect the initial resync after having just built the array, am I right?

    I would also be interested in further information about setting read ahead etc.

  10. edit:
    Sometimes the speed is rising from 20000k to 100000k for a few seconds or minutes and then falling back to 20. I have no explanation for that behaviour.
    Except for the resync the system is idle.
    md0_raid6 is using about 5% while speed ist 20000k, 25% while speed ist 100000k
    md0_resync is using a few percent at 20000k, 20% at 100000k.
    Just in this moment the speed went up to 100000k for only 10 seconds… no clue.

  11. Sounds like the RAID setup can do 100 MB/sec, but you have changed the /proc/sys/dev/raid/speed_limit_max setting only, this is used if there is no other activity on the disks at all. SO the rise to 100MB/sec is wen no other I/O activity happens. Try changing /proc/sys/dev/raid/speed_limit_min

    echo “50000” > /proc/sys/dev/raid/speed_limit_min

    then see what happens.

  12. just for sharing : replacing a failed drive , 2TB caviar black – WD2002FAEX, 3 good drives + 1 new on a production server (still gets 100Mbs trafic out )

    [>………………..] recovery = 2.8% (56272128/1942250240) finish=888.0min speed=35394K/sec

    The drives ar having a 92% busy
    DSK | sda | busy 92%

    —-total-cpu-usage—- -dsk/total- -net/total- —paging– —system–
    usr sys idl wai hiq siq| read writ| recv send| in out | int csw
    6 6 83 3 0 2| 107M 51M| 0 0 | 5.6B 44B|8900 14k
    1 10 72 14 0 3| 363M 150M|8421k 23M| 0 0 |9330 11k
    7 12 63 14 1 4| 416M 150M|8009k 25M| 0 0 |9804 12k

    won’t push the speed limit to keep the server responsive.

    ps: careful running “magic” scripts with “magic” settings. got my server (colocated) frozen while changing some readhead values “on the fly”. Costed me 30$ extra intervention to the colo guys to reboot it (it was on weekend :)) .

    Can be fun tweaking but really there’s no silver bullet that comes without some risks. For oldies : there were times when computers came with a Turbo button. If you’d press it the cpu frequency would get higher. Of course 50% of the time Duke Nukem would freeze if you press the turbo button while playing:)

  13. Thanks to this article and the comments on tweaking the drive parameters from Cr0t, rebuild performance on RAID1 went from 32K to over 110K during system idle. The combination of increasing the dev.raid.speed_limit_min to 50000 and disabling NCQ on both drives gave a phenomenal increase in performance. After seeing how much disabling NCQ helped, I further increased dev.raid.speed_limit_min to 110000 to further speed up the rebuild. The rebuild went from an initial estimate of 6 hours, to only taking about 1.5 hours after these tweaks.

  14. Also, thanks to Cr0t!! sped up my raid build from about 28-30K/sec to 40-45K/sec, topping out over 50K occasionally. recognize what you are doing issuing those commands, which is tuning the hard drive parameters that deal with pretty low-level communication options of the device. options tied to the drivers, firmware and kernel. so, use them with care, and don’t be surprised if extreme circumstances cause a lock-up or other failure case causing you to have to start your raid rebuild over. point being, use with caution and in experimental environment as mentioned. still, this did not cause any issues with my setup even doing it during the build, and in most cases on properly working hardware it should not. but be warned!

    anywho, the original problem for me turned out to be one bad hard drive causing the array to be built at <1K/sec. sad thing is that nothing about the system or even smartctl really indicated errors. it was really diagnosed by noticing the smartctl 'remap' value skyrocketing on one device where it was not on any of the others, and physically looking at the box showed 10x more LED activity on the problematic disk as it tried every read and write vs the other 19 in the array. so, don't rely on your os to inform you of hard drive issues and you can't rely on SMART at face value either. i think any experienced sys-admin would validate that statement.

    happy tuning everyone =) thanks for the info.

  15. There is no reason to turn bitmaps off (unless you need to grow the array), in fact, the whole point to bitmaps is to leave them on so only the unsynced parts of the device (as indicated by the bitmap) need to be rebuilt when a crash or other unclean shutdown occurs. Turn them on when you first make the raid and leave them on.

    Also you can’t turn them on during a resync anyways.

  16. Thanks for the tips, and thanks to Cr0t for sharing his script.
    Speed went from 26M/sec to 140M/sec

  17. I did
    But Didn´t work :/

    The speed_min is 50000 and the speed on process continues 1000kb/s

  18. Further to Cr0t’s script…
    A little clean up + added lines to set speed max/min
    NOTE: only edit the lines at the beginning

    — begin script —

    # devices seperated by spaces i.e. "a b c d..." between "(" and ")"
    RAID_DRIVES=(b c d e f g)
    # this should be changed to match above line
    blockdev --setra 16384 /dev/sd[bcdefg]
    echo $SPEED_MIN > /proc/sys/dev/raid/speed_limit_min
    echo $SPEED_MAX > /proc/sys/dev/raid/speed_limit_max
    # looping though drives that make up raid -> /dev/sda,/dev/sdb...
    for index in "${RAID_DRIVES[@]}"
            echo 1024 > /sys/block/sd${index}/queue/read_ahead_kb
            echo 256 > /sys/block/sd${index}/queue/nr_request
            # Disabling NCQ on all disks...
            echo 1 > /sys/block/sd${index}/device/queue_depth
    # Set read-ahead.
    echo "Setting read-ahead to 64 MiB for /dev/${RAID_NAME}"
    blockdev --setra 65536 /dev/${RAID_NAME}
    # Set stripe-cache_size
    echo "Setting stripe_cache_size to 16 MiB for /dev/${RAID_NAME}"
    echo 16384 > /sys/block/${RAID_NAME}/md/stripe_cache_size
    echo 8192 > /sys/block/${RAID_NAME}/md/stripe_cache_active

    — end script —

  19. @fuzzy,

    Nice update. One suggestion, you can accept md0 (or md1) and SPEED_MIN and
    SPEED_MAX using command line args:

    # devices seperated by spaces i.e. "a b c d..." between "(" and ")"
    RAID_DRIVES=(b c d e f g)
    # this should be changed to match above line
    blockdev --setra 16384 /dev/sd[bcdefg]
    # reset is same as posted by fuzzy

    One can run it as:

    ./script md1 60000 100000
  20. I have raid5 with 5 wd20ears. Before this script:
    echo 50000 > /proc/sys/dev/raid/speed_limit_min
    I had rebuilding speeds of around 15000K/sec. After running that script I have 55000-70000K/sec, but somehow after the rebuilding was running couple of hours it slowed down again to the original speed of 15000K/sec…
    When I query the speed with this command: ” sysctl dev.raid.speed_limit_min “, it still shows ” dev.raid.speed_limit_min = 50000 “, but it just slowed down to an average speed of 15000K/sec.
    What happened? What can I do? Please help me in this issue.

  21. Just wanted to add, that if the sync is going when you change these values, it won’t (or at least didn’t for me) make a diff. So, you can bounce the machine OR do this which causes the resync to restart.

    echo “idle” > /sys/block/md0/md/sync_action

    1. Thanks for your suggestions, but it won’t work if I run the command echo “idle” > /sys/block/md0/md/sync_action, the md still be recovering instead of idle.
      how can we set the stripe_cache_size and make it work? However, we can set it in linux softraid kernel module, but I wouldn’t like to set it there.

  22. And changing this value really kicks the resync into overdrive…

    echo [size val] > /sys/block/md1/md/stripe_cache_size

  23. Let me add aswell, I’ve been running a raid6 for a few months, and done quite a few rebuilds, and often it fails during rebuild, but having it rebuild at slower speeds gives me more data security… just my 2 cents

  24. Great post – thanks
    I have 2 x 16GB USB thumb drives with CentOS software RAID level 1 operating 5 x 2TB software RAID level 5 with hot spare which offers 6TB iSCSI target to the network. My poor man’s SAN 🙂

  25. If your RAID rebuild is only running at the min speed, it could be because you have just created an ext4 filesystem and it is doing background initialisation (look for a process called “ext4lazyinit”). When that completes, as long as there’s no other disk I/O going on, the RAID rebuild should crank up to the max speed.

    1. This info needs to be in the article!!! This shit happened to me after extending a RAID-5 array and doing “resize2fs /dev/md0”. Is there a way to prevent that?

  26. Once the rebuild is complete do recommend changing any of these values back to something or leaving them as is? If you recommend changing them, what would recommend they be changed to?

    Examples of changed values:
    echo 1024 > /sys/block/sd${index}/queue/read_ahead_kb
    echo 256 > /sys/block/sd${index}/queue/nr_request
    echo 1 > /sys/block/sd${index}/device/queue_depth


  27. setting the queue_depth of all disks to 1 speeded up the resync time of a raid5 array of 5 disks (2TB and partitioned 3TB) with a factor of 1,5x

    thanks for that Cr0t

    (raising the min_speed limit also helped a bit 😉 )

  28. I’d like to ask a naive question: how do these changes affect the RAID array speed in ordinary use, i.e. when not rebuilding the array? Should they be put back to the previous values after the rebuilt is finished?

  29. These steps helped a great deal, improving resync performance of my RAID-5 array with 4x WD20EARX drives from 15MB/s to 45MB/s on a dual-core hyperthreaded atom-d525. Didn’t realize the NCQ settings had such a big impact, improving performance by 200%, while the stripe-cache-size improved performance by around 25%.

    I have made a script that saves current values so you can restore them later, before setting the high performance values. It will also detect the underlying disks to set the NCQ for you, along with the read-ahead and sync_speed values. I left out the setting of the bitmap as this can only be done while not syncing, so wasn’t useful to me.

    The script can be found at https://gist.github.com/jinnko/6209899.

  30. On a Synology Rackstation, I increased an ongoing RAID5 rebuild speed from 10 MB/s to 20-30 MB/s by increasing

    sysctl -w dev.raid.speed_limit_min=100000
    (default was 10,000)

    without significant CPU load increase.

    I wonder if I could set the stripe-cache_size during the rebuild or if I should/must wait till it’s over?

  31. Thanks.

    I’m not sure why these tips works, but I got a resync performance that went from 20MB/s to about 140MB/s. That’s an improvement.

  32. changing dev.raid.speed_limit_min and dev.raid.speed_limit_max alone gave me a more than 10fold increase in speed. instead of taking almost a week, i expect the rebuild to be done overnight. the other settings i was not able to change because the resync was already in progress, but
    echo “idle” > /sys/block/md0/md/sync_action
    was also useful, while not stopping the resync it allowed me to control which device was synced first. instead of the slow huge one i could get it to sync the smaller / and /var devices which (thanks to the speedup) were done in a few minutes. with those out of the way, now i can sleep better while the last huge data disk is syncing.

  33. Thanks a lot for the tips! I went with #1 and #2 and it sure work’s like a charm!

    Should the settings be reverted after (re)syncing the RAID???


  34. Thanks for this post. I tried all the settings in the main post and the scripts in comments, and all slowed down my rebuild by a small degree (RAID 1). Only increasing nr_requests made a really bad change (-25%) – the rest were barely noticable. I take this to mean that my distro’s (Fedora 20) defaults are as good as this hardware (SuperMicro/Hitachi) can get. So, for others:

    queue_depth: 31
    drive/array ra: 256
    read_ahead_kb: 128
    nr_requests: 128
    speed_limit_min/max: 200000

  35. Great walk-through and great team assembled here. Thanks NIX and all!
    I migrated a 5 drive RAID from ZFS with 2 extra drives for Swing Space.
    This required about 3 rounds of pool expansions and reshapes, and with these settings I managed to cut the reshape times to half.

    I am on the last reshape, and get about 33MBps on reshapes
    One of the disks is a bit on the slower side, and constantly maxes out its IOpS).

    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
    md0 : active raid5 sdd1[1] sde1[2] sdg1[4] sdf1[3] sdc1[0]
    7813770240 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
    [=>……………….] reshape = 6.7% (265179648/3906885120) finish=1798.8min speed=33740K/sec
    bitmap: 0/30 pages [0KB], 65536KB chun

    I had to redo my /etc/mdadm/mdadm.conf by hand a few times as the pool did not activate upon restarts, because somehow my discs reordered (yes… I was bold enough to restart during a reshape… learned my lesson)

  36. Short summary – the bitmap stuff is wrong in the above.

    It will only help if you need to re-add a device that had been in the array and became disconnected due to a system crash (or similar). So an event that would normally trigger a full resync between the devices can be partially avoided – as the bitmap can be consulted to determine which parts need synchronisation.

    You cannot add a bitmap to an array that is syncing.

    If you remove the bitmap, you may get slightly better write performance (as writing to the disk will cause the bitmap to be updated, if one is present) – at the cost of losing a potential faster resync in the future after some sort of crash.

    See : https://raid.wiki.kernel.org/index.php/Write-intent_bitmap

  37. Hmm, my RAID5 re-build speed with bitmap (~ 105000 k/sec) is slower than without bitmap (~ 125000 k/sec)

  38. Big Thank You! Was rebuilding a 12TB array. Did steps 1-3 and the stimated recovery time went from 5 days + to 12 hours.

  39. Hi,

    What kind of values would you recommend for a Buffalo Linkstation Pro Quad 12 Tb (dual core arm processor, 256 Mb RAM, 4x3Tb disks in RAID5)?

  40. Thanks. And thanks again. This saves me so much time.

    One problem though: In my case, disabling the disks’ queues cut the throughput in half. This is what I would have expected anyway; why should I gain performance if I don’t take advantage of disk queues?

    The other problem is that the bitmaps can’t be set while the resync is going on, but judging from a few comments here, it’s not such a good idea to use this feature anyway.

    So with the highest possible readahead and stripe cache size I tripled my performance, or more. Wonderful.

  41. Tip #1 : /proc/sys/dev/raid/{speed_limit_max,speed_limit_min}
    had a major effect on my raid, especially setting the min parameter. Thank you.

    Tip #2 : I had already set an internal bitmap on my array, so I can not verify if this resulted in a performance increase.

    Tip #3-5 : Had no effect on my raid, as the defaults on Ubuntu 14.10 seem to be sufficient.

    Tip #6 : Umounting the raid array yielded another 10% increase for me. The reason being that my ext file system was initialized lazy and kept updating the file system in the background.

  42. The advice to disable NCQ seems no longer reasonable — on my Synology DS415+ w/ DSM 5.2-5644 Update 1, Linux kernel 3.10.35 and the following drives

    [ 15.534557] scsi 0:0:0:0: Direct-Access WDC WD40EFRX-68WT0N0 80.0 PQ: 0 ANSI: 5
    [ 37.588618] scsi 4:0:0:0: Direct-Access WDC WD30EURS-63R8UY0 80.0 PQ: 0 ANSI: 5
    [ 54.829964] scsi 5:0:0:0: Direct-Access ATA ST32000542AS CC35 PQ: 0 ANSI: 5
    [1191564.915438] scsi 1:0:0:0: Direct-Access WDC WD60EFRX-68L0BN1 82.0 PQ: 0 ANSI: 5

    disabling NCQ MASSIVELY reduces sync performance for me (by almost 50%). Reenabling it immediately restores performance again.

  43. I have bookmarked this page because it is so useful. Tip #3 (increase stripe cache size) was the one that did it for me. 98K/s from 57K/s. Yay!

  44. So actually the only useful thing here is the speed limit min. Your speed went from 4 to 50k because you set up 50k speed limit min. All other options are to fine tune the cpu usage during that min 50k rebuild. on a live linux system rarely will the speed go above the min setting if there is any activity on the computer since mdadm always puts the running tasks as a priority. or more the kernel itseld puts mdresync process on a low priority…for example on a web server that was running for 6 years, rebuild speed never jumped over the min speed limit. with other options you can squeeze some more in terms of lover cpu usage during rebuild or more speed if the system is pretty much idle and doing nothing except the rebuild…

  45. Just found this interesting thread.

    Right now my Openmediavault NAS is reshaping after growing my RAID-5 with 3x3TB with another 3TB hdd.
    cat /proc/mdstat shows 1000K/sec Speed resulting in 35-40 days for terminating !!!

    Can I interrupt this process without loosing my data and improving speed using your hacks?
    1000K/sec seems ultra-slow for me. What’s wrong here?

    Kind regards


Comments are closed.