The easiest and fastest solution is to use mhddfs driver on Linux operating systems. It is a fuse-based file system for unifying several mount points into one. From the man page:
The mhddfs (fuse) file system allows to unite a several mount points (or directories) to the single one. So a one big filesystem is simulated and this makes it possible to combine a several hard drives or network file systems. This system is like unionfs but it can choose a drive with the most of free space, and move the data between drives transparently for the applications. While writing files they are written to a 1st hdd until the hdd has the free space (see mlimit option), then they are written on a 2nd hdd, then to 3rd etc. df will show a total statistics of all filesystems like there is a big one hdd. If an overflow arises while writing to the hdd1 then a file content already written will be transferred to a hdd containing enough of free space for a file. The transferring is processed on-the-fly, fully transparent for the application that is writing. So this behaviour simulates a big file system.
In this tutorial you will learn how to install and configure MHDDFS virtual storage pool on a Linux operating systems.
Our sample setup
For demo purpose I’ve three hard disks drive /dev/sdb1, /dev/sdc1, and /dev/sdd1 as follows:
# df
Sample outputs:
And files in my /disk{1,2,3}/ dirs are as follows:
# ls -l /disk{1,2,3}
Sample outputs:
/disk1: total 28 drwxr-xr-x 2 root root 4096 Aug 8 14:25 app1 drwx------ 2 root root 16384 Aug 8 14:20 lost+found -rw-r--r-- 1 root root 7545 Aug 8 14:26 resume.txt /disk2: total 28 drwxr-xr-x 2 root root 4096 Aug 8 14:25 app2 drwx------ 2 root root 16384 Aug 8 14:20 lost+found -rw-r--r-- 1 root root 6303 Aug 8 14:26 party.jpg /disk3: total 40 drwxr-xr-x 2 root root 4096 Aug 8 14:25 app3 drwx------ 2 root root 16384 Aug 8 14:21 lost+found -rw-r--r-- 1 root root 17080 Aug 8 14:26 output.log |
Installation
Let us see how to install mhddfs package on different Linux distros.
Install mhddfs package on a Debian/Ubuntu/Mint Linux & Co
Type the following apt-get command to install mhddfs:
# apt-get install mhddfs
Sample outputs:
Install mhddfs package on a Fedora/RHEL/CentOS Linux & Co
Turn on EPEL repo and type the following command:
# yum install mhddfs
Fedora Linux v22.x+ user type the following command:
# dnf install mhddfs
Sample outputs:
Configuration
First, create a new mount point directory called /virtual.data, enter:
# mkdir /virtual.data
To join all three drives (see fig.01) together, enter:
# mhddfs /disk1,/disk2,/disk3 /virtual.data -o allow_other
Sample outputs:
That’s all. You can now verify that /virtual.data/ as a single bing volume i.e. several directories combined, simulating a single big volume which can merge several hard drives or remote file systems:
# df
Sample outputs:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 39428520 6109320 31293268 17% /
udev 10240 0 10240 0% /dev
tmpfs 811792 9084 802708 2% /run
tmpfs 2029472 144 2029328 1% /dev/shm
tmpfs 5120 4 5116 1% /run/lock
tmpfs 2029472 0 2029472 0% /sys/fs/cgroup
tmpfs 405896 16 405880 1% /run/user/1000
/dev/sdb1 4061888 8196 3827644 1% /disk1
/dev/sdc1 4061888 8196 3827644 1% /disk2
/dev/sdd1 4061888 8208 3827632 1% /disk3
/disk1;/disk2;/disk3 12185664 24600 11482920 1% /virtual.data
Also, note down ls -l command output:
# ls -l /virtual.data/ total 64 drwxr-xr-x 2 root root 4096 Aug 8 14:25 app1 drwxr-xr-x 2 root root 4096 Aug 8 14:25 app2 drwxr-xr-x 2 root root 4096 Aug 8 14:25 app3 drwx------ 2 root root 16384 Aug 8 14:20 lost+found -rw-r--r-- 1 root root 17080 Aug 8 14:26 output.log -rw-r--r-- 1 root root 6303 Aug 8 14:26 party.jpg -rw-r--r-- 1 root root 7545 Aug 8 14:26 resume.txt
You can now copy files or create new directories as per your requirement:
# cd /virtual.data/
# mkdir Music
# rsync -avp /somewhere/* Music/
....
..
Update /etc/fstab
Update /etc/fstab file as follows:
mhddfs#/disk1,/disk2,disk3 /virtual.data/ fuse defaults,allow_other 0 0 |
How do I unmount mhddfs based fuse file systems
Use the umount command to detaches the /virtual.data/ file system:
# umount /virtual.data/
For more info type see mhddfs man page.
You can also use Logical Volumes to accomplish this. No RAID or splitting files.
How would this compare to the mentioned Logical Volumes ?
I think the main difference is: LVM need to be created from scratch then if disks contains data, data will be lost (because LVM need to be formated) in the other hands mhddfs joins existing partition in one big virtual partition
Hi,
Thanks a lot…
Yes, this can also be accomplished with LVM and GlusterFS. But the added advantage that I see using mhddfs is that your data is merged when you combine different standard partitions.
Very nice tool
The better (or at least more standard) method in my opinion would be to use LVM. I guess its just much more cleaner.
Hi there,
it is interesting article.
Regarding LVM – I see a big trouble in case of disk/partition failure and recovering LVM – it is horrible….my personal experience.
Thanks for other inputs.
marek
Have you tried LVM’s mirroring features?
Using LVM+mirroring – or LVM+RAID or LVM plus a filesystem like gluster can protect you against hardware failures.
I’d usually prefer to use LVM over mhddfs unless I wanted to be able to take the segments apart again at some later date. Or if I simply could not afford the extra disk space needed to do the migration from independent filesystems to one big filesystem under LVM, mhddfs sounds like a good option.
Well, a disk/partition failure is always horrible. The reason LVM makes it worse is because you are combining points of failure into larger volumes. That is, when you have three separate drives and one goes, you only lose the data on the one. When you have three drives combined with LVM, then you lose your LVM partition, and access to the data on all three, so it’s like tripling the chance of drive failure.
There are ways to recover data from the drives that haven’t failed. However, this is not desirable. It is better to use mirroring, either through RAID or LVM, and backup if the data is important. Any time you go without backups, you are assuming the risk of data loss.
Of course, if you are scrounging up old hard drives and that’s all you’ve got, then you may not have the luxury of backups. As soon as it becomes possible, however, I’d recommend you start.
Just use LVM. Combining several small drives to give one big ‘disk’ is exactly what LVM is designed to do.
Hi
Interesting article. I am wondering if you delete a file from the virtual data file that is also on disk 1, 2 or 3. Will the file also be deleted from the actual drive?
Regards
Philip
Hi
thanks a lot
To those who “think” LVM is better/simpler you miss the point that LVM requires the drives be formatted resulting in data loss mhddfs as mentioned here combines drives which already contain data that is still needed.
Think before you respond ;)
This guide is wrong – there is no mhddfs package in EPEL for CentOS 6.
the package you can get from linuxtech repo
question
if i have a 2gb drive and a 6gb drive how does it choose which drive to write to say i want to write a 500mb file
Hello, thanks for the elaborate article. After crunching the internet, I have come across different other ways of combining two directories into one big one and they include; Unionfs, Aufs, Overlayfs and now mhddfs. I really get confused when presented with different options that achieve the same thing. Your article doesn’t give context with regards with the other options. Could you kindly elaborate on these various choices in respect to mhddfs? Thanks