≡ Menu

linux filesystem

Linux Convert ext3 to ext4 File system

Some time ago ext4 was released and available for Linux kernel. ext4 provides some additional benefits and perforce over ext3 file system. You can easily convert ext3 to ext4 file system. The next release of Fedora, 11, will default to the ext4 file system unless serious regressions are seen. In this quick tutorial you will learn about converting ext3 to ext4 file system.
[click to continue…]

Following are few situations where you may be interested in performing a filesystem benchmarking.

=> Deploying a new application that is very read and write intensive.
=> Purchased a new storage system and would like to measure the performance.
=> Changing the RAID level and would like to measure the performance of the new RAID.
=> Changing the storage parameters and would like to know the performance impact of this change

This article gives you a jumpstart on performing benchmark on filesystem using iozone a free Filesystem Benchmark utility.

1. Download and Install iozone software

Go to iozone and download the iozone for your appropriate platform. I downloaded the "Linux i386 RPM". Install the iozone from the RPM as shown below. By default this installs the iozone under /opt/iozone/bin
# cd /tmp
# wget http://www.iozone.org/src/current/iozone-3-303.i386.rpm
# rpm -ivh iozone-3-303.i386.rpm

Sample output:

Preparing...                ########################################### [100%]
   1:iozone                 ########################################### [100%]

Note: You can install iozone under any UNIX / Linux or Windows operating system.

2. Start the performance test

Execute the following command in the background to begin the performance test.

# /opt/iozone/bin/iozone -R -l 5 -u 5 -r 4k -s 100m -F /home/f1 /home/f2 /home/f3 /home/f4 /home/f5 | tee -a /tmp/iozone_results.txt &

Let us review all the individual parameter passed to iozone command.

  • -R : Instructs the iozone to generate excel compatible text output.
  • -l : This is the lower limit on how many process/threads needs to be started by iozone during execution. In this example, iozone will start 5 threads.
  • -u : This is the upper limit on how many process/threads needs to be started by iozone during execution. In this example, iozone will not exceed maximum of 5 threads. If you set -l and -u to the same value, it will run exactly those many number of process/threads. In this example, this will execute exactly 5 threads.
  • -r : This specifies the record size. In this example, the record size for benchmark testing is 4k. This is an important parameter to be set appropriately depending on the purpose of your filesystem performance testing. For e.g. If you are performing benchmark on a filesystem that will host a database, it is appropriate to set this value to the DB block size of the database.
  • -s : This specifies the size of the file that needs to be tested. In this example, iozone will try to perform test on 100Mb file.
  • -F : Specify the temporary filename that should be used by the iozone during testing. The total number of files specified here should match the value specified in -l and -u parameter.

3. Analyze the output of iozone file.

The first part of the output will contain the details about every individual filesystem performance metrics that was tested. for e.g. Initial write, rewrite etc as shown below.

        Iozone: Performance Test of File I/O
                Version $Revision: 3.303 $
                Compiled for 32 bit mode.
                Build: linux
        Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
                     Al Slater, Scott Rhine, Mike Wisner, Ken Goss
                     Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
                     Randy Dunlap, Mark Montague, Dan Million,
                     Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy,
                     Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root.
        Run began: Thu Jun  22 00:08:51 2008
        Excel chart generation enabled
        Record Size 4 KB
        File size set to 102400 KB
        Command line used: /opt/iozone/bin/iozone -R -l 5 -u 5 -r  4k -s 100m -F /home/f1 /home/f2 /home/f3 /home/f4 /home/f5
        Output is in Kbytes/sec
        Time Resolution = 0.000001 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
        Min process = 5
        Max process = 5
        Throughput test with 5 processes
        Each process writes a 102400 Kbyte file in 4 Kbyte records
        Children see throughput for  2 initial writers  =   60172.28 KB/sec
        Parent sees throughput for  2 initial writers   =   45902.89 KB/sec
        Min throughput per process                        =   28564.52 KB/sec
        Max throughput per process                      =   31607.76 KB/sec
        Avg throughput per process                      =   30086.14 KB/sec
        Min xfer                                                  =   92540.00 KB
        Children see throughput for  2 rewriters        =   78658.92 KB/sec
        Parent sees throughput for  2 rewriters         =   34277.52 KB/sec
        Min throughput per process                      =   35743.92 KB/sec
        Max throughput per process                      =   42915.00 KB/sec
        Avg throughput per process                      =   39329.46 KB/sec
        Min xfer                                        =   85296.00 KB

Similar values like above will be generated for readers, re-readers, reverse readers, stride readers, random readers, mixed workload, random writers, pwrite writers, pread readers. The last part of the iozone output will contain the Throughput summary for different metrics as shown below.

Throughput report Y-axis is type of test X-axis is number of processes
Record size = 4 Kbytes
Output is in Kbytes/sec
Initial write       60172.28
Rewrite           78658.92
Read              2125613.88
Re-read          1754367.31
Reverse Read 1603521.50
Stride read      1633166.38
Random read   1583648.75
Mixed workload 1171437.78
Random write    5365.59
Pwrite               26847.44
Pread               2054149.00


(Fig.01: iozone in action)

Iozone does a benchmarking on different types of file system performance metrics. for e.g. Read, Write, Random read. Depending on the application that you are planning to deploy on that particular filesystem, pay attention to the appropriate items. for e.g. If the filesystem hosts an read intensive OLTP database, then pay attention to Random Read, Random write and Mixed workload. If the application does lot of streaming media content, pay attention to the Sequential Read. On a final note, you can generate graphs using the Generate_Graphs and gengnuplot.sh located under /opt/iozone/bin, based on the iozone output.

References:

  • Iozone PDF documentation – Full documentation from iozone.org explaining all the iozone command line options and more.
  • Linux Iozone graph example - This is a sample *.xls file from iozone that shows the kind of excel output that can be generated from iozone.

If you try to mount an ext3 Linux filesystem on a SAN from multiple nodes at the same time you will be in serious deep trouble.

SAN based storage allows multiple nodes to connect to same devices at the same time. Ext3/2 are not cluster aware file system. They can lead to a disaster such as kernel panic, server hang, corruption etc.

You need to use something which supports:

  1. Useful in clusters for moderate scale out and shared SAN volumes
  2. Symmetrical Parallel Cluster File System, Journaled
  3. POSIX access controls

Both GFS (RedHat Global File System) and Lustre (a scalable, secure, robust, highly available cluster file system) can be used with SAN based storage allows multiple nodes to connect to same devices at the same time.

Many newbie get confused as Linux offers a number of file systems. This paper (Linux File System Primer) discusses these file systems, why there are so many, and which ones are the best to use for which workloads and data.

A single inode number use to represent file in each file system. All hard links based upon inode number.

So linking across file system will lead into confusing references for UNIX or Linux. For example, consider following scenario

* File system: /home
* Directory: /home/vivek
* Hard link: /home/vivek/file2
* Original file: /home/vivek/file1

Now you create a hard link as follows:
$ touch file1
$ ln file1 file2
$ ls -l

Output:

-rw-r--r--  2 vivek vivek    0 2006-01-30 13:28 file1
-rw-r--r--  2 vivek vivek    0 2006-01-30 13:28 file2

Now just see inode of both file1 and file2:
$ ls -i file1
782263
$ ls -i file2
782263

As you can see inode number is same for hard link file called file2 in inode table under /home file system. Now if you try to create a hard link for /tmp file system it will lead to confusing references for UNIX or Linux file system. Is that a link no. 782263 in the /home or /tmp file system? To avoid this problem UNIX or Linux does not allow creating hard links across file system boundaries. Continue reading rest of the Understanding Linux file system series (this is part VII):

  • Part I - Understanding Linux superblock
  • Part II - Understanding Linux superblock
  • Part III - An example of Surviving a Linux Filesystem Failures
  • Part IV - Understanding filesystem Inodes
  • Part V - Understanding filesystem directories
  • Part VI - Understanding UNIX/Linux symbolic (soft) and hard links
  • Part VII - Why isn't it possible to create hard links across file system boundaries?

Inodes are associated with precisely one directory entry at a time. However, with hard links it is possible to associate multiple directory entries with a single inode. To create a hard link use ln command as follows:
# ln /root/file1 /root/file2
# ls -l

Above commands create a link to file1. Symbolic links refer to:

A symbolic path indicating the abstract location of another file.

Hard links refer to:

The specific location of physical data.

Hard link vs. Soft link in Linux or UNIX

  • Hard links cannot link directories.
  • Cannot cross file system boundaries.

Soft or symbolic links are just like hard links. It allows to associate multiple filenames with a single file. However, symbolic links allows:

  • To create links between directories.
  • Can cross file system boundaries.

These links behave differently when the source of the link is moved or removed.

  • Symbolic links are not updated.
  • Hard links always refer to the source, even if moved or removed.

How do I create symbolic link?

You can create symbolic link with ln command:
$ ln -s /path/to/file1.txt /path/to/file2.txt
$ ls -ali

Above command will create a symbolic link to file1.txt.

Task: Symbolic link creation and deletion

Let us create a directory called foo, enter:
$ mkdir foo
$ cd foo

Copy /etc/resolv.conf file, enter:
$ cp /etc/resolv.conf .
View inode number, enter:
$ ls -ali
Sample output:

total 152
1048600 drwxr-xr-x   2 vivek vivek   4096 2008-12-09 20:19 .
1015809 drwxrwxrwt 220 root  root  143360 2008-12-09 20:19 ..
1048601 -rwxr-xr-x   1 vivek vivek    129 2008-12-09 20:19 resolv.conf

Now create soft link to resolv.conf, enter:
$ ln -s resolv.conf alink.conf
$ ls -ali

Sample output:

total 152
1048600 drwxr-xr-x   2 vivek vivek   4096 2008-12-09 20:24 .
1015809 drwxrwxrwt 220 root  root  143360 2008-12-09 20:19 ..
1048602 lrwxrwxrwx   1 vivek vivek     11 2008-12-09 20:24 alink.conf -> resolv.conf
1048601 -rwxr-xr-x   1 vivek vivek    129 2008-12-09 20:19 resolv.conf

The reference count of the directory has not changed (total 152). Our symbolic (soft) link is stored in a different inode than the text file (1048602). The information stored in resolv.conf is accessible through the alink.conf file. If we delete the text file resolv.conf, alink.conf becomes a broken link and our data is lost:
$ rm resolv.conf
$ ls -ali

If alink.conf was a hard link, our data would still be accessible through alink.conf. Also, if you delete the soft link itself, the data would still be there. Read man page of ln for more information.
Continue reading rest of the Understanding Linux file system series (this is part VI):

  • Part I - Understanding Linux superblock
  • Part II - Understanding Linux superblock
  • Part III - An example of Surviving a Linux Filesystem Failures
  • Part IV - Understanding filesystem Inodes
  • Part V - Understanding filesystem directories
  • Part VI - Understanding UNIX/Linux symbolic (soft) and hard links
  • Part VII - Why isn't it possible to create hard links across file system boundaries?

An inode identifies the file and its attributes such as file size, owner, and so on. A unique inode number within the file system identifies each inode. But, why to delete file by an inode number? Sure, you can use rm command to delete file. Sometime accidentally you creates filename with control characters or characters which are unable to be input on a keyboard or special character such as ?, * ^ etc. Removing such special character filenames can be problem. Use following method to delete a file with strange characters in its name:

Please note that the procedure outlined below works with Solaris, FreeBSD, Linux, or any other Unixish oses out there:

Find out file inode

First find out file inode number with any one of the following command:

stat {file-name}

OR

ls -il {file-name}

Use find command to remove file:

Use find command as follows to find and remove a file:

find . -inum [inode-number] -exec rm -i {} \;

When prompted for confirmation, press Y to confirm removal of the file.

Delete or remove files with inode number

Let us try to delete file using inode number.

(a) Create a hard to delete file name:
$ cd /tmp
$ touch "\+Xy \+\8"
$ ls

(b) Try to remove this file with rm command:
$ rm \+Xy \+\8

(c) Remove file by an inode number, but first find out the file inode number:
$ ls -ilOutput:

781956 drwx------  3 viv viv 4096 2006-01-27 15:05 gconfd-viv
781964 drwx------  2 viv viv 4096 2006-01-27 15:05 keyring-pKracm
782049 srwxr-xr-x  1 viv viv    0 2006-01-27 15:05 mapping-viv
781939 drwx------  2 viv viv 4096 2006-01-27 15:31 orbit-viv
781922 drwx------  2 viv viv 4096 2006-01-27 15:05 ssh-cnaOtj4013
781882 drwx------  2 viv viv 4096 2006-01-27 15:05 ssh-SsCkUW4013
782263 -rw-r--r--  1 viv viv    0 2006-01-27 15:49 \+Xy \+\8

Note: 782263 is inode number.

(d) Use find command to delete file by inode:
Find and remove file using find command, type the command as follows:
$ find . -inum 782263 -exec rm -i {} \;

Note you can also use add \ character before special character in filename to remove it directly so the command would be:
$ rm "\+Xy \+\8"

If you have file like name like name "2005/12/31" then no UNIX or Linux command can delete this file by name. Only method to delete such file is delete file by an inode number. Linux or UNIX never allows creating filename like 2005/12/31 but if you are using NFS from MAC OS or Windows then it is possible to create a such file.

See also:

dd command is all in one tool to Copy a file, converting and formatting according to the options. Since Linux (and other UNIX versions) understand everything as a file dd works like wonders. Please note dd is not created specifically for a backup purpose but it is real handy tool. Few months back I was new to HP-UX and I was unable to understand the HP-UX tape devices then I used dd to create backup. Later when I got information of tape device name I switched to age old tar and other dump commands

dd command syntax

The syntax of dd is as follows:

dd if=INPUT-FILE-NAME of=OUTPUT-FILE-NAME

dd command examples

So to backup /dev/hda3 under Linux command should be as follows i.e. linux filesystem backup with dd:
# dd if=/dev/hda3 of=/backup/myhostname-15-nov-05-hda3.bak.dd
However if you are running planning to run dd in background and if you wish to kill it or want to sending a SIGUSR1 single to a running dd process then you need to start dd as follows (this is really useful stuff):
# dd if=/dev/hda3 of=/backup/myhostname-15-nov-05-hda3.bak.dd; dpid=$!
Now use kill command as follows:
# kill -USR1 $dpid; sleep 5; kill $dpid

dd command to backup boot loader / MBR

dd can be use to backup your boot loader too (if you install a Windows after Linux it will destroy grub/lilo boot loader):
# dd if=/dev/hdX of=/backup/mbr.bak bs=512 count=1
You can restore MBR with the following dd command:
# dd if=/backup/mbr.bak of=/dev/hdX bs=512 count=1
Note replace hdX with your actual device name. However I prefer to use grub-install.

Please note that dd is also capable of reading tapes that were created on other UNIX or written in a format other than Unix (like Windows 2000 server).

Here is one more practical example for Solaris UNIX:

To copy all but the label from disk to tape i.e. copy data in 512 KiB blocks between a disk and a tape, but do not save or restore:
# (dd bs=4k skip=1 count=0 && dd bs=512k) </dev/rdsk/c0t1d0s2 >/dev/rmt/0
Copy from tape back to disk, but leave the disk label alone (restore):
# (dd bs=4k seek=1 count=0 && dd bs=512k) < /dev/rmt/0 >/dev/rdsk/c0t1d0s2

Backing up entire disk/partition with dd command

Backup /dev/hda to /dev/hdb:
# dd if=/dev/hda of=/dev/hdb conv=noerror,sync
Where,

  • /dev/hda: Source disk
  • /dev/hdb: Target disk
  • sync: Use synchronized I/O for data and metadata
  • noerror: Continue copy operation after read errors

Above command will only work if the both disks are the same size and C/H/S geometry. I strongly suggest using partition level backup. dd is an easy to use (real life saver) command. Read the man page of dd for more information.
$ man dd