≡ Menu

unix filesystem

Following are few situations where you may be interested in performing a filesystem benchmarking.

=> Deploying a new application that is very read and write intensive.
=> Purchased a new storage system and would like to measure the performance.
=> Changing the RAID level and would like to measure the performance of the new RAID.
=> Changing the storage parameters and would like to know the performance impact of this change

This article gives you a jumpstart on performing benchmark on filesystem using iozone a free Filesystem Benchmark utility.

1. Download and Install iozone software

Go to iozone and download the iozone for your appropriate platform. I downloaded the "Linux i386 RPM". Install the iozone from the RPM as shown below. By default this installs the iozone under /opt/iozone/bin
# cd /tmp
# wget http://www.iozone.org/src/current/iozone-3-303.i386.rpm
# rpm -ivh iozone-3-303.i386.rpm

Sample output:

Preparing...                ########################################### [100%]
   1:iozone                 ########################################### [100%]

Note: You can install iozone under any UNIX / Linux or Windows operating system.

2. Start the performance test

Execute the following command in the background to begin the performance test.

# /opt/iozone/bin/iozone -R -l 5 -u 5 -r 4k -s 100m -F /home/f1 /home/f2 /home/f3 /home/f4 /home/f5 | tee -a /tmp/iozone_results.txt &

Let us review all the individual parameter passed to iozone command.

  • -R : Instructs the iozone to generate excel compatible text output.
  • -l : This is the lower limit on how many process/threads needs to be started by iozone during execution. In this example, iozone will start 5 threads.
  • -u : This is the upper limit on how many process/threads needs to be started by iozone during execution. In this example, iozone will not exceed maximum of 5 threads. If you set -l and -u to the same value, it will run exactly those many number of process/threads. In this example, this will execute exactly 5 threads.
  • -r : This specifies the record size. In this example, the record size for benchmark testing is 4k. This is an important parameter to be set appropriately depending on the purpose of your filesystem performance testing. For e.g. If you are performing benchmark on a filesystem that will host a database, it is appropriate to set this value to the DB block size of the database.
  • -s : This specifies the size of the file that needs to be tested. In this example, iozone will try to perform test on 100Mb file.
  • -F : Specify the temporary filename that should be used by the iozone during testing. The total number of files specified here should match the value specified in -l and -u parameter.

3. Analyze the output of iozone file.

The first part of the output will contain the details about every individual filesystem performance metrics that was tested. for e.g. Initial write, rewrite etc as shown below.

        Iozone: Performance Test of File I/O
                Version $Revision: 3.303 $
                Compiled for 32 bit mode.
                Build: linux
        Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
                     Al Slater, Scott Rhine, Mike Wisner, Ken Goss
                     Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
                     Randy Dunlap, Mark Montague, Dan Million,
                     Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy,
                     Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root.
        Run began: Thu Jun  22 00:08:51 2008
        Excel chart generation enabled
        Record Size 4 KB
        File size set to 102400 KB
        Command line used: /opt/iozone/bin/iozone -R -l 5 -u 5 -r  4k -s 100m -F /home/f1 /home/f2 /home/f3 /home/f4 /home/f5
        Output is in Kbytes/sec
        Time Resolution = 0.000001 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
        Min process = 5
        Max process = 5
        Throughput test with 5 processes
        Each process writes a 102400 Kbyte file in 4 Kbyte records
        Children see throughput for  2 initial writers  =   60172.28 KB/sec
        Parent sees throughput for  2 initial writers   =   45902.89 KB/sec
        Min throughput per process                        =   28564.52 KB/sec
        Max throughput per process                      =   31607.76 KB/sec
        Avg throughput per process                      =   30086.14 KB/sec
        Min xfer                                                  =   92540.00 KB
        Children see throughput for  2 rewriters        =   78658.92 KB/sec
        Parent sees throughput for  2 rewriters         =   34277.52 KB/sec
        Min throughput per process                      =   35743.92 KB/sec
        Max throughput per process                      =   42915.00 KB/sec
        Avg throughput per process                      =   39329.46 KB/sec
        Min xfer                                        =   85296.00 KB

Similar values like above will be generated for readers, re-readers, reverse readers, stride readers, random readers, mixed workload, random writers, pwrite writers, pread readers. The last part of the iozone output will contain the Throughput summary for different metrics as shown below.

Throughput report Y-axis is type of test X-axis is number of processes
Record size = 4 Kbytes
Output is in Kbytes/sec
Initial write       60172.28
Rewrite           78658.92
Read              2125613.88
Re-read          1754367.31
Reverse Read 1603521.50
Stride read      1633166.38
Random read   1583648.75
Mixed workload 1171437.78
Random write    5365.59
Pwrite               26847.44
Pread               2054149.00


(Fig.01: iozone in action)

Iozone does a benchmarking on different types of file system performance metrics. for e.g. Read, Write, Random read. Depending on the application that you are planning to deploy on that particular filesystem, pay attention to the appropriate items. for e.g. If the filesystem hosts an read intensive OLTP database, then pay attention to Random Read, Random write and Mixed workload. If the application does lot of streaming media content, pay attention to the Sequential Read. On a final note, you can generate graphs using the Generate_Graphs and gengnuplot.sh located under /opt/iozone/bin, based on the iozone output.

References:

  • Iozone PDF documentation – Full documentation from iozone.org explaining all the iozone command line options and more.
  • Linux Iozone graph example - This is a sample *.xls file from iozone that shows the kind of excel output that can be generated from iozone.

A single inode number use to represent file in each file system. All hard links based upon inode number.

So linking across file system will lead into confusing references for UNIX or Linux. For example, consider following scenario

* File system: /home
* Directory: /home/vivek
* Hard link: /home/vivek/file2
* Original file: /home/vivek/file1

Now you create a hard link as follows:
$ touch file1
$ ln file1 file2
$ ls -l

Output:

-rw-r--r--  2 vivek vivek    0 2006-01-30 13:28 file1
-rw-r--r--  2 vivek vivek    0 2006-01-30 13:28 file2

Now just see inode of both file1 and file2:
$ ls -i file1
782263
$ ls -i file2
782263

As you can see inode number is same for hard link file called file2 in inode table under /home file system. Now if you try to create a hard link for /tmp file system it will lead to confusing references for UNIX or Linux file system. Is that a link no. 782263 in the /home or /tmp file system? To avoid this problem UNIX or Linux does not allow creating hard links across file system boundaries. Continue reading rest of the Understanding Linux file system series (this is part VII):

  • Part I - Understanding Linux superblock
  • Part II - Understanding Linux superblock
  • Part III - An example of Surviving a Linux Filesystem Failures
  • Part IV - Understanding filesystem Inodes
  • Part V - Understanding filesystem directories
  • Part VI - Understanding UNIX/Linux symbolic (soft) and hard links
  • Part VII - Why isn't it possible to create hard links across file system boundaries?

Inodes are associated with precisely one directory entry at a time. However, with hard links it is possible to associate multiple directory entries with a single inode. To create a hard link use ln command as follows:
# ln /root/file1 /root/file2
# ls -l

Above commands create a link to file1. Symbolic links refer to:

A symbolic path indicating the abstract location of another file.

Hard links refer to:

The specific location of physical data.

Hard link vs. Soft link in Linux or UNIX

  • Hard links cannot link directories.
  • Cannot cross file system boundaries.

Soft or symbolic links are just like hard links. It allows to associate multiple filenames with a single file. However, symbolic links allows:

  • To create links between directories.
  • Can cross file system boundaries.

These links behave differently when the source of the link is moved or removed.

  • Symbolic links are not updated.
  • Hard links always refer to the source, even if moved or removed.

How do I create symbolic link?

You can create symbolic link with ln command:
$ ln -s /path/to/file1.txt /path/to/file2.txt
$ ls -ali

Above command will create a symbolic link to file1.txt.

Task: Symbolic link creation and deletion

Let us create a directory called foo, enter:
$ mkdir foo
$ cd foo

Copy /etc/resolv.conf file, enter:
$ cp /etc/resolv.conf .
View inode number, enter:
$ ls -ali
Sample output:

total 152
1048600 drwxr-xr-x   2 vivek vivek   4096 2008-12-09 20:19 .
1015809 drwxrwxrwt 220 root  root  143360 2008-12-09 20:19 ..
1048601 -rwxr-xr-x   1 vivek vivek    129 2008-12-09 20:19 resolv.conf

Now create soft link to resolv.conf, enter:
$ ln -s resolv.conf alink.conf
$ ls -ali

Sample output:

total 152
1048600 drwxr-xr-x   2 vivek vivek   4096 2008-12-09 20:24 .
1015809 drwxrwxrwt 220 root  root  143360 2008-12-09 20:19 ..
1048602 lrwxrwxrwx   1 vivek vivek     11 2008-12-09 20:24 alink.conf -> resolv.conf
1048601 -rwxr-xr-x   1 vivek vivek    129 2008-12-09 20:19 resolv.conf

The reference count of the directory has not changed (total 152). Our symbolic (soft) link is stored in a different inode than the text file (1048602). The information stored in resolv.conf is accessible through the alink.conf file. If we delete the text file resolv.conf, alink.conf becomes a broken link and our data is lost:
$ rm resolv.conf
$ ls -ali

If alink.conf was a hard link, our data would still be accessible through alink.conf. Also, if you delete the soft link itself, the data would still be there. Read man page of ln for more information.
Continue reading rest of the Understanding Linux file system series (this is part VI):

  • Part I - Understanding Linux superblock
  • Part II - Understanding Linux superblock
  • Part III - An example of Surviving a Linux Filesystem Failures
  • Part IV - Understanding filesystem Inodes
  • Part V - Understanding filesystem directories
  • Part VI - Understanding UNIX/Linux symbolic (soft) and hard links
  • Part VII - Why isn't it possible to create hard links across file system boundaries?

An inode identifies the file and its attributes such as file size, owner, and so on. A unique inode number within the file system identifies each inode. But, why to delete file by an inode number? Sure, you can use rm command to delete file. Sometime accidentally you creates filename with control characters or characters which are unable to be input on a keyboard or special character such as ?, * ^ etc. Removing such special character filenames can be problem. Use following method to delete a file with strange characters in its name:

Please note that the procedure outlined below works with Solaris, FreeBSD, Linux, or any other Unixish oses out there:

Find out file inode

First find out file inode number with any one of the following command:

stat {file-name}

OR

ls -il {file-name}

Use find command to remove file:

Use find command as follows to find and remove a file:

find . -inum [inode-number] -exec rm -i {} \;

When prompted for confirmation, press Y to confirm removal of the file.

Delete or remove files with inode number

Let us try to delete file using inode number.

(a) Create a hard to delete file name:
$ cd /tmp
$ touch "\+Xy \+\8"
$ ls

(b) Try to remove this file with rm command:
$ rm \+Xy \+\8

(c) Remove file by an inode number, but first find out the file inode number:
$ ls -ilOutput:

781956 drwx------  3 viv viv 4096 2006-01-27 15:05 gconfd-viv
781964 drwx------  2 viv viv 4096 2006-01-27 15:05 keyring-pKracm
782049 srwxr-xr-x  1 viv viv    0 2006-01-27 15:05 mapping-viv
781939 drwx------  2 viv viv 4096 2006-01-27 15:31 orbit-viv
781922 drwx------  2 viv viv 4096 2006-01-27 15:05 ssh-cnaOtj4013
781882 drwx------  2 viv viv 4096 2006-01-27 15:05 ssh-SsCkUW4013
782263 -rw-r--r--  1 viv viv    0 2006-01-27 15:49 \+Xy \+\8

Note: 782263 is inode number.

(d) Use find command to delete file by inode:
Find and remove file using find command, type the command as follows:
$ find . -inum 782263 -exec rm -i {} \;

Note you can also use add \ character before special character in filename to remove it directly so the command would be:
$ rm "\+Xy \+\8"

If you have file like name like name "2005/12/31" then no UNIX or Linux command can delete this file by name. Only method to delete such file is delete file by an inode number. Linux or UNIX never allows creating filename like 2005/12/31 but if you are using NFS from MAC OS or Windows then it is possible to create a such file.

See also:

Understanding UNIX / Linux filesystem directories

You use DNS (domain name system) to translate between domain names and IP addresses.

Similarly files are referred by file name, not by inode number. So what is the purpose of a directory? You can groups the files according to your usage. For example all configuration files are stored under /etc directory. So the purpose of a directory is to make a connection between file names and their associated inode number. Inside every directory you will find out two directories . (current directory) and .. (pointer to previous directory i.e. the directory immediately above the one I am in now). The .. appears in every directory except for the root directory.

Directory

A directory contained inside another directory is called a subdirectory. At the end the directories form a tree structure. Use tree command to see directory tree structure:
$ tree /etc | less
Again a directory has an inode just like a file. It is a specially formatted file containing records which associate each name with an inode number. Please note the following limitation of directories under ext2/3 file system:

  • There is an upper limit of 32768 subdirectories in a single directory.
  • There is a "soft" upper limit of about 10-15k files in a single directory

However according to official documentation of ext2/3 file system points that “Using a hashed directory index (which is under development) allows 100k-1M+ files in a single directory without performance problems'. Here are my two favorite alias commands related to directory :
$ alias ..='cd ..'
alias d='ls -l | grep -E "^d"'


Well I'm sure all of you know the basic commands related to directories and files managment. Click above (or here) to see summery of all basic commands related to directories and files managment. See interesting discussion about soft links and directories. This is 6th part of "Understanding UNIX/Linux file system, continue reading rest of the Understanding Linux file system series (this is part IV):

  • Part I - Understanding Linux superblock
  • Part II - Understanding Linux superblock
  • Part III - An example of Surviving a Linux Filesystem Failures
  • Part IV - Understanding filesystem Inodes
  • Part V - Understanding filesystem directories
  • Part VI - Understanding UNIX/Linux symbolic (soft) and hard links
  • Part VII - Why isn't it possible to create hard links across file system boundaries?

Understanding UNIX / Linux filesystem Inodes

The inode (index node) is a fundamental concept in the Linux and UNIX filesystem. Each object in the filesystem is represented by an inode. But what are the objects? Let us try to understand it in simple words. Each and every file under Linux (and UNIX) has following attributes:

=> File type (executable, block special etc)
=> Permissions (read, write etc)
=> Owner
=> Group
=> File Size
=> File access, change and modification time (remember UNIX or Linux never stores file creation time, this is favorite question asked in UNIX/Linux sys admin job interview)
=> File deletion time
=> Number of links (soft/hard)
=> Extended attribute such as append only or no one can delete file including root user (immutability)
=> Access Control List (ACLs)

All the above information stored in an inode. In short the inode identifies the file and its attributes (as above) . Each inode is identified by a unique inode number within the file system. Inode is also know as index number.

inode definition

An inode is a data structure on a traditional Unix-style file system such as UFS or ext3. An inode stores basic information about a regular file, directory, or other file system object.

How do I see file inode number?

You can use ls -i command to see inode number of file
$ ls -i /etc/passwd
Sample Output

32820 /etc/passwd

You can also use stat command to find out inode number and its attribute:
$ stat /etc/passwdOutput:

File: `/etc/passwd'
Size: 1988            Blocks: 8          IO Block: 4096   regular file
Device: 341h/833d       Inode: 32820       Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2005-11-10 01:26:01.000000000 +0530
Modify: 2005-10-27 13:26:56.000000000 +0530
Change: 2005-10-27 13:26:56.000000000 +0530

Inode application

Many commands used by system administrators in UNIX / Linux operating systems often give inode numbers to designate a file. Let us see he practical application of inode number. Type the following commands:
$ cd /tmp
$ touch \"la*
$ ls -l

Now try to remove file "la*

You can't, to remove files having created with control characters or characters which are unable to be input on a keyboard or special character such as ?, * ^ etc. You have to use inode number to remove file. This is fourth part of "Understanding UNIX/Linux file system, continue reading rest of the Understanding Linux file system series (this is part IV):

  • Part I - Understanding Linux superblock
  • Part II - Understanding Linux superblock
  • Part III - An example of Surviving a Linux Filesystem Failures
  • Part IV - Understanding filesystem Inodes
  • Part V - Understanding filesystem directories
  • Part VI - Understanding UNIX/Linux symbolic (soft) and hard links
  • Part VII - Why isn't it possible to create hard links across file system boundaries?

Understanding UNIX / Linux filesystem Superblock

This is second part of "Understanding UNIX/Linux file system", part I is here. Let us take an example of 20 GB hard disk. The entire disk space subdivided into multiple file system blocks. And blocks used for what?

Unix / Linux filesystem blocks

The blocks used for two different purpose:

  1. Most blocks stores user data aka files (user data).
  2. Some blocks in every file system store the file system's metadata. So what the hell is a metadata?

In simple words Metadata describes the structure of the file system. Most common metadata structure are superblock, inode and directories. Following paragraphs describes each of them.

Superblock

Each file system is different and they have type like ext2, ext3 etc. Further each file system has size like 5 GB, 10 GB and status such as mount status. In short each file system has a superblock, which contains information about file system such as:

  • File system type
  • Size
  • Status
  • Information about other metadata structures

If this information lost, you are in trouble (data loss) so Linux maintains multiple redundant copies of the superblock in every file system. This is very important in many emergency situation, for example you can use backup copies to restore damaged primary super block. Following command displays primary and backup superblock location on /dev/sda3:
# dumpe2fs /dev/hda3 | grep -i superblock
Output:

Primary superblock at 0, Group descriptors at 1-1
Backup superblock at 32768, Group descriptors at 32769-32769
Backup superblock at 98304, Group descriptors at 98305-98305
Backup superblock at 163840, Group descriptors at 163841-163841
Backup superblock at 229376, Group descriptors at 229377-229377
Backup superblock at 294912, Group descriptors at 294913-294913

Continue reading rest of the Understanding Linux file system series (this is part II):

  • Part I - Understanding Linux superblock
  • Part II - Understanding Linux superblock
  • Part III - An example of Surviving a Linux Filesystem Failures
  • Part IV - Understanding filesystem Inodes
  • Part V - Understanding filesystem directories
  • Part VI - Understanding UNIX/Linux symbolic (soft) and hard links
  • Part VII - Why isn't it possible to create hard links across file system boundaries?