≡ Menu


VMWare ESX4 and ESX3.5: SCSI timeout For Linux Guest

Recently, I noticed that the timeout values differ on CentOS v5.x and RHEL Linux 5.x guests on VMWare ESX4 and ESX3.5. I've notices that older ESX 3.5 set a 60 secs timeout and ESX4.x set to 180 secs. Luckly you can fix it easily:
Edit /etc/udev/rules.d/99-vmware-scsi-udev.rules,
# vi /etc/udev/rules.d/99-vmware-scsi-udev.rules
Sample config:

RUN+="/bin/sh -c 'echo 180 >/sys$DEVPATH/device/timeout'"

Find timeout value (180) and change it as per your requirements. Make sure you install the vmware-tools RPM.

Top 10 Linux Virtualization Software

Virtualization is the latest buzz word. You may wonder computers are getting cheaper every day, why should I care and why should I use virtualization? Virtualization is a broad term that refers to the abstraction of computer resources such as:

  1. Platform Virtualization
  2. Resource Virtualization
  3. Storage Virtualization
  4. Network Virtualization
  5. Desktop Virtualization

This article describes why you need virtualization and list commonly used FOSS and proprietary Linux virtualization software.
[click to continue…]

Linux: Boot a 2TB+ partition or Larger Array Using Grub

I've already written about creating a partition size larger than 2TB under Linux using GNU parted command with GPT. In this tutorial, I will provide instructions for booting to a flat 2TB or larger RAID array under Linux using the GRUB boot loader.
[click to continue…]

Linux: Should You Use Twice the Amount of Ram as Swap Space?

Linux and other Unix-like operating systems use the term "swap" to describe both the act of moving memory pages between RAM and disk, and the region of a disk the pages are stored on. It is common to use a whole partition of a hard disk for swapping. However, with the 2.6 Linux kernel, swap files are just as fast as swap partitions. Now, many admins (both Windows and Linux/UNIX) follow an old rule of thumb that your swap partition should be twice the size of your main system RAM. Let us say I've 32GB RAM, should I set swap space to 64 GB? Is 64 GB of swap space really required? How big should your Linux / UNIX swap space be?

Old dumb memory managers

I think the '2x swap space' rule came from Old Solaris and Windows admins. Also, earlier memory mangers were very badly designed. There were not very smart. Today, we have very smart and intelligent memory manager for both Linux and UNIX.

Nonsense rule: Twice the size of your main system RAM for Servers

According to OpenBSD FAQ:

Many people follow an old rule of thumb that your swap partition should be twice the size of your main system RAM. This rule is nonsense. On a modern system, that's a LOT of swap, most people prefer that their systems never swap. You don't want your system to ever run out of RAM+swap, but you usually would rather have enough RAM in the system so it doesn't need to swap.

Select right size for your setup

Here is my rule for normal server (Web / Mail etc):

  1. Swap space == Equal RAM size (if RAM < 2GB)
  2. Swap space == 2GB size (if RAM > 2GB)

My friend who is a true Oracle GURU recommends something as follows for heavy duty Oracle server with fast storage such as RAID 10:

  1. Swap space == Equal RAM size (if RAM < 8GB)
  2. Swap space == 0.50 times the size of RAM (if RAM > 8GB)

Red Hat Recommendation

Red hat recommends setting as follows for RHEL 5:

The reality is the amount of swap space a system needs is not really a function of the amount of RAM it has but rather the memory workload that is running on that system. A Red Hat Enterprise Linux 5 system will run just fine with no swap space at all as long as the sum of anonymous memory and system V shared memory is less than about 3/4 the amount of RAM. In this case the system will simply lock the anonymous and system V shared memory into RAM and use the remaining RAM for caching file system data so when memory is exhausted the kernel only reclaims pagecache memory.

Considering that 1) At installation time when configuring the swap space there is no easy way to predetermine the memory a workload will require, and 2) The more RAM a system has the less swap space it typically needs, a better swap space

  1. Systems with 4GB of ram or less require a minimum of 2GB of swap space
  2. Systems with 4GB to 16GB of ram require a minimum of 4GB of swap space
  3. Systems with 16GB to 64GB of ram require a minimum of 8GB of swap space
  4. Systems with 64GB to 256GB of ram require a minimum of 16GB of swap space

Swap will just keep running servers...

Swap space will just keep operation running for a while on heavy duty servers by swapping process. You can always find out swap space utilization using any one of the following command:
cat /proc/swaps
swapon -s
free -m

See how to find out disk I/O and related information under Linux. In the end, you need to add more RAM, adjust software (like controlling Apache workers or using lighttpd web server to save RAM) or use some sort of load balancing.

Also, refer Linux kernel documentation for /proc/sys/vm/swappiness. With this you can fine tune swap space.

A note about Desktop and Laptop

If you are going to suspend to disk, then you need swap space more than actual RAM. For example, my laptop has 1GB RAM and swap is setup to 2GB. This only applies to Laptop or desktop but not to servers.

Kernel hackers need more swap space

If you are a kernel hacker (debugging and fixing kernel issues) and generating core dumps, you need twice the RAM swap space.


If Linux kernel is going to use more than 2GiB swap space at a time, all users will feel the heat. Either, you get more RAM (recommend) and move to faster storage to improve disk I/O. There are no rules, each setup and configuration is unique. Adjust values as per your requirements. Select amount of swap that is right for you.

What do you think? Please add your thoughts about 'swap space' in the comments below.

mount forcedirectio: Disable Linux CIFS / NFS Client Caching

If your network is heavily loaded you may see some problem with Common Internet File System (CIFS) and NFS under Linux. By default Linux CIFS mount command will try to cache files open by the client. You can use mount option forcedirectio when mounting the CIFS filesystem to disable caching on the CIFS client. This is tested with NETAPP and other storage devices and Novell, CentOS, UNIX and Red Hat Linux systems. This is the only way to avoid data mis-compare and problems.

The default is to attempt to cache ie try to request oplock on files opened by the client (forcedirectio is off). Foredirectio also can indirectly alter the network read and write size, since i/o will now match what was requested by the application, as readahead and writebehind is not being performed by the page cache when forcedirectio is enabled for a mount

mount -t cifs //mystorage/data2 -o username=vivek,password=myPassword,rw,bg,vers=3,proto=tcp,hard,intr,rsize=32768,wsize=32768,forcedirectio,llock /data2

Refer mount.cifs man page, docs stored at Documentation/filesystems/cifs.txt and fs/cifs/README in the linux kernel source tree for additional options and information.

Parallel NFS: Read / Write Hundreds of Gigabytes Per Second

NFS is pretty old file sharing technology for UNIX based system and storage systems. However, it suffers from performance issues. NFSv4.1 address data access issues by adding a new feature called parallel NFS (pNFS) - a method of introducing Data Access Parallelism. The end result is ultra fast file sharing for clusters and high availability configurations.

The Network File System (NFS) is a stalwart component of most modern local area networks (LANs). But NFS is inadequate for the demanding input- and output-intensive applications commonly found in high-performance computing -- or, at least it was. The newest revision of the NFS standard includes Parallel NFS (pNFS), a parallelized implementation of file sharing that multiplies transfer rates by orders of magnitude.

In addition to pNFS, NFSv4.1 provides Sessions, Directory Delegation and Notifications, Multi-server Namespace, ACL/SACL/DACL, Retention Attributions, and SECINFO_NO_NAME.

Fig.01: The conceptual organization of pNFS - Image credit IBM

Fig.01: The conceptual organization of pNFS - Image credit IBM

According to wikipedia:

The NFSv4.1 protocol defines a method of separating the meta-data (names and attributes) of a filesystem from the location of the file data; it goes beyond the simple name/data separation of striping the data amongst a set of data servers. This is different from the traditional NFS server which holds the names of files and their data under the single umbrella of the server. There exists products which are multi-node NFS servers, but the participation of the client in separation of meta-data and data is limited. The NFSv4.1 client can be enabled to be a direct participant in the exact location of file data and avoid solitary interaction with the single NFS server when moving data.

The NFSv4.1 pNFS server is a collection of server resources or components; these are assumed to be controlled by the meta-data server.

The pNFS client still accesses a single meta-data server for traversal or interaction with the namespace; when the client moves data to and from the server it may be directly interacting with the set of data servers belonging to the pNFS server collection.

More information about pNFS

  1. Scale your file system with Parallel NFS
  2. Linux NFS Overview, FAQ and HOWTO Documents
  3. NFSv4 delivers seamless network access
  4. Nfsv4 Status Pages
  5. NFS article from the Wikipedia

Linux tgtadm: Setup iSCSI Target ( SAN )

Linux target framework (tgt) aims to simplify various SCSI target driver (iSCSI, Fibre Channel, SRP, etc) creation and maintenance. The key goals are the clean integration into the scsi-mid layer and implementing a great portion of tgt in user space.

The developer of IET is also helping to develop Linux SCSI target framework (stgt) which looks like it might lead to an iSCSI target implementation with an upstream kernel component. iSCSI Target can be useful:

a] To setup stateless server / client (used in diskless setups).
b] Share disks and tape drives with remote client over LAN, Wan or the Internet.
c] Setup SAN - Storage array.
d] To setup loadbalanced webcluser using cluster aware Linux file system etc.

In this tutorial you will learn how to have a fully functional Linux iSCSI SAN using tgt framework.
[click to continue…]