≡ Menu

data center

Parallel NFS: Read / Write Hundreds of Gigabytes Per Second

NFS is pretty old file sharing technology for UNIX based system and storage systems. However, it suffers from performance issues. NFSv4.1 address data access issues by adding a new feature called parallel NFS (pNFS) - a method of introducing Data Access Parallelism. The end result is ultra fast file sharing for clusters and high availability configurations.

The Network File System (NFS) is a stalwart component of most modern local area networks (LANs). But NFS is inadequate for the demanding input- and output-intensive applications commonly found in high-performance computing -- or, at least it was. The newest revision of the NFS standard includes Parallel NFS (pNFS), a parallelized implementation of file sharing that multiplies transfer rates by orders of magnitude.

In addition to pNFS, NFSv4.1 provides Sessions, Directory Delegation and Notifications, Multi-server Namespace, ACL/SACL/DACL, Retention Attributions, and SECINFO_NO_NAME.

Fig.01: The conceptual organization of pNFS - Image credit IBM

Fig.01: The conceptual organization of pNFS - Image credit IBM

According to wikipedia:

The NFSv4.1 protocol defines a method of separating the meta-data (names and attributes) of a filesystem from the location of the file data; it goes beyond the simple name/data separation of striping the data amongst a set of data servers. This is different from the traditional NFS server which holds the names of files and their data under the single umbrella of the server. There exists products which are multi-node NFS servers, but the participation of the client in separation of meta-data and data is limited. The NFSv4.1 client can be enabled to be a direct participant in the exact location of file data and avoid solitary interaction with the single NFS server when moving data.

The NFSv4.1 pNFS server is a collection of server resources or components; these are assumed to be controlled by the meta-data server.

The pNFS client still accesses a single meta-data server for traversal or interaction with the namespace; when the client moves data to and from the server it may be directly interacting with the set of data servers belonging to the pNFS server collection.

More information about pNFS

  1. Scale your file system with Parallel NFS
  2. Linux NFS Overview, FAQ and HOWTO Documents
  3. NFSv4 delivers seamless network access
  4. Nfsv4 Status Pages
  5. NFS article from the Wikipedia

Linux tgtadm: Setup iSCSI Target ( SAN )

Linux target framework (tgt) aims to simplify various SCSI target driver (iSCSI, Fibre Channel, SRP, etc) creation and maintenance. The key goals are the clean integration into the scsi-mid layer and implementing a great portion of tgt in user space.

The developer of IET is also helping to develop Linux SCSI target framework (stgt) which looks like it might lead to an iSCSI target implementation with an upstream kernel component. iSCSI Target can be useful:

a] To setup stateless server / client (used in diskless setups).
b] Share disks and tape drives with remote client over LAN, Wan or the Internet.
c] Setup SAN - Storage array.
d] To setup loadbalanced webcluser using cluster aware Linux file system etc.

In this tutorial you will learn how to have a fully functional Linux iSCSI SAN using tgt framework.
[click to continue…]

Seagate Barracuda: 1.5TB Hard Drive Launched

Wow, this is a large size desktop hard disk for storing movies, tv shows, music / mp3s, and photos. You can also load multiple operating systems using vmware or other software for testing purpose. This hard disk comes with 5 year warranty and can transfer at 300MB/s. From the article:

It's been more than 18 months since Hitachi reached the terabyte mark with the Deskstar 7K1000. In that time, all the major players in the hard drive industry have spun up terabytes of their own, and in some cases, offered multiple models targeting different markets. With so many options available and more than enough time for the milestone capacity's initial buzz to fade, it's no wonder that the current crop of 1TB drives is more affordable than we've ever seen from a flagship capacity. The terabyte, it seems, is old news.

Fig.01: Seagate's Barracuda 7200.11 1.5TB hard drive

Fig.01: Seagate's Barracuda 7200.11 1.5TB hard drive

The real question is about reliability. How reliable is the hard disk? So far my Seagate 500GB hard disk working fine. I might get one to dump all my multimedia data / files :)

Linux Support For Intel Core i7 (Nehalem) Processors

Latest version of Linux kernel does support the Intel Core i7 (Nehalem) processors. Nehalem (microarchitecture), developed by Intel Corporation, successor to the Intel Core microarchitecture. Nehalem is the largest change in Intel's system architecture since the introduction of the Pentium Pro. Nehalem is highly scalable with different components for different tasks.
[click to continue…]

The Google Way: Saving Electricity For Data Center

Learn from Google how to save electricity while serving millions of request across a globe. Google come up with 5-step approach to build efficient data centers. From the page:

Google's mission is to organize the world's information and make it universally accessible and useful. Hundreds of millions of users access our services through the web, and supporting this traffic requires lots of computers. We strive to offer great internet services while taking our energy use very seriously. That's why, almost a decade ago, we started our efforts to make our computing infrastructure as sustainable as possible. Today we are operating what we believe to be the world's most efficient data centers.

As a result, the energy used per Google search is minimal. In the time it takes to do a Google search, your own personal computer will use more energy than we will use to answer your query.

=> Commitment to Sustainable Computing

CentOS / Red Hat Enterprise Linux 5.2 Poor NFS Performance and Solution

A few days ago I noticed that NFS performance between a web server node and NFS server went down by 50%. NFS was optimized and the only thing was updated Red Hat kernel v5.2. I also noticed same trend on CentOS 5.2 64 bit edition.

NFS server crashed each and every time web server node tried to store a large file 20-100 MB each. Read performance was fine but write performance went to hell. Finally, I had to rollback the updates. Recently, while reading Red Hat site I came across the solution.

Updated kernel packages that fix various security issues and several bugs are now available for Red Hat Enterprise Linux 5:

* a 50-75% drop in NFS server rewrite performance, compared to Red Hat
Enterprise Linux 4.6, has been resolved.

After upgrading kernel on both server and client my issue resolved:
# yum update

Red Hat / CentOS Linux 4: Setup Device Mapper Multipathing

Multipath I/O is a fault-tolerance and performance enhancement technique whereby there is more than one physical path between the CPU in a computer system and its mass storage devices through the buses, controllers, switches, and bridge devices connecting them.

A simple example would be a SCSI disk connected to two SCSI controllers on the same computer or a disk connected to two Fibre Channel ports. Should one controller, port or switch fail, the operating system can route I/O through the remaining controller transparently to the application, with no changes visible to the applications, other than perhaps incremental latency.

This is useful for:

  1. Dynamic load balancing
  2. Traffic shaping
  3. Automatic path management
  4. Dynamic reconfiguration

Linux device-mapper

In the Linux kernel, the device-mapper serves as a generic framework to map one block device onto another. It forms the foundation of LVM2 and EVMS, software RAIDs, dm-crypt disk encryption, and offers additional features such as file-system snapshots.

Device-mapper works by processing data passed in from a virtual block device, that it itself provides, and then passing the resultant data on to another block device.

How do I setup device-mapper multipathing in CentOS / RHEL 4 update 2 or above?

Open /etc/multipath.conf file, enter:
# vi /etc/multipath.conf
Make sure following line exists and commented out:

devnode_blacklist {
        devnode "*"

Make sure default_path_grouping_policy option in the defaults section set to failover. Here is my sample config:

defaults {
       multipath_tool  "/sbin/multipath -v0"
       udev_dir        /dev
       polling_interval 10
       default_selector        "round-robin 0"
       default_path_grouping_policy    failover
       default_getuid_callout  "/sbin/scsi_id -g -u -s /block/%n"
       default_prio_callout    "/bin/true"
       default_features        "0"
       rr_min_io              100
       failback                immediate

Save and close the file. Type the following command to load drivers:
# modprobe dm-multipath
# modprobe dm-round-robin

Start the service, enter:
# /etc/init.dmultipathd start
multipath is used to detect multiple paths to devices for fail-over or performance reasons and coalesces them:
# multipath -v2
Turn on service:
# /sbin/chkconfig multipathd on
Finally, create device maps from partition tables:
# kpartx -a /dev/mapper/mpath#
You need to use fdisk on the underlying disks such as /dev/sdc.


  • man page kpartx,multipath, udev, dmsetup and hotplug