High performance computing

The sar command collect, report, or save UNIX / Linux system activity information. It will save selected counters in the operating system to the /var/log/sa/sadd file. From the collected data, you get lots of information about your server:

  1. CPU utilization
  2. Memory paging and its utilization
  3. Network I/O, and transfer statistics
  4. Process creation activity
  5. All block devices activity
  6. Interrupts/sec etc.

sar output can be used for identifying server bottlenecks. However, analyzing information provided by sar can be difficult, so use kSar, which can take sar output and plot a nice easy to understand graph over period of time.

{ 35 comments }

Research shows that if your web pages take longer than 5 seconds to load, you lose 50% of your viewers and sales. As a UNIX admin often end users and web developers complain about website loading speed and timings. Usually, there is nothing wrong with my servers or server farm. Fancy java script and images / flash makes site pretty slow. These tools are useful to debug performance problems for sys admins, developers and end users. Here are six tools that can analyzes web pages and tells you why they are slow. Use the following tools to:

  • Make your site faster.
  • Debug site problem, especially client side and server side stuff.
  • Better user experience.
  • Improve the web.

{ 28 comments }

Unplanned downtime may be the result of a software bug, human error, equipment failure, power failure, and much more. Last week was a bad one. We faced three different downtime:

  • First, there was a fiber cut for one of our data center resulting into routing anomalies due BGP reroute. Traffic was rerouted but updating those BGP tables took some time to update.
  • Someone from networking team failed to follow proper maintenance procedures for network device resulted into 55 minutes downtime.
  • One of our SAN hardware failure – Many internal UNIX / Linux web applications use SAN to store data including file server, tracking apps, R&D apps, IT help desk, LAN and WAN servers failed. This one lasted for 12 hrs. It was stared around midnight. The vendor replaced entire SAN hardware. Now we have dual stacked SAN as a backup device for internal usage.

Note: There is a poll embedded within this post, please visit the site to participate in this post’s poll.

{ 6 comments }

Applications that perform a lot of memory accesses (several GBs) may obtain performance improvements by using large pages due to reduced Translation Lookaside Buffer (TLB) misses. HugeTLBfs is memory management feature offered in Linux kernel, which is valuable for applications that use a large virtual address space. It is especially useful for database applications such as MySQL, Oracle and others. Other server software(s) that uses the prefork or similar (e.g. Apache web server) model will also benefit.

The CPU’s Translation Lookaside Buffer (TLB) is a small cache used for storing virtual-to-physical mapping information. By using the TLB, a translation can be performed without referencing the in-memory page table entry that maps the virtual address. However, to keep translations as fast as possible, the TLB is usually small. It is not uncommon for large memory applications to exceed the mapping capacity of the TLB. Users can use the huge page support in Linux kernel by either using the mmap system call or standard SYSv shared memory system calls (shmget, shmat).

{ 11 comments }

I’ve three nameserver load-balanced (LB) in three geo locations. Each LB has a front end public IP address and two backend IP address (one for BIND and another for zone transfer) are assigned to actual bind 9 server running Linux. So when a zone transfer initiates from slave server, all I get errors. A connection cannot be established, it tries again with the servers main ip or LB2 / LB3 ip. This is a problem because my servers are geo located and load balanced. However, there is a small workaround for this problem.

{ 4 comments }

Get ready for a minute with 61 seconds. Scientists are delaying the start of 2009 by the first ‘leap second’ a timing tweak meant to make up for changes in the Earth’s rotation.

The aged Earth is slowing down in its daily rotation, at least in the current epoch. So a leap second is added (a one-second adjustment added) to our time. This year will be exactly one second longer.

Precise time measurements are needed for high-speed communications systems among other modern technologies such as clusters, GPS, networks. You need to make sure that you are running updated version of ntpd that support leap second for UNIX and Windows computers.

{ 0 comments }

I’ve already written about setting the MTU (Maximum Transmission Unit) under Linux including Jumbo frames (FreeBSD specific MTU information is here).

With this quick tip you can increase MTU size to get a better networking performance.

{ 0 comments }