≡ Menu

Hardware

Software Vs Hardware RAID

A redundant array of inexpensive disks (RAID) allows high levels of storage reliability. RAID is not a backup solution. It is used to improve disk I/O (performance) and reliability of your server or workstation. A RAID can be deployed using both software and hardware. But the real question is whether you should use a hardware RAID solution or a software RAID solution.

In this post I will document my experience with both software and hardware RAID.
[click to continue…]

I've Windows Vista installed as a guest under Ubuntu Linux using VMWARE Workstation 6.0. This is done for testing purpose and browsing a few site that only works with Internet Explorer. Since I only use it for testing I made 16GB for Vista and 5GB for CentOS and 5GB in size for FreeBSD guest operating systems. However, after some time I realized I'm running out of disk space under both CentOS and Vista. Adding a second hard drive under CentOS solved my problem as LVM was already in use. Unfortunately, I needed to double 32GB space without creating a new D: drive under Windows Vista. Here is a simple procedure to increase your Virtual machine's disk capacity by resizing vmware vmdk file.
[click to continue…]

A typical question from my mailbag:

How do I find out if a given PCI hardware is supported of by the current CentOS / Debian / RHEL / Fedora Linux kernel?

You can easily find out find out if a given piece of PCI hardware such as RAID, network, sound, graphics card is supported or not by the current Linux kernel using the following utilities under any Linux distributions.
[click to continue…]

Linux x86_64: Detecting Hardware Errors

The Blue Screen of Death (BSoD) is used by Microsoft Windows, after encountering a critical system error. Linux / UNIX like operating system may get a kernel panic. It is just like BSoD. The BSoD and a kernel panic generated using a Machine Check Exception (MCE). MCE is nothing but feature of AMD / Intel 64 bit systems which is used to detect an unrecoverable hardware problem. MCE can detect:

  • Communication error between CPU and motherboard.
  • Memory error - ECC problems.
  • CPU cache errors and so on.

[click to continue…]

Unplanned downtime may be the result of a software bug, human error, equipment failure, power failure, and much more. Last week was a bad one. We faced three different downtime:

  • First, there was a fiber cut for one of our data center resulting into routing anomalies due BGP reroute. Traffic was rerouted but updating those BGP tables took some time to update.
  • Someone from networking team failed to follow proper maintenance procedures for network device resulted into 55 minutes downtime.
  • One of our SAN hardware failure - Many internal UNIX / Linux web applications use SAN to store data including file server, tracking apps, R&D apps, IT help desk, LAN and WAN servers failed. This one lasted for 12 hrs. It was stared around midnight. The vendor replaced entire SAN hardware. Now we have dual stacked SAN as a backup device for internal usage.

[click to continue…]

The round-robin database tool aims to handle time-series data like network bandwidth, temperatures, CPU load etc. The data gets stored in round-robin database so that system storage footprint remains constant over time. Lighttpd comes with mod_rrdtool to monitor the server load and other details. This is useful for debugging and tuning lighttpd / fastcgi server performance.
[click to continue…]

Applications that perform a lot of memory accesses (several GBs) may obtain performance improvements by using large pages due to reduced Translation Lookaside Buffer (TLB) misses. HugeTLBfs is memory management feature offered in Linux kernel, which is valuable for applications that use a large virtual address space. It is especially useful for database applications such as MySQL, Oracle and others. Other server software that uses the prefork or similar (e.g. Apache web server) model will also benefit.

[click to continue…]