Linux x86_64: Detecting Hardware Errors

Posted on in Categories CentOS, Debian Linux, fedora linux, Gentoo Linux, Hardware, Howto, kernel, Linux, Linux distribution, Networking, package management, RedHat/Fedora Linux, Shell scripting, Sys admin, Tips, Troubleshooting, Ubuntu Linux last updated June 2, 2009

The Blue Screen of Death (BSoD) is used for the error screen displayed by Microsoft Windows, after encountering a critical system. Linux / UNIX like operating system may get a kernel panic. It is just like BSoD. The BSoD and a kernel panic generated using a Machine Check Exception (MCE). MCE is nothing but feature of AMD / Intel 64 bit systems which is used to detect an unrecoverable hardware problem.

Program such mcelog decodes machine check events (hardware errors) on x86-64 machines running a 64-bit Linux kernel. It should be run regularly as a cron job on any x86-64 Linux system. This is useful for predicting server hardware failure before actual server crash.

valgrind – Linux Tools For Debugging And Profiling Programs ( bug reporting tool )

Posted on in Categories Hardware, Howto, Linux, Sys admin, Troubleshooting last updated April 15, 2008

Few days back I wrote about strace tool for reporting and finding bug in program. Today I’m going to talk about another interesting tool called valgrind.

Valgrind is a flexible program for debugging and profiling Linux executables. It consists of a core, which provides a synthetic CPU in software, and a series of “tools”, each of which is a debugging or profiling tool. The architecture is modular, so that new tools can be created easily and without disturbing the existing structure. There are Valgrind tools that can automatically detect many memory management and threading bugs, and profile your programs in detail. You can also use Valgrind to build new tools.

The Valgrind distribution currently includes five production-quality tools:

  • a memory error detector
  • a thread error detector
  • a cache and branch-prediction profiler
  • a call-graph generating cache profiler
  • a heap profiler

It also includes two experimental tools:

  • a data race detector
  • an instant memory leak detector.

It runs on the following platforms:

  • X86/Linux
  • AMD64/Linux
  • PPC32/Linux
  • PPC64/Linux

How do I use valgrind?

Valgrind is typically run as follows:
$ valgrind command-name arg1 arg2 argN
$ valgrind program args
$ valgrind ./myapp -d /tmp -f 120

You can select tool using the –tool=TOOLName option. For example use memcheck which is a fine-grained memory checker. To generate trace back for command called myapp, enter:
$ valgrind --tool=memcheck -v --log-file=myapp.dbg --num-callers=8 ./myapp -d /tmp -f 120
Where,

  • –tool=memcheck : Run the Valgrind tool called memcheck
  • -v : Verbose output
  • –log-file=myapp.dbg : Specifies that Valgrind should send all of its messages to the specified file.
  • –num-callers=8 : By default, Valgrind shows twelve levels of function call names to help you identify program locations. You can change that number with this option. This can help in determining the program’s location in deeply-nested call chains.

The –leak-check option turns on the detailed memory leak detector:
$ valgrind --tool=memcheck -v --log-file=myapp.dbg --num-callers=8 --leak-check=yes ./myapp -d /tmp -f 120

Further readings: