Linux x86_64: Detecting Hardware Errors

in Categories CentOS, Debian Linux, fedora linux, Gentoo Linux, Hardware, Howto, kernel, Linux, Linux distribution, Networking, package management, RedHat/Fedora Linux, Shell scripting, Sys admin, Tips, Troubleshooting, Ubuntu Linux last updated June 2, 2009

The Blue Screen of Death (BSoD) is used for the error screen displayed by Microsoft Windows, after encountering a critical system. Linux / UNIX like operating system may get a kernel panic. It is just like BSoD. The BSoD and a kernel panic generated using a Machine Check Exception (MCE). MCE is nothing but feature of AMD / Intel 64 bit systems which is used to detect an unrecoverable hardware problem.

Program such mcelog decodes machine check events (hardware errors) on x86-64 machines running a 64-bit Linux kernel. It should be run regularly as a cron job on any x86-64 Linux system. This is useful for predicting server hardware failure before actual server crash.

Reboot Linux box after a kernel panic

in Categories CentOS, Debian Linux, Gentoo Linux, Hardware, Howto, Linux, RedHat/Fedora Linux, Tips, Troubleshooting, Tuning last updated November 16, 2007

If you want the server to get rebooted automatically after kernel hit by a pain error message, try adding panic=N to /etc/sysctl.conf file.

It specify kernel behavior on panic. By default, the kernel will not reboot after a panic, but this option will cause a kernel reboot after N seconds. For example following boot parameter will force to reboot Linux after 10 seconds.
Continue reading “Reboot Linux box after a kernel panic”