Linux x86_64: Detecting Hardware Errors

last updated in Categories CentOS, Debian Linux, fedora linux, Gentoo Linux, Hardware, Howto, kernel, Linux, Linux distribution, Networking, package management, RedHat/Fedora Linux, Shell scripting, Sys admin, Tips, Troubleshooting, Ubuntu Linux

The Blue Screen of Death (BSoD) is used for the error screen displayed by Microsoft Windows, after encountering a critical system. Linux / UNIX like operating system may get a kernel panic. It is just like BSoD. The BSoD and a kernel panic generated using a Machine Check Exception (MCE). MCE is nothing but feature of AMD / Intel 64 bit systems which is used to detect an unrecoverable hardware problem.

Program such mcelog decodes machine check events (hardware errors) on x86-64 machines running a 64-bit Linux kernel. It should be run regularly as a cron job on any x86-64 Linux system. This is useful for predicting server hardware failure before actual server crash.

Reboot Linux box after a kernel panic

last updated in Categories CentOS, Debian Linux, Gentoo Linux, Hardware, Howto, Linux, RedHat/Fedora Linux, Tips, Troubleshooting, Tuning

If you want the server to get rebooted automatically after kernel hit by a pain error message, try adding panic=N to /etc/sysctl.conf file.

It specify kernel behavior on panic. By default, the kernel will not reboot after a panic, but this option will cause a kernel reboot after N seconds. For example following boot parameter will force to reboot Linux after 10 seconds.
Continue reading “Reboot Linux box after a kernel panic”

Get 15% off on Linux Foundation certified SysAdmin, Progamming, Kubernetes/Containers and Open Stack certification & course. Use "SPLASH15" coupon code. Offer expires on August 27, 2018
training.linuxfoundation.org