Linux x86_64: Detecting Hardware Errors

Posted on in Categories CentOS, Debian Linux, fedora linux, Gentoo Linux, Hardware, Howto, kernel, Linux, Linux distribution, Networking, package management, RedHat/Fedora Linux, Shell scripting, Sys admin, Tips, Troubleshooting, Ubuntu Linux last updated June 18, 2009

The Blue Screen of Death (BSoD) is used for the error screen displayed by Microsoft Windows, after encountering a critical system. Linux / UNIX like operating system may get a kernel panic. It is just like BSoD. The BSoD and a kernel panic generated using a Machine Check Exception (MCE). MCE is nothing but feature of AMD / Intel 64 bit systems which is used to detect an unrecoverable hardware problem.

Program such mcelog decodes machine check events (hardware errors) on x86-64 machines running a 64-bit Linux kernel. It should be run regularly as a cron job on any x86-64 Linux system. This is useful for predicting server hardware failure before actual server crash.

Test If Linux Server SCSI / SATA Hard Disk Going Bad

Posted on in Categories CentOS, Debian Linux, File system, Gentoo Linux, Howto, Linux, Linux distribution, Sys admin, Tips, Troubleshooting, Ubuntu Linux last updated March 18, 2017

One of our regular sends us a question:

How can I test if my hard disk is going bad? I see few errors in /var/log/messages file.

I/O errors in /var/log/messages indicates that something is wrong with the hard disk and it may be failing.

You can check hard disk errors using smartctl command, which is control and monitor utility for SMART disks under Linux / UNIX like operating systems.