The Blue Screen of Death (BSoD) is used by Microsoft Windows, after encountering a critical system error. Linux / UNIX like operating system may get a kernel panic. It is just like BSoD. The BSoD and a kernel panic generated using a Machine Check Exception (MCE). MCE is nothing but feature of AMD / Intel 64 bit systems which is used to detect an unrecoverable hardware problem. MCE can detect:
- Communication error between CPU and motherboard.
- Memory error - ECC problems.
- CPU cache errors and so on.
Program such mcelog decodes machine check events (hardware errors) on x86-64 machines running a 64-bit Linux kernel. It should be run regularly as a cron job on any x86-64 Linux system. This is useful for predicting server hardware failure before actual server crash.
Install mcelog
Type the following command under RHEL / CentOS / Fedora Linux, 64 bit kernel:
# yum install mcelog
Type the following command under Debian / Ubuntu Linux, 64 bit kernel:
# apt-get update && apt-get install mcelog
Default Cronjob
mcelog should be run regularly as a cron job on any x86-64 Linux system. By default following cron settings are used on Debian / Ubuntu Linux - /etc/cron.d/mcelog:
# /etc/cron.d/mcelog: crontab entry for the mcelog package SHELL=/bin/sh PATH=/sbin:/bin:/usr/sbin:/usr/bin */5 * * * * root test -x /usr/sbin/mcelog -a ! -e /etc/mcelog-disabled && /usr/sbin/mcelog --ignorenodev --filter >> /var/log/mcelog
CentOS / RHEL / Fedora Linux runs hourly cron job via /etc/cron.hourly/mcelog.cron:
#!/bin/bash
/usr/sbin/mcelog --ignorenodev --filter >> /var/log/mcelogHow do I view error logs?
Use tail or grep command:
# tail -f /var/log/mcelog
OR
# grep -i "hardware error" /var/log/mcelog
OR
# grep -c "hardware error" /var/log/mcelog
Alternatively, you can send an email alert when hardware error found on the system (write a shell script and call it via cron job):
# [ $(grep -c "hardware error" /var/log/mcelog) -gt 0 ] && echo "Hardware Error Found $(hostname) @ $(date)" | mail -s 'H/w Error' pager@example.com
With this tool I was able to pick up couple of hardware problem before a kernel panic i.e. server crash.
A Note About mcelog
- You need to use 64 bit Linux kernel and operating system to run mcelog. Machine checks can indicate failing hardware, system overheats, bad DIMMs or other problems. Some MCEs are fatal and can not generally be survived without reboot and h/w replacement, but I was able to catch lots of bad h/w before crash with this tool.
- mcat - A Windows command-line program from AMD to decode MCEs from AMD K8, Family 0x10 and 0x11 processors.
- mcelog project home page.
- mcedaemon - a daemonthat can get MCE notifications as soon as the kernel finds them. It does not try to interpret the MCE data, just alert other apps.
- Linux Kernel panic source code.
- man mcelog
- Machine check exception support information for MS-Windows server 2003 and XP operating systems.
- 30 Handy Bash Shell Aliases For Linux / Unix / Mac OS X
- Top 30 Nmap Command Examples For Sys/Network Admins
- 25 PHP Security Best Practices For Sys Admins
- 20 Linux System Monitoring Tools Every SysAdmin Should Know
- 20 Linux Server Hardening Security Tips
- Linux: 20 Iptables Examples For New SysAdmins
- Top 20 OpenSSH Server Best Security Practices
- Top 20 Nginx WebServer Best Security Practices
- 20 Examples: Make Sure Unix / Linux Configuration Files Are Free From Syntax Errors
- 15 Greatest Open Source Terminal Applications Of 2012

- My 10 UNIX Command Line Mistakes
- Top 10 Open Source Web-Based Project Management Software
- Top 5 Email Client For Linux, Mac OS X, and Windows Users
- The Novice Guide To Buying A Linux Laptop










{ 7 comments… read them below or add one }
Hello,
Is there any similar tools for 32-bit operating systems? You mention mcelog only works
with 64-bit operating systems.
Noop. AFAIK.
There are some other tools for other CPUs as well: Wikipedia
Do anyone know about a working solution for 32bit operating systems on x86_64 hardware?
if i run your script i am getting this error..
/etc/cron.hourly/mcelog.cron
Usage:
mcelog [--k8|--p4|--generic] [--syslog] [mcelogdevice]
mcelog [--k8|--p4|--generic] –ascii
Decode machine check error records
Hi Vivek ! i get lot of information through your website .. Thanks very much. pls help me to decode the mcelog errors: As i forwarded this case to HP , But as per hp its is firware issue ….What you have to say?
Node : BL280c-G6
1)plcg298: MCE 0
plcg298: HARDWARE ERROR. This is *NOT* a software problem!
plcg298: Please contact your hardware vendor
plcg298: CPU 11 BANK 5 TSC 7d0a8fb75c06bd [at 2934 Mhz 138 days 20:43:18 uptime (unreliable)]
plcg298: MISC 1091 ADDR 61797b458
plcg298: MCG status:
plcg298: MCi status:
plcg298: MCi_MISC register valid
plcg298: MCi_ADDR register valid
plcg298: MCA: corrected filtering (some unreported errors in same region)
plcg298: Data CACHE Level-1 Data-Read Error
plcg298: STATUS 8c20004000101135 MCGSTATUS 0
plcg371:
2) plcg423: MCE 0
plcg423: HARDWARE ERROR. This is *NOT* a software problem!
plcg423: Please contact your hardware vendor
plcg423: CPU 6 BANK 8 TSC 7ca01c751f525e [at 2934 Mhz 138 days 9:38:40 uptime (unreliable)]
plcg423: MISC 1008040200081588 ADDR 3f2c58200
plcg423: MCG status:
plcg423: MCi status:
plcg423: MCi_MISC register valid
plcg423: MCi_ADDR register valid
plcg423: MCA: MEMORY CONTROLLER RD_CHANNELunspecified_ERR
plcg423: Transaction: Memory read error
plcg423: STATUS 8c0000400001009f MCGSTATUS 0
plcg423: MCE 1
plcg423: HARDWARE ERROR. This is *NOT* a software problem!
plcg423: Please contact your hardware vendor
plcg423: CPU 2 BANK 8 TSC 7ca01c751f5057 [at 2934 Mhz 138 days 9:38:40 uptime (unreliable)]
plcg423: MISC 1008040200081588 ADDR 3f2c58200
plcg423: MCG status:
plcg423: MCi status:
plcg423: MCi_MISC register valid
plcg423: MCi_ADDR register valid
plcg423: MCA: MEMORY CONTROLLER RD_CHANNELunspecified_ERR
plcg423: Transaction: Memory read error
plcg423: STATUS 8c0000400001009f MCGSTATUS 0
hi
i have problem to install any os on laptop and test the dvd & usb i dont know how install os