Linux kernel: Uhhuh. NMI received for unknown reason 30

by on January 16, 2008 · 5 comments· LAST UPDATED January 16, 2009

in , ,

Q. I've upgrade my CentOS / RHEL (Red Hat Enterprise Linux) 4.7 on HP ProLiant DL580 G5 and it is showing unknown NMI errors in the logs:

Uhhuh. NMI received for unknown reason 30.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?

Uhhuh. NMI received for unknown reason 20.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?

How do I fix this error?
A. This is caused when the system is hanging under load. Add any one of the following to you /boot/grub.conf file:

  1. Disable the NMI watchdog by adding "nmi_watchdog=0"
  2. Disable the high precision event timer (HPET) by adding "nohpet"

Open grub.conf, type:
vi grub.conf
Make modification to kernel line as follows:

title Red Hat Enterprise Linux AS (2.6.9-78.0.8.EL)
        root (hd0,0)
        kernel /vmlinuz-2.6.9-78.0.8.EL ro nohpet root=/dev/VolGroup00/LogVol00 nohpet
        initrd /initrd-2.6.9-78.0.8.EL.img

Save and close the file. Reboot the server:
# reboot

TwitterFacebookGoogle+PDF versionFound an error/typo on this page? Help us!

{ 5 comments… read them below or add one }

1 MiG December 1, 2011 at 2:35 pm

try also the “acpi=off” switch because on some cause it works better than the two ones proposed in this article

Reply

2 bob dobbs January 6, 2012 at 10:34 pm

So, you recommend turning off the nmi_watchdog, that listens for hardware throwing errors that may compromise your system? Probably not smart. Just out of curiosity why would you additionally suggest changing the kernel timer as a method to avoid a hardware device throwing kill signals? BTW, this is not caused by the system hanging under load.

http://en.wikipedia.org/wiki/Non-maskable_interrupt

even wikipedia knows better

Reply

3 Richard Parker March 21, 2012 at 9:03 pm

Well for one Virtual Machines cannot use NMI watchdog or HPET for that matter, so disabling will stop the errors and prevent possible problems.

Reply

4 higkoo November 13, 2013 at 3:42 am

I had see
kernel:Uhhuh. NMI received for unknown reason 31 on CPU 0.
kernel:Uhhuh. NMI received for unknown reason 21 on CPU 0.
kernel:Do you have a strange power saving mode enabled?
kernel:Dazed and confused, but trying to continue

I had try :
add ‘nmi_watchdog=0 pcie_aspm=off nohpet’ to kernel param
change a older kernel

Result:
Use a older kernel 2.6.32-131.21.1 (default is 2.6.32-358.23.2)。

Reply

5 Enrique Zapata November 30, 2013 at 9:45 am

Hello,

I am thankful for this posting, which has been a useful resource for testing one of my servers.

My 2 cents is that while this mechanism is a great tool to diagnose possible problems, configuring it (turning it off) to get rid of the thousand of error messages in the log helped extend the system hang periods, from a few hours to a couple of days, but the hangs did not disappear.

Nevertheless, as bob dobbs comments, you are only closing your eyes to a problem that needs to be addressed anyways. It is a temporary solution on your log, that does not fix the existing problem, and the system hangs will keep occurring.

In our case this problem triggered after a system upgrade, but we believe that the upgrade only turned on the watchdog which started logging the errors and increasing the failure rate.

So, basically leaving the watchdog on (as it is the default on this specific debian kernel), is a sign most likely of an abnormal memory or a hardware problem that needs to be addressed soon, which caused one of your CPU’s to hang, and not respond to the 5 second checks by the NMI Watchdog. Kernel is: 2.6.32-5-686-bigmen

My advise is to NOT turn it off, but only while you find out what the real error is. Once you find the cause of failure you must turn it back on and see what happens.

=======================================================
One more piece of advise. In case you have a hard time “sudoing” and echoing 0 to > /proc/sys/kernel/nmi_watchdog, etc. etc. and getting errors doing it, and even turning it off on /etc/sysctl.conf through:

# Turn OFF NMI watchdog
kernel.nmi_watchdog = 0

The only way it worked for me was to set it at the kernel loading level by editing it in /boot/grub/grub.cfg so it gets set during the bootloading process.

linux /boot/vmlinuz-2.6.32-5-686-bigmem root=UUID=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXX ro quiet nmi_watchdog=1

where the UUID will be different and specific to your particular system.

Reply

Leave a Comment

Tagged as: , , , , , , , , , , , , , , , ,

Previous Faq:

Next Faq: