Linux Nvidia: NVRM: GPU at 0000:01:00.0 Has Fallen Off The Bus Error and Solution

by on February 21, 2012 · 14 comments· LAST UPDATED July 7, 2012

in ,

I'm using NVIDIA UNIX x86_64 Kernel Module version (driver) 280.13 under Debian 64 bit Linux with Linux kernel 2.6.32-5-amd64 x86_64. However, I'm getting the following errors in my /var/log/messages file

Feb 13 05:53:39 wks01 kernel: [26652.425207] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
Feb 14 03:59:14 wks01 kernel: [39846.244283] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
Feb 17 04:47:32 wks01 kernel: [35237.485871] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
Feb 18 06:53:19 wks01 kernel: [49298.937949] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
Feb 19 06:14:01 wks01 kernel: [28508.567838] NVRM: GPU at 0000:01:00.0 has fallen off the bus.

This error occurs randomly and my laptop goes in hard freez mode. The hard reboot is the only way to recover from complete freeze of my Dell M6500 Debian Linux based laptop. How do I fix this problem?

This issue is reported all over the places and most recommended solutions are as follows:

Install Latest Kernel Version and NVIDIA Driver

You need to update your kernel and install the latest NVIDIA Unix driver.

Put NVIDIA Driver In Persistence Mode

You need to set your GPU in persistence mode. From the man page:

A flag that indicates whether persistence mode is enabled for the GPU. Value is either "Enabled" or "Disabled". When persistence mode is enabled the NVIDIA driver remains loaded even when no active clients, such as X11 or nvidia-smi, exist. This minimizes the driver load latency associated with running dependent apps, such as CUDA programs. For all CUDA- capable products. Linux only.

Edit /etc/rc.local file and add the following line before exit 0 statement:

 
/usr/bin/nvidia-smi -pm 1
 

Save and close the file. The above line ensures that your GPU is set to persistence mode as soon as it boots into the system.

How Do I Set Persistence Mode From Command Line?

Type the following command as root user:
# /usr/bin/nvidia-smi -pm 1

How Do I Verify That Persistence Mode Is Set From My Device?

Type the following command as root user:
# /usr/bin/nvidia-smi -q | grep -i Persistence
Sample outputs:

    Persistence Mode            : Enabled

How Do I View All Settings?

Type the following command to display GPU or unit info:
# nvidia-smi -q | less
Sample outputs:

==============NVSMI LOG==============
Timestamp                       : Tue Feb 21 07:20:20 2012
Driver Version                  : 280.13
Attached GPUs                   : 1
GPU 0000:01:00.0
    Product Name                : Quadro FX 2800M
    Display Mode                : N/A
    Persistence Mode            : Enabled
    Driver Model
        Current                 : N/A
        Pending                 : N/A
    Serial Number               : N/A
    GPU UUID                    : N/A
    Inforom Version
        OEM Object              : N/A
        ECC Object              : N/A
        Power Management Object : N/A
    PCI
        Bus                     : 1
        Device                  : 0
        Domain                  : 0
        Device Id               : 061D10DE
        Bus Id                  : 0000:01:00.0
    Fan Speed                   : N/A
    Memory Usage
        Total                   : 1023 Mb
        Used                    : 74 Mb
        Free                    : 949 Mb
    Compute Mode                : Default
    Utilization
        Gpu                     : N/A
        Memory                  : N/A
    Ecc Mode
        Current                 : N/A
        Pending                 : N/A
    ECC Errors
        Volatile
            Single Bit
                Device Memory   : N/A
                Register File   : N/A
                L1 Cache        : N/A
                L2 Cache        : N/A
                Total           : N/A
            Double Bit
                Device Memory   : N/A
                Register File   : N/A
                L1 Cache        : N/A
                L2 Cache        : N/A
                Total           : N/A
        Aggregate
            Single Bit
                Device Memory   : N/A
                Register File   : N/A
                L1 Cache        : N/A
                L2 Cache        : N/A
                Total           : N/A
            Double Bit
                Device Memory   : N/A
                Register File   : N/A
                L1 Cache        : N/A
                L2 Cache        : N/A
                Total           : N/A
    Temperature
        Gpu                     : 48 C
    Power Readings
        Power State             : N/A
        Power Management        : N/A
        Power Draw              : N/A
        Power Limit             : N/A
    Clocks
        Graphics                : N/A
        SM                      : N/A
        Memory                  : N/A

Update 7/July/2012: Nvidia v302.17 Driver

A few user notified me that if you use NVIDIA v302.17 driver this problem get sorted out with Linux kernel 3.xx.xx series. You need to remove (delete or disable) flash player support from all browsers. This will get rid of this problem. NVIDIA internal bug to track this issue is bug ID # 973068.

Recommended readings:

TwitterFacebookGoogle+PDF versionFound an error/typo on this page? Help us!

{ 14 comments… read them below or add one }

1 ttt April 20, 2012 at 8:28 pm

It seems that nouveau is the only sure way to fix this error. I’m using kernel 3.2.xx. If I go back to 2.6.18.xx kernel nvidia works perfectly. This is a mess created by Nvidia.

Reply

2 charliemurder July 11, 2012 at 8:00 am

For the past 3 weeks, my pc has been hard freezing. When it happens, my mouse is totally unresponsive and the numlock and capslock lights on my keyboard aren’t even responsive. I need to manually turn off the pc to regain control.

It’s happening on Windows 7, Ubuntu 10.04 and Xubuntu 12.04. The pc is less than a year old and I’ve checked all hard drives using Seatools, RAM using memtest and I even monitored my graphics card using ATItool. I’ve also disabled Flash player in my browsers.

Going to try your suggestion. Will revert with feedback.

Reply

3 John Alexis Guerra Gómez August 16, 2012 at 3:34 pm

I have experienced the same problem with an Nvidia Quadro K2000M on a Lenovo ThinkPad W530 using Ubuntu 12.04, the machine freezes to a black screen at the end of the boot process, and the only way of getting it out of there is for holding the power button.

Enabling the Persistance Mode helped, but the problem is that putting it on the rc.local seems to be too late to avoid the freezing. To fix it I created a script on /etc/init.d/nvidia_persistent with this code:

#!/bin/sh
### BEGIN INIT INFO
# Provides:          rc.local
# Required-Start:    $remote_fs $syslog $all
# Required-Stop:
# Default-Start:     1 2 3 4 5
# Default-Stop:
# Short-Description: Enables Persistence mode in the nvidia card
### END INIT INFO
echo "Setting Persistent mode"
/usr/bin/nvidia-smi -pm 1
echo `/usr/bin/nvidia-smi -q | grep -i Persistence`

and then added it to the boot process using:

sudo update-rc.d nvidia_persistent defaults 15

the 15 “guarantees” that the command is ran on time.

Reply

4 Stefan August 23, 2012 at 9:54 am

version 304.37 finally fixed this problem for me. (9800GT)

Reply

5 Tom August 23, 2012 at 3:02 pm

I’m using 304.37-1 and the issue is still present for me on Win7, Debian Sqeeze and Arch linux. (XFX 9800GTX+).

Reply

6 Norman Ramsey August 29, 2012 at 10:40 pm

I’m still seeing this problem with driver 302.17 and kernel 3.2.0

Reply

7 nixCraft August 29, 2012 at 11:03 pm

Have you tried the latest version 304.43?

Reply

8 Eman September 1, 2012 at 12:12 am

Whole computer freezes when playing OpenGL game + Wine + Nvidia since nvidia 302.17! Nvidia sucks dude.

Reply

9 DoubleD October 9, 2012 at 6:46 pm

I tried it with nvidia ver 304.51 and 3.2.0-31-generic, still fallen off bus.

Reply

10 Herbert October 10, 2012 at 7:14 pm

Still happening with driver Version 304.43 on OpenSUSE 12.2…

Reply

11 himi November 13, 2012 at 5:36 am

I’ve still been seeing this with 304.60 on Ubuntu 12.04 – it was only coming up when running Unity, with it going away when I switched to wmaker. I’m trying the persistence mode setting now.

himi

Reply

12 himi November 14, 2012 at 12:29 am

Also saw this with 304.64 (the most recent update I’ve seen) on 12.04 with Unity. Admittedly, this was associated with waking up from suspend, but I’d seen no issues with suspend/wake cycles on wmaker so it’s a bit odd. Possibly the persistence mode didn’t stick across the suspend – I’m going to look into that and see what options there are for running stuff during the wakeup process.

himi

Reply

13 Mark December 14, 2012 at 4:36 am

I just started seeing this error after upgrading my machine from an old E8400 type of CPU to a Dell Precision T7400, I just moved the HD and NVidia 8800GT over.

Ubuntu precise, NVidia drivers.

What is the root cause of this error? I am really starting to hate it!

Reply

14 himi January 21, 2013 at 11:20 pm

Still seeing this error with all the driver versions I’ve tried, including the 310.* versions pushed by the Steam beta. In additon, I’m seeing this error regularly with Tesla S2050 cards in a large(ish) GPU compute cluster – in that case it seems to be tied to some kinds of CUDA code, but if there’s any commonality it’d be great to find it.

himi

Reply

Leave a Comment

Tagged as: , , , , , , , , , , , , , , , ,

Previous Faq:

Next Faq: