Stopping lockups with nvidia drivers and AGP

I use the nvidia drivers on my linux desktop in order to get proper TV output, and in my experience they are unstable pieces of crap. For the longest time my machine would lockup, seemingly at random but more frequently when watching stuff with freevo/mplayer. I finally tracked the problem down to the nvidia drivers AGP support on VIA chipsets being buggy. NVidia blame the chipset of course and seem to be unable or unwilling to fix the problem.

Switching off AGP support in the X server config by setting “NvAGP” to “0” seemed to fix the problem when I was running a 2.4 kernel. Not useful if you are using the driver for the 3d acceleration, but all I want is tvout so it doesn’t affect me.

Section “Device”
Identifier “Geforce4MX-4000”
Driver “nvidia”
Option “IgnoreEDID” “1”
Option “UseEdidFreqs” “0”
Option “HWCursor” “0”
Option “ConnectedMonitor” “TV”
Option “NvAGP” “0”
EndSection

I recently switched to using a 2.6 kernel though and a few days after the switch the random lockups returned. A quick check, and sure enough, the via_agp and agpgart modules were loaded indicating the system was using AGP. No matter what I did I couldn’t seem to get rid of them.

So I decided drastic measures were needed and dived into the nvidia module source. I use the debian packages and build using module-assistant rather than the binary direct from NVidia, so these instructions are specific to that although I imagine you could adapt them.

I found the variable that controls the default setting for NvAGP (NVreg_NvAGP in os-registry.c), which is initially 3 (use whatever you can find) and reset it to 0 (don’t use AGP). Went to recompile the module and the compile fails, which struck me as very strange, I’ve only changed one number for fscks sake. Turns out that even with the value reset back to what it was the code won’t compile. Compiling it via module-assistant works fine though, don’t know what weird shit that is setting in the environment but it must be doing something.

Make the change again and compile a new module using module-assistant, install and reboot. Normally rebooting isn’t needed for new modules, but in this case I need to get rid of the agp modules and they won’t let me remove them once they have loaded. Everything comes back up fine, but the agp modules are still loaded. Damn :-( I go back to the source to see what else I can change and find the NvAGP setting is back to 3!

Turns out module-assistant automatically replaces the source with a new copy from the tarball whenever you compile, so I created a new tarball of the source with the NvAGP set at 0, put it in place of the downloaded one and let module-assistant do it’s magic.

Result! An nvidia module that doesn’t load any AGP stuff, and thus a rock solid stable system once more :-).