NVIDIA Driver Troubleshooting Guide

https://lemmy.world/post/2155783

NVIDIA Driver Troubleshooting Guide - Lemmy.world

Driver status To check if you have a functioning driver, run nvidia-smi in a terminal. If the driver is functioning, it will actively report the GPU(s) it found on the system, and the version of the driver loaded. $ nvidia-smi Tue Jul 25 22:14:24 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.86.05 Driver Version: 535.86.05 CUDA Version: 12.2 | How to reinstall If this is not working, purge and reinstall the drivers on the system. sudo apt purge ~nnvidia sudo apt install nvidia-driver-535 Freezes and suspend/resume issues These are most typically related to power management. You can try out different driver options by creating a file in /etc/modprobe.d/, such as a hypothetical /etc/modprobe.d/zz-nvidia.conf. > Some of these are automatically generated by system76-power when switching between graphics modes. So if you are manually setting these, be wary that these can conflict with different modes, or the system76-power.conf will override your settings if your file’s name comes alphabetically before it. All systems should have at least this defined: options nvidia-drm modeset=1 — For hybrid graphics, it may be necessary to define blacklist i2c_nvidia_gpu alias i2c_nvidia_gpu off options nvidia NVreg_DynamicPowerManagement=0x02 — If the hardware has issues with GC6, change DynamicPowerManagement to options nvidia NVreg_DynamicPowerManagement=0x01 — Some issues have issues with resuming from S3 sleep, which needs options nvidia NVreg_PreserveVideoMemoryAllocations=1 — Worst case scenario, you can try disabling these sudo systemctl disable --now nvidia-hibernate.service nvidia-resume.service nvidia-suspend.service I found a workaround Do share what solutions for your hardware; the graphics card model you have; and if it is a laptop, the DMI IDs that could be used to ID a system to automatically apply a known workaround. You can run this script in a terminal to print the DMI info: for dmi_file in /sys/devices/virtual/dmi/id/*_{name,version}; do echo $dmi_file; echo -n ' '; cat $file done

My laptop (HP Omen Intel i7 + Nvidia 2060) lost the discrete graphics after the PopOS updater installed Nvidia driver 535.

I was able to fix it by re-installing driver 470. However, I’m still having problems with the discrete graphics going dark after screen blank or suspend.

The above info is interesting, and I’d love to use it to fix my issues, but even as something of a UNIX/Linux power user, I have trouble parsing the jargon.

What does this do?

sudo apt purge ~nnvidia

I don’t understand the use of the tilde and double-n notation. I know that tilde is used as a home directory shortcut, but that’s not how it’s used here? I haven’t been able to Google anything on it either, none of the other apt purge examples I found are using this notation.

I definitely have issues with the graphics failing to wake up after suspend. What do the hybrid graphics commands do? With respect to GC6 and Suspend S3, how would I know whether I need to do anything about those? I understand that they are some kind of power saving modes, but how would I know whether they are causing problems?

I’d love to be able to use the latest drivers & for suspend to work right, but I have to admit I’m out of my depth.

Obligatory hardware details:

root@pop-os:/home/rickr# for dmi_file in /sys/devices/virtual/dmi/id/*_{name,version}; do echo $dmi_file; echo -n ’ '; cat $dmi_file done /sys/devices/virtual/dmi/id/board_name 878A /sys/devices/virtual/dmi/id/product_name OMEN Laptop 15-ek0xxx /sys/devices/virtual/dmi/id/bios_version F.14 /sys/devices/virtual/dmi/id/board_version 17.29 /sys/devices/virtual/dmi/id/chassis_version Chassis Version /sys/devices/virtual/dmi/id/product_version