Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230

 

 

 

[Solved] nVidia driver nightmare continued...

Need help with peripherals or devices?
Post Reply
Message
Author
bitrat
Posts: 85
Joined: 2023-07-20 09:41
Has thanked: 3 times

[Solved] nVidia driver nightmare continued...

#1 Post by bitrat »

________________________________________ FINAL COMMENTS:


For anyone looking to solve nVidia issues, the takeaway in this thread is that newer nVidia hardware keeps critical firmware on the card (out of the public domain) hence the open source nVidia drivers don't work on older cards, like my Quadro K2200. You need the proprietary ones.

If you're seeing this in dmesg:

Code: Select all

[   26.055648] NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:13ba)
               NVRM: installed in this system is not supported by open
               NVRM: nvidia.ko because it does not include the required GPU
               NVRM: System Processor (GSP).
               NVRM: Please see the 'Open Linux Kernel Modules' and 'GSP
               NVRM: Firmware' sections in the driver README, available on
               NVRM: the Linux graphics driver download page at
               NVRM: www.nvidia.com.
Do this:

Code: Select all

$ sudo apt autoremove *nvidia* --purge
$ sudo apt install nvidia-driver nvidia-smi nvidia-settings
$ sudo reboot
For Quadro K2200 at least, I strongly recommend using the nVidia drivers. The output is much nicer and I suspect the nouveau driver was damaging my card.

I've been experiencing two separate (but possibly causally related) intermittent issues with X. Both can be corrected by restarting X, without rebooting.

Code: Select all

sudo service lightdm restart
The primary issue has been the driver freezing, corrupting and locking the display. This seems to have been fixed by switching from the nouveau to the nvidia driver.

The secondary issue, which I'm still seeing with the nvidia driver, is the driver going into some kind of wait loop. The display remains active, but is blank, with only a laggy mouse pointer.

I think this is related to waking after sleep, so I need to explore that angle. I'm fairly sure I saw this issue before the driver change, but I think it was probably being masked by the primary issue. I don't have any log of it with nouveau installed.

From /var/log/Xorg.1.log

Code: Select all

[ 23492.639] (--) NVIDIA(GPU-0): 
[ 23493.118] (--) NVIDIA(GPU-0): AOC 2450W (DFP-1): connected
[ 23493.118] (--) NVIDIA(GPU-0): AOC 2450W (DFP-1): Internal TMDS
[ 23493.118] (--) NVIDIA(GPU-0): AOC 2450W (DFP-1): 165.0 MHz maximum pixel clock
[ 23493.118] (--) NVIDIA(GPU-0): 
.
.
. 
Repeats 14865 times...

It's possible it's a hardware problem with my card, however I ran my system for years on Ubuntu, with no graphics issues. They only appeared after I switched to Debian and xfce.

I'm assuming (but haven't checked) that the Ubuntu system was using nVidia drivers. I like Debian/xfce and don't intend to go back to Ubuntu, but it would be good if somebody could do an audit of differences in the low level graphics software between the two systems.






________________________________________ ORIGINAL POST:


Hi,

apologies for the new thread but the existing ones are too cluttered with overlapping issues..

Nouveau was periodically crashing on my machine and maybe killing my GPU, or maybe it's actually a developing hardware issue. The desktop could be restored without rebooting using sudo service lightdm restart.

Anyway, I used these instructions to install the nvidia driver...

But it won't load...

Code: Select all

sudo dmesg
.
.
[   26.054765] nvidia-nvlink: Nvlink Core is being initialized, major device number 242

[   26.055587] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=none:owns=io+mem
[   26.055648] NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:13ba)
               NVRM: installed in this system is not supported by open
               NVRM: nvidia.ko because it does not include the required GPU
               NVRM: System Processor (GSP).
               NVRM: Please see the 'Open Linux Kernel Modules' and 'GSP
               NVRM: Firmware' sections in the driver README, available on
               NVRM: the Linux graphics driver download page at
               NVRM: www.nvidia.com.
[   26.055672] nvidia: probe of 0000:01:00.0 failed with error -1
[   26.055684] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   26.055685] NVRM: None of the NVIDIA devices were initialized.
[   26.055891] nvidia-nvlink: Unregistered Nvlink Core, major device number 242
.
.
Can anyone pinpoint the problem and suggest a fix?

I'd be grateful for a link to the relevant Debian wiki instructions for:
  1. nVidia driver installation.
  2. changing kernel version, including manual editing grub (I have other boot drives).
Failing a resolution I'm going to pull my GPU card and use the on board graphics, hoping my install has drivers for that.

Also, can I revert to nouveau easily (without uninstalling nvidia) while I explore the options?

My system currently works, but with low res graphics.

Ps: I wasn't familiar with timeshift, but I think I'll start using it! :D Can I use it to copy an installed system onto another drive (ie, able to run as OS from grub)?

____________________________
____________________________
____________________________

Code: Select all

$ inxi -CGSxxz
System:
  Kernel: 6.1.0-20-amd64 arch: x86_64 bits: 64 compiler: gcc v: 12.2.0
    Desktop: Xfce v: 4.18.1 tk: Gtk v: 3.24.36 wm: xfwm dm: LightDM
    Distro: Debian GNU/Linux 12 (bookworm)
CPU:
  Info: quad core model: Intel Core i7-4790 bits: 64 type: MT MCP
    arch: Haswell rev: 3 cache: L1: 256 KiB L2: 1024 KiB L3: 8 MiB
  Speed (MHz): avg: 1062 high: 2900 min/max: 800/4000 cores: 1: 800 2: 800
    3: 800 4: 800 5: 800 6: 800 7: 2900 8: 801 bogomips: 57600
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Graphics:
  Device-1: NVIDIA GM107GL [Quadro K2200] driver: N/A arch: Maxwell pcie:
    speed: 5 GT/s lanes: 16 bus-ID: 01:00.0 chip-ID: 10de:13ba
  Device-2: Logic3 / SpectraVideo plc LG Optical Mouse 3D-310 type: USB
    driver: hid-generic,usbhid bus-ID: 2-3.1.3:9 chip-ID: 1267:0210
  Display: x11 server: X.Org v: 1.21.1.7 compositor: xfwm v: 4.18.0 driver:
    X: loaded: nouveau,vesa unloaded: fbdev,modesetting alternate: nv
    dri: swrast gpu: N/A display-ID: :0.0 screens: 1
  Screen-1: 0 s-res: 1024x768 s-dpi: 96
  Monitor-1: default res: 1024x768 size: N/A
  API: OpenGL v: 4.5 Mesa 22.3.6 renderer: llvmpipe (LLVM 15.0.6 256 bits)
    direct-render: Yes

Code: Select all

$ nvidia-detect
Detected NVIDIA GPUs:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM107GL [Quadro K2200] [10de:13ba] (rev a2)

Checking card:  NVIDIA Corporation GM107GL [Quadro K2200] (rev a2)
Your card is supported by all driver versions.
Your card is also supported by the Tesla drivers series.
Your card is also supported by the Tesla 470 drivers series.
It is recommended to install the
    nvidia-driver
package.

Code: Select all

$ dpkg -l "*nvidia"
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                      Version      Architecture Description
+++-=========================-============-============-===========================================>
un  bumblebee-nvidia          <none>       <none>       (no description available)
ii  glx-alternative-nvidia    1.2.2        amd64        allows the selection of NVIDIA as GLX provi>
un  libegl1-glvnd-nvidia      <none>       <none>       (no description available)
un  libegl1-nvidia            <none>       <none>       (no description available)
un  libgldispatch0-nvidia     <none>       <none>       (no description available)
un  libgles1-glvnd-nvidia     <none>       <none>       (no description available)
un  libgles2-glvnd-nvidia     <none>       <none>       (no description available)
un  libglvnd0-nvidia          <none>       <none>       (no description available)
un  libglx0-glvnd-nvidia      <none>       <none>       (no description available)
un  libopengl0-glvnd-nvidia   <none>       <none>       (no description available)
ii  xserver-xorg-video-nvidia 550.54.15-1  amd64        NVIDIA binary Xorg driver

Code: Select all

$ apt list --installed | grep nvidia

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

firmware-nvidia-gsp/unknown,now 550.54.15-1 amd64 [installed,automatic]
firmware-nvidia-tesla-gsp/stable-updates,now 525.147.05-7~deb12u1 amd64 [installed]
glx-alternative-nvidia/stable,now 1.2.2 amd64 [installed,automatic]
libegl-nvidia0/unknown,now 550.54.15-1 amd64 [installed,automatic]
libgl1-nvidia-glvnd-glx/unknown,now 550.54.15-1 amd64 [installed,automatic]
libgles-nvidia1/unknown,now 550.54.15-1 amd64 [installed,automatic]
libgles-nvidia2/unknown,now 550.54.15-1 amd64 [installed,automatic]
libglx-nvidia0/unknown,now 550.54.15-1 amd64 [installed,automatic]
libnvidia-allocator1/unknown,now 550.54.15-1 amd64 [installed,automatic]
libnvidia-cfg1/unknown,now 550.54.15-1 amd64 [installed,automatic]
libnvidia-egl-gbm1/stable,now 1.1.0-2 amd64 [installed,automatic]
libnvidia-eglcore/unknown,now 550.54.15-1 amd64 [installed,automatic]
libnvidia-encode1/unknown,now 550.54.15-1 amd64 [installed,automatic]
libnvidia-glcore/unknown,now 550.54.15-1 amd64 [installed,automatic]
libnvidia-glvkspirv/unknown,now 550.54.15-1 amd64 [installed,automatic]
libnvidia-gpucomp1/unknown,now 550.54.15-1 amd64 [installed,automatic]
libnvidia-ml1/unknown,now 550.54.15-1 amd64 [installed,automatic]
libnvidia-ptxjitcompiler1/unknown,now 550.54.15-1 amd64 [installed,automatic]
libnvidia-rtcore/unknown,now 550.54.15-1 amd64 [installed,automatic]
nvidia-alternative/unknown,now 550.54.15-1 amd64 [installed,automatic]
nvidia-detect/unknown,now 550.54.15-1 amd64 [installed]
nvidia-driver-bin/unknown,now 550.54.15-1 amd64 [installed,automatic]
nvidia-driver-libs/unknown,now 550.54.15-1 amd64 [installed,automatic]
nvidia-driver/unknown,now 550.54.15-1 amd64 [installed]
nvidia-egl-common/unknown,now 550.54.15-1 amd64 [installed,automatic]
nvidia-egl-icd/unknown,now 550.54.15-1 amd64 [installed,automatic]
nvidia-installer-cleanup/stable,now 20220217+3~deb12u1 amd64 [installed,automatic]
nvidia-kernel-common/stable,now 20220217+3~deb12u1 amd64 [installed,automatic]
nvidia-kernel-open-dkms/unknown,now 550.54.15-1 amd64 [installed]
nvidia-kernel-support/unknown,now 550.54.15-1 amd64 [installed,automatic]
nvidia-legacy-check/unknown,now 550.54.15-1 amd64 [installed,automatic]
nvidia-modprobe/unknown,now 550.54.15-1 amd64 [installed,automatic]
nvidia-persistenced/unknown,now 550.54.15-1 amd64 [installed,automatic]
nvidia-settings/unknown,now 550.54.15-1 amd64 [installed]
nvidia-smi/unknown,now 550.54.15-1 amd64 [installed]
nvidia-support/stable,now 20220217+3~deb12u1 amd64 [installed,automatic]
nvidia-vdpau-driver/unknown,now 550.54.15-1 amd64 [installed,automatic]
nvidia-vulkan-common/unknown,now 550.54.15-1 amd64 [installed,automatic]
nvidia-vulkan-icd/unknown,now 550.54.15-1 amd64 [installed,automatic]
xserver-xorg-video-nvidia/unknown,now 550.54.15-1 amd64 [installed,automatic]

Code: Select all

$ apt list --installed | grep firmware

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

atmel-firmware/stable,now 1.3-7 all [installed]
bluez-firmware/stable,now 1.2-9 all [installed]
dahdi-firmware-nonfree/stable,now 2.11.1.0.20170917-2 all [installed]
firmware-amd-graphics/stable,now 20230210-5 all [installed]
firmware-ast/stable,now 20140808-7 all [installed]
firmware-ath9k-htc/stable,now 1.4.0-108-gd856466+dfsg1-1.3+deb12u1 all [installed]
firmware-atheros/stable,now 20230210-5 all [installed]
firmware-bnx2/stable,now 20230210-5 all [installed]
firmware-bnx2x/stable,now 20230210-5 all [installed]
firmware-brcm80211/stable,now 20230210-5 all [installed]
firmware-cavium/stable,now 20230210-5 all [installed]
firmware-intel-sound/stable,now 20230210-5 all [installed]
firmware-ipw2x00/stable,now 20230210-5 all [installed]
firmware-ivtv/stable,now 20230210-5 all [installed]
firmware-iwlwifi/stable,now 20230210-5 all [installed]
firmware-libertas/stable,now 20230210-5 all [installed]
firmware-linux-free/stable,now 20200122-1 all [installed]
firmware-linux-nonfree/stable,now 20230210-5 all [installed,automatic]
firmware-misc-nonfree/stable,now 20230210-5 all [installed]
firmware-myricom/stable,now 20230210-5 all [installed]
firmware-netronome/stable,now 20230210-5 all [installed]
firmware-netxen/stable,now 20230210-5 all [installed]
firmware-nvidia-gsp/unknown,now 550.54.15-1 amd64 [installed,automatic]
firmware-nvidia-tesla-gsp/stable-updates,now 525.147.05-7~deb12u1 amd64 [installed]
firmware-qcom-soc/stable,now 20230210-5 all [installed]
firmware-qlogic/stable,now 20230210-5 all [installed]
firmware-realtek-rtl8723cs-bt/stable,now 20181104-2 all [installed]
firmware-realtek/stable,now 20230210-5 all [installed]
firmware-samsung/stable,now 20230210-5 all [installed]
firmware-siano/stable,now 20230210-5 all [installed]
firmware-sof-signed/stable,now 2.2.4-1 all [installed]
firmware-ti-connectivity/stable,now 20230210-5 all [installed]
firmware-zd1211/stable,now 1:1.5-10 all [installed]
hdmi2usb-fx2-firmware/stable,now 0.0.0~git20151225-3 all [installed]

Code: Select all

$ lspci -knn
00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor DRAM Controller [8086:0c00] (rev 06)
	Subsystem: Gigabyte Technology Co., Ltd 4th Gen Core Processor DRAM Controller [1458:5000]
	Kernel driver in use: hsw_uncore
00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller [8086:0c01] (rev 06)
	Subsystem: Gigabyte Technology Co., Ltd Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller [1458:5000]
	Kernel driver in use: pcieport
00:14.0 USB controller [0c03]: Intel Corporation 9 Series Chipset Family USB xHCI Controller [8086:8cb1]
	Subsystem: Gigabyte Technology Co., Ltd 9 Series Chipset Family USB xHCI Controller [1458:5007]
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci
00:16.0 Communication controller [0780]: Intel Corporation 9 Series Chipset Family ME Interface #1 [8086:8cba]
	Subsystem: Gigabyte Technology Co., Ltd 9 Series Chipset Family ME Interface [1458:1c3a]
	Kernel driver in use: mei_me
	Kernel modules: mei_me
00:1a.0 USB controller [0c03]: Intel Corporation 9 Series Chipset Family USB EHCI Controller #2 [8086:8cad]
	Subsystem: Gigabyte Technology Co., Ltd 9 Series Chipset Family USB EHCI Controller [1458:5006]
	Kernel driver in use: ehci-pci
	Kernel modules: ehci_pci
00:1b.0 Audio device [0403]: Intel Corporation 9 Series Chipset Family HD Audio Controller [8086:8ca0]
	Subsystem: Gigabyte Technology Co., Ltd 9 Series Chipset Family HD Audio Controller [1458:a182]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel
00:1c.0 PCI bridge [0604]: Intel Corporation 9 Series Chipset Family PCI Express Root Port 1 [8086:8c90] (rev d0)
	Subsystem: Gigabyte Technology Co., Ltd 9 Series Chipset Family PCI Express Root Port 1 [1458:5001]
	Kernel driver in use: pcieport
00:1c.2 PCI bridge [0604]: Intel Corporation 9 Series Chipset Family PCI Express Root Port 3 [8086:8c94] (rev d0)
	Subsystem: Gigabyte Technology Co., Ltd 9 Series Chipset Family PCI Express Root Port 3 [1458:5001]
	Kernel driver in use: pcieport
00:1c.3 PCI bridge [0604]: Intel Corporation 9 Series Chipset Family PCI Express Root Port 4 [8086:8c96] (rev d0)
	Subsystem: Gigabyte Technology Co., Ltd 9 Series Chipset Family PCI Express Root Port 4 [1458:5001]
	Kernel driver in use: pcieport
00:1d.0 USB controller [0c03]: Intel Corporation 9 Series Chipset Family USB EHCI Controller #1 [8086:8ca6]
	Subsystem: Gigabyte Technology Co., Ltd 9 Series Chipset Family USB EHCI Controller [1458:5006]
	Kernel driver in use: ehci-pci
	Kernel modules: ehci_pci
00:1f.0 ISA bridge [0601]: Intel Corporation Z97 Chipset LPC Controller [8086:8cc4]
	Subsystem: Gigabyte Technology Co., Ltd Z97 Chipset LPC Controller [1458:5001]
	Kernel driver in use: lpc_ich
	Kernel modules: lpc_ich
00:1f.2 SATA controller [0106]: Intel Corporation 9 Series Chipset Family SATA Controller [AHCI Mode] [8086:8c82]
	Subsystem: Gigabyte Technology Co., Ltd 9 Series Chipset Family SATA Controller [AHCI Mode] [1458:b005]
	Kernel driver in use: ahci
	Kernel modules: ahci
00:1f.3 SMBus [0c05]: Intel Corporation 9 Series Chipset Family SMBus Controller [8086:8ca2]
	Subsystem: Gigabyte Technology Co., Ltd 9 Series Chipset Family SMBus Controller [1458:5001]
	Kernel driver in use: i801_smbus
	Kernel modules: i2c_i801
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM107GL [Quadro K2200] [10de:13ba] (rev a2)
	Subsystem: NVIDIA Corporation GM107GL [Quadro K2200] [10de:1097]
	Kernel modules: nvidia
01:00.1 Audio device [0403]: NVIDIA Corporation GM107 High Definition Audio Controller [GeForce 940MX] [10de:0fbc] (rev a1)
	Subsystem: NVIDIA Corporation GM107 High Definition Audio Controller [GeForce 940MX] [10de:1097]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel
03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 06)
	Subsystem: Gigabyte Technology Co., Ltd Onboard Ethernet [1458:e000]
	Kernel driver in use: r8169
	Kernel modules: r8169
04:00.0 PCI bridge [0604]: Intel Corporation 82801 PCI Bridge [8086:244e] (rev 41)
	Subsystem: Gigabyte Technology Co., Ltd 82801 PCI Bridge [1458:8892]
05:00.0 Ethernet controller [0200]: Qualcomm Atheros AR2417 Wireless Network Adapter [AR5007G 802.11bg] [168c:001d] (rev 01)
	Subsystem: Qualcomm Atheros AR2417 Wireless Network Adapter [AR5007G 802.11bg] [168c:2055]
	Kernel driver in use: ath5k
	Kernel modules: ath5k

Code: Select all

$ lsmod | sort
acpi_pad              184320  0
aesni_intel           393216  0
ahci                   49152  4
async_memcpy           20480  2 raid456,async_raid6_recov
async_pq               20480  2 raid456,async_raid6_recov
async_raid6_recov      24576  1 raid456
async_tx               20480  5 async_pq,async_memcpy,async_xor,raid456,async_raid6_recov
async_xor              20480  3 async_pq,raid456,async_raid6_recov
at24                   28672  0
ath                    36864  1 ath5k
ath5k                 167936  0
autofs4                53248  2
binfmt_misc            24576  1
blake2b_generic        20480  0
btrfs                1789952  0
button                 24576  0
cdrom                  81920  1 sr_mod
cfg80211             1142784  3 ath,ath5k,mac80211
chacha_x86_64          28672  1 libchacha20poly1305
configfs               57344  1
coretemp               20480  0
crc16                  16384  1 ext4
crc32c_generic         16384  0
crc32c_intel           24576  7
crc32_pclmul           16384  0
crc64                  20480  1 crc64_rocksoft
crc64_rocksoft         20480  1 t10_pi
crc_t10dif             20480  1 t10_pi
crct10dif_common       16384  3 crct10dif_generic,crc_t10dif,crct10dif_pclmul
crct10dif_generic      16384  0
crct10dif_pclmul       16384  1
cryptd                 28672  2 crypto_simd,ghash_clmulni_intel
crypto_simd            16384  1 aesni_intel
curve25519_x86_64      36864  1 wireguard
dm_mod                184320  0
drm                   614400  1 drm_kms_helper
drm_kms_helper        208896  0
efi_pstore             16384  0
ehci_hcd              102400  1 ehci_pci
ehci_pci               20480  0
enclosure              20480  1 ses
evdev                  28672  9
ext4                  983040  3
fan                    20480  0
fuse                  176128  5
ghash_clmulni_intel    16384  0
hid                   159744  2 usbhid,hid_generic
hid_generic            16384  0
i2c_i801               36864  0
i2c_smbus              20480  1 i2c_i801
intel_cstate           20480  0
intel_pmc_bxt          16384  1 iTCO_wdt
intel_powerclamp       20480  0
intel_rapl_common      32768  1 intel_rapl_msr
intel_rapl_msr         20480  0
intel_uncore          217088  0
ip6_udp_tunnel         16384  1 wireguard
ip_tables              36864  0
irqbypass              16384  1 kvm
iTCO_vendor_support    16384  1 iTCO_wdt
iTCO_wdt               16384  0
jbd2                  167936  1 ext4
kvm                  1146880  1 kvm_intel
kvm_intel             380928  0
ledtrig_audio          16384  1 snd_hda_codec_generic
libahci                49152  1 ahci
libarc4                16384  1 mac80211
libata                401408  2 libahci,ahci
libchacha              16384  1 chacha_x86_64
libchacha20poly1305    16384  1 wireguard
libcrc32c              16384  4 nf_conntrack,btrfs,nf_tables,raid456
libcurve25519_generic    49152  2 curve25519_x86_64,wireguard
libphy                180224  3 r8169,mdio_devres,realtek
linear                 20480  0
loop                   32768  0
lp                     20480  0
lpc_ich                28672  0
mac80211             1175552  1 ath5k
mbcache                16384  1 ext4
mdio_devres            16384  1 r8169
md_mod                192512  6 raid1,raid10,raid0,linear,raid456,multipath
mei                   159744  3 mei_hdcp,mei_me
mei_hdcp               24576  0
mei_me                 53248  1
Module                  Size  Used by
multipath              20480  0
nf_conntrack          188416  1 xt_connmark
nf_defrag_ipv4         16384  1 nf_conntrack
nf_defrag_ipv6         24576  1 nf_conntrack
nfnetlink              20480  2 nft_compat,nf_tables
nf_tables             303104  661 nft_compat
nft_compat             20480  116
overlay               163840  0
parport                73728  3 parport_pc,lp,ppdev
parport_pc             40960  1
pcspkr                 16384  0
poly1305_x86_64        28672  1 libchacha20poly1305
ppdev                  24576  0
qrtr                   49152  4
r8169                  94208  0
raid0                  24576  0
raid10                 65536  0
raid1                  53248  0
raid456               180224  0
raid6_pq              122880  4 async_pq,btrfs,raid456,async_raid6_recov
rapl                   20480  0
realtek                36864  1
rfkill                 36864  2 cfg80211
scsi_common            16384  6 scsi_mod,usb_storage,uas,libata,sg,sr_mod
scsi_mod              286720  8 ses,scsi_transport_sas,sd_mod,usb_storage,uas,libata,sg,sr_mod
scsi_transport_sas     49152  1 ses
sd_mod                 65536  5
ses                    20480  0
sg                     40960  0
sha1_ssse3             32768  0
sha256_ssse3           32768  0
sha512_generic         16384  1 sha512_ssse3
sha512_ssse3           49152  0
snd                   126976  19 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hwdep,snd_hda_intel,snd_hda_codec,snd_hda_codec_realtek,snd_timer,snd_compress,snd_soc_core,snd_pcm
snd_compress           28672  1 snd_soc_core
snd_hda_codec         184320  4 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec_realtek
snd_hda_codec_generic    98304  1 snd_hda_codec_realtek
snd_hda_codec_hdmi     81920  1
snd_hda_codec_realtek   172032  1
snd_hda_core          122880  5 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec,snd_hda_codec_realtek
snd_hda_intel          57344  5
snd_hwdep              16384  1 snd_hda_codec
snd_intel_dspcfg       36864  1 snd_hda_intel
snd_intel_sdw_acpi     20480  1 snd_intel_dspcfg
snd_pcm               159744  8 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec,snd_soc_rt5640,snd_compress,snd_soc_core,snd_hda_core
snd_soc_core          352256  1 snd_soc_rt5640
snd_soc_rl6231         20480  1 snd_soc_rt5640
snd_soc_rt5640        147456  0
snd_timer              49152  1 snd_pcm
soundcore              16384  1 snd
sr_mod                 28672  0
t10_pi                 16384  1 sd_mod
uas                    32768  0
udp_tunnel             24576  1 wireguard
usb_common             16384  3 xhci_hcd,usbcore,ehci_hcd
usbcore               348160  7 xhci_hcd,ehci_pci,usbhid,usb_storage,ehci_hcd,xhci_pci,uas
usbhid                 65536  0
usb_storage            81920  1 uas
vboxdrv               602112  2 vboxnetadp,vboxnetflt
vboxnetadp             28672  0
vboxnetflt             32768  0
video                  65536  0
watchdog               45056  1 iTCO_wdt
wireguard              98304  0
wmi                    36864  1 video
x86_pkg_temp_thermal    20480  0
xhci_hcd              315392  1 xhci_pci
xhci_pci               24576  0
xor                    24576  2 async_xor,btrfs
x_tables               61440  6 nft_compat,xt_tcpudp,xt_comment,xt_connmark,ip_tables,xt_mark
xt_comment             16384  96
xt_connmark            16384  12
xt_mark                16384  8
xt_tcpudp              20480  0
zstd_compress         294912  1 btrfs

Code: Select all

$ apt list "linux-headers*"
Listing... Done
linux-headers-6.1.0-11-amd64/stable-security 6.1.38-4 amd64
linux-headers-6.1.0-11-cloud-amd64/stable-security 6.1.38-4 amd64
linux-headers-6.1.0-11-common-rt/stable-security 6.1.38-4 all
linux-headers-6.1.0-11-common/stable-security 6.1.38-4 all
linux-headers-6.1.0-11-rt-amd64/stable-security 6.1.38-4 amd64
linux-headers-6.1.0-12-amd64/stable-security,now 6.1.52-1 amd64 [installed,automatic]
linux-headers-6.1.0-12-cloud-amd64/stable-security 6.1.52-1 amd64
linux-headers-6.1.0-12-common-rt/stable-security 6.1.52-1 all
linux-headers-6.1.0-12-common/stable-security,now 6.1.52-1 all [installed,automatic]
linux-headers-6.1.0-12-rt-amd64/stable-security 6.1.52-1 amd64
linux-headers-6.1.0-15-amd64/stable 6.1.66-1 amd64
linux-headers-6.1.0-15-cloud-amd64/stable 6.1.66-1 amd64
linux-headers-6.1.0-15-common-rt/stable 6.1.66-1 all
linux-headers-6.1.0-15-common/stable 6.1.66-1 all
linux-headers-6.1.0-15-rt-amd64/stable 6.1.66-1 amd64
linux-headers-6.1.0-16-amd64/stable-updates 6.1.67-1 amd64
linux-headers-6.1.0-16-cloud-amd64/stable-updates 6.1.67-1 amd64
linux-headers-6.1.0-16-common-rt/stable-updates 6.1.67-1 all
linux-headers-6.1.0-16-common/stable-updates 6.1.67-1 all
linux-headers-6.1.0-16-rt-amd64/stable-updates 6.1.67-1 amd64
linux-headers-6.1.0-17-amd64/stable-security 6.1.69-1 amd64
linux-headers-6.1.0-17-cloud-amd64/stable-security 6.1.69-1 amd64
linux-headers-6.1.0-17-common-rt/stable-security 6.1.69-1 all
linux-headers-6.1.0-17-common/stable-security 6.1.69-1 all
linux-headers-6.1.0-17-rt-amd64/stable-security 6.1.69-1 amd64
linux-headers-6.1.0-18-amd64/stable 6.1.76-1 amd64
linux-headers-6.1.0-18-cloud-amd64/stable 6.1.76-1 amd64
linux-headers-6.1.0-18-common-rt/stable 6.1.76-1 all
linux-headers-6.1.0-18-common/stable 6.1.76-1 all
linux-headers-6.1.0-18-rt-amd64/stable 6.1.76-1 amd64
linux-headers-6.1.0-20-amd64/stable-security,now 6.1.85-1 amd64 [installed,automatic]
linux-headers-6.1.0-20-cloud-amd64/stable-security 6.1.85-1 amd64
linux-headers-6.1.0-20-common-rt/stable-security 6.1.85-1 all
linux-headers-6.1.0-20-common/stable-security,now 6.1.85-1 all [installed,automatic]
linux-headers-6.1.0-20-rt-amd64/stable-security 6.1.85-1 amd64
linux-headers-amd64/stable-security,now 6.1.85-1 amd64 [installed]
linux-headers-cloud-amd64/stable-security 6.1.85-1 amd64
linux-headers-rt-amd64/stable-security 6.1.85-1 amd64

Code: Select all

$ sudo grep non-free /etc/apt/sources.list
# deb cdrom:[Debian GNU/Linux 12 _Bookworm_ - Official Snapshot amd64 LIVE/INSTALL Binary 20230610-08:51]/ bookworm main non-free-firmware
deb http://deb.debian.org/debian/ bookworm contrib main non-free-firmware
deb-src http://deb.debian.org/debian/ bookworm contrib main non-free-firmware
deb http://security.debian.org/debian-security bookworm-security contrib main non-free-firmware
deb-src http://security.debian.org/debian-security bookworm-security contrib main non-free-firmware
deb http://deb.debian.org/debian/ bookworm-updates contrib main non-free-firmware
deb-src http://deb.debian.org/debian/ bookworm-updates contrib main non-free-firmware

Code: Select all

$ dpkg -l linux-image*
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                Version      Architecture Description
+++-===================================-============-============-=================================>
un  linux-image                         <none>       <none>       (no description available)
rc  linux-image-6.1.0-11-amd64          6.1.38-4     amd64        Linux 6.1 for 64-bit PCs (signed)
un  linux-image-6.1.0-11-amd64-unsigned <none>       <none>       (no description available)
ii  linux-image-6.1.0-12-amd64          6.1.52-1     amd64        Linux 6.1 for 64-bit PCs (signed)
un  linux-image-6.1.0-12-amd64-unsigned <none>       <none>       (no description available)
ii  linux-image-6.1.0-20-amd64          6.1.85-1     amd64        Linux 6.1 for 64-bit PCs (signed)
un  linux-image-6.1.0-20-amd64-unsigned <none>       <none>       (no description available)
rc  linux-image-6.1.0-9-amd64           6.1.27-1     amd64        Linux 6.1 for 64-bit PCs (signed)
un  linux-image-6.1.0-9-amd64-unsigned  <none>       <none>       (no description available)
ii  linux-image-amd64                   6.1.85-1     amd64        Linux for 64-bit PCs (meta-packag>
un  linux-image-generic                 <none>       <none>       (no description available)
Last edited by bitrat on 2024-04-16 22:50, edited 7 times in total.

User avatar
@ttila
Posts: 157
Joined: 2017-12-13 16:57
Has thanked: 2 times
Been thanked: 15 times

Re: nVidia driver nightmare continued...

#2 Post by @ttila »

5648] NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:13ba)
NVRM: installed in this system is not supported by open
NVRM: nvidia.ko because it does not include the required GPU
NVRM: System Processor (GSP).
NVRM: Please see the 'Open Linux Kernel Modules' and 'GSP
NVRM: Firmware' sections in the driver README, available on
NVRM: the Linux graphics driver download page at
NVRM: www.nvidia.com.
The open dkms kernel module supports cards RTX 2000 series and newer, install the nvidia kernel dkms module instead.

bitrat
Posts: 85
Joined: 2023-07-20 09:41
Has thanked: 3 times

Re: nVidia driver nightmare continued...

#3 Post by bitrat »

@ttila wrote: 2024-04-14 06:54
5648] NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:13ba)
NVRM: installed in this system is not supported by open
NVRM: nvidia.ko because it does not include the required GPU
NVRM: System Processor (GSP).
NVRM: Please see the 'Open Linux Kernel Modules' and 'GSP
NVRM: Firmware' sections in the driver README, available on
NVRM: the Linux graphics driver download page at
NVRM: www.nvidia.com.
The open dkms kernel module supports cards RTX 2000 series and newer, install the nvidia kernel dkms module instead.
Thanks! How do I do that?

Code: Select all

$ sudo apt autoremove *nvidia* --purge
$ sudo apt install nvidia-driver nvidia-smi nvidia-settings
$ sudo reboot
All done! Looks great. Thanks again!

I was too busy and distracted to give this my full time and attention, thus making it more difficult than necessary. I now recall reading that newer nVidia hardware has firmware on the card that keeps critical code out of the public domain, hence open source not working on my ancient Quadro K2200.

==============================================

Unfortunately, the card has crashed again...
  • firefox was using excessive cpu and desktop blank with laggy mouse pointer...
  • so I killed firefox..
  • ..mouse now responding normally but desktop still blank
dmesg...

Code: Select all

[19037.712989] perf: interrupt took too long (2576 > 2500), lowering kernel.perf_event_max_sample_rate to 77500
I think this is the point where I did sudo service lightdm restart ...

Code: Select all

[29443.370208] xfce4-terminal[1987]: segfault at 90 ip 00007f65bc181b48 sp 00007ffe23bcc040 error 4 in libgdk-3.so.0.2406.32[7f65bc122000+86000] likely on CPU 4 (core 0, socket 0)
[29443.370226] Code: 24 60 49 89 6a 38 f2 0f 5e c8 f2 41 0f 11 4a 48 e8 7d 1f fb ff 48 89 ef e8 b5 b3 fa ff 48 8b 7c 24 10 48 89 c6 e8 18 2c fb ff <48> 8b b3 90 00 00 00 48 8b 7c 24 10 e8 67 2c fb ff 49 8d 94 24 b8
Looks like my initial suspicion, that the nouveau driver has toasted my GPU, may be correct. :(

User avatar
@ttila
Posts: 157
Joined: 2017-12-13 16:57
Has thanked: 2 times
Been thanked: 15 times

Re: nVidia driver nightmare continued...

#4 Post by @ttila »

Can you test with a live distro??? Just to exclude a HW issue

bitrat
Posts: 85
Joined: 2023-07-20 09:41
Has thanked: 3 times

Re: nVidia driver nightmare continued...

#5 Post by bitrat »

@ttila wrote: 2024-04-16 00:43 Can you test with a live distro??? Just to exclude a HW issue
The thing is it's very intermittent. It can happen twice in an hour, then not for a week.

I gave the box a dust and I'll see what happens next. I don't really have time to play with it and it does look very nice (way nicer than nouveau).

Actually, the latest crash was different to the others.. including a specific firefox error...

Code: Select all

(Firefox-esr:2488): GLib-GObject-CRITICAL **: 12:36:25.216: g_object_ref: assertion 'G_IS_OBJECT (object)' failed
[Parent 2488, Main Thread] WARNING: g_object_ref: assertion 'G_IS_OBJECT (object)' failed: 'glib warning', file ./toolkit/xre/nsSigHandlers.cpp:167
I also remembered (for some reason) I'd set firefox back to use hardware acceleration, and I suspect that was at the root of the most recent crash, which may have just been firefox sitting on the system like a giant 300 ton elephant. I'm ~fairly~ sure there's a lot less fan activity with the nVidia driver, but it wasn't reporting any overheating with nouveau...

Also, I was going to try onboard graphics, but the BIOS is set to use the nVidia PCI slot as first display. It doesn't seem to have an 'auto' setting and when I pulled the nVidia card, there were no graphics from the board, which has DVI and VGA out. (I don't currently have a second screen to test both at once.)

I was going to switch over to onboard graphics, then saw something online saying for that I'd need to enable 'compatibility mode' (CSM). It was for a different Gigabyte BIOS and seems odd, but I don't want to be stuck with no graphics at all. Is there a way to reset the BIOS to defaults, btw?

Anyway, if you know anything about these Gigabyte UEFIs I'll update here with some details.

Post Reply