Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230

 

 

 

[Solved] How do you install the Nvidia Tesla drivers?

Need help with peripherals or devices?
Post Reply
Message
Author
kerryhall
Posts: 275
Joined: 2008-08-19 11:06
Has thanked: 3 times

[Solved] How do you install the Nvidia Tesla drivers?

#1 Post by kerryhall »

Debian 10.

nvidia-detect shows both my geforce and my tesla. Recommends nvidia-driver. According to this:
https://wiki.debian.org/NvidiaGraphicsDrivers

tesla drivers are supposedly included with the debian 10 nvidia-driver.

I install this, reboot, I'm on 418. nvidia-smi -L doesn't show the tesla. nvidia-detect shows it.

So I try to install the nvidia-tesla-418-driver from backports. Doing this causes the "regular" 470 driver to get installed from backports. What? Why? Reboot, nvidia-smi -L still doesn't show the tesla.

Ok, maybe the tesla driver version number has to match the "regular" driver version?

Code: Select all

sudo apt install -t buster-backports nvidia-tesla-470-driver          
                                                                                                                        
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 nvidia-tesla-470-driver : Depends: nvidia-tesla-470-driver-libs (= 470.141.03-1~deb11u1~bpo10+1) but it is not going to be installed
                           Depends: nvidia-tesla-470-driver-bin (= 470.141.03-1~deb11u1~bpo10+1) but it is not going to be installed
                           Depends: xserver-xorg-video-nvidia-tesla-470 (= 470.141.03-1~deb11u1~bpo10+1) but it is not going to be installed
                           Depends: nvidia-tesla-470-kernel-dkms (= 470.141.03-1~deb11u1~bpo10+1) but it is not going to be installed or
                                    nvidia-tesla-470-kernel-470.141.03
E: Unable to correct problems, you have held broken packages.
Looks like the tesla xorg package conflicts with the "regular" xorg package? Why? Do you need both? Or just one or the other? If you have a tesla and a geforce, which one do you need?

I go through dependency hell, purge every nvidia package, and install *only* tesla drivers, ie, nvidia-tesla-470-driver etc. Reboot. Now xorg won't start.

I go back and forth five more times, stepping through every permutation, although it seems like I can't get nvidia-driver and nvidia-tesla-470-driver installed at the same time due to conflicting packages.

Does the geforce driver include the tesla driver? Does the tesla driver include the geforce driver? Do I need both even though they are in conflict?

How do I get this working? What combination of packages is needed here?

I want xorg to work as usual with my geforce, and I want to use the tesla for cuda. And occasionally the geforce for cuda also. 470 driver version for both would be nice.
Last edited by kerryhall on 2023-06-03 08:30, edited 1 time in total.

kerryhall
Posts: 275
Joined: 2008-08-19 11:06
Has thanked: 3 times

Re: How do you install the Nvidia Tesla drivers?

#2 Post by kerryhall »

Small update here. Installed Debian 11. Same issue.

Tried nvidia-driver and nvidia-tesla-driver. Same issue.

Upgraded to bookworm. Same issue.

Tried nvidia-driver and nvidia-tesla-driver. Same issue.

Is there some low level way I can try and communicate with the Tesla here?

kerryhall
Posts: 275
Joined: 2008-08-19 11:06
Has thanked: 3 times

Re: How do you install the Nvidia Tesla drivers?

#3 Post by kerryhall »

Back to my Debian 10 install.

lspci -vv as root for the geforce:

Code: Select all

01:00.0 VGA compatible controller: NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: PNY GP106 [GeForce GTX 1060 6GB]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 39
	Region 0: Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
	Region 1: Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Region 3: Memory at f0000000 (64-bit, prefetchable) [size=32M]
	Region 5: I/O ports at e000 [size=128]
	[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: [60] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee04004  Data: 0028
	Capabilities: [78] Express (v2) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <16us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Via message
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=01
			Status:	NegoPending- InProgress-
	Capabilities: [250 v1] Latency Tolerance Reporting
		Max snoop latency: 0ns
		Max no snoop latency: 0ns
	Capabilities: [128 v1] Power Budgeting <?>
	Capabilities: [420 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
	Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Kernel driver in use: nvidia
	Kernel modules: nvidia

kerryhall
Posts: 275
Joined: 2008-08-19 11:06
Has thanked: 3 times

Re: How do you install the Nvidia Tesla drivers?

#4 Post by kerryhall »

lcpci -vv as root for the tesla:

Code: Select all

02:00.0 3D controller: NVIDIA Corporation GP102GL [Tesla P40] (rev a1)
	Subsystem: NVIDIA Corporation GP102GL [Tesla P40]
	Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 17
	Region 0: Memory at f3000000 (32-bit, non-prefetchable) [size=16M]
	Region 1: Memory at <unassigned> (64-bit, prefetchable)
	Region 3: Memory at <unassigned> (64-bit, prefetchable)
	Capabilities: [60] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [78] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75.000W
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #1, Speed 8GT/s, Width x16, ASPM not supported, Exit Latency L0s <512ns, L1 <4us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Via message
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=01
			Status:	NegoPending- InProgress-
	Capabilities: [250 v1] Latency Tolerance Reporting
		Max snoop latency: 0ns
		Max no snoop latency: 0ns
	Capabilities: [128 v1] Power Budgeting <?>
	Capabilities: [420 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
	Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900 v1] #19
	Kernel modules: nvidia

kerryhall
Posts: 275
Joined: 2008-08-19 11:06
Has thanked: 3 times

Re: How do you install the Nvidia Tesla drivers?

#5 Post by kerryhall »

Found this in dmesg:

Code: Select all

NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR1 is 0M @ 0x0 (PCI:0000:02:00.0)
               
and

Code: Select all

 NVRM: The system BIOS may have misconfigured your GPU.
 
My motherboard manual shows 2 video cards being used simultaneously, so I thought this would work, but clearly there is something else going on here.

kerryhall
Posts: 275
Joined: 2008-08-19 11:06
Has thanked: 3 times

Re: How do you install the Nvidia Tesla drivers?

#6 Post by kerryhall »

In case anyone runs into this issue...it appears to be some sort of feature missing from my motherboard and/or bios, ie "Above 4GB decoding". Sadly the only solution here was to purchase a new motherboard + proc. Good news is that I didn't even have to install any software, simply swapped the mobo + proc, enabled "Above 4GB decoding", booted, and tesla works now.

Aki
Global Moderator
Global Moderator
Posts: 2816
Joined: 2014-07-20 18:12
Location: Europe
Has thanked: 68 times
Been thanked: 382 times

Re: [Solved] How do you install the Nvidia Tesla drivers?

#7 Post by Aki »

Hello,
Thanks for reporting.
Happy Debian and happy hacking with your two nvidia graphic cards ! :-)
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
⠈⠳⣄⠀

Post Reply