Hello. I am new to debian and working with a new desktop with an rtx 3090 that I am using for generative ai workflows with stable diffusion using tools like ComfyUI. Here is my about this system:
Operating System: Debian GNU/Linux 12
KDE Plasma Version: 5.27.5
KDE Frameworks Version: 5.103.0
Qt Version: 5.15.8
Kernel Version: 6.1.0-17-amd64 (64-bit)
Graphics Platform: Wayland
Processors: 24 × AMD Ryzen 9 7900X 12-Core Processor
Memory: 30.5 GiB of RAM
Graphics Processor: AMD Radeon Graphics
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: B650 AORUS ELITE AX
The issue I am having is that often the gpu becomes unavailable after a period of time. Restarting the computer seems to get it working again.
When it becomes unavailable `nvidia-smi` returns Unable to determine the device handle for GPU0000:01:00.0: Unknown Error
`nvidia-debugdump --dumpall`
returns
```
ERROR: internal_dumpNvLogComponent() failed, return code: 0x3e7
ERROR: internal_dumpGpuComponent() failed, return code: 0x3e7
ERROR: internal_dumpGpuComponent() failed, return code: 0x3e7
ERROR: internal_dumpNvLogComponent() failed, return code: 0x3e7
```
Possibly related, on https://wiki.debian.org/NvidiaGraphicsDrivers
It is mentioned that if `lspci | grep -E "VGA|3D"` returns two lines, you have an optimus card, for me the command returns:
```
01:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)
10:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raphael (rev c2)
```
So take that to mean it is an optimus card, but when I go to look at the steps for that I dont really understand what I am to do. It looks like the optimal solution is `Nvidia prime to render offload`. But that doesn't look like a configuration, it reads to me as something i prepend to commands. Does that mean that I should prepend that to any command that will be using a gpu. In the case of comfyui would I run `__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia python main.py` after activating the conda environment that I set up for comfyui? In that case I still get the same error:
```
torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available
```
How can I solve this so the gpu is ready whenever I need it?
Thanks in advance.
Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230
[Software] Nvidia gpu stops being useable after a time
- FreewheelinFrank
- Global Moderator
- Posts: 2117
- Joined: 2010-06-07 16:59
- Has thanked: 38 times
- Been thanked: 232 times
Re: [Software] Nvidia gpu stops being useable after a time
Not an Nvidia user, but I'll give you a bump. Maybe some of our Nvidia users will notice the topic.
Could it be the power supply crapping out under high load? Is the Nvidia gpu actually working when it becomes unavailable? The application sounds quite intense.
Is there anything in the journal when it happens?
If you have rebooted or
For current boot.
Could it be the power supply crapping out under high load? Is the Nvidia gpu actually working when it becomes unavailable? The application sounds quite intense.
Is there anything in the journal when it happens?
Code: Select all
# journalctl -b -1
Code: Select all
# journalctl -b