[Solved] VGA Passthrough problem

Kernels & Hardware, configuring network, installing services

[Solved] VGA Passthrough problem

Postby kaldtismann.deb » 2020-06-05 10:40

Hello and thank you for your help

I have tried to setup a Virtual Machine in qemu to install Windows 10

I have successed to launch the virtual machine, to install Windows 10 and to connect it from a remote SPICE client

But I have successed only without VGA Passthrough !!!

Once I try to give the virtual machine a graphic card it's not possible to access the machine with SPICE client and the process is completly blocked !
I can't even kill the qemu process with a kill -9 PID of [qemu process]

The hardware is new and I have tried to follow a lot of guide but none gave me the solution

---------------------------------------------------------------------------------------

--- Here is my hardware setup : ---

Gigabyte TRX40 Designare
AMD Threadripper 3960x
AMD Radeon RX480

This computer is attached to the network and I don't plan to use it as a desktop with screen
I have an other one from where I connect to the virtual machine with SPICE client. (SPICE client is really fast !)

---------------------------------------------------------------------------------------

What I have done :

I have installed Debian Buster with a ssh server and nothing more

I have installed qemu and KVM

I have successed to install Windows 10 with a remote SPICE client and I can start and stop the virtual machine without any problem when I use this command

Code: Select all
qemu-system-x86_64 -name windows10_vm -uuid xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -k fr-ch -machine type=pc-i440fx-3.1,accel=kvm -cpu host -smp 8,sockets=1,cores=8,threads=1 -m 16G -rtc clock=host,base=localtime -serial none -parallel none -soundhw hda -usb -device usb-tablet -boot order=dc -drive id=disk0,if=virtio,cache=none,format=raw,file=/dev/disk -drive file=/directory/ISO/Win10x64.iso,index=1,media=cdrom -drive file=/directory/virtio-win-0.1.171.iso,index=2,media=cdrom -spice port="5900",addr="0.0.0.0",disable-ticketing -vga qxl


---------------------------------------------------------------------------------------

After that I decided to try to enable VGA Passthrough

I have done the following :

-> enabled IOMMU in the BIOS

-> then search the id of the graphical card

Code: Select all
root@nightmare # lspci -v
...
21:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480] (rev c7) (prog-if 00 [VGA controller])

21:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590]
...


there is nothing more in this IOMMU group so as what I have understood it's ok if I'm not wrong ?

then create a file to enable vfio module for the graphical card

Code: Select all
root@nightmare # cat /etc/modprobe.d/vfio.conf

options vfio-pci ids=1002:67df,1002:aaf0
options vfio-pci disable_vga=1


then create a file to blacklist amdgpu

Code: Select all
root@nightmare # cat /etc/modprobe.d/blacklist.conf

blacklist amdgpu


and after that add parameters to the /etc/default/grub.conf
first parameters enable IOMMU and second to disable the EFI/VESA framebuffer

Code: Select all
root@nightmare # cat /etc/default/grub

GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on video=vesafb:off,efifb:off"


---------------------------------------------------------------------------------------

So I thought it was ok to start the virtual machine and the VGA passthrough will work but no...

I tried to start the virtual with the new command line with this :

Code: Select all
qemu-system-x86_64 -name windows10_vm -uuid xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -k fr-ch -machine type=pc-i440fx-3.1,accel=kvm -cpu host -smp 8,sockets=1,cores=8,threads=1 -m 16G -rtc clock=host,base=localtime -serial none -parallel none -soundhw hda -usb -device usb-tablet -boot order=dc -drive id=disk0,if=virtio,cache=none,format=raw,file=/dev/disk -drive file=/directory/ISO/Win10_2004_French_x64.iso,index=1,media=cdrom -drive file=/directory/virtio-win-0.1.171.iso,index=2,media=cdrom -spice port="5900",addr="0.0.0.0",disable-ticketing -vga qxl -device vfio-pci,host=21:00.0,multifunction=on -device vfio-pci,host=21:00.1


And I get a return error as this one from qemu

Code: Select all
qemu-system-x86_64: -device vfio-pci,host=21:00.0,multifunction=on: vfio 0000:21:00.0: failed to setup container for group 24: failed to set iommu for container: Operation not permitted


I searched what it was about and I found that there is a problem about "unsafe_interrupts" to allow VGA passthrough

Code: Select all
root@nightmare # dmesg | grep 'remapping'
AMD-Vi: Disabling interrupt remapping
vfio_iommu_type1_attach_group: No interrupt remapping support.  Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform


So I should add an other parameter to /etc/default/grub

Code: Select all
root@nightmare # cat /etc/default/grub

GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on video=vesafb:off,efifb:off vfio_iommu_type1.allow_unsafe_interrupts=1"


But after that I still have this message with dmesg | grep remapping

Code: Select all
AMD-Vi: Disabling interrupt remapping




Why does this not work ?

I have followed a lot of guide but there is nothing more about this problem... it should work but no ???

What shall I do then ?
Last edited by kaldtismann.deb on 2020-06-13 12:40, edited 1 time in total.
kaldtismann.deb
 
Posts: 17
Joined: 2019-03-23 09:17

Re: VGA Passthrough problem

Postby kaldtismann.deb » 2020-06-05 10:41

The solution that I used to solve my problem

In the end I just had to add the option

Code: Select all
options pcie_aspm=off


in the file vfio.conf I have created inside the /etc/modprobe.d/ directory

And nothing more !!! Then I could start my virtual machine and have a lot of performance for my tasks !
Last edited by kaldtismann.deb on 2020-06-13 12:39, edited 2 times in total.
kaldtismann.deb
 
Posts: 17
Joined: 2019-03-23 09:17

Re: VGA Passthrough problem

Postby CwF » 2020-06-05 14:04

kaldtismann.deb wrote:Once I try to give the virtual machine a graphic card it's not possible to access the machine with SPICE client and the process is completly blocked !

Why not? Should I take this to mean the vm does boot up on the passed gpu and does show a display? If so, a virt-viewer window should work. On the host the virt-viewer window should be black with no display but when you click in it you should be transferring KBM to the VM. The virt-viewer window can be small and it will scale it's area with the vm. Using the tablet driver will change the way this works, with it the window won't trap the KBM.

This should be manageable entirely with virtual machine manager, VMM.

also, this is why I use Intel! I don't have hardly any of the options you list. My interrupts option is in vfio.conf.
Code: Select all
#/etc/modprobe.d/vfio.conf
#options vfio-pci ids=
options vfio_iommu_type1 allow_unsafe_interrupts=1
# /etc/initramfs-tools/modules list modules, options here only
# list for boot vfio only static pieces, use dynamic binding


On one computer, unsafe = 0, I rarely needed it, I bet you don't.

I never use 'disable vga' and I do always use a QXL vga device in both windows and linux guest, so the passed vga is always a secondary gpu. You then configure within the guest who gets the spice KBM by disabling the QXL display -not by removing it. Many ways to control, use display=1 for passed, display=0 for the qxl... having the qxl is a fallback display and if a video driver blows up you still have the qxl to work with just like real hardware with onboard video for example.
Typically I pass a nic so my host has no window open and use x2vnc or x2x for seamless integration with the host display(s).

It can be tricky concerning the exact hardware. I have found AMD gpu's in particular to work fine on specific hardware, in one case worked in one slot and not in another - on the same machine, same config. One, I could not pass it while coexisting with the host gpu, I could pass it if it was the host gpu! Yes, vm takes over and need to shutdown the guest to get back to the host. I have a handful of those examples that I simply don't pursue. As hinted at above, I am moving towards no static config on the host at all other than having iommu on and modules available, and configure post boot. I ultimately expect this new method to be easier. Not specifically helpful, I hope something helps...
CwF
 
Posts: 812
Joined: 2018-06-20 15:16

Re: VGA Passthrough problem

Postby kaldtismann.deb » 2020-06-06 08:28

Hello CwF and thank you for your answer

So no the virtual machine don't boot up at all

I don't have virt-viewer, only a SPICE client from an other PC

About intel if you want to pay me a complete setup it's as you want... I have an AMD setup what do you want I answer you then ?

About the

Code: Select all
vfio_iommu_type1 allow_unsafe_interrupts=1


I have found that it can be written in /etc/modprobe.d/vfio.conf but it can be written too in /etc/default/grub

I read that it's better to initialize with grub than with modprobe... I don't know what the truth

But I have tried both... inside /etc/modprobe.d/vfio.conf or inside /etc/default/grub and the result is the same

About the QXL and screen parameters I can't say anything because in the qemu documentation there isn't a lot of explanation... I don't know if I am doing wrong but the virtual machine worked with this option so I don't know what to say ??? If you have a good explanation about what I shall do and use I am opened to advice.

Then a small detail that I found when I do a dmesg once I have started the virtual machine with the VGA passthrough command ; it give me this strange result

Code: Select all
kernel: pcieport 0000:20:03.1: DPC: containment event, status:0x1f01 source:0x0000
kernel: pcieport 0000:20:03.1: DPC: unmasked uncorrectable error detected


this device is

Code: Select all
20:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge (prog-if 00 [Normal decode])


So there is something strange here ! Why this pci-e device react while it isn't in the IOMMU group of the graphic card ?

I have searched and there is really nothing written on this problem but I have found this Level1tech thread showing the same error as mine

https://forum.level1techs.com/t/trx40-related-ovmf-qemu-failing-to-boot-pauses-black-screen-when-starting-vm/153477/28

"maximlevitsky" is saying that he had resolved his/her problem with a own compiled kernel (without explanation) but he/she says that

These DPC errors probably are just bogus, and are leftover from some enterprise EPYC features.


then "vljio" posted his/her grub parameters and the only string I don't have is

Code: Select all
pcie_aspm=off


I don't know if it is this that can block the DPC error

---------------------------------------------------------------------------------------

I am asking myself if I should use the ACS IOMMU override

It's really strange that there is an error with this Starship/Matisse GPP Bridge ???

I don't understand because it is not in the same IOMMU group ???

If you have some idea, help would be appreciated

Have a nice day
kaldtismann.deb
 
Posts: 17
Joined: 2019-03-23 09:17

Re: VGA Passthrough problem

Postby CwF » 2020-06-06 14:53

I don't have virt-viewer, only a SPICE client from an other PC

Both virt-viewer and VMM work from another PC and are more developed.

from earlier, you don't need to blacklist anything ever, in /etc/modules:
Code: Select all
softdep amdgpu pre: vfio vfio_pci

I read that it's better to initialize with grub than with modprobe... I don't know what the truth

grub just passes the parameter, doesn't really matter. If the config was more xen and booting into a vm at boot then pass it early. Determine if that gpu is sharing an IRQ, if it's not, forget the unsafe setting.

About the QXL

I recommend it simply because it provides a fallback fully graphical capable interface if some secondary GPU is not working. With whatever utility you use to access the VM, the QXL is display 0 and the gpu is display 1. So everything will default to the QXL which in many cases will be blank, and respond to tty switching, it's there you do any cli magic.
-So, with it, you'd boot the windows vm and it finds a working solution with the QXL for which it can have a real driver. Then in that display environment you check the status of the secondary GPU, yah or nah. You eventually get that cleared up, then in display properties move the desktop to the secondary display and disable the QXL - and restart. Windows then comes up on the secondary vfio display and is now display 0.
The important point here is the superior diagnostic. Windows will boot successfully with QXL if the host is correct and the issue is windows driver config. If it still doesn't boot, it's the host.

So there is something strange here ! Why this pci-e device react while it isn't in the IOMMU group of the graphic card ?

This is why I made the general non-specific comment that it works in a slot, doesn't in another = having been building such systems for a few years now I stay away from solutions that require specific tweaks since the configuration is not easily transferable. It's a long trace why this or that bridge or switch doesn't work right, but a clean iommu group does not mean the path to it is direct-able, it means it should be...
About intel

My perspective is straight forward and not intended to offend. One is either bent on getting a piece of hardware working, or is determined in getting a VM solution working. It can't be both. I've discarded plenty of hardware, so if you want...My qualification is simple; the host itself and all the vm's need to migrate without issue, including vfio hardware assistance.

With all that said, it is up to enthusiast like you to find these things that concern specific hardware. It seems upstream patches for ill-behaved hardware implementations happen almost as fast as vendors retail new ill-behaved implementations!

Best of luck!
CwF
 
Posts: 812
Joined: 2018-06-20 15:16

Re: VGA Passthrough problem

Postby kaldtismann.deb » 2020-06-07 09:32

Hello and thank you for you answer

Both virt-viewer and VMM work from another PC and are more developed.


Ok so here is what I have installed on the remote computer and the problem is still here

Code: Select all
libcacard0 libgovirt-common libgovirt2 libgtk-vnc-2.0-0 libgvnc-1.0-0  libphodav-2.0-0 libphodav-2.0-common libspice-client-glib-2.0-8  libspice-client-gtk-3.0-5 libusbredirhost1 libusbredirparser1 libvirt-glib-1.0-0 libvirt0 spice-client-glib-usb-acl-helper virt-viewer gir1.2-atk-1.0 gir1.2-freedesktop gir1.2-gdkpixbuf-2.0 gir1.2-gtk-3.0 gir1.2-pango-1.0 gir1.2-spiceclientglib-2.0 gir1.2-spiceclientgtk-3.0 libpangoxft-1.0-0


It's a minimal install for virt-viewer... and about VMM I think I don't need it because I feel confortable with qemu command line... I prefer to understand CLI before using VMM and the XML files.
VMM should probably be a better choice for the eyes of IT people and I should probably be "masochistic" but I would like to understand from the beginning before using graphical programs

After that said is it better to use VMM than using qemu command line... I don't know ; you probably know better the answer than me !
I have read that VMM use libvirt to "translate" XML file to qemu command line... so I think in the end it should be the same if I am not wrong !

I am completly opened to learn but it doesn't mean because everyone use VMM I will use VMM... if there is a clear advantage with VMM compared to qemu command line, I will look it... if it's "just a frontend"... before using it I would prefer learning qemu without. If there is a problem with VMM and I don't know nothing about qemu command line then how can I do without VMM ?

For giving an example I use LibreOffice Writer for the files I use everyday. There are pictures, boards and "shaping" in these files.
But I know to use Latex and old style text files. I am faster with Libre Office to do my everyday file I need to print and share

But when it's for only myself I use old style text files

You can say I am doing wrong choices but for the moment I don't know other programs with which I can do the same with better results (speaking about office) !

grub just passes the parameter, doesn't really matter. If the config was more xen and booting into a vm at boot then pass it early. Determine if that gpu is sharing an IRQ, if it's not, forget the unsafe setting.


ok so what I read was wrong... I will use modprobe.d then !

about the IRQ (interrupt request) how and where can I know more about IRQ with my hardware (RX480 and Gigabyte TRX40 Designare) ?
is there a bash command line that give this result(s) ?

I recommend it simply because it provides a fallback fully graphical capable interface if some secondary GPU is not working. With whatever utility you use to access the VM, the QXL is display 0 and the gpu is display 1. So everything will default to the QXL which in many cases will be blank, and respond to tty switching, it's there you do any cli magic.
-So, with it, you'd boot the windows vm and it finds a working solution with the QXL for which it can have a real driver. Then in that display environment you check the status of the secondary GPU, yah or nah. You eventually get that cleared up, then in display properties move the desktop to the secondary display and disable the QXL - and restart. Windows then comes up on the secondary vfio display and is now display 0.
The important point here is the superior diagnostic. Windows will boot successfully with QXL if the host is correct and the issue is windows driver config. If it still doesn't boot, it's the host.


I understand what you mean. It's like having two monitors. If one go down you still have the second to diagnose problems

The problem is not here... Because I use qemu command line, I watched qemu's documentation. And in the qemu's documentation it's written

Code: Select all
-vga type
    Select type of VGA card to emulate. Valid values for type are
...
qxl
    QXL paravirtual graphic card. It is VGA compatible. Works best with qxl guest drivers installed though. Recommended choice when using the spice protocol.
...


Excuse me but here it's only written about SPICE protocol... even if I passthrough a GPU card I still need to use the SPICE protocol to see my "remote" screen ?
I don't want to connect a screen to the GPU card on my qemu server so have you better option ?
I decided to use the SPICE protocol because it is faster than VNC when you need to have reactivity... Again if I am wrong correct me !

This is why I made the general non-specific comment that it works in a slot, doesn't in another = having been building such systems for a few years now I stay away from solutions that require specific tweaks since the configuration is not easily transferable. It's a long trace why this or that bridge or switch doesn't work right, but a clean iommu group does not mean the path to it is direct-able, it means it should be...


To be clear and direct with you I am not an IT engineer... I am a mechanic.
So for me it's unlogical that something outside a IOMMU group react when you call this IOMMU group
After that I don't know how AMD TR40X chipset works and how is the protocol of PCI-e

You say in this sentence that you have done a lot of "builds" and you have seen a lot of garbage when you need specific tweaks...
what can I answer ? when I read some GPU passthrough they only said there could be some problem with Nvidia code 43 or with AMD reset bug... I just find one where it said that he had to use ACS override because the guy bought a X470 motherboard...
Now I have bought this motherboard and I get this specifical error message I find more on internet but before I never found in the guides all this problems !

My perspective is straight forward and not intended to offend. One is either bent on getting a piece of hardware working, or is determined in getting a VM solution working. It can't be both. I've discarded plenty of hardware, so if you want...My qualification is simple; the host itself and all the vm's need to migrate without issue, including vfio hardware assistance.


I am not offended but how should I have react so ?

Code: Select all
example with my job... if you call a mechanic because you have a problem with your car and he answers you : just take the train...
what will you answer him ? even if you take the train it doesn't solve your car's problem...
you can learn more about what you or the manufacturer have done wrong with the car but the problem is still here


As said before people were saying the AMD TRX40 would be a good choice for making VGA passthrough because there is a lot of PCI-e lanes and a lot of IOMMU group... I am not a specialist so I believe them.
Now what you propose me to do ?
Where can I find a solution with my problem ?
I have read people saying they have a VGA passthrough with the quite same hardware as mine... so I think there should be some possibilities before buying a new motherboard and Intel CPU...

With all that said, it is up to enthusiast like you to find these things that concern specific hardware. It seems upstream patches for ill-behaved hardware implementations happen almost as fast as vendors retail new ill-behaved implementations!


Me enthusiast ? I don't think so... TRX40 is build on the X399 chipset and it's the same with the architecture for Threadripper...
I didn't think it would still have problem "today" after roughly 3 years on the market with VGA passthrough... For the X times... I am once again probably wrong...

Is it the fault of AMD, of Gigayte, of VFIO, of the linux kernel, of me or of someone other... I don't know. In the end I just don't understand what is the real bottom of my problem...

Thanks for all and have a nice day
kaldtismann.deb
 
Posts: 17
Joined: 2019-03-23 09:17

Re: VGA Passthrough problem

Postby CwF » 2020-06-07 16:07

... and about VMM I think I don't need it because I feel confortable with qemu command line..

Even with VMM, or without, look into using virsh.
I don't want to connect a screen to the GPU card on my qemu server so have you better option ?

Not sure what your after here. Spice is faster, will benefit from the gpu, is still going over the network so not that fast. I've done one build that technically worked and wouldn't recommend it = Quadro drivers under windows can extend the QXL drivers with 3D in a windows guest - your spice to screen0 will be 3D 'accelerated' QXL and it works, but still sucks. Some higher end AMD's (MxGPU) can accelerate and compress the network stream but those are solutions outside our realm here.
what is the real bottom of my problem..

The PCI-e slots are listed as 2x16 or 3x8 and are version 4. A double red flag, version 4 means newer than new and not a linux favorite, but more important all those slots are behind a bifurcation switch = that's bad, and is something that may need the ASC magic. Sure it might work, someday, I don't know. I work backwards from the job at hand to the hardware, ie. you spec the job, I spec the hardware. The hardware I'm typing on has 3x16 free and clear with 80 lanes total and 3 additional slots with bifurcation switches, it does make a difference. I do think 'they' will get it figured out, I'll pass. Many Intel boards have similar issues, I wouldn't spec them either.
I am a mechanic.

Me too! Except maybe a different kind. Ever seen the fee schedule that says;
To fix it, $100/hr.
If you tried to fix it, $200/hr.
If another mechanic tried to fix it, $500/hr.
I got two categories, brand new custom fitments, and last resort recovery. You could say I am Q's mechanic and exactly the type to say "maybe you should take the train."

Just keep doing what your doing! Maybe someone else here has something more useful than I do. Your quest is out of my scope. Good Luck!
CwF
 
Posts: 812
Joined: 2018-06-20 15:16


Return to System configuration

Who is online

Users browsing this forum: No registered users and 9 guests

fashionable