Restarts & Shutdowns problem [SOLVED]

New to Debian (Or Linux in general)? Ask your questions here!

Restarts & Shutdowns problem [SOLVED]

Postby GeckoS » 2017-05-14 23:14

Hi!

I've got problem with my machine as when I want to shutdown or restart it - it closes X, the screen goes black but machine is still running. Problem started when my root partition became full because of problems with GPU pass-through to virtual machine running on libvirt - logs filled the root partition. I solved the GPU error, cleaned logs in /var/log with this command:
Code: Select all
rm -v /var/log/syslo* && rm -v /var/log/kern.lo*

but the restarts/shutdowns problem showed up. Below are some details of my setup.
There are three main software-RAID0 partitions:

Code: Select all
w@sphinx:~$ lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
nvme1n1     259:0    0   477G  0 disk 
├─nvme1n1p1 259:2    0   571M  0 part 
├─nvme1n1p2 259:3    0    14G  0 part 
│ └─md0       9:0    0  27,9G  0 raid0 /
├─nvme1n1p3 259:4    0   2,8G  0 part 
│ └─md1       9:1    0   5,6G  0 raid0 [SWAP]
└─nvme1n1p4 259:5    0   413G  0 part 
  └─md2       9:2    0 825,9G  0 raid0 /home
nvme0n1     259:1    0   477G  0 disk 
├─nvme0n1p1 259:6    0   571M  0 part  /boot/efi
├─nvme0n1p2 259:7    0    14G  0 part 
│ └─md0       9:0    0  27,9G  0 raid0 /
├─nvme0n1p3 259:8    0   2,8G  0 part 
│ └─md1       9:1    0   5,6G  0 raid0 [SWAP]
└─nvme0n1p4 259:9    0   413G  0 part 
  └─md2       9:2    0 825,9G  0 raid0 /home


System is Stretch Testing with updated kernel:
Code: Select all
w@sphinx:~$ uname -a
Linux sphinx 4.10.13-kernel-name #2 SMP Sun Apr 30 03:35:57 CEST 2017 x86_64 GNU/Linux


Below is part of dmesg where at the end there are repetitive errors:

Code: Select all
[    2.935470] AVX2 version of gcm_enc/dec engaged.
[    2.935470] AES CTR mode by8 optimization enabled
[    2.937031] Error: Driver 'pcspkr' is already registered, aborting...
[    2.937624] i915 0000:00:02.0: enabling device (0004 -> 0007)
[    2.938123] [drm] Memory usable by graphics device = 4096M
[    2.938123] [drm] VT-d active for gfx access
[    2.938132] checking generic (c0000000 300000) vs hw (90000000 10000000)
[    2.938139] [drm] Replacing VGA console driver
[    2.948569] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    2.948569] [drm] Driver supports precise vblank timestamp query.
[    2.964246] i915 0000:00:02.0: Direct firmware load for i915/kbl_dmc_ver1_01.bin failed with error -2
[    2.964248] i915 0000:00:02.0: Failed to load DMC firmware [https://01.org/linuxgraphics/intel-linux-graphics-firmwares], disabling runtime power management.
[    2.964777] Adding 5855228k swap on /dev/md1.  Priority:-1 extents:1 across:5855228k SSFS
[    3.004402] pstore: using zlib compression
[    3.005815] alg: No test for pcbc(aes) (pcbc-aes-aesni)
[    3.005836] asus_wmi: ASUS WMI generic driver loaded
[    3.007227] asus_wmi: Initialization: 0x0
[    3.007245] asus_wmi: BIOS WMI version: 0.9
[    3.007271] asus_wmi: SFUN value: 0x0
[    3.008428] input: Eee PC WMI hotkeys as /devices/platform/eeepc-wmi/input/input12
[    3.008711] asus_wmi: Number of fans: 1
[    3.017106] pstore: Registered efi as persistent store backend
[    3.019128] intel_rapl: Found RAPL domain package
[    3.019128] intel_rapl: Found RAPL domain core
[    3.019129] intel_rapl: Found RAPL domain uncore
[    3.019129] intel_rapl: Found RAPL domain dram
[    3.133838] EXT4-fs (md2): mounted filesystem with ordered data mode. Opts: (null)
[    3.263777] random: crng init done
[    3.833620] ip6_tables: (C) 2000-2006 Netfilter Core Team
[    3.840111] Ebtables v2.0 registered
[    3.951154] IPv6: ADDRCONF(NETDEV_UP): enp0s31f6: link is not ready
[    4.033817] nvidia-modeset: Allocated GPU:0 (GPU-acf0f085-c568-9442-ad62-ecb2429dec4a) @ PCI:0000:02:00.0
[    4.168303] IPv6: ADDRCONF(NETDEV_UP): enp0s31f6: link is not ready
[    5.727961] e1000e: enp0s31f6 NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx
[    5.728021] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s31f6: link becomes ready
[   11.038694] Console: switching to colour dummy device 80x25
[   11.048130] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[   11.049024] tun: Universal TUN/TAP device driver, 1.6
[   11.049024] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
[   11.140235] virbr0: port 1(virbr0-nic) entered blocking state
[   11.140236] virbr0: port 1(virbr0-nic) entered disabled state
[   11.140270] device virbr0-nic entered promiscuous mode
[   11.266457] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
[   11.275597] virbr0: port 1(virbr0-nic) entered blocking state
[   11.275598] virbr0: port 1(virbr0-nic) entered listening state
[   11.309861] virbr0: port 1(virbr0-nic) entered disabled state
[  243.031718] INFO: task systemd-udevd:391 blocked for more than 120 seconds.
[  243.031725]       Tainted: P           OE   4.10.13-kernel-name #2
[  243.031728] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.031731] systemd-udevd   D    0   391    353 0x00000104
[  243.031737] Call Trace:
[  243.031747]  ? __schedule+0x233/0x6f0
[  243.031754]  ? schedule+0x32/0x80
[  243.031759]  ? vga_get+0x110/0x1b0
[  243.031765]  ? wake_up_q+0x70/0x70
[  243.031850]  ? i915_disable_vga+0x45/0xa0 [i915]
[  243.031927]  ? intel_modeset_init+0x4ca/0x1780 [i915]
[  243.032002]  ? intel_i2c_reset+0x3e/0x40 [i915]
[  243.032069]  ? intel_setup_gmbus+0xf1/0x3b0 [i915]
[  243.032129]  ? i915_driver_load+0x9a9/0x1460 [i915]
[  243.032133]  ? acpi_dev_found+0x5f/0x70
[  243.032136]  ? local_pci_probe+0x3f/0x90
[  243.032139]  ? pci_device_probe+0x10c/0x160
[  243.032145]  ? driver_probe_device+0x2b2/0x450
[  243.032149]  ? __driver_attach+0xda/0xe0
[  243.032153]  ? driver_probe_device+0x450/0x450
[  243.032157]  ? bus_for_each_dev+0x67/0xb0
[  243.032162]  ? bus_add_driver+0x16a/0x260
[  243.032166]  ? driver_register+0x57/0xc0
[  243.032169]  ? 0xffffffffc190f000
[  243.032174]  ? do_one_initcall+0x4e/0x1a0
[  243.032177]  ? __vunmap+0x6d/0xc0
[  243.032181]  ? do_init_module+0x5b/0x1f6
[  243.032186]  ? load_module+0x2486/0x2980
[  243.032188]  ? __symbol_put+0x60/0x60
[  243.032194]  ? SYSC_finit_module+0xc6/0xf0
[  243.032199]  ? do_syscall_64+0x57/0xc0
[  243.032204]  ? entry_SYSCALL64_slow_path+0x25/0x25
[  243.032211] INFO: task systemd-udevd:403 blocked for more than 120 seconds.
[  243.032215]       Tainted: P           OE   4.10.13-kernel-name #2
[  243.032218] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.032220] systemd-udevd   D    0   403    353 0x00000104
[  243.032224] Call Trace:
[  243.032229]  ? __schedule+0x233/0x6f0
[  243.032233]  ? kernfs_next_descendant_post+0x73/0x90
[  243.032236]  ? kernfs_link_sibling+0x80/0xb0
[  243.032240]  ? schedule+0x32/0x80
[  243.032244]  ? schedule_preempt_disabled+0xa/0x10
[  243.032248]  ? __mutex_lock_slowpath+0x1af/0x2c0
[  243.032253]  ? driver_probe_device+0x450/0x450
[  243.032258]  ? __driver_attach+0x56/0xe0
[  243.032263]  ? driver_probe_device+0x450/0x450
[  243.032266]  ? bus_for_each_dev+0x67/0xb0
[  243.032271]  ? bus_add_driver+0x16a/0x260
[  243.032275]  ? driver_register+0x57/0xc0
[  243.032286]  ? trace_event_define_fields_hda_pm+0x28/0x28 [snd_hda_intel]
[  243.032289]  ? do_one_initcall+0x4e/0x1a0
[  243.032293]  ? __vunmap+0x6d/0xc0
[  243.032296]  ? do_init_module+0x5b/0x1f6
[  243.032300]  ? load_module+0x2486/0x2980
[  243.032303]  ? __symbol_put+0x60/0x60
[  243.032308]  ? SYSC_finit_module+0xc6/0xf0
[  243.032312]  ? do_syscall_64+0x57/0xc0
[  243.032318]  ? entry_SYSCALL64_slow_path+0x25/0x25
[  363.860774] INFO: task systemd-udevd:391 blocked for more than 120 seconds.
[  363.860782]       Tainted: P           OE   4.10.13-kernel-name #2
[  363.860785] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  363.860788] systemd-udevd   D    0   391    353 0x00000104
[  363.860793] Call Trace:
[  363.860803]  ? __schedule+0x233/0x6f0
[  363.860810]  ? schedule+0x32/0x80
[  363.860816]  ? vga_get+0x110/0x1b0
[  363.860821]  ? wake_up_q+0x70/0x70
[  363.860907]  ? i915_disable_vga+0x45/0xa0 [i915]
[  363.860987]  ? intel_modeset_init+0x4ca/0x1780 [i915]
[  363.861067]  ? intel_i2c_reset+0x3e/0x40 [i915]
[  363.861133]  ? intel_setup_gmbus+0xf1/0x3b0 [i915]
[  363.861193]  ? i915_driver_load+0x9a9/0x1460 [i915]
[  363.861197]  ? acpi_dev_found+0x5f/0x70
[  363.861200]  ? local_pci_probe+0x3f/0x90
[  363.861203]  ? pci_device_probe+0x10c/0x160
[  363.861208]  ? driver_probe_device+0x2b2/0x450
[  363.861213]  ? __driver_attach+0xda/0xe0
[  363.861218]  ? driver_probe_device+0x450/0x450
[  363.861221]  ? bus_for_each_dev+0x67/0xb0
[  363.861226]  ? bus_add_driver+0x16a/0x260
[  363.861230]  ? driver_register+0x57/0xc0
[  363.861234]  ? 0xffffffffc190f000
[  363.861238]  ? do_one_initcall+0x4e/0x1a0
[  363.861242]  ? __vunmap+0x6d/0xc0
[  363.861246]  ? do_init_module+0x5b/0x1f6
[  363.861250]  ? load_module+0x2486/0x2980
[  363.861253]  ? __symbol_put+0x60/0x60
[  363.861259]  ? SYSC_finit_module+0xc6/0xf0
[  363.861263]  ? do_syscall_64+0x57/0xc0
[  363.861269]  ? entry_SYSCALL64_slow_path+0x25/0x25
[  363.861276] INFO: task systemd-udevd:403 blocked for more than 120 seconds.
[  363.861280]       Tainted: P           OE   4.10.13-kernel-name #2
[  363.861282] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  363.861285] systemd-udevd   D    0   403    353 0x00000104
[  363.861289] Call Trace:
[  363.861294]  ? __schedule+0x233/0x6f0
[  363.861298]  ? kernfs_next_descendant_post+0x73/0x90
[  363.861301]  ? kernfs_link_sibling+0x80/0xb0
[  363.861305]  ? schedule+0x32/0x80
[  363.861309]  ? schedule_preempt_disabled+0xa/0x10
[  363.861314]  ? __mutex_lock_slowpath+0x1af/0x2c0
[  363.861318]  ? driver_probe_device+0x450/0x450
[  363.861323]  ? __driver_attach+0x56/0xe0
[  363.861328]  ? driver_probe_device+0x450/0x450
[  363.861331]  ? bus_for_each_dev+0x67/0xb0
[  363.861335]  ? bus_add_driver+0x16a/0x260
[  363.861340]  ? driver_register+0x57/0xc0
[  363.861350]  ? trace_event_define_fields_hda_pm+0x28/0x28 [snd_hda_intel]
[  363.861354]  ? do_one_initcall+0x4e/0x1a0
[  363.861357]  ? __vunmap+0x6d/0xc0
[  363.861360]  ? do_init_module+0x5b/0x1f6
[  363.861364]  ? load_module+0x2486/0x2980
[  363.861367]  ? __symbol_put+0x60/0x60
[  363.861372]  ? SYSC_finit_module+0xc6/0xf0
[  363.861377]  ? do_syscall_64+0x57/0xc0
[  363.861383]  ? entry_SYSCALL64_slow_path+0x25/0x25
[  484.689943] INFO: task systemd-udevd:391 blocked for more than 120 seconds.
[  484.689950]       Tainted: P           OE   4.10.13-kernel-name #2
[  484.689953] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  484.689957] systemd-udevd   D    0   391    353 0x00000104
[  484.689962] Call Trace:
[  484.689973]  ? __schedule+0x233/0x6f0
[  484.689980]  ? schedule+0x32/0x80
[  484.689985]  ? vga_get+0x110/0x1b0
[  484.689992]  ? wake_up_q+0x70/0x70
[  484.690087]  ? i915_disable_vga+0x45/0xa0 [i915]
[  484.690167]  ? intel_modeset_init+0x4ca/0x1780 [i915]
[  484.690246]  ? intel_i2c_reset+0x3e/0x40 [i915]
[  484.690315]  ? intel_setup_gmbus+0xf1/0x3b0 [i915]
[  484.690378]  ? i915_driver_load+0x9a9/0x1460 [i915]
[  484.690382]  ? acpi_dev_found+0x5f/0x70
[  484.690385]  ? local_pci_probe+0x3f/0x90
[  484.690388]  ? pci_device_probe+0x10c/0x160
[  484.690394]  ? driver_probe_device+0x2b2/0x450
[  484.690399]  ? __driver_attach+0xda/0xe0
[  484.690403]  ? driver_probe_device+0x450/0x450
[  484.690407]  ? bus_for_each_dev+0x67/0xb0
[  484.690411]  ? bus_add_driver+0x16a/0x260
[  484.690416]  ? driver_register+0x57/0xc0
[  484.690420]  ? 0xffffffffc190f000
[  484.690424]  ? do_one_initcall+0x4e/0x1a0
[  484.690428]  ? __vunmap+0x6d/0xc0
[  484.690431]  ? do_init_module+0x5b/0x1f6
[  484.690435]  ? load_module+0x2486/0x2980
[  484.690438]  ? __symbol_put+0x60/0x60
[  484.690444]  ? SYSC_finit_module+0xc6/0xf0
[  484.690448]  ? do_syscall_64+0x57/0xc0
[  484.690455]  ? entry_SYSCALL64_slow_path+0x25/0x25
[  484.690461] INFO: task systemd-udevd:403 blocked for more than 120 seconds.
[  484.690465]       Tainted: P           OE   4.10.13-kernel-name #2
[  484.690468] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  484.690470] systemd-udevd   D    0   403    353 0x00000104
[  484.690474] Call Trace:
[  484.690479]  ? __schedule+0x233/0x6f0
[  484.690483]  ? kernfs_next_descendant_po...


As I remember I ran:
Code: Select all
fsck -fv /dev/md0

while booting from DVD and picking "rescue mode".

I don't know how to handle this. Temporary I'm shutting down my machine with Alt+SysRq+B and then when UEFI loads I push the Power button. Please help...
Thank you for your time.
Last edited by GeckoS on 2017-05-24 21:35, edited 3 times in total.
GeckoS
 
Posts: 10
Joined: 2017-04-20 15:50
Location: Poland

Re: Restarts & Shutdowns problem

Postby dasein » 2017-05-14 23:31

GeckoS wrote:I'm shutting down my machine with Alt+SysRq+B...

Using only 1/6th of the REISUB sequence is pointless. It does not provide a "clean" shutdown.
User avatar
dasein
 
Posts: 7775
Joined: 2011-03-04 01:06
Location: Terra Incantationum

Re: Restarts & Shutdowns problem

Postby pylkko » 2017-05-15 09:33

Please describe in detail what you did when you "solved the GPU error"? Is it hanging for a specific amount of cycles or does it appear to do so indefinitely? HAve you tried zeroing the hang time as suggested by the log? Does it shutdown then or just loop forever?
User avatar
pylkko
 
Posts: 1045
Joined: 2014-11-06 19:02

Re: Restarts & Shutdowns problem

Postby GeckoS » 2017-05-15 21:44

Thanks for replies, I couldn't get here earlier.
About REISUB - you're right, I did not read information about it thoroughly. However I just tried to do REISUO which was suppose to do clean shutdown but it didn't - my machine was still working.

About my GPU error - I have to GPUs in my setup - GTX1050Ti for Linux and GTX1070 for Windows VM. I don't remember now what I changed but once when I ran my VM - it responded slowly to my instructions. After 1-2 minutes my general logs were full of this information:
Code: Select all
May 10 16:04:47 sphinx kernel: [  199.173005] vfio-pci 0000:01:00.0: BAR 1: can't reserve [mem 0xc0000000-0xcfffffff 64bit pref]

which was repeated and repeated.
I found solution for this error here:https://www.redhat.com/archives/vfio-users/2016-March/msg00088.html
which is:
Code: Select all
echo 0 > /sys/class/vtconsole/vtcon0/bind
echo 0 > /sys/class/vtconsole/vtcon1/bind
echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/unbind


I created rc.local (which I didn't find in /etc - that surprised me a little) and put inside the script that executes these commands:
Code: Select all
root@sphinx:/home/w# cat /etc/rc.local
#!/bin/sh
#
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don't
# want to do the full Sys V style init stuff.

bash /home/w/Pulpit/KVM/scripts/framebuffer-unbind.sh
bash /home/w/Pulpit/KVM/scripts/ksm.sh

exit 0


Code: Select all
root@sphinx:/home/w/Pulpit/KVM/scripts# cat framebuffer-unbind.sh
echo 0 > /sys/class/vtconsole/vtcon0/bind
echo 0 > /sys/class/vtconsole/vtcon1/bind
echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/unbind
virsh net-start default


Code: Select all
root@sphinx:/home/w/Pulpit/KVM/scripts# cat ksm.sh
echo 1 > /sys/kernel/mm/ksm/run


Info about second script I found here: https://dnaeon.github.io/enable-ksm-during-boot-time-on-linux/

The main error appears constantly.

EDIT:
I think I zeroed the hang time as the log suggested but I did it again and now there is no errors in dmesg (for 10 minutes at least) and now I can restart or shutdown my machine NORMALLY :).
What I want to add is - what I also did about two days ago was removing nvidia-drivers, xorg and virtualization tools. When there was no X - I could shutdown or restart my machine normally with "shutdown now" or "reboot". When I installed those things again - problem came back.
Problem solved. Thank you Pylkko for suggesting me zeroing the hang time.
Last edited by GeckoS on 2017-05-17 20:15, edited 1 time in total.
GeckoS
 
Posts: 10
Joined: 2017-04-20 15:50
Location: Poland

Re: Restarts & Shutdowns problem

Postby GeckoS » 2017-05-17 20:12

Unfortunately my problem came back. Now even this:
Code: Select all
echo 0 > /proc/sys/kernel/hung_task_timeout_secs

doesn't help.
GeckoS
 
Posts: 10
Joined: 2017-04-20 15:50
Location: Poland

Re: Restarts & Shutdowns problem

Postby GeckoS » 2017-05-24 21:35

I found workaround that is good for me.
So... Except this error:
Code: Select all
[  243.032211] INFO: task systemd-udevd:403 blocked for more than 120 seconds.

I found in logs another that wasn't showing very often:
Code: Select all
[ 1088.613369] INFO: task modprobe:429 blocked for more than 120 seconds.

So with below command I found this:
Code: Select all
root@sphinx:/home/w# ps ax | grep 429
429 ?        D      0:00 /sbin/modprobe -q -- i915

Also when I tried to
Code: Select all
modprobe i915

the process was hunging.

As far as I'm not using iGPU I added it to pci_stub module in GRUB so it can take that over before i915 module.
Code: Select all
root@sphinx:/home/w# cat /etc/default/grub | grep pci_stub
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on pcie_acs_override=downstream pci_stub.ids=10de:1b81,10de:10f0,8086:5912 CONFIG_KSM"

Code: Select all
update-grub


I also had to add pci_stub to /etc/initramfs-tools/modules and execute: update-initramfs -u .

Code: Select all
root@sphinx:/home/w# lspci -nnk
00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:5912] (rev 04)
   Subsystem: ASUSTeK Computer Inc. Device [1043:872f]
   Kernel driver in use: pci-stub
   Kernel modules: i915


Now my machine restarts/shutdowns normally.
GeckoS
 
Posts: 10
Joined: 2017-04-20 15:50
Location: Poland


Return to Beginners Questions

Who is online

Users browsing this forum: No registered users and 2 guests

fashionable