Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230

 

 

 

Restarts & Shutdowns problem [SOLVED]

New to Debian (Or Linux in general)? Ask your questions here!
Post Reply
Message
Author
GeckoS
Posts: 11
Joined: 2017-04-20 15:50

Restarts & Shutdowns problem [SOLVED]

#1 Post by GeckoS »

Hi!

I've got problem with my machine as when I want to shutdown or restart it - it closes X, the screen goes black but machine is still running. Problem started when my root partition became full because of problems with GPU pass-through to virtual machine running on libvirt - logs filled the root partition. I solved the GPU error, cleaned logs in /var/log with this command:

Code: Select all

rm -v /var/log/syslo* && rm -v /var/log/kern.lo*
but the restarts/shutdowns problem showed up. Below are some details of my setup.
There are three main software-RAID0 partitions:

Code: Select all

w@sphinx:~$ lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
nvme1n1     259:0    0   477G  0 disk  
├─nvme1n1p1 259:2    0   571M  0 part  
├─nvme1n1p2 259:3    0    14G  0 part  
│ └─md0       9:0    0  27,9G  0 raid0 /
├─nvme1n1p3 259:4    0   2,8G  0 part  
│ └─md1       9:1    0   5,6G  0 raid0 [SWAP]
└─nvme1n1p4 259:5    0   413G  0 part  
  └─md2       9:2    0 825,9G  0 raid0 /home
nvme0n1     259:1    0   477G  0 disk  
├─nvme0n1p1 259:6    0   571M  0 part  /boot/efi
├─nvme0n1p2 259:7    0    14G  0 part  
│ └─md0       9:0    0  27,9G  0 raid0 /
├─nvme0n1p3 259:8    0   2,8G  0 part  
│ └─md1       9:1    0   5,6G  0 raid0 [SWAP]
└─nvme0n1p4 259:9    0   413G  0 part  
  └─md2       9:2    0 825,9G  0 raid0 /home
System is Stretch Testing with updated kernel:

Code: Select all

w@sphinx:~$ uname -a
Linux sphinx 4.10.13-kernel-name #2 SMP Sun Apr 30 03:35:57 CEST 2017 x86_64 GNU/Linux
Below is part of dmesg where at the end there are repetitive errors:

Code: Select all

[    2.935470] AVX2 version of gcm_enc/dec engaged.
[    2.935470] AES CTR mode by8 optimization enabled
[    2.937031] Error: Driver 'pcspkr' is already registered, aborting...
[    2.937624] i915 0000:00:02.0: enabling device (0004 -> 0007)
[    2.938123] [drm] Memory usable by graphics device = 4096M
[    2.938123] [drm] VT-d active for gfx access
[    2.938132] checking generic (c0000000 300000) vs hw (90000000 10000000)
[    2.938139] [drm] Replacing VGA console driver
[    2.948569] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    2.948569] [drm] Driver supports precise vblank timestamp query.
[    2.964246] i915 0000:00:02.0: Direct firmware load for i915/kbl_dmc_ver1_01.bin failed with error -2
[    2.964248] i915 0000:00:02.0: Failed to load DMC firmware [https://01.org/linuxgraphics/intel-linux-graphics-firmwares], disabling runtime power management.
[    2.964777] Adding 5855228k swap on /dev/md1.  Priority:-1 extents:1 across:5855228k SSFS
[    3.004402] pstore: using zlib compression
[    3.005815] alg: No test for pcbc(aes) (pcbc-aes-aesni)
[    3.005836] asus_wmi: ASUS WMI generic driver loaded
[    3.007227] asus_wmi: Initialization: 0x0
[    3.007245] asus_wmi: BIOS WMI version: 0.9
[    3.007271] asus_wmi: SFUN value: 0x0
[    3.008428] input: Eee PC WMI hotkeys as /devices/platform/eeepc-wmi/input/input12
[    3.008711] asus_wmi: Number of fans: 1
[    3.017106] pstore: Registered efi as persistent store backend
[    3.019128] intel_rapl: Found RAPL domain package
[    3.019128] intel_rapl: Found RAPL domain core
[    3.019129] intel_rapl: Found RAPL domain uncore
[    3.019129] intel_rapl: Found RAPL domain dram
[    3.133838] EXT4-fs (md2): mounted filesystem with ordered data mode. Opts: (null)
[    3.263777] random: crng init done
[    3.833620] ip6_tables: (C) 2000-2006 Netfilter Core Team
[    3.840111] Ebtables v2.0 registered
[    3.951154] IPv6: ADDRCONF(NETDEV_UP): enp0s31f6: link is not ready
[    4.033817] nvidia-modeset: Allocated GPU:0 (GPU-acf0f085-c568-9442-ad62-ecb2429dec4a) @ PCI:0000:02:00.0
[    4.168303] IPv6: ADDRCONF(NETDEV_UP): enp0s31f6: link is not ready
[    5.727961] e1000e: enp0s31f6 NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx
[    5.728021] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s31f6: link becomes ready
[   11.038694] Console: switching to colour dummy device 80x25
[   11.048130] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[   11.049024] tun: Universal TUN/TAP device driver, 1.6
[   11.049024] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
[   11.140235] virbr0: port 1(virbr0-nic) entered blocking state
[   11.140236] virbr0: port 1(virbr0-nic) entered disabled state
[   11.140270] device virbr0-nic entered promiscuous mode
[   11.266457] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
[   11.275597] virbr0: port 1(virbr0-nic) entered blocking state
[   11.275598] virbr0: port 1(virbr0-nic) entered listening state
[   11.309861] virbr0: port 1(virbr0-nic) entered disabled state
[  243.031718] INFO: task systemd-udevd:391 blocked for more than 120 seconds.
[  243.031725]       Tainted: P           OE   4.10.13-kernel-name #2
[  243.031728] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.031731] systemd-udevd   D    0   391    353 0x00000104
[  243.031737] Call Trace:
[  243.031747]  ? __schedule+0x233/0x6f0
[  243.031754]  ? schedule+0x32/0x80
[  243.031759]  ? vga_get+0x110/0x1b0
[  243.031765]  ? wake_up_q+0x70/0x70
[  243.031850]  ? i915_disable_vga+0x45/0xa0 [i915]
[  243.031927]  ? intel_modeset_init+0x4ca/0x1780 [i915]
[  243.032002]  ? intel_i2c_reset+0x3e/0x40 [i915]
[  243.032069]  ? intel_setup_gmbus+0xf1/0x3b0 [i915]
[  243.032129]  ? i915_driver_load+0x9a9/0x1460 [i915]
[  243.032133]  ? acpi_dev_found+0x5f/0x70
[  243.032136]  ? local_pci_probe+0x3f/0x90
[  243.032139]  ? pci_device_probe+0x10c/0x160
[  243.032145]  ? driver_probe_device+0x2b2/0x450
[  243.032149]  ? __driver_attach+0xda/0xe0
[  243.032153]  ? driver_probe_device+0x450/0x450
[  243.032157]  ? bus_for_each_dev+0x67/0xb0
[  243.032162]  ? bus_add_driver+0x16a/0x260
[  243.032166]  ? driver_register+0x57/0xc0
[  243.032169]  ? 0xffffffffc190f000
[  243.032174]  ? do_one_initcall+0x4e/0x1a0
[  243.032177]  ? __vunmap+0x6d/0xc0
[  243.032181]  ? do_init_module+0x5b/0x1f6
[  243.032186]  ? load_module+0x2486/0x2980
[  243.032188]  ? __symbol_put+0x60/0x60
[  243.032194]  ? SYSC_finit_module+0xc6/0xf0
[  243.032199]  ? do_syscall_64+0x57/0xc0
[  243.032204]  ? entry_SYSCALL64_slow_path+0x25/0x25
[  243.032211] INFO: task systemd-udevd:403 blocked for more than 120 seconds.
[  243.032215]       Tainted: P           OE   4.10.13-kernel-name #2
[  243.032218] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.032220] systemd-udevd   D    0   403    353 0x00000104
[  243.032224] Call Trace:
[  243.032229]  ? __schedule+0x233/0x6f0
[  243.032233]  ? kernfs_next_descendant_post+0x73/0x90
[  243.032236]  ? kernfs_link_sibling+0x80/0xb0
[  243.032240]  ? schedule+0x32/0x80
[  243.032244]  ? schedule_preempt_disabled+0xa/0x10
[  243.032248]  ? __mutex_lock_slowpath+0x1af/0x2c0
[  243.032253]  ? driver_probe_device+0x450/0x450
[  243.032258]  ? __driver_attach+0x56/0xe0
[  243.032263]  ? driver_probe_device+0x450/0x450
[  243.032266]  ? bus_for_each_dev+0x67/0xb0
[  243.032271]  ? bus_add_driver+0x16a/0x260
[  243.032275]  ? driver_register+0x57/0xc0
[  243.032286]  ? trace_event_define_fields_hda_pm+0x28/0x28 [snd_hda_intel]
[  243.032289]  ? do_one_initcall+0x4e/0x1a0
[  243.032293]  ? __vunmap+0x6d/0xc0
[  243.032296]  ? do_init_module+0x5b/0x1f6
[  243.032300]  ? load_module+0x2486/0x2980
[  243.032303]  ? __symbol_put+0x60/0x60
[  243.032308]  ? SYSC_finit_module+0xc6/0xf0
[  243.032312]  ? do_syscall_64+0x57/0xc0
[  243.032318]  ? entry_SYSCALL64_slow_path+0x25/0x25
[  363.860774] INFO: task systemd-udevd:391 blocked for more than 120 seconds.
[  363.860782]       Tainted: P           OE   4.10.13-kernel-name #2
[  363.860785] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  363.860788] systemd-udevd   D    0   391    353 0x00000104
[  363.860793] Call Trace:
[  363.860803]  ? __schedule+0x233/0x6f0
[  363.860810]  ? schedule+0x32/0x80
[  363.860816]  ? vga_get+0x110/0x1b0
[  363.860821]  ? wake_up_q+0x70/0x70
[  363.860907]  ? i915_disable_vga+0x45/0xa0 [i915]
[  363.860987]  ? intel_modeset_init+0x4ca/0x1780 [i915]
[  363.861067]  ? intel_i2c_reset+0x3e/0x40 [i915]
[  363.861133]  ? intel_setup_gmbus+0xf1/0x3b0 [i915]
[  363.861193]  ? i915_driver_load+0x9a9/0x1460 [i915]
[  363.861197]  ? acpi_dev_found+0x5f/0x70
[  363.861200]  ? local_pci_probe+0x3f/0x90
[  363.861203]  ? pci_device_probe+0x10c/0x160
[  363.861208]  ? driver_probe_device+0x2b2/0x450
[  363.861213]  ? __driver_attach+0xda/0xe0
[  363.861218]  ? driver_probe_device+0x450/0x450
[  363.861221]  ? bus_for_each_dev+0x67/0xb0
[  363.861226]  ? bus_add_driver+0x16a/0x260
[  363.861230]  ? driver_register+0x57/0xc0
[  363.861234]  ? 0xffffffffc190f000
[  363.861238]  ? do_one_initcall+0x4e/0x1a0
[  363.861242]  ? __vunmap+0x6d/0xc0
[  363.861246]  ? do_init_module+0x5b/0x1f6
[  363.861250]  ? load_module+0x2486/0x2980
[  363.861253]  ? __symbol_put+0x60/0x60
[  363.861259]  ? SYSC_finit_module+0xc6/0xf0
[  363.861263]  ? do_syscall_64+0x57/0xc0
[  363.861269]  ? entry_SYSCALL64_slow_path+0x25/0x25
[  363.861276] INFO: task systemd-udevd:403 blocked for more than 120 seconds.
[  363.861280]       Tainted: P           OE   4.10.13-kernel-name #2
[  363.861282] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  363.861285] systemd-udevd   D    0   403    353 0x00000104
[  363.861289] Call Trace:
[  363.861294]  ? __schedule+0x233/0x6f0
[  363.861298]  ? kernfs_next_descendant_post+0x73/0x90
[  363.861301]  ? kernfs_link_sibling+0x80/0xb0
[  363.861305]  ? schedule+0x32/0x80
[  363.861309]  ? schedule_preempt_disabled+0xa/0x10
[  363.861314]  ? __mutex_lock_slowpath+0x1af/0x2c0
[  363.861318]  ? driver_probe_device+0x450/0x450
[  363.861323]  ? __driver_attach+0x56/0xe0
[  363.861328]  ? driver_probe_device+0x450/0x450
[  363.861331]  ? bus_for_each_dev+0x67/0xb0
[  363.861335]  ? bus_add_driver+0x16a/0x260
[  363.861340]  ? driver_register+0x57/0xc0
[  363.861350]  ? trace_event_define_fields_hda_pm+0x28/0x28 [snd_hda_intel]
[  363.861354]  ? do_one_initcall+0x4e/0x1a0
[  363.861357]  ? __vunmap+0x6d/0xc0
[  363.861360]  ? do_init_module+0x5b/0x1f6
[  363.861364]  ? load_module+0x2486/0x2980
[  363.861367]  ? __symbol_put+0x60/0x60
[  363.861372]  ? SYSC_finit_module+0xc6/0xf0
[  363.861377]  ? do_syscall_64+0x57/0xc0
[  363.861383]  ? entry_SYSCALL64_slow_path+0x25/0x25
[  484.689943] INFO: task systemd-udevd:391 blocked for more than 120 seconds.
[  484.689950]       Tainted: P           OE   4.10.13-kernel-name #2
[  484.689953] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  484.689957] systemd-udevd   D    0   391    353 0x00000104
[  484.689962] Call Trace:
[  484.689973]  ? __schedule+0x233/0x6f0
[  484.689980]  ? schedule+0x32/0x80
[  484.689985]  ? vga_get+0x110/0x1b0
[  484.689992]  ? wake_up_q+0x70/0x70
[  484.690087]  ? i915_disable_vga+0x45/0xa0 [i915]
[  484.690167]  ? intel_modeset_init+0x4ca/0x1780 [i915]
[  484.690246]  ? intel_i2c_reset+0x3e/0x40 [i915]
[  484.690315]  ? intel_setup_gmbus+0xf1/0x3b0 [i915]
[  484.690378]  ? i915_driver_load+0x9a9/0x1460 [i915]
[  484.690382]  ? acpi_dev_found+0x5f/0x70
[  484.690385]  ? local_pci_probe+0x3f/0x90
[  484.690388]  ? pci_device_probe+0x10c/0x160
[  484.690394]  ? driver_probe_device+0x2b2/0x450
[  484.690399]  ? __driver_attach+0xda/0xe0
[  484.690403]  ? driver_probe_device+0x450/0x450
[  484.690407]  ? bus_for_each_dev+0x67/0xb0
[  484.690411]  ? bus_add_driver+0x16a/0x260
[  484.690416]  ? driver_register+0x57/0xc0
[  484.690420]  ? 0xffffffffc190f000
[  484.690424]  ? do_one_initcall+0x4e/0x1a0
[  484.690428]  ? __vunmap+0x6d/0xc0
[  484.690431]  ? do_init_module+0x5b/0x1f6
[  484.690435]  ? load_module+0x2486/0x2980
[  484.690438]  ? __symbol_put+0x60/0x60
[  484.690444]  ? SYSC_finit_module+0xc6/0xf0
[  484.690448]  ? do_syscall_64+0x57/0xc0
[  484.690455]  ? entry_SYSCALL64_slow_path+0x25/0x25
[  484.690461] INFO: task systemd-udevd:403 blocked for more than 120 seconds.
[  484.690465]       Tainted: P           OE   4.10.13-kernel-name #2
[  484.690468] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  484.690470] systemd-udevd   D    0   403    353 0x00000104
[  484.690474] Call Trace:
[  484.690479]  ? __schedule+0x233/0x6f0
[  484.690483]  ? kernfs_next_descendant_po...
As I remember I ran:

Code: Select all

fsck -fv /dev/md0
while booting from DVD and picking "rescue mode".

I don't know how to handle this. Temporary I'm shutting down my machine with Alt+SysRq+B and then when UEFI loads I push the Power button. Please help...
Thank you for your time.
Last edited by GeckoS on 2017-05-24 21:35, edited 3 times in total.

User avatar
dasein
Posts: 7680
Joined: 2011-03-04 01:06
Location: Terra Incantationum

Re: Restarts & Shutdowns problem

#2 Post by dasein »

GeckoS wrote:I'm shutting down my machine with Alt+SysRq+B...
Using only 1/6th of the REISUB sequence is pointless. It does not provide a "clean" shutdown.

User avatar
pylkko
Posts: 1802
Joined: 2014-11-06 19:02

Re: Restarts & Shutdowns problem

#3 Post by pylkko »

Please describe in detail what you did when you "solved the GPU error"? Is it hanging for a specific amount of cycles or does it appear to do so indefinitely? HAve you tried zeroing the hang time as suggested by the log? Does it shutdown then or just loop forever?

GeckoS
Posts: 11
Joined: 2017-04-20 15:50

Re: Restarts & Shutdowns problem

#4 Post by GeckoS »

Thanks for replies, I couldn't get here earlier.
About REISUB - you're right, I did not read information about it thoroughly. However I just tried to do REISUO which was suppose to do clean shutdown but it didn't - my machine was still working.

About my GPU error - I have to GPUs in my setup - GTX1050Ti for Linux and GTX1070 for Windows VM. I don't remember now what I changed but once when I ran my VM - it responded slowly to my instructions. After 1-2 minutes my general logs were full of this information:

Code: Select all

May 10 16:04:47 sphinx kernel: [  199.173005] vfio-pci 0000:01:00.0: BAR 1: can't reserve [mem 0xc0000000-0xcfffffff 64bit pref]
which was repeated and repeated.
I found solution for this error here:https://www.redhat.com/archives/vfio-us ... 00088.html
which is:

Code: Select all

echo 0 > /sys/class/vtconsole/vtcon0/bind
echo 0 > /sys/class/vtconsole/vtcon1/bind
echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/unbind
I created rc.local (which I didn't find in /etc - that surprised me a little) and put inside the script that executes these commands:

Code: Select all

root@sphinx:/home/w# cat /etc/rc.local 
#!/bin/sh
#
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don't
# want to do the full Sys V style init stuff.

bash /home/w/Pulpit/KVM/scripts/framebuffer-unbind.sh
bash /home/w/Pulpit/KVM/scripts/ksm.sh

exit 0

Code: Select all

root@sphinx:/home/w/Pulpit/KVM/scripts# cat framebuffer-unbind.sh 
echo 0 > /sys/class/vtconsole/vtcon0/bind
echo 0 > /sys/class/vtconsole/vtcon1/bind
echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/unbind
virsh net-start default

Code: Select all

root@sphinx:/home/w/Pulpit/KVM/scripts# cat ksm.sh 
echo 1 > /sys/kernel/mm/ksm/run
Info about second script I found here: https://dnaeon.github.io/enable-ksm-dur ... -on-linux/

The main error appears constantly.

EDIT:
I think I zeroed the hang time as the log suggested but I did it again and now there is no errors in dmesg (for 10 minutes at least) and now I can restart or shutdown my machine NORMALLY :).
What I want to add is - what I also did about two days ago was removing nvidia-drivers, xorg and virtualization tools. When there was no X - I could shutdown or restart my machine normally with "shutdown now" or "reboot". When I installed those things again - problem came back.
Problem solved. Thank you Pylkko for suggesting me zeroing the hang time.
Last edited by GeckoS on 2017-05-17 20:15, edited 1 time in total.

GeckoS
Posts: 11
Joined: 2017-04-20 15:50

Re: Restarts & Shutdowns problem

#5 Post by GeckoS »

Unfortunately my problem came back. Now even this:

Code: Select all

echo 0 > /proc/sys/kernel/hung_task_timeout_secs
doesn't help.

GeckoS
Posts: 11
Joined: 2017-04-20 15:50

Re: Restarts & Shutdowns problem

#6 Post by GeckoS »

I found workaround that is good for me.
So... Except this error:

Code: Select all

[  243.032211] INFO: task systemd-udevd:403 blocked for more than 120 seconds.
I found in logs another that wasn't showing very often:

Code: Select all

[ 1088.613369] INFO: task modprobe:429 blocked for more than 120 seconds.
So with below command I found this:

Code: Select all

root@sphinx:/home/w# ps ax | grep 429
429 ?        D      0:00 /sbin/modprobe -q -- i915
Also when I tried to

Code: Select all

modprobe i915
the process was hunging.

As far as I'm not using iGPU I added it to pci_stub module in GRUB so it can take that over before i915 module.

Code: Select all

root@sphinx:/home/w# cat /etc/default/grub | grep pci_stub
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on pcie_acs_override=downstream pci_stub.ids=10de:1b81,10de:10f0,8086:5912 CONFIG_KSM"

Code: Select all

update-grub
I also had to add pci_stub to /etc/initramfs-tools/modules and execute: update-initramfs -u .

Code: Select all

root@sphinx:/home/w# lspci -nnk
00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:5912] (rev 04)
	Subsystem: ASUSTeK Computer Inc. Device [1043:872f]
	Kernel driver in use: pci-stub
	Kernel modules: i915
Now my machine restarts/shutdowns normally.

Post Reply