Boot failure with kernels > linux-image-6.4.0-2-amd64

- - ALL UNSTABLE / TESTING THREADS SHOULD BE POSTED HERE - -
This sub-forum is the dedicated area for the ongoing Unstable/Testing releases of Debian. Advanced, or Experienced User support only. Use the software, give, and take advice with caution.
Post Reply
Message
Author
cloudstrife9999
Posts: 9
Joined: 2015-03-11 14:17

Boot failure with kernels > linux-image-6.4.0-2-amd64

#1 Post by cloudstrife9999 »

Hello,

I'm experiencing an error (see attached image) when trying to boot my Debian OS with any kernel more recent than linux-image-6.4.0-2-amd64.

For reference, my partitions are as follows:

Disk model: PC401 NVMe SK hynix 512GB
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 936619CA-4D20-4BD2-A255-4B742A86985A

Device Start End Sectors Size Type
/dev/nvme0n1p1 2048 1026047 1024000 500M EFI System
/dev/nvme0n1p2 1026048 657817599 656791552 313.2G Linux filesystem
/dev/nvme0n1p3 983447552 1000214527 16766976 8G Linux swap
/dev/nvme0n1p4 657817600 983447551 325629952 155.3G Microsoft basic data


I am still able to boot Debian with linux-image-6.4.0-2-amd64.
Is there anything I can to to fix the issue with newer kernels?
Attachments
Picture of the boot error
Picture of the boot error

Aki
Global Moderator
Global Moderator
Posts: 1876
Joined: 2014-07-20 18:12
Location: Europe
Has thanked: 38 times
Been thanked: 248 times

Re: Boot failure with kernels > linux-image-6.4.0-2-amd64

#2 Post by Aki »

Hello,
What’s your Debian version ?
What is exactly the involved kernel version (debian package) ?
Can you collect the full dmesg log using busybox and attach it to a follow up message ?
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
⠈⠳⣄⠀

cloudstrife9999
Posts: 9
Joined: 2015-03-11 14:17

Re: Boot failure with kernels > linux-image-6.4.0-2-amd64

#3 Post by cloudstrife9999 »

Aki wrote: 2023-09-16 14:28 Hello,
What’s your Debian version ?
What is exactly the involved kernel version (debian package) ?
Can you collect the full dmesg log using busybox and attach it to a follow up message ?
Hi,

I am using Debian Sid.

All kernel versions after linux-image-6.4.0-2-amd64 seem to cause the bug. At the very least:
- linux-image-6.4.0-3-amd64
- linux-image-6.4.0-4-amd64
- linux-image-6.5.0-1-amd64

A dmesg dump can be found here: https://pastebin.com/bKnhMsYR

Aki
Global Moderator
Global Moderator
Posts: 1876
Joined: 2014-07-20 18:12
Location: Europe
Has thanked: 38 times
Been thanked: 248 times

Re: Boot failure with kernels > linux-image-6.4.0-2-amd64

#4 Post by Aki »

Hello,
cloudstrife9999 wrote: 2023-09-16 15:41 Hi,

I am using Debian Sid.

All kernel versions after linux-image-6.4.0-2-amd64 seem to cause the bug. At the very least:
- linux-image-6.4.0-3-amd64
- linux-image-6.4.0-4-amd64
- linux-image-6.5.0-1-amd64

A dmesg dump can be found here: https://pastebin.com/bKnhMsYR
The dump is from 6.4.0-2-amd64.

Can you send the the output of the dmesg command from the busybox with the full kernel log of the error ?
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
⠈⠳⣄⠀

cloudstrife9999
Posts: 9
Joined: 2015-03-11 14:17

Re: Boot failure with kernels > linux-image-6.4.0-2-amd64

#5 Post by cloudstrife9999 »

Aki wrote: 2023-09-16 15:47 Hello,
cloudstrife9999 wrote: 2023-09-16 15:41 Hi,

I am using Debian Sid.

All kernel versions after linux-image-6.4.0-2-amd64 seem to cause the bug. At the very least:
- linux-image-6.4.0-3-amd64
- linux-image-6.4.0-4-amd64
- linux-image-6.5.0-1-amd64

A dmesg dump can be found here: https://pastebin.com/bKnhMsYR
The dump is from 6.4.0-2-amd64.

Can you send the the output of the dmesg command from the busybox with the full kernel log of the error ?
I am unable to store the dmesg output from the busybox to a file that survives rebooting the OS.

I can upload a couple of pictures I took.

The first one is a picture of the tail of the output.
The second one is a picture showing the boot arguments among other things.
Attachments
dmesg_tail.jpg
dmesg_kernel.jpg

Aki
Global Moderator
Global Moderator
Posts: 1876
Joined: 2014-07-20 18:12
Location: Europe
Has thanked: 38 times
Been thanked: 248 times

Re: Boot failure with kernels > linux-image-6.4.0-2-amd64

#6 Post by Aki »

cloudstrife9999 wrote: 2023-09-16 11:35 [..]
For reference, my partitions are as follows:

Code: Select all

Disk model: PC401 NVMe SK hynix 512GB               
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 936619CA-4D20-4BD2-A255-4B742A86985A

Device             Start        End   Sectors   Size Type
/dev/nvme0n1p1      2048    1026047   1024000   500M EFI System
/dev/nvme0n1p2   1026048  657817599 656791552 313.2G Linux filesystem
/dev/nvme0n1p3 983447552 1000214527  16766976     8G Linux swap
/dev/nvme0n1p4 657817600  983447551 325629952 155.3G Microsoft basic data
What is the file system in /dev/nvme0n1p2 ?
cloudstrife9999 wrote: 2023-09-16 15:41 I am unable to store the dmesg output from the busybox to a file that survives rebooting the OS.
You should be able to using an usb disk with ext2-3-4 formatted patition, indeed. From the busybox console (where sdXn is the name of the partition of a USB device found with the command "cat /proc/partitions" or the command "blkid" ):

Code: Select all

mkdir /mnt 
mount /dev/sdXn /mnt
dmesg > dmesg_dump
umount /mnt
From the "tail photo" you sent, there is an error for the nvme controller:
nvme_controller_error.png
The image says:

Code: Select all

nmve nvme0: I/O 12 QID 0 timeout, disable controller
nmve nvme0: Identify Controller failed (-4)
nmve: probe of 0000:04:00.0 failed with error -5
It seems that newer kernel versions fail to identify the nvme controller.

The second error message is generated in this kernel function: Some questions (the controller should be embedded in the nvme disk):
  • what is your nvme disk (manufacturer, model) ?
  • can you send the output of the following command:

    Code: Select all

    inxi -Fxxxz
    
Please, use code tags to include commands and logs in the follow-up messages.
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
⠈⠳⣄⠀

cloudstrife9999
Posts: 9
Joined: 2015-03-11 14:17

Re: Boot failure with kernels > linux-image-6.4.0-2-amd64

#7 Post by cloudstrife9999 »

Aki wrote: 2023-09-16 18:59
cloudstrife9999 wrote: 2023-09-16 11:35 [..]
For reference, my partitions are as follows:

Code: Select all

Disk model: PC401 NVMe SK hynix 512GB               
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 936619CA-4D20-4BD2-A255-4B742A86985A

Device             Start        End   Sectors   Size Type
/dev/nvme0n1p1      2048    1026047   1024000   500M EFI System
/dev/nvme0n1p2   1026048  657817599 656791552 313.2G Linux filesystem
/dev/nvme0n1p3 983447552 1000214527  16766976     8G Linux swap
/dev/nvme0n1p4 657817600  983447551 325629952 155.3G Microsoft basic data
What is the file system in /dev/nvme0n1p2 ?
cloudstrife9999 wrote: 2023-09-16 15:41 I am unable to store the dmesg output from the busybox to a file that survives rebooting the OS.
You should be able to using an usb disk with ext2-3-4 formatted patition, indeed. From the busybox console (where sdXn is the name of the partition of a USB device found with the command "cat /proc/partitions" or the command "blkid" ):

Code: Select all

mkdir /mnt 
mount /dev/sdXn /mnt
dmesg > dmesg_dump
umount /mnt
From the "tail photo" you sent, there is an error for the nvme controller:
nvme_controller_error.png
The image says:

Code: Select all

nmve nvme0: I/O 12 QID 0 timeout, disable controller
nmve nvme0: Identify Controller failed (-4)
nmve: probe of 0000:04:00.0 failed with error -5
It seems that newer kernel versions fail to identify the nvme controller.

The second error message is generated in this kernel function: Some questions (the controller should be embedded in the nvme disk):
  • what is your nvme disk (manufacturer, model) ?
  • can you send the output of the following command:

    Code: Select all

    inxi -Fxxxz
    
Please, use code tags to include commands and logs in the follow-up messages.
- The filesystem in /dev/nvme0n1p2 is ext4.

- The output from dmesg from the busybox can be found here (it is too large to post here, even with code tags) : https://pastebin.com/T9Fxhvda

- The output from

Code: Select all

inxi -Fxxxz
(which should contains the answer to the manufacturer/model questions) is as follows:

Code: Select all

System:
  Kernel: 6.4.0-2-amd64 arch: x86_64 bits: 64 compiler: gcc v: 13.2.0
    clocksource: tsc Desktop: KDE Plasma v: 5.27.8 tk: Qt v: 5.15.10
    info: cairo-dock wm: kwin_x11 vt: 2 dm: SDDM Distro: Debian GNU/Linux
    trixie/sid
Machine:
  Type: Laptop System: Dell product: XPS 15 9560 v: N/A
    serial: <superuser required> Chassis: type: 10 serial: <superuser required>
  Mobo: Dell model: 05FFDN v: A00 serial: <superuser required> UEFI: Dell
    v: 1.24.0 date: 08/10/2021
Battery:
  ID-1: BAT0 charge: 61.2 Wh (100.0%) condition: 61.2/97.0 Wh (63.1%)
    volts: 12.5 min: 11.4 model: SMP DELL GPM0365 type: Li-ion serial: <filter>
    status: full
CPU:
  Info: quad core model: Intel Core i7-7700HQ bits: 64 type: MT MCP
    smt: enabled arch: Kaby Lake rev: 9 cache: L1: 256 KiB L2: 1024 KiB
    L3: 6 MiB
  Speed (MHz): avg: 2050 high: 2800 min/max: 800/3800 cores: 1: 2800 2: 800
    3: 805 4: 2800 5: 800 6: 2800 7: 2800 8: 2800 bogomips: 44798
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Graphics:
  Device-1: Intel HD Graphics 630 vendor: Dell driver: i915 v: kernel
    arch: Gen-9.5 ports: active: eDP-1 empty: DP-1, DP-2, HDMI-A-1, HDMI-A-2
    bus-ID: 00:02.0 chip-ID: 8086:591b class-ID: 0300
  Device-2: NVIDIA GP107M [GeForce GTX 1050 Mobile] vendor: Dell
    driver: nouveau v: kernel arch: Pascal pcie: speed: 2.5 GT/s lanes: 16
    bus-ID: 01:00.0 chip-ID: 10de:1c8d class-ID: 0302 temp: 40.0 C
  Device-3: Sunplus Innovation Integrated_Webcam_HD driver: uvcvideo
    type: USB rev: 2.0 speed: 480 Mb/s lanes: 1 bus-ID: 1-12:5
    chip-ID: 1bcf:2b95 class-ID: 0e02
  Display: x11 server: X.Org v: 1.21.1.8 with: Xwayland v: 23.2.0
    compositor: kwin_x11 driver: X: loaded: modesetting unloaded: fbdev,vesa
    dri: iris gpu: i915 display-ID: :0 screens: 1
  Screen-1: 0 s-res: 1920x1080 s-dpi: 96 s-size: 507x285mm (19.96x11.22")
    s-diag: 582mm (22.9")
  Monitor-1: eDP-1 model: Sharp 0x1476 res: 1920x1080 hz: 60 dpi: 141
    size: 346x194mm (13.62x7.64") diag: 397mm (15.6") modes: 3840x2160
  API: OpenGL v: 4.6 Mesa 23.1.7-1 renderer: Mesa Intel HD Graphics 630
    (KBL GT2) direct-render: Yes
Audio:
  Device-1: Intel CM238 HD Audio vendor: Dell driver: snd_hda_intel v: kernel
    bus-ID: 00:1f.3 chip-ID: 8086:a171 class-ID: 0403
  API: ALSA v: k6.4.0-2-amd64 status: kernel-api
  Server-1: JACK v: 1.9.21 status: off
  Server-2: PipeWire v: 0.3.80 status: active with: 1: pipewire-pulse
    status: active 2: wireplumber status: active
  Server-3: PulseAudio v: 16.1 status: off (using pipewire-pulse)
Network:
  Device-1: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter
    vendor: Rivet Networks Killer Wireless-n/a/ac 1535 driver: ath10k_pci
    v: kernel pcie: speed: 2.5 GT/s lanes: 1 bus-ID: 02:00.0
    chip-ID: 168c:003e class-ID: 0280 temp: 61.0 C
  IF: wlp2s0 state: up mac: <filter>
Bluetooth:
  Device-1: Qualcomm Atheros QCA61x4 Bluetooth 4.0 driver: btusb v: 0.8
    type: USB rev: 2.0 speed: 12 Mb/s lanes: 1 bus-ID: 1-4:3 chip-ID: 0cf3:e300
    class-ID: e001
  Report: hciconfig ID: hci0 rfk-id: 0 state: up address: <filter> bt-v: 4.2
    lmp-v: 8 sub-v: 25a hci-v: 8 class-ID: 7c010c
Drives:
  Local Storage: total: 484.42 GiB used: 225.16 GiB (46.5%)
  ID-1: /dev/nvme0n1 vendor: SK Hynix model: PC401 NVMe 512GB
    size: 476.94 GiB speed: 31.6 Gb/s lanes: 4 tech: SSD serial: <filter>
    fw-rev: 80002E00 temp: 29.9 C scheme: GPT
  ID-2: /dev/sda vendor: Transcend model: JetFlash 8GB size: 7.48 GiB
    type: USB rev: 2.0 spd: 480 Mb/s lanes: 1 tech: SSD serial: <filter>
    fw-rev: 8.07 scheme: MBR
Partition:
  ID-1: / size: 307.2 GiB used: 225.08 GiB (73.3%) fs: ext4
    dev: /dev/nvme0n1p2
  ID-2: /boot/efi size: 496 MiB used: 75.1 MiB (15.2%) fs: vfat
    dev: /dev/nvme0n1p1
Swap:
  ID-1: swap-1 type: partition size: 8 GiB used: 0 KiB (0.0%) priority: -2
    dev: /dev/nvme0n1p3
Sensors:
  System Temperatures: cpu: 43.0 C pch: 43.0 C mobo: 39.0 C gpu: nouveau
    temp: 40.0 C
  Fan Speeds (rpm): cpu: 2504 fan-2: 2510
Info:
  Processes: 281 Uptime: 2m wakeups: 7672 Memory: total: 16 GiB
  available: 15.47 GiB used: 2.68 GiB (17.4%) Init: systemd v: 254
  target: graphical (5) default: graphical Compilers: gcc: 13.2.0
  alt: 10/11/12/13/9 Packages: pm: dpkg pkgs: 4444 Shell: Zsh v: 5.9
  running-in: konsole inxi: 3.3.29

User avatar
sunrat
Administrator
Administrator
Posts: 5903
Joined: 2006-08-29 09:12
Location: Melbourne, Australia
Has thanked: 102 times
Been thanked: 353 times

Re: Boot failure with kernels > linux-image-6.4.0-2-amd64

#8 Post by sunrat »

Moved to Testing/Unstable subforum
“ computer users can be divided into 2 categories:
Those who have lost data
...and those who have not lost data YET ”
Remember to BACKUP!

Aki
Global Moderator
Global Moderator
Posts: 1876
Joined: 2014-07-20 18:12
Location: Europe
Has thanked: 38 times
Been thanked: 248 times

Re: Boot failure with kernels > linux-image-6.4.0-2-amd64

#9 Post by Aki »

@cloudstrife9999:
This is your ssd according to the previous message:

Code: Select all

Drives:
  Local Storage: total: 484.42 GiB used: 225.16 GiB (46.5%)
  ID-1: /dev/nvme0n1 vendor: SK Hynix model: PC401 NVMe 512GB
    size: 476.94 GiB speed: 31.6 Gb/s lanes: 4 tech: SSD serial: <filter>
    fw-rev: 80002E00 temp: 29.9 C scheme: GPT
I suppose is a kernel regression for the controller of your nvme device.

A quick search lead me to this link: an particularly to this:
emmanuelrosa September 5, 2023, 6:17pm 17

I had a similar issue a long time ago. I got around it with this:

kernelParams = [
"nvme_core.default_ps_max_latency_us=5500"
];
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
⠈⠳⣄⠀

cloudstrife9999
Posts: 9
Joined: 2015-03-11 14:17

Re: Boot failure with kernels > linux-image-6.4.0-2-amd64

#10 Post by cloudstrife9999 »

@Aki:
That you for your suggestion.

However, after adding

Code: Select all

nvme_core.default_ps_max_latency_us=5500
to the boot parameters in /etc/default/grub, reloading the grub configuration, and confirming (via grub interface) that the edit was successful, I am still presented with the very same issue.

Aki
Global Moderator
Global Moderator
Posts: 1876
Joined: 2014-07-20 18:12
Location: Europe
Has thanked: 38 times
Been thanked: 248 times

Re: Boot failure with kernels > linux-image-6.4.0-2-amd64

#11 Post by Aki »

Can you report the complete output of the following commands for the nvme device (to get PCI IDs - and other ID data - for the nvme device; the nvme-cli package must be installed):

Code: Select all

lspci -vnn
sudo nvme id-ctrl /dev/nvme0
Can you report the output of the following command for kernels linux-image-6.4.0-2-amd64 and linux-image-6.4.0-3-amd64 (to get the upstream version for each kernel):

Code: Select all

uname -a
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
⠈⠳⣄⠀

Post Reply