So the issue is - at a seemingly random manner, once in few hours, the system becomes barely responsive, LA shoots into 50-100 range, wa at 90+%, syslog have following messages:
Code: Select all
Oct 2 12:08:45 nikita kernel: [11480.032693] INFO: task md1_raid10:222 blocked for more than 120 seconds.
Oct 2 12:08:45 nikita kernel: [11480.032698] Tainted: P O 4.18.0-0.bpo.1-amd64 #1 Debian 4.18.6-1~bpo9+1
Oct 2 12:08:45 nikita kernel: [11480.032699] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 2 12:08:45 nikita kernel: [11480.032721] md1_raid10 D 0 222 2 0x80000000
Oct 2 12:08:45 nikita kernel: [11480.032725] Call Trace:
Oct 2 12:08:45 nikita kernel: [11480.032733] ? __schedule+0x3f5/0x880
Oct 2 12:08:45 nikita kernel: [11480.032737] schedule+0x32/0x80
Oct 2 12:08:45 nikita kernel: [11480.032745] md_super_wait+0x6e/0xa0 [md_mod]
Oct 2 12:08:45 nikita kernel: [11480.032751] ? remove_wait_queue+0x60/0x60
Oct 2 12:08:45 nikita kernel: [11480.032757] md_update_sb.part.61+0x4af/0x910 [md_mod]
Oct 2 12:08:45 nikita kernel: [11480.032764] md_check_recovery+0x312/0x530 [md_mod]
Oct 2 12:08:45 nikita kernel: [11480.032769] raid10d+0x64/0x1550 [raid10]
Oct 2 12:08:45 nikita kernel: [11480.032773] ? __schedule+0x3fd/0x880
Oct 2 12:08:45 nikita kernel: [11480.032776] ? schedule+0x32/0x80
Oct 2 12:08:45 nikita kernel: [11480.032779] ? schedule_timeout+0x1e5/0x350
Oct 2 12:08:45 nikita kernel: [11480.032785] ? md_thread+0x125/0x170 [md_mod]
Oct 2 12:08:45 nikita kernel: [11480.032790] md_thread+0x125/0x170 [md_mod]
Oct 2 12:08:45 nikita kernel: [11480.032794] ? remove_wait_queue+0x60/0x60
Oct 2 12:08:45 nikita kernel: [11480.032797] kthread+0xf8/0x130
Oct 2 12:08:45 nikita kernel: [11480.032803] ? md_rdev_init+0xc0/0xc0 [md_mod]
Oct 2 12:08:45 nikita kernel: [11480.032805] ? kthread_create_worker_on_cpu+0x70/0x70
Oct 2 12:08:45 nikita kernel: [11480.032809] ret_from_fork+0x35/0x40
Oct 2 12:08:45 nikita kernel: [11480.032813] INFO: task md2_raid10:225 blocked for more than 120 seconds.
Oct 2 12:08:45 nikita kernel: [11480.032815] Tainted: P O 4.18.0-0.bpo.1-amd64 #1 Debian 4.18.6-1~bpo9+1
Oct 2 12:08:45 nikita kernel: [11480.032816] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 2 12:08:45 nikita kernel: [11480.032818] md2_raid10 D 0 225 2 0x80000000
Oct 2 12:08:45 nikita kernel: [11480.032821] Call Trace:
Oct 2 12:08:45 nikita kernel: [11480.032824] ? __schedule+0x3f5/0x880
Oct 2 12:08:45 nikita kernel: [11480.032827] schedule+0x32/0x80
Oct 2 12:08:45 nikita kernel: [11480.032833] md_super_wait+0x6e/0xa0 [md_mod]
Oct 2 12:08:45 nikita kernel: [11480.032837] ? remove_wait_queue+0x60/0x60
Oct 2 12:08:45 nikita kernel: [11480.032842] write_page+0x177/0x330 [md_mod]
Oct 2 12:08:45 nikita kernel: [11480.032845] ? __schedule+0x3fd/0x880
Oct 2 12:08:45 nikita kernel: [11480.032849] ? __percpu_ref_switch_mode+0xd0/0x170
Oct 2 12:08:45 nikita kernel: [11480.032854] md_update_sb.part.61+0x408/0x910 [md_mod]
Oct 2 12:08:45 nikita kernel: [11480.032860] md_check_recovery+0x312/0x530 [md_mod]
Oct 2 12:08:45 nikita kernel: [11480.032864] raid10d+0x64/0x1550 [raid10]
Oct 2 12:08:45 nikita kernel: [11480.032868] ? lock_timer_base+0x74/0x90
Oct 2 12:08:45 nikita kernel: [11480.032871] ? try_to_del_timer_sync+0x4d/0x80
Oct 2 12:08:45 nikita kernel: [11480.032874] ? del_timer_sync+0x35/0x40
Oct 2 12:08:45 nikita kernel: [11480.032877] ? schedule_timeout+0x177/0x350
Oct 2 12:08:45 nikita kernel: [11480.032883] ? md_thread+0x125/0x170 [md_mod]
Oct 2 12:08:45 nikita kernel: [11480.032888] md_thread+0x125/0x170 [md_mod]
Oct 2 12:08:45 nikita kernel: [11480.032891] ? remove_wait_queue+0x60/0x60
Oct 2 12:08:45 nikita kernel: [11480.032894] kthread+0xf8/0x130
Oct 2 12:08:45 nikita kernel: [11480.032899] ? md_rdev_init+0xc0/0xc0 [md_mod]
Oct 2 12:08:45 nikita kernel: [11480.032902] ? kthread_create_worker_on_cpu+0x70/0x70
Oct 2 12:08:45 nikita kernel: [11480.032905] ret_from_fork+0x35/0x40
Does anybody have any idea what it is and how it can be fixed or at least debugged? My google-fu failed to produce anything of value All similar problems points to hardware, usually usb, which is not the case here. Its not that i need 4.18 now, 4.9 works, but i would like to find the root of a problem before something forces me to a newer kernel...