Hi,
After what? Do You mean that the OS crashes when running from the backup HDD or is it crashing after making a backup (but running from the original LVM setup ? ) Is that backup HDD still connected? Did You changed the kernel in the mean time? (backports/the wonderful Liquorix kernel, etc?)
The OS crashed when the system is running for a few days. There are no backup HDDs involved and no backup is being made as the 2*4Tb drives are mirrored to the 8Tb drive via LVM. I am using the default Debian Buster kernel. At the moment version 4.19.0-11
Code: Select all
# dpkg -l | grep -i linux-image-`uname -r`
ii linux-image-4.19.0-11-amd64 4.19.146-1 amd64 Linux 4.19 for 64-bit PCs (signed)
# uname -a
Linux elite8300sff 4.19.0-11-amd64 #1 SMP Debian 4.19.146-1 (2020-09-17) x86_64 GNU/Linux
What is Your PSU wattage? - can it handle increased load?
The PSU 240W, and as I mentioned in the first post, after an 8+ hour stress test the machine worked fine and could sustain the load
Btw, how did You checked the RAM? - Debian' memtest does not work with UEFI systems - what is the BIOS version that You're using?
RAM was tested with Memtest86 V8.4 (Correction from my post #1) (
https://www.memtest86.com/), BIOS version is K01 v03.08.
What I noticed is that the crash occurs when I have a KVM instance of Kali Linux running (1 core, 2 Gb RAM). Since I made the initial post I disabled the virtual machine, but today I checked the output of dmesg and saw that there has been another crash and in this case I haven't ran any cpu or resource extensive software which could cause this.
Code: Select all
[Tue Sep 29 11:28:06 2020] rcu: INFO: rcu_sched self-detected stall on CPU
[Tue Sep 29 11:28:06 2020] rcu: 4-....: (5250 ticks this GP) idle=25e/1/0x4000000000000002 softirq=1072544/1072544 fqs=2625
[Tue Sep 29 11:28:06 2020] rcu: (t=5251 jiffies g=9116249 q=61515)
[Tue Sep 29 11:28:06 2020] NMI backtrace for cpu 4
[Tue Sep 29 11:28:06 2020] CPU: 4 PID: 65 Comm: kswapd0 Kdump: loaded Not tainted 4.19.0-11-amd64 #1 Debian 4.19.146-1
[Tue Sep 29 11:28:06 2020] Hardware name: Hewlett-Packard HP Compaq Elite 8300 SFF/3397, BIOS K01 v03.08 04/10/2019
[Tue Sep 29 11:28:06 2020] Call Trace:
[Tue Sep 29 11:28:06 2020] <IRQ>
[Tue Sep 29 11:28:06 2020] dump_stack+0x66/0x90
[Tue Sep 29 11:28:06 2020] nmi_cpu_backtrace.cold.4+0x13/0x50
[Tue Sep 29 11:28:06 2020] ? lapic_can_unplug_cpu.cold.33+0x37/0x37
[Tue Sep 29 11:28:06 2020] nmi_trigger_cpumask_backtrace+0xf9/0xfb
[Tue Sep 29 11:28:06 2020] rcu_dump_cpu_stacks+0x9b/0xcb
[Tue Sep 29 11:28:06 2020] rcu_check_callbacks.cold.81+0x1db/0x335
[Tue Sep 29 11:28:06 2020] ? tick_sched_do_timer+0x60/0x60
[Tue Sep 29 11:28:06 2020] update_process_times+0x28/0x60
[Tue Sep 29 11:28:06 2020] tick_sched_handle+0x22/0x60
[Tue Sep 29 11:28:06 2020] tick_sched_timer+0x37/0x70
[Tue Sep 29 11:28:06 2020] __hrtimer_run_queues+0x100/0x280
[Tue Sep 29 11:28:06 2020] hrtimer_interrupt+0x100/0x220
[Tue Sep 29 11:28:06 2020] smp_apic_timer_interrupt+0x6a/0x140
[Tue Sep 29 11:28:06 2020] apic_timer_interrupt+0xf/0x20
[Tue Sep 29 11:28:06 2020] </IRQ>
[Tue Sep 29 11:28:06 2020] RIP: 0010:scan_swap_map_try_ssd_cluster+0x129/0x140
[Tue Sep 29 11:28:06 2020] Code: ff ff 41 8b 46 70 48 8b 0c 24 48 89 01 49 89 45 00 e9 08 ff ff ff 4d 85 e4 74 af ba 01 00 00 00 eb 9a
48 83 c4 08 31 c0 5b 5d <41> 5c 41 5d 41 5e 41 5f c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
[Tue Sep 29 11:28:06 2020] RSP: 0018:ffffac21437eb980 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[Tue Sep 29 11:28:06 2020] RAX: 0000000000000000 RBX: ffff906ab9ba8ad4 RCX: ffffac21437eba88
[Tue Sep 29 11:28:06 2020] RDX: ffffac21437eb9c8 RSI: ffffac21437eb9c0 RDI: ffff906ab9ba8a00
[Tue Sep 29 11:28:06 2020] RBP: ffff906ab2f1ae00 R08: ffff906aba504770 R09: ffffffffbc92de20
[Tue Sep 29 11:28:06 2020] R10: ffffffffbc92dde0 R11: 000000000000000e R12: ffff906ab9ba8a00
[Tue Sep 29 11:28:06 2020] R13: ffffac21437eb9c8 R14: ffff906ab9ba8a00 R15: ffffcc213fd1ef38
[Tue Sep 29 11:28:06 2020] scan_swap_map_slots+0x6d/0x670
[Tue Sep 29 11:28:06 2020] get_swap_pages+0x20a/0x360
[Tue Sep 29 11:28:06 2020] get_swap_page+0x141/0x210
[Tue Sep 29 11:28:06 2020] shmem_writepage+0x80/0x320
[Tue Sep 29 11:28:06 2020] ? radix_tree_delete_item+0x69/0xc0
[Tue Sep 29 11:28:06 2020] pageout.isra.49+0x117/0x340
[Tue Sep 29 11:28:06 2020] shrink_page_list+0xa47/0xc70
[Tue Sep 29 11:28:06 2020] shrink_inactive_list+0x207/0x590
[Tue Sep 29 11:28:06 2020] shrink_node_memcg+0x20c/0x7b0
[Tue Sep 29 11:28:06 2020] shrink_node+0xcf/0x450
[Tue Sep 29 11:28:06 2020] kswapd+0x3cd/0x6e0
[Tue Sep 29 11:28:06 2020] ? mem_cgroup_shrink_node+0x170/0x170
[Tue Sep 29 11:28:06 2020] kthread+0x112/0x130
[Tue Sep 29 11:28:06 2020] ? kthread_bind+0x30/0x30
[Tue Sep 29 11:28:06 2020] ret_from_fork+0x35/0x40
[Tue Sep 29 11:38:42 2020] perf: interrupt took too long (3958 > 3947), lowering kernel.perf_event_max_sample_rate to 50500
[Tue Sep 29 12:48:59 2020] rcu: INFO: rcu_sched self-detected stall on CPU
[Tue Sep 29 12:48:59 2020] rcu: 0-....: (5249 ticks this GP) idle=0de/1/0x4000000000000002 softirq=2053707/2053707 fqs=2625
[Tue Sep 29 12:48:59 2020] rcu: (t=5250 jiffies g=10420061 q=39909)
[Tue Sep 29 12:48:59 2020] NMI backtrace for cpu 0
[Tue Sep 29 12:48:59 2020] CPU: 0 PID: 65 Comm: kswapd0 Kdump: loaded Not tainted 4.19.0-11-amd64 #1 Debian 4.19.146-1
[Tue Sep 29 12:48:59 2020] Hardware name: Hewlett-Packard HP Compaq Elite 8300 SFF/3397, BIOS K01 v03.08 04/10/2019
[Tue Sep 29 12:48:59 2020] Call Trace:
[Tue Sep 29 12:48:59 2020] <IRQ>
[Tue Sep 29 12:48:59 2020] dump_stack+0x66/0x90
[Tue Sep 29 12:48:59 2020] nmi_cpu_backtrace.cold.4+0x13/0x50
[Tue Sep 29 12:48:59 2020] ? lapic_can_unplug_cpu.cold.33+0x37/0x37
[Tue Sep 29 12:48:59 2020] nmi_trigger_cpumask_backtrace+0xf9/0xfb
[Tue Sep 29 12:48:59 2020] rcu_dump_cpu_stacks+0x9b/0xcb
[Tue Sep 29 12:48:59 2020] rcu_check_callbacks.cold.81+0x1db/0x335
[Tue Sep 29 12:48:59 2020] ? tick_sched_do_timer+0x60/0x60
[Tue Sep 29 12:48:59 2020] update_process_times+0x28/0x60
[Tue Sep 29 12:48:59 2020] tick_sched_handle+0x22/0x60
[Tue Sep 29 12:48:59 2020] tick_sched_timer+0x37/0x70
[Tue Sep 29 12:48:59 2020] __hrtimer_run_queues+0x100/0x280
[Tue Sep 29 12:48:59 2020] hrtimer_interrupt+0x100/0x220
[Tue Sep 29 12:48:59 2020] smp_apic_timer_interrupt+0x6a/0x140
[Tue Sep 29 12:48:59 2020] apic_timer_interrupt+0xf/0x20
[Tue Sep 29 12:48:59 2020] </IRQ>
[Tue Sep 29 12:48:59 2020] RIP: 0010:scan_swap_map_slots+0x2f9/0x670
[Tue Sep 29 12:48:59 2020] Code: 01 0f 88 1d 01 00 00 48 8b 54 24 18 48 8d 42 01 48 89 44 24 18 48 3b 44 24 20 0f 83 13 02 00 00 49 8b
44 24 40 0f b6 54 10 01 <84> d2 74 18 48 8b 05 ec 8e 40 01 48 01 c0 48 3b 05 da 8e 40 01 7d
[Tue Sep 29 12:48:59 2020] RSP: 0018:ffffac21437eb9a8 EFLAGS: 00000283 ORIG_RAX: ffffffffffffff13
[Tue Sep 29 12:48:59 2020] RAX: ffffac214391d000 RBX: 00000000000000fb RCX: ffffac21437eba88
[Tue Sep 29 12:48:59 2020] RDX: 00000000000000bf RSI: 0000000000001bb4 RDI: 000000000000fff8
[Tue Sep 29 12:48:59 2020] RBP: 0000000000000000 R08: ffff906aba504770 R09: ffffffffbc92de20
[Tue Sep 29 12:48:59 2020] R10: ffffffffbc92dde0 R11: 0000000000000001 R12: ffff906ab2f1a200
[Tue Sep 29 12:48:59 2020] R13: ffff906ab2f1a2d4 R14: ffff906ab2f1a328 R15: ffffac21437eba88
[Tue Sep 29 12:48:59 2020] ? scan_swap_map_slots+0x6d/0x670
[Tue Sep 29 12:48:59 2020] get_swap_pages+0x20a/0x360
[Tue Sep 29 12:48:59 2020] get_swap_page+0x141/0x210
[Tue Sep 29 12:48:59 2020] shmem_writepage+0x80/0x320
[Tue Sep 29 12:48:59 2020] ? count_shadow_nodes+0xa0/0xa0
[Tue Sep 29 12:48:59 2020] pageout.isra.49+0x117/0x340
[Tue Sep 29 12:48:59 2020] shrink_page_list+0xa47/0xc70
[Tue Sep 29 12:48:59 2020] shrink_inactive_list+0x207/0x590
[Tue Sep 29 12:48:59 2020] shrink_node_memcg+0x20c/0x7b0
[Tue Sep 29 12:48:59 2020] shrink_node+0xcf/0x450
[Tue Sep 29 12:48:59 2020] kswapd+0x3cd/0x6e0
[Tue Sep 29 12:48:59 2020] ? mem_cgroup_shrink_node+0x170/0x170
[Tue Sep 29 12:48:59 2020] kthread+0x112/0x130
[Tue Sep 29 12:48:59 2020] ? kthread_bind+0x30/0x30
[Tue Sep 29 12:48:59 2020] ret_from_fork+0x35/0x40
[Tue Sep 29 12:51:21 2020] rcu: INFO: rcu_sched self-detected stall on CPU
[Tue Sep 29 12:51:21 2020] rcu: 4-....: (5250 ticks this GP) idle=4f6/1/0x4000000000000002 softirq=1151424/1151424 fqs=2616
[Tue Sep 29 12:51:21 2020] rcu: (t=5251 jiffies g=10444585 q=36046)
[Tue Sep 29 12:51:21 2020] NMI backtrace for cpu 4
[Tue Sep 29 12:51:21 2020] CPU: 4 PID: 65 Comm: kswapd0 Kdump: loaded Not tainted 4.19.0-11-amd64 #1 Debian 4.19.146-1
[Tue Sep 29 12:51:21 2020] Hardware name: Hewlett-Packard HP Compaq Elite 8300 SFF/3397, BIOS K01 v03.08 04/10/2019
[Tue Sep 29 12:51:21 2020] Call Trace:
[Tue Sep 29 12:51:21 2020] <IRQ>
[Tue Sep 29 12:51:21 2020] dump_stack+0x66/0x90
[Tue Sep 29 12:51:21 2020] nmi_cpu_backtrace.cold.4+0x13/0x50
[Tue Sep 29 12:51:21 2020] ? lapic_can_unplug_cpu.cold.33+0x37/0x37
[Tue Sep 29 12:51:21 2020] nmi_trigger_cpumask_backtrace+0xf9/0xfb
[Tue Sep 29 12:51:21 2020] rcu_dump_cpu_stacks+0x9b/0xcb
[Tue Sep 29 12:51:21 2020] rcu_check_callbacks.cold.81+0x1db/0x335
[Tue Sep 29 12:51:21 2020] ? tick_sched_do_timer+0x60/0x60
[Tue Sep 29 12:51:21 2020] update_process_times+0x28/0x60
[Tue Sep 29 12:51:21 2020] tick_sched_handle+0x22/0x60
[Tue Sep 29 12:51:21 2020] tick_sched_timer+0x37/0x70
[Tue Sep 29 12:51:21 2020] __hrtimer_run_queues+0x100/0x280
[Tue Sep 29 12:51:21 2020] hrtimer_interrupt+0x100/0x220
[Tue Sep 29 12:51:21 2020] smp_apic_timer_interrupt+0x6a/0x140
[Tue Sep 29 12:51:21 2020] apic_timer_interrupt+0xf/0x20
[Tue Sep 29 12:51:21 2020] </IRQ>
[Tue Sep 29 12:51:21 2020] RIP: 0010:get_swap_pages+0x18f/0x360
[Tue Sep 29 12:51:21 2020] Code: 28 01 00 00 48 89 c5 4c 39 f6 0f 84 be 00 00 00 4f 8d 24 2f 4c 89 e7 e8 5f 1c 4f 00 48 c7 c7 a0 b3 e2
bc c6 07 00 0f 1f 40 00 <49> 8d 9f d4 00 00 00 48 89 df e8 92 e9 50 00 41 8b 57 64 85 d2 74
[Tue Sep 29 12:51:21 2020] RSP: 0018:ffffac21437eba10 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[Tue Sep 29 12:51:21 2020] RAX: 0000000000000001 RBX: 0000000000000000 RCX: dead000000000200
[Tue Sep 29 12:51:21 2020] RDX: ffff906aba504770 RSI: ffff906ab2f1af28 RDI: ffffffffbce2b3a0
[Tue Sep 29 12:51:21 2020] RBP: ffff906ab2f1ae00 R08: ffff906aba504770 R09: ffffffffbc92de20
[Tue Sep 29 12:51:21 2020] R10: ffffffffbc92dde0 R11: 0000000000000001 R12: ffff906ab2f1a310
[Tue Sep 29 12:51:21 2020] R13: 0000000000000110 R14: ffff906ab2f1a328 R15: ffff906ab2f1a200
[Tue Sep 29 12:51:21 2020] get_swap_page+0x141/0x210
[Tue Sep 29 12:51:21 2020] shmem_writepage+0x80/0x320
[Tue Sep 29 12:51:21 2020] ? count_shadow_nodes+0xa0/0xa0
[Tue Sep 29 12:51:21 2020] pageout.isra.49+0x117/0x340
[Tue Sep 29 12:51:21 2020] shrink_page_list+0xa47/0xc70
[Tue Sep 29 12:51:21 2020] shrink_inactive_list+0x207/0x590
[Tue Sep 29 12:51:21 2020] shrink_node_memcg+0x20c/0x7b0
[Tue Sep 29 12:51:21 2020] shrink_node+0xcf/0x450
[Tue Sep 29 12:51:21 2020] kswapd+0x3cd/0x6e0
[Tue Sep 29 12:51:21 2020] ? mem_cgroup_shrink_node+0x170/0x170
[Tue Sep 29 12:51:21 2020] kthread+0x112/0x130
[Tue Sep 29 12:51:21 2020] ? kthread_bind+0x30/0x30
[Tue Sep 29 12:51:21 2020] ret_from_fork+0x35/0x40
[Tue Sep 29 12:51:42 2020] rcu: INFO: rcu_sched self-detected stall on CPU
[Tue Sep 29 12:51:42 2020] rcu: 4-....: (5250 ticks this GP) idle=4f6/1/0x4000000000000002 softirq=1151428/1151428 fqs=2624
[Tue Sep 29 12:51:42 2020] rcu: (t=5251 jiffies g=10444593 q=85372)
[Tue Sep 29 12:51:42 2020] NMI backtrace for cpu 4
[Tue Sep 29 12:51:42 2020] CPU: 4 PID: 65 Comm: kswapd0 Kdump: loaded Not tainted 4.19.0-11-amd64 #1 Debian 4.19.146-1
[Tue Sep 29 12:51:42 2020] Hardware name: Hewlett-Packard HP Compaq Elite 8300 SFF/3397, BIOS K01 v03.08 04/10/2019
[Tue Sep 29 12:51:42 2020] Call Trace:
[Tue Sep 29 12:51:42 2020] <IRQ>
[Tue Sep 29 12:51:42 2020] dump_stack+0x66/0x90
[Tue Sep 29 12:51:42 2020] nmi_cpu_backtrace.cold.4+0x13/0x50
[Tue Sep 29 12:51:42 2020] ? lapic_can_unplug_cpu.cold.33+0x37/0x37
[Tue Sep 29 12:51:42 2020] nmi_trigger_cpumask_backtrace+0xf9/0xfb
[Tue Sep 29 12:51:42 2020] rcu_dump_cpu_stacks+0x9b/0xcb
[Tue Sep 29 12:51:42 2020] rcu_check_callbacks.cold.81+0x1db/0x335
[Tue Sep 29 12:51:42 2020] ? tick_sched_do_timer+0x60/0x60
[Tue Sep 29 12:51:42 2020] update_process_times+0x28/0x60
[Tue Sep 29 12:51:42 2020] tick_sched_handle+0x22/0x60
[Tue Sep 29 12:51:42 2020] tick_sched_timer+0x37/0x70
[Tue Sep 29 12:51:42 2020] __hrtimer_run_queues+0x100/0x280
[Tue Sep 29 12:51:42 2020] hrtimer_interrupt+0x100/0x220
[Tue Sep 29 12:51:42 2020] smp_apic_timer_interrupt+0x6a/0x140
[Tue Sep 29 12:51:42 2020] apic_timer_interrupt+0xf/0x20
[Tue Sep 29 12:51:42 2020] </IRQ>
[Tue Sep 29 12:51:42 2020] RIP: 0010:scan_swap_map_slots+0x2d6/0x670
[Tue Sep 29 12:51:42 2020] Code: 86 51 02 00 00 49 8b 54 24 40 0f b6 14 02 84 d2 74 57 48 8b 05 2b 8f 40 01 48 01 c0 48 39 c7 7f 43 83 eb 01 0f 88 1d 01 00 00 <48> 8b 54 24 18 48 8d 42 01 48 89 44 24 18 48 3b 44 24 20 0f 83 13
[Tue Sep 29 12:51:42 2020] RSP: 0018:ffffac21437eb9a8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[Tue Sep 29 12:51:42 2020] RAX: 0000000000000002 RBX: 00000000000000f8 RCX: ffffac21437eba88
[Tue Sep 29 12:51:42 2020] RDX: 00000000000000bf RSI: 0000000000001bb4 RDI: 000000000000fff8
[Tue Sep 29 12:51:42 2020] RBP: 0000000000000000 R08: ffff906aba504770 R09: ffffffffbc92de20
[Tue Sep 29 12:51:42 2020] R10: ffffffffbc92dde0 R11: 000000000000003b R12: ffff906ab2f1a200
[Tue Sep 29 12:51:42 2020] R13: ffff906ab2f1a2d4 R14: ffff906ab2f1a328 R15: ffffac21437eba88
[Tue Sep 29 12:51:43 2020] ? scan_swap_map_slots+0x6d/0x670
[Tue Sep 29 12:51:43 2020] get_swap_pages+0x20a/0x360
[Tue Sep 29 12:51:43 2020] get_swap_page+0x141/0x210
[Tue Sep 29 12:51:43 2020] shmem_writepage+0x80/0x320
[Tue Sep 29 12:51:43 2020] ? radix_tree_delete_item+0x69/0xc0
[Tue Sep 29 12:51:43 2020] pageout.isra.49+0x117/0x340
[Tue Sep 29 12:51:43 2020] shrink_page_list+0xa47/0xc70
[Tue Sep 29 12:51:43 2020] shrink_inactive_list+0x207/0x590
[Tue Sep 29 12:51:43 2020] shrink_node_memcg+0x20c/0x7b0
[Tue Sep 29 12:51:43 2020] shrink_node+0xcf/0x450
[Tue Sep 29 12:51:43 2020] kswapd+0x3cd/0x6e0
[Tue Sep 29 12:51:43 2020] ? mem_cgroup_shrink_node+0x170/0x170
[Tue Sep 29 12:51:43 2020] kthread+0x112/0x130
[Tue Sep 29 12:51:43 2020] ? kthread_bind+0x30/0x30
[Tue Sep 29 12:51:43 2020] ret_from_fork+0x35/0x40
[Tue Sep 29 15:58:51 2020] perf: interrupt took too long (4956 > 4947), lowering kernel.perf_event_max_sample_rate to 40250