Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230

 

 

 

[Software] Serious deadlock issue jbd2

Linux Kernel, Network, and Services configuration.
Message
Author
rontarplan
Posts: 17
Joined: 2023-05-11 07:46
Been thanked: 1 time

Re: [Software] Serious deadlock issue jbd2

#21 Post by rontarplan »

Aki wrote: 2023-05-23 15:55 Hello,
rontarplan wrote: 2023-05-23 13:16 Haha. It just happened again shortly after I have wrote my message saying I was optimistic:
[..]
Load turned out to be funny too:

Code: Select all

load average: 7918.04, 7920.83, 7920.76
What is the memory usage under such high loads ?
That one I did not take from

Code: Select all

top
, so I don't have it for today.
But a previous one:

Code: Select all

top - 16:15:46 up 2 days, 17:08,  3 users,  load average: 11930.00, 11930.02, 11930.54
Tasks: 218 total,   1 running, 205 sleeping,   0 stopped,  12 zombie
%Cpu(s):  0.0 us,  1.0 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 257643.0 total,  54273.7 free,  47562.8 used, 155806.4 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used. 207842.4 avail Mem
The server usually is still quite responsive when it shows such loads. From my observation it seems that these loads are rather not the cause for not being able to write to the disk but the result of it.

User avatar
kent_dorfman766
Posts: 529
Joined: 2022-12-16 06:34
Location: socialist states of america
Has thanked: 56 times
Been thanked: 69 times

Re: [Software] Serious deadlock issue jbd2

#22 Post by kent_dorfman766 »

don't be too quick to dismiss the high load averages as not being problematic. disk IO timeout "can be a thing" in linux under too heavy a load with poorly designed software. That's one of my gripes with the linux kernel. It tries to weigh everything "fairly" which isnt always the best solution for scheduling.

I'd be interested in sar output over several days, taken at about 10 minute intervals. when set up properly, sar saves a whole bunch of useful diag data under /var/log/sa/sa?? files.

Another thing, if you haven't already done it. DO NOT load production data back on the questionable array. stress test it in layers on an otherwise quiescent system using test data / IO.

rontarplan
Posts: 17
Joined: 2023-05-11 07:46
Been thanked: 1 time

Re: [Software] Serious deadlock issue jbd2

#23 Post by rontarplan »

And it just happend again:

Code: Select all

top - 00:45:45 up 11:42,  1 user,  load average: 10261.46, 10229.62, 10071.91
Tasks: 299 total,   1 running, 294 sleeping,   0 stopped,   4 zombie
%Cpu(s):  0.3 us,  0.2 sy,  0.0 ni, 99.0 id,  0.4 wa,  0.0 hi,  0.1 si,  0.0 st
MiB Mem : 257631.0 total, 231503.9 free,  21246.2 used,   4880.9 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used. 234141.6 avail Mem

Code: Select all

perf: interrupt took too long (2514 > 2500), lowering kernel.perf_event_max_sample_rate to 79500
INFO: task jbd2/sda5-8:6968 blocked for more than 120 seconds.
      Not tainted 5.10.0-22-amd64 #1 Debian 5.10.178-3
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:jbd2/sda5-8     state:D stack:    0 pid: 6968 ppid:     2 flags:0x00004000
Call Trace:
 __schedule+0x282/0x870
 schedule+0x46/0xb0
 jbd2_journal_commit_transaction+0x163/0x1ad0 [jbd2]
 ? update_load_avg+0x7a/0x5d0
 ? newidle_balance+0x282/0x3d0
 ? __switch_to_asm+0x3a/0x60
 ? __switch_to+0x114/0x440
 ? add_wait_queue_exclusive+0x70/0x70
 ? lock_timer_base+0x61/0x80
 kjournald2+0xab/0x270 [jbd2]
 ? add_wait_queue_exclusive+0x70/0x70
 ? load_superblock.part.0+0xb0/0xb0 [jbd2]
 kthread+0x11b/0x140
 ? __kthread_bind_mask+0x60/0x60
 ret_from_fork+0x22/0x30
That was with the default settings for read_ahead and dirty page handling and it seems they don't work well on this machine.
kent_dorfman766 wrote: 2023-05-23 20:08 don't be too quick to dismiss the high load averages as not being problematic. disk IO timeout "can be a thing" in linux under too heavy a load with poorly designed software. That's one of my gripes with the linux kernel. It tries to weigh everything "fairly" which isnt always the best solution for scheduling.

I'd be interested in sar output over several days, taken at about 10 minute intervals. when set up properly, sar saves a whole bunch of useful diag data under /var/log/sa/sa?? files.

Another thing, if you haven't already done it. DO NOT load production data back on the questionable array. stress test it in layers on an otherwise quiescent system using test data / IO.
No, I am trying not to dismiss anything yet. I am just sharing my observations. The thing is, with top displaying such loads, I would expect the server to stall and to be laggy and slow. This is not the case. It is responsive and reacting to shell commands without noticeable delay. So I don't know how accurate that number is to be taken as real load or if it is only a calculated metric. So it seems that the 'real' load at the moment is much much lower, but it is still displaying these ridiculously high values.

A current iostat (repeating mode) shows this:

Code: Select all

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.04    0.00    0.00   99.96
Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
Basically that shows that all the processes have been killed due to inaccessibility of the array. But the load averages do not decline which makes me wonder. Because obviously what is happening is that when disk access is blocked for more than 120 seconds, the I/Os go to the RAM, which after a while will lead to get the processes reaped for memory exhaustion. These are Docker containers and I think for the next try I will allocate more memory to them. But this will probably only delay the issue and not preventing it as at some point the RAM has to be written. But maybe i'll help getting over I/O spikes. Processes getting reaped would be fine but it seems they don't restart reliably after that. Maybe again due to high I/O. (However I cannot even restart them manually now even although the machine is responsive.

I would be totally fine if it turns out that the server is too weak for this. But as I said in my initial post I am not yet convinced and try to figure out what is going on as this is the only server from several servers running that has this problem. That's what puzzles me most. The other servers are similar hardware, have similar load, some with identical HW-RAID, others with MD-SOFTRAID and and similar if not identical setup and every other server is running fine without showing these issues that this one has. So I still believe there is something with this specific machine that makes it not keeping up.

Here is the single average iostat for this machine:

Code: Select all

iostat -dxm
Linux 5.10.0-22-amd64 (xxx)  05/24/2023      _x86_64_        (12 CPU)

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda            195.96      7.59   486.90  71.30   14.87    39.65   59.27      2.12   185.92  75.83   10.89    36.56    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    3.56  43.99
And for comparison another machine with identical hardware that works well:

Code: Select all

iostat -dxm
Linux 5.10.0-21-amd64 (yyy)  05/24/2023      _x86_64_        (12 CPU)

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda            101.97      2.84   250.67  71.08    1.11    28.53  210.18     14.97   432.85  67.31    1.50    72.94    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.01   8.03
I have never used sar I have to check how to set it up. The thing is that I need the data on this machine in production I would have to offload them to a different machine if we cannot use them for longer periods.

rontarplan
Posts: 17
Joined: 2023-05-11 07:46
Been thanked: 1 time

Re: [Software] Serious deadlock issue jbd2

#24 Post by rontarplan »

The server is fully back up running. Current top looks like this:

Code: Select all

top - 18:06:27 up 13:24,  2 users,  load average: 1.06, 1.06, 1.03
Tasks: 304 total,   2 running, 302 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.3 us,  0.8 sy,  0.0 ni, 94.9 id,  2.5 wa,  0.0 hi,  0.4 si,  0.0 st
MiB Mem : 257631.0 total, 222493.0 free,   3266.9 used,  31871.0 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used. 252125.7 avail Mem
And I managed to set up sar:

Code: Select all

01:50:00 PM       DEV       tps     rkB/s     wkB/s     dkB/s   areq-sz    aqu-sz     await     %util
02:00:00 PM       sda    136.76    992.48   4131.16      0.00     37.46      0.74      5.39     39.56
02:10:00 PM       sda    162.60    875.53   4630.75      0.00     33.86      0.85      5.24     41.93
02:20:01 PM       sda    157.51    929.38   4968.65      0.00     37.45      0.81      5.11     40.68
02:30:00 PM       sda    183.08   1728.08   5776.48      0.00     40.99      1.00      5.45     46.02
02:40:00 PM       sda    177.76   1580.22   5769.30      0.00     41.34      1.06      5.96     45.80
02:50:00 PM       sda    172.63   1073.03   5290.92      0.00     36.87      0.95      5.52     43.74
03:00:00 PM       sda    174.61   1231.54   5434.07      0.00     38.18      0.95      5.42     44.43
03:10:00 PM       sda    241.74   1405.74   5857.17      0.00     30.04      1.60      6.64     55.91
03:20:01 PM       sda    193.54   1205.23   5422.76      0.00     34.25      1.12      5.77     46.26
03:30:00 PM       sda    224.35   1294.21   5748.65      0.00     31.39      1.47      6.55     49.09
03:40:00 PM       sda    363.48   1875.49   6829.61      0.00     23.95      4.11     11.30     64.12
03:50:00 PM       sda    253.76   1119.48   5707.69      0.00     26.90      1.78      7.01     53.65
04:00:00 PM       sda    206.93   1114.55   5112.67      0.00     30.09      1.15      5.57     46.98
04:10:00 PM       sda    201.99   1011.87   5291.37      0.00     31.21      1.16      5.76     45.63
04:20:00 PM       sda    185.01   1084.43   4885.28      0.00     32.27      1.09      5.87     44.66
04:30:01 PM       sda    172.42   1158.46   4629.44      0.00     33.57      0.96      5.56     43.34
04:40:00 PM       sda    176.56   1184.76   5414.49      0.00     37.38      0.96      5.42     43.48
04:50:00 PM       sda    149.22   1171.96   5295.60      0.00     43.34      0.79      5.33     39.30
05:00:00 PM       sda    175.82   1222.74   5286.95      0.00     37.03      0.95      5.40     42.62
05:10:00 PM       sda    160.37   1243.80   4897.32      0.00     38.29      0.86      5.38     42.04
05:20:01 PM       sda    145.26   1319.95   4519.11      0.00     40.20      0.78      5.34     38.87
05:30:00 PM       sda    147.01   1338.80   4580.38      0.00     40.26      0.80      5.43     39.70
05:40:00 PM       sda    148.96   1446.04   4588.74      0.00     40.51      0.92      6.21     42.98
05:50:00 PM       sda    155.75   1522.46   4595.88      0.00     39.28      0.99      6.35     44.72
06:00:01 PM       sda    154.11   1420.50   4079.49      0.00     35.69      0.96      6.22     43.45
Average:          sda    237.71   3691.94   3705.23      0.00     31.12      2.09      8.79     56.98
Currently everything looks fine. But for how long?

User avatar
kent_dorfman766
Posts: 529
Joined: 2022-12-16 06:34
Location: socialist states of america
Has thanked: 56 times
Been thanked: 69 times

Re: [Software] Serious deadlock issue jbd2

#25 Post by kent_dorfman766 »

good. you need to determine what processes are going rogue and hammering the system after its been running for a while. lots of logging, and maybe piping immediate mode "tail" of stuff like dmesg and /var/log/syslog or journalctl to terminals so you ahve realtime telemetry when something bad happens.

FWIW, those 10 minute disk utilization seem high, like you've got a constricted disk io channel in play

Why are you writing so much? The prefered strategy is to use RAM caches to avoid that.

rontarplan
Posts: 17
Joined: 2023-05-11 07:46
Been thanked: 1 time

Re: [Software] Serious deadlock issue jbd2

#26 Post by rontarplan »

kent_dorfman766 wrote: 2023-05-24 20:50 good. you need to determine what processes are going rogue and hammering the system after its been running for a while. lots of logging, and maybe piping immediate mode "tail" of stuff like dmesg and /var/log/syslog or journalctl to terminals so you ahve realtime telemetry when something bad happens.

FWIW, those 10 minute disk utilization seem high, like you've got a constricted disk io channel in play

Why are you writing so much? The prefered strategy is to use RAM caches to avoid that.
First of all thank you for all your support and effort so far. I really appreciate it as I am running out of ideas myself.
Surely it has happened again but at least I have some stats to share. But they look crazy:

Code: Select all

08:50:00 AM       DEV       tps     rkB/s     wkB/s     dkB/s   areq-sz    aqu-sz     await     %util
09:00:00 AM       sda    285.35   2528.23   6653.75      0.00     32.18      2.83      9.91     67.80
09:10:00 AM       sda    307.98   2493.06   7397.51      0.00     32.11      2.95      9.58     69.20
09:20:00 AM       sda    392.58   5251.61   7376.78      0.00     32.17      4.51     11.49     80.39
09:30:00 AM       sda    558.88   7018.46   8289.23      0.00     27.39      9.19     16.44     92.95
09:40:00 AM       sda    505.78   7892.24   7170.58      0.00     29.78      7.17     14.18     92.87
09:50:00 AM       sda    291.77   2438.00   6452.09      0.00     30.47      2.77      9.49     65.31
10:00:00 AM       sda    259.32   2041.77   6280.14      0.00     32.09      2.31      8.92     62.69
10:10:00 AM       sda    269.14   2042.95   6541.84      0.00     31.90      2.37      8.80     63.56
10:20:00 AM       sda    276.09   2018.29   6477.84      0.00     30.77      2.40      8.69     63.41
10:30:00 AM       sda    519.54   7810.86   7297.82      0.00     29.08      7.45     14.34     92.45
10:40:00 AM       sda    567.60   8350.32   7073.22      0.00     27.17      9.10     16.04     93.70
10:50:00 AM       sda    648.92  10001.42   7034.04      0.00     26.25     11.31     17.42     95.10
11:00:01 AM       sda    507.10   8066.23   6530.22      0.00     28.78      6.73     13.27     90.90
11:10:00 AM       sda    266.80   2316.53   6234.01      0.00     32.05      2.37      8.89     64.55
11:20:00 AM       sda    468.81   7805.63   6857.57      0.00     31.28      5.67     12.09     88.74
11:30:00 AM       sda    576.62   8805.48   7589.14      0.00     28.43      9.14     15.85     93.79
11:40:00 AM       sda    471.16   7358.60   5772.44      0.00     27.87      7.42     15.76     90.67
11:50:00 AM       sda     62.58   1221.49   1070.68      0.00     36.63      0.74     11.75     38.73
12:00:00 PM       sda     25.05    573.13    872.77      0.00     57.72      0.23      9.26     15.31
12:10:00 PM       sda     25.31    587.42    868.25      0.00     57.52      0.23      9.01     14.84
12:20:00 PM       sda     60.99   2707.25    603.17      0.00     54.27      0.23      3.81     15.11
12:30:00 PM       sda     16.86    371.83     62.03      0.00     25.74      0.18     10.93     11.63
12:40:00 PM       sda     16.87    331.97     32.06      0.00     21.58      0.18     10.94     11.65
12:50:00 PM       sda    133.86   8727.48     23.04      0.00     65.37      0.17      1.24     11.69
01:00:00 PM       sda     12.52    232.82     21.78      0.00     20.34      0.13     10.40      8.18
01:10:00 PM       sda    250.47  19354.66     25.68      0.00     77.37      0.15      0.61     12.68
01:20:00 PM       sda    249.87  18022.16     23.05      0.00     72.22      0.13      0.51     14.48
01:30:00 PM       sda    624.78  41620.57     23.03      0.00     66.65      0.12      0.19     21.80
01:40:00 PM       sda    291.91  22611.29     15.35      0.00     77.51      0.08      0.29     16.98
01:50:00 PM       sda   2217.61 178394.71     17.28      0.00     80.45      0.19      0.08     87.66
02:00:00 PM       sda   1442.27 117149.49      9.85      0.00     81.23      0.12      0.08     87.27
02:10:00 PM       sda   2870.87 241037.23      9.76      0.00     83.96      0.20      0.07     93.81
02:20:00 PM       sda   1526.36 121876.51     17.33      0.00     79.86      0.11      0.07     92.50
02:30:00 PM       sda   2438.88 202585.83     11.85      0.00     83.07      0.15      0.06     96.34
02:40:00 PM       sda   1673.79 121001.31     21.35      0.00     72.30      0.10      0.06     95.75
02:50:00 PM       sda   1523.02 103439.83     17.07      0.00     67.93      0.09      0.06     95.44
03:00:00 PM       sda   1437.55  94253.87     17.23      0.00     65.58      0.08      0.06     95.03
03:10:00 PM       sda   1563.04  99113.52     16.41      0.00     63.42      0.09      0.06     95.76
03:20:00 PM       sda   1850.12 112372.70     15.33      0.00     60.75      0.10      0.06     97.39
03:30:00 PM       sda   2379.89 146420.62     15.17      0.00     61.53      0.13      0.05     96.70
03:40:03 PM       sda    238.75  15143.67     12.37      0.00     63.48      0.01      0.06     10.33

03:40:03 PM       DEV       tps     rkB/s     wkB/s     dkB/s   areq-sz    aqu-sz     await     %util
03:50:28 PM       sda      7.61      1.46     52.13      0.00      7.04      0.00      0.13      0.18
04:00:03 PM       sda      1.41      0.00      6.95      0.00      4.93      0.00      0.11      0.05
04:10:00 PM       sda      5.40      0.01     26.51      0.00      4.91      0.00      0.12      0.12
04:20:00 PM       sda      1.82      0.00      9.22      0.00      5.05      0.00      0.12      0.06
04:30:00 PM       sda      3.43      0.00     16.91      0.00      4.94      0.00      0.12      0.08
04:40:00 PM       sda      4.35      0.00     21.44      0.00      4.92      0.00      0.11      0.11
04:50:00 PM       sda      1.25      0.00      6.35      0.00      5.09      0.00      0.11      0.06
Average:          sda    446.52  20631.96   4465.34      0.00     56.21      1.88      4.21     54.45

05:18:15 PM  LINUX RESTART      (12 CPU)
At 11:39 that's when dmesg logs the usual messages that it was not able to write for 120 seconds and the server and starts to reap processes and stopped functioning normally so that I had to restart it later.

As comparison I put a log from the twin server (same hardware, same configuration, same duty) which is running fine:

Code: Select all

08:50:00 AM       sda    258.58   1816.44   7998.89      0.00     37.96      5.33     20.63     43.68
09:00:00 AM       sda    244.88   1619.24   8299.48      0.00     40.50      5.18     21.16     43.66
09:10:00 AM       sda    253.39   1583.74   9226.60      0.00     42.66      5.54     21.85     42.26
09:20:00 AM       sda    226.38   1545.93   8076.81      0.00     42.51      4.97     21.94     40.96
09:30:00 AM       sda    251.62   1455.59   8809.26      0.00     40.79      5.41     21.49     40.53
09:40:00 AM       sda    253.32   1455.39   8414.71      0.00     38.96      5.35     21.13     41.46
09:50:00 AM       sda    261.02   1573.65   8499.85      0.00     38.59      5.31     20.33     42.51
10:00:01 AM       sda    240.22   1516.15   7958.60      0.00     39.44      5.08     21.16     40.61
10:10:00 AM       sda    237.97   1397.86   7982.46      0.00     39.42      5.16     21.68     39.98
10:20:00 AM       sda    244.65   1465.13   7992.11      0.00     38.66      5.09     20.82     39.53
10:30:00 AM       sda    246.73   1719.70   7914.72      0.00     39.05      5.19     21.03     41.84
10:40:00 AM       sda    236.95   1905.43   7285.75      0.00     38.79      5.14     21.69     42.52
10:50:00 AM       sda    247.48   1915.01   7315.82      0.00     37.30      5.14     20.77     43.13
11:00:00 AM       sda    240.28   1766.43   7364.03      0.00     38.00      4.88     20.31     41.26
11:10:00 AM       sda    241.09   1881.85   7884.93      0.00     40.51      5.26     21.82     42.02
11:20:00 AM       sda    236.87   1789.64   7856.43      0.00     40.72      5.07     21.42     39.94
11:30:00 AM       sda    235.60   1805.97   7769.98      0.00     40.64      5.35     22.71     41.44
11:40:00 AM       sda    249.06   1888.53   8225.22      0.00     40.61      5.30     21.29     42.49
11:50:00 AM       sda    248.19   1722.36   7846.07      0.00     38.55      5.15     20.75     41.92
12:00:00 PM       sda    246.56   1484.34   7564.92      0.00     36.70      5.50     22.30     41.62

12:00:00 PM       DEV       tps     rkB/s     wkB/s     dkB/s   areq-sz    aqu-sz     await     %util
12:10:01 PM       sda    220.16   1448.06   7350.32      0.00     39.96      4.17     18.94     37.50
12:20:00 PM       sda    197.94   1427.23   7483.55      0.00     45.02      2.88     14.54     34.35
12:30:00 PM       sda    183.17   1572.92   6928.04      0.00     46.41      2.95     16.11     35.05
12:40:00 PM       sda    174.52   1463.03   6774.25      0.00     47.20      2.53     14.47     35.08
12:50:00 PM       sda    161.34   1623.71   6406.14      0.00     49.77      1.90     11.79     34.64
01:00:00 PM       sda    275.05   2316.83   7288.59      0.00     34.92      5.46     19.86     51.21
01:10:00 PM       sda    321.34   2572.91   7559.43      0.00     31.53      6.77     21.05     55.66
01:20:00 PM       sda    153.44   1733.26   5664.99      0.00     48.22      1.61     10.49     34.28
01:30:00 PM       sda    187.41   1461.55   6812.36      0.00     44.15      2.42     12.91     34.01
01:40:00 PM       sda    234.57   1529.85   7878.85      0.00     40.11      3.65     15.57     35.23
01:50:00 PM       sda    456.32   1811.20  10088.55      0.00     26.08      9.58     20.98     56.69
02:00:00 PM       sda    165.52   1369.52   6175.34      0.00     45.58      1.66     10.05     30.70
02:10:00 PM       sda    172.97   1516.13   5885.51      0.00     42.79      2.26     13.05     32.28
02:20:00 PM       sda    167.68   1433.92   5378.73      0.00     40.63      2.45     14.58     31.86
02:30:00 PM       sda    135.17   1439.63   5213.05      0.00     49.22      1.23      9.08     29.94
02:40:00 PM       sda    150.30   1464.62   5881.88      0.00     48.88      1.79     11.90     30.51
02:50:00 PM       sda    148.29   1736.03   6120.03      0.00     52.98      1.49     10.02     32.54
03:00:00 PM       sda    148.91   1564.29   5672.66      0.00     48.60      1.43      9.62     31.89
03:10:00 PM       sda    342.91   1898.71   8051.81      0.00     29.02      7.72     22.52     52.12
03:20:00 PM       sda    309.84   2001.25   7489.74      0.00     30.63      7.57     24.44     49.95
03:30:00 PM       sda    161.89   1857.30   5966.11      0.00     48.32      1.93     11.92     33.82
03:40:00 PM       sda    345.08   1896.39   9222.21      0.00     32.22      7.43     21.53     47.94
03:50:00 PM       sda    479.74   1902.53  11317.15      0.00     27.56     12.99     27.08     59.90
04:00:00 PM       sda    373.97   1829.15   9635.27      0.00     30.66      9.66     25.83     53.68
04:10:00 PM       sda    186.13   1653.07   7718.60      0.00     50.35      2.44     13.11     33.72
04:20:00 PM       sda    185.53   1658.73   7366.37      0.00     48.65      2.41     13.00     33.57
04:30:01 PM       sda    354.99   1888.79   9429.82      0.00     31.88      8.95     25.20     51.87
04:40:00 PM       sda    177.06   1754.27   7095.80      0.00     49.98      2.33     13.13     34.86
04:50:00 PM       sda    180.82   1908.12   7304.80      0.00     50.95      2.22     12.28     37.02
05:00:00 PM       sda    177.46   1617.38   7175.81      0.00     49.55      2.15     12.13     33.91
05:10:00 PM       sda    181.77   1638.96   7385.29      0.00     49.65      2.12     11.66     34.56
05:20:01 PM       sda    169.11   1843.99   7219.73      0.00     53.60      1.87     11.04     35.83
These servers are serving file up and downloads from users about which the control is very limited. I have thought about caches and they would help to some extent, but currently I am still so confused why it is working flawlessly on one server and not on the other.

Here is a top of the twin sever:

Code: Select all

top - 10:54:39 up 119 days, 20:31,  2 users,  load average: 3.21, 2.94, 2.94
Tasks: 321 total,   1 running, 319 sleeping,   0 stopped,   1 zombie
%Cpu(s):  2.6 us,  5.6 sy,  0.0 ni, 88.1 id,  2.9 wa,  0.0 hi,  0.9 si,  0.0 st
MiB Mem : 257643.0 total,  12974.7 free,   5962.9 used, 238705.4 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used. 249412.6 avail Mem
It is running for 119 days without a reboot and the other one can't stay up for a day. I have no explanation yet.

User avatar
kent_dorfman766
Posts: 529
Joined: 2022-12-16 06:34
Location: socialist states of america
Has thanked: 56 times
Been thanked: 69 times

Re: [Software] Serious deadlock issue jbd2

#27 Post by kent_dorfman766 »

sure looks like you've got a task/process on that machine hammering the array. You've never said "what" is on the array. If it were me I'd run the thing in single user mode to watch the IO metrics, then I'd incrementally start services and watch for disk utilization.

Aki
Global Moderator
Global Moderator
Posts: 2816
Joined: 2014-07-20 18:12
Location: Europe
Has thanked: 68 times
Been thanked: 382 times

Re: [Software] Serious deadlock issue jbd2

#28 Post by Aki »

kent_dorfman766 wrote: 2023-05-26 11:11 sure looks like you've got a task/process on that machine hammering the array. You've never said "what" is on the array. If it were me I'd run the thing in single user mode to watch the IO metrics, then I'd incrementally start services and watch for disk utilization.
@rontarplan : have you identified the culprit ?
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
⠈⠳⣄⠀

User avatar
Penaut Butter
Posts: 13
Joined: 2018-05-31 21:59
Has thanked: 1 time

Re: [Software] Serious deadlock issue jbd2

#29 Post by Penaut Butter »

Hey everyone. I am also trying to work my way through this locks and to understand what is happening with my system
this https://www.suse.com/support/kb/doc/?id=000017857 and this https://wiki.archlinux.org/title/Sysctl#Virtual_memory is worth reading to try to understand the cause of this frustrating and nasty issue.

In my case on a simpler environment; I've been transferring hundreds of gigabytes of data from an HDD to another HDD (via USB SATA dock, sometimes using borgbackup, sometimes plain cp -a) a few years ago this used to take about 3 to 6 hours to complete, but lately it takes over 14~ hours to complete! I believe it got a lot worse once I installed more RAM on my machine (it used to have 4gb, then 12gb, then 16gb).
I am aware sometimes hdd transfers can go from 60mbps to let's say 40 or even 20mbps after a while, but this wasn't the case, I noticed that my dmesg had lots of: task jbd2/sdX bla bla blocked for more than 120 seconds. Sometimes it was jbd2, sometimes it was cp, or borg, all tasks related to the transfer, on iotop-c I also noticed the transfer was fluctuating from 0 to 2Mbps, if I suspended the task, and waited for the filesystem to sync and then resumed the task, it would spike to 50/40mbps for some time, and then return to locks and slow write performance.

I am just a debian/linux enthusiast, I am ignorant on the matter, so I am still not sure how to solve this, i just know that setting my system to use vm.dirty_background_bytes = 4194304 AND vm.dirty_bytes = 4194304 got rid of the deadlocks, I also tried setting vm dirty ratio and vm dirty background ratio to lower values and that reduced the amount of deadlocks, but it didn't got rid of the deadlocks, compared to the 4194304 setting.

the 4M (4194304) value helped to get rid of the deadlocks, but I am not sure if this value is optimal for other processes/daily use, or even my system/case overall, I also don't know if there is a way to tell the most optimal value for the X gb of RAM on my system. As I am not 100% sure if the transfers are still at their most optimal speed (still to be tested).

Anyway, I hope we can get more info on the matter.
(sorry for any grammar mistake)

Aki
Global Moderator
Global Moderator
Posts: 2816
Joined: 2014-07-20 18:12
Location: Europe
Has thanked: 68 times
Been thanked: 382 times

Re: [Software] Serious deadlock issue jbd2

#30 Post by Aki »

Penaut Butter wrote: 2023-07-13 04:03 Hey everyone. I am also trying to work my way through this locks and to understand what is happening with my system
this https://www.suse.com/support/kb/doc/?id=000017857 and this https://wiki.archlinux.org/title/Sysctl#Virtual_memory is worth reading to try to understand the cause of this frustrating and nasty issue. [..]
@Penaut Butter:
Thank you so much for your contribution to the discussion.

If it is possible, it could be interesting to replicate it. Could it be possible for you to share a proof of concept (as terminal commands) that replicates the issue on your computer ?
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
⠈⠳⣄⠀

User avatar
Penaut Butter
Posts: 13
Joined: 2018-05-31 21:59
Has thanked: 1 time

Re: [Software] Serious deadlock issue jbd2

#31 Post by Penaut Butter »

Aki wrote: 2023-07-15 09:23 If it is possible, it could be interesting to replicate it. Could it be possible for you to share a proof of concept (as terminal commands) that replicates the issue on your computer ?
Oops, my account was locked for an unknown reason, but after some weeks the admin unlocked it thankfully (thanks!).
Anyway.
My advice if you want to replicate my 'not so complex scenario', would be to edit/create the /etc/sysctl.d/40-dirty.conf file and try any of these settings AND the defaults. And on each boot transfer at least 16GiB of data from one HDD to another HDD (if possible one of them external), it is optional to divide the data into 500mb files, this to mimic borg's 'segment files' but maybe it is not necessary, it was just the task I was doing when I noticed this limitation (except it was about 600+GiB). Anyway, depending on the setting, after about 4.1GiB of transferred data, the system will have a hard time to keep up, and it will get worse, and worse. The more data you try to transfer... the worst, specially with the defaults.
Settings:
vm.dirty_background_ratio = 1
vm.dirty_ratio = 2
or
vm.dirty_background_bytes = 4194304
vm.dirty_bytes = 4194304
this on a system with 16384 MB RAM installed


A longer explanation to the test I did:
to keep things simple (i hope) I am going to call sda the hdd receiving/writing data, and sdb the drive transferring/reading the data.
the first setting (ratio), seems to be the most balanced one I don't get any lockups! and the speed seems to be mostly stable at least checking with iotop-c, I mean it has a few pauses, but I presume it is flushing cache to sda because the activity on both sdb's dock LED and iotop-c 'disk read' output cease (0,00B), but sda is writing, after it finishes writing about 1GiB or 500MB, everything stops for a few seconds, and then sdb resumes activity and the whole process repeats itself. It is not the best performance I've had, but it gets the job done.

the second setting (bytes), if I recall correctly it gets only a few lockups, but it is mostly 'stable', the write speed is slow in my case... but the reads and writes are mostly frequent, sdb is pretty much always active, same sda, but the write speed is just very low, like 2MB or 10MB at best.

and there is the original option which is debian's defaults (no 40-dirty.conf file but the ratios are 20 and 10 iirc), this one being the worst configuration for my case, it has TONS, and I mean TONS of deadlocks, the system will hang for SEVERAL minutes depending on how much data is being transferred, it will hang so bad that even sdb's dock LED will enter into some kind of stand-by mode (the LED will power off, only to blink once after a few seconds, meaning there is no activity whatsoever), once it resumes sdb will read like, 100 or 300~mb at best, then, sda will write a very tiny amount of data (VERY slowly) only to get another deadlock... it's just sad! and frustrating to see.

rontarplan
Posts: 17
Joined: 2023-05-11 07:46
Been thanked: 1 time

Re: [Software] Serious deadlock issue jbd2

#32 Post by rontarplan »

Aki wrote: 2023-06-12 11:15
kent_dorfman766 wrote: 2023-05-26 11:11 sure looks like you've got a task/process on that machine hammering the array. You've never said "what" is on the array. If it were me I'd run the thing in single user mode to watch the IO metrics, then I'd incrementally start services and watch for disk utilization.
@rontarplan : have you identified the culprit ?
Back to the discussion. Nice to see that I am not alone with this issue.
I am coming to the conclusion that it is a bug more and more.
To recap. We are running several machines from single disk to multidisk with different raids. All of them under heavy IO at times.
We see no locks on the single disk machines, no locks on some machines with hw-raid, no locks on machines with sw-raid (mdadm) and locks on some machines with hw-raid.
Since my last post, we completely exchanged the hw-raid machines that were locking, but the exchanged ones had the same problem immediately again.
The nasty thing in my case is that the locks cause the machine to stall at some point to where only a reboot helps and this comes with the requirement of fsck afterwards which takes a very long time.
What finally helped - and this is why I am thinking this is a bug now - was to mount the array with option

Code: Select all

data=journal
in fstab.
I have no idea why this helps. All I can tell is that since that change the 2 machines that have been stalling frequently, have not stalled a single time. Running for months now.
Maybe this information helps to identify the underlying cause. As with that setting, data are written into the journal first, while with the default data=ordered mode they are not but written to the file system first. Maybe the stalling means that it is waiting for the journal to complete and for some reason it does not send completion information. I don't know.

But this is my solution for now as it seems to have solved it.

Aki
Global Moderator
Global Moderator
Posts: 2816
Joined: 2014-07-20 18:12
Location: Europe
Has thanked: 68 times
Been thanked: 382 times

Re: [Software] Serious deadlock issue jbd2

#33 Post by Aki »

Hello,
rontarplan wrote: 2023-11-10 09:36 [..]
What finally helped - and this is why I am thinking this is a bug now - was to mount the array with option

Code: Select all

data=journal
in fstab.
I have no idea why this helps. All I can tell is that since that change the 2 machines that have been stalling frequently, have not stalled a single time. Running for months now.
Maybe this information helps to identify the underlying cause. As with that setting, data are written into the journal first, while with the default data=ordered mode they are not but written to the file system first. Maybe the stalling means that it is waiting for the journal to complete and for some reason it does not send completion information. I don't know.

But this is my solution for now as it seems to have solved it.
Thank your for sharing the solution to the reported issue.

Your feedback sheds light on a niche topic, but still very useful for other Debian users and forum readers with the same issue.

If you agree, please mark the discussion as "solved" adding the tag "[Solved]" at the beginning of the subject of the first message (after other tags), for example:
[Software] [Solved] Serious deadlock issue jbd2 with disks in RAID configuration
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
⠈⠳⣄⠀

User avatar
Penaut Butter
Posts: 13
Joined: 2018-05-31 21:59
Has thanked: 1 time

Re: [Software] Serious deadlock issue jbd2

#34 Post by Penaut Butter »

rontarplan wrote: 2023-11-10 09:36
Aki wrote: 2023-06-12 11:15
kent_dorfman766 wrote: 2023-05-26 11:11 sure looks like you've got a task/process on that machine hammering the array. You've never said "what" is on the array. If it were me I'd run the thing in single user mode to watch the IO metrics, then I'd incrementally start services and watch for disk utilization.
@rontarplan : have you identified the culprit ?
Back to the discussion. Nice to see that I am not alone with this issue.
I am coming to the conclusion that it is a bug more and more.
To recap. We are running several machines from single disk to multidisk with different raids. All of them under heavy IO at times.
We see no locks on the single disk machines, no locks on some machines with hw-raid, no locks on machines with sw-raid (mdadm) and locks on some machines with hw-raid.
Since my last post, we completely exchanged the hw-raid machines that were locking, but the exchanged ones had the same problem immediately again.
The nasty thing in my case is that the locks cause the machine to stall at some point to where only a reboot helps and this comes with the requirement of fsck afterwards which takes a very long time.
What finally helped - and this is why I am thinking this is a bug now - was to mount the array with option

Code: Select all

data=journal
in fstab.
I have no idea why this helps. All I can tell is that since that change the 2 machines that have been stalling frequently, have not stalled a single time. Running for months now.
Maybe this information helps to identify the underlying cause. As with that setting, data are written into the journal first, while with the default data=ordered mode they are not but written to the file system first. Maybe the stalling means that it is waiting for the journal to complete and for some reason it does not send completion information. I don't know.

But this is my solution for now as it seems to have solved it.
hmmm... I wonder if the bug is related to journaling filesystems (edit: that, or the delayed allocation feature), because today I noticed this couldn't be reproduced on a btrfs filesystem. I will try testing more later, since I don't know the equivalent to data=journal to xfs (redundant).

rontarplan
Posts: 17
Joined: 2023-05-11 07:46
Been thanked: 1 time

Re: [Software] Serious deadlock issue jbd2

#35 Post by rontarplan »

Aki wrote: 2023-11-10 14:58 If you agree, please mark the discussion as "solved" adding the tag "[Solved]" at the beginning of the subject of the first message (after other tags), for example:
[Software] [Solved] Serious deadlock issue jbd2 with disks in RAID configuration
I could do that. However I am still not sure if this is a solution of a workaround.
Penaut Butter wrote: 2023-11-11 08:54 hmmm... I wonder if the bug is related to journaling filesystems (edit: that, or the delayed allocation feature), because today I noticed this couldn't be reproduced on a btrfs filesystem. I will try testing more later, since I don't know the equivalent to data=journal to xfs (redundant).
I have just double checked and noted, that we have also added

Code: Select all

journal_checksum
and

Code: Select all

journal_async_commit
as mount option.
So the complete addition is

Code: Select all

journal_checksum,journal_async_commit,data=journal
With the journal_async_commit option, the data=ordered mode is not possible on ext4. Only journal and writeback are allowed as it seems.
So it could be that

Code: Select all

journal_async_commit
is the real game changer in my case. We have only tried together with data=journal so maybe even data=writeback would work. As full data journaling puts additional load on the IO system, this mode would be even be preferable. But we have not tested, if it would work or if the machines would start to fail again.

User avatar
Penaut Butter
Posts: 13
Joined: 2018-05-31 21:59
Has thanked: 1 time

Re: [Software] Serious deadlock issue jbd2

#36 Post by Penaut Butter »

Well I just had this issue while transferring data from an xfs drive to a btrfs drive on my archlinux (linux 6.7.1) system, so this has to be a kernel issue I guess.

Aki
Global Moderator
Global Moderator
Posts: 2816
Joined: 2014-07-20 18:12
Location: Europe
Has thanked: 68 times
Been thanked: 382 times

Re: [Software] Serious deadlock issue jbd2

#37 Post by Aki »

Hello,
Penaut Butter wrote: 2024-01-29 04:06 Well I just had this issue while transferring data from an xfs drive to a btrfs drive on my archlinux (linux 6.7.1) system, so this has to be a kernel issue I guess.
Thanks for your feedback.

The jbd2 (quoted in the topic) is a kernel module for ext4 filesystem, but you are now reporting about xfs and btrfs filesystems.

Does the solution reported in the previous post still apply to the problem detected for xfs and btrfs filesystems?
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
⠈⠳⣄⠀

User avatar
Penaut Butter
Posts: 13
Joined: 2018-05-31 21:59
Has thanked: 1 time

Re: [Software] Serious deadlock issue jbd2

#38 Post by Penaut Butter »

Aki wrote: 2024-01-31 08:03 Hello,
Penaut Butter wrote: 2024-01-29 04:06 Well I just had this issue while transferring data from an xfs drive to a btrfs drive on my archlinux (linux 6.7.1) system, so this has to be a kernel issue I guess.
Thanks for your feedback.

The jbd2 (quoted in the topic) is a kernel module for ext4 filesystem, but you are now reporting about xfs and btrfs filesystems.

Does the solution reported in the previous post still apply to the problem detected for xfs and btrfs filesystems?
No it didn't, my issue has to do with several tasks locking up randomly when heavy i/o is happening, examples being:
INFO: task xfsaild/dm-1:1346 blocked for more than 120 seconds.
INFO: task kworker/u32:17:6058 blocked for more than 120 seconds.
INFO: task cp:5940 blocked for more than 120 seconds.
INFO: task wireplumber:2406 blocked for more than 120 seconds.
INFO: task panel-1-weather:2927 blocked for more than 120 seconds.
INFO: task pool-Thunar:3509 blocked for more than 120 seconds.

except for wireplumber and a weather applet (which probably was blocked because I tried click it while a heavy i/o activity was running), what these tasks have in common is that cp and thunar are often used by me to write data, so the actual task doing the i/o activity often gets blocked, it could be any command/program mv, cp, rsync, borgbackup, thunar, etc, which lends me to think that first, all of them are God-tier software, and I doubt they have a bug in common that nukes my system everytime, and two, that an underlying cause with either 'fsync' or something like that is nuking my system, because a very frequent task blocked is kworker, and xfsaild (and jbd in the case of ext4).
I will have to try my best and write a detailed bugreport to kernel.org since it happened with archlinux as well (and with very different filesystems). BUT, then again, this was somehow, already reported and ignored? https://bugzilla.kernel.org/show_bug.cgi?id=204253 This bug report details the same behavior I see with different software and versions, but it is the same behavior. There are several bug reports with "X has been blocked for more than 120 seconds" that could be related.

I am not sure if it has something to do with delayed allocation features triggering something, because now that I think about it my BTRFS+arch install has autofrag enabled, reading the docs it says "When enabled, small random writes into files (in a range of tens of kilobytes, currently it’s 64KiB) are detected and queued up for the defragmentation process." so they might be similar? I might be comparing apples to oranges. Anyway I will have to read bugreports and do some tests and then report this on my own.

Aki
Global Moderator
Global Moderator
Posts: 2816
Joined: 2014-07-20 18:12
Location: Europe
Has thanked: 68 times
Been thanked: 382 times

Re: [Software] Serious deadlock issue jbd2

#39 Post by Aki »

Penaut Butter wrote: 2024-02-10 07:03 [..]
I am not sure if it has something to do with delayed allocation features triggering something, because now that I think about it my BTRFS+arch install has autofrag enabled, reading the docs it says "When enabled, small random writes into files (in a range of tens of kilobytes, currently it’s 64KiB) are detected and queued up for the defragmentation process." so they might be similar? I might be comparing apples to oranges. Anyway I will have to read bugreports and do some tests and then report this on my own.
Thank you for your detailed feedback. Please feel free to open a new discussion as it is likely to be a different topic (I mean probably caused by different issues) even if the error message is the same.
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
⠈⠳⣄⠀

Aki
Global Moderator
Global Moderator
Posts: 2816
Joined: 2014-07-20 18:12
Location: Europe
Has thanked: 68 times
Been thanked: 382 times

Re: [Software] Serious deadlock issue jbd2

#40 Post by Aki »

Hello @rontarplan,
rontarplan wrote: 2023-11-14 03:54
Aki wrote: 2023-11-10 14:58 If you agree, please mark the discussion as "solved" adding the tag "[Solved]" at the beginning of the subject of the first message (after other tags), for example:
[Software] [Solved] Serious deadlock issue jbd2 with disks in RAID configuration
I could do that. However I am still not sure if this is a solution of a workaround.
Penaut Butter wrote: 2023-11-11 08:54 hmmm... I wonder if the bug is related to journaling filesystems (edit: that, or the delayed allocation feature), because today I noticed this couldn't be reproduced on a btrfs filesystem. I will try testing more later, since I don't know the equivalent to data=journal to xfs (redundant).
I have just double checked and noted, that we have also added

Code: Select all

journal_checksum
and

Code: Select all

journal_async_commit
as mount option.
So the complete addition is

Code: Select all

journal_checksum,journal_async_commit,data=journal
With the journal_async_commit option, the data=ordered mode is not possible on ext4. Only journal and writeback are allowed as it seems.
So it could be that

Code: Select all

journal_async_commit
is the real game changer in my case. We have only tried together with data=journal so maybe even data=writeback would work. As full data journaling puts additional load on the IO system, this mode would be even be preferable. But we have not tested, if it would work or if the machines would start to fail again.
Did you have any further updates on the topic ?
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
⠈⠳⣄⠀

Post Reply