Debian 10.4 isn't mounting my mdadm array and 10.3 did

New to Debian (Or Linux in general)? Ask your questions here!

Debian 10.4 isn't mounting my mdadm array and 10.3 did

Postby road hazard » 2020-05-11 11:49

Did a fresh install of Debian 10.3 several days ago and no problems. This morning, I upgraded to 10.4 (which I guess includes kernel 4.19.0-9) and noticed my mdadm raid 6 array wasn't mounted at boot. I restarted the PC and went back to 4.19.0-8 and my array is being mounted again.

Being a Linux newbie, when Debian releases a new Kernel update, is there anything I need to do on my side to get mdadm going again? I was using Mint for a while and don't recall this problem with kernel updates so maybe it's a Debian specific thing?

Was reading the notes for 10.4 and don't remember anything in the package list about mdadm changing. Any ideas?

Thanks
road hazard
 
Posts: 25
Joined: 2017-06-21 19:36

Re: Debian 10.4 isn't mounting my mdadm array and 10.3 did

Postby pendrachken » 2020-05-11 19:45

Just booting back to an older kernel will not make you go back to an earlier release of Debian. So you are running 10.4, just with a known working kernel from 10.3. The release is still 10.4 because all of the OS other than that kernel is running the versions of software in 10.4.

By going back to a previous kernel and having it work immediately means there likely was a regression in the latest kernel. Or you could have had a one in a million failure when booting the newer kernel.
Your best bet would be to try booting a few times with the newest kernel, and see if your array fails to mount each time, or only part of the time.

If it fails every time, try mounting the array manually and see if that works. If all else fails, file a bug report, and state what testing you did ( the booting / manual mounting / ETC, and what the outcomes were ).
fortune -o
Your love life will be... interesting.
:twisted: How did it know?

The U.S. uses the metric system too, we have tenths, hundredths and thousandths of inches :-P
pendrachken
 
Posts: 1374
Joined: 2007-03-04 21:10
Location: U.S.A. - WI.

Re: Debian 10.4 isn't mounting my mdadm array and 10.3 did

Postby road hazard » 2020-05-12 15:34

pendrachken wrote:Just booting back to an older kernel will not make you go back to an earlier release of Debian. So you are running 10.4, just with a known working kernel from 10.3. The release is still 10.4 because all of the OS other than that kernel is running the versions of software in 10.4.

By going back to a previous kernel and having it work immediately means there likely was a regression in the latest kernel. Or you could have had a one in a million failure when booting the newer kernel.
Your best bet would be to try booting a few times with the newest kernel, and see if your array fails to mount each time, or only part of the time.

If it fails every time, try mounting the array manually and see if that works. If all else fails, file a bug report, and state what testing you did ( the booting / manual mounting / ETC, and what the outcomes were ).


Thanks for the feedback! I was poking around in syslog and maybe found a clue:

First, booting with the 4.19.0-9 kernel (raid isn't being mounted), I see this error:

Code: Select all
May 12 07:37:13 server-pc systemd[1]: dev-md0.device: Job dev-md0.device/start timed out.
May 12 07:37:13 server-pc systemd[1]: Timed out waiting for device /dev/md0.
May 12 07:37:13 server-pc systemd[1]: Dependency failed for /mnt/md0.
May 12 07:37:13 server-pc systemd[1]: mnt-md0.mount: Job mnt-md0.mount/start failed with result 'dependency'.
May 12 07:37:13 server-pc systemd[1]: Startup finished in 11.398s (kernel) + 1min 30.148s (userspace) = 1min 41.546s.
May 12 07:37:13 server-pc systemd[1]: dev-md0.device: Job dev-md0.device/start failed with result 'timeout'.


Later on, the array is being assembled as md127 I guess?

Code: Select all
May 12 07:40:19 server-pc kernel: [   11.049712] md/raid:md127: device sdk operational as raid disk 0
May 12 07:40:19 server-pc kernel: [   11.049712] md/raid:md127: device sdg operational as raid disk 1
May 12 07:40:19 server-pc kernel: [   11.049713] md/raid:md127: device sdi operational as raid disk 7
May 12 07:40:19 server-pc kernel: [   11.049713] md/raid:md127: device sda operational as raid disk 11
May 12 07:40:19 server-pc kernel: [   11.049713] md/raid:md127: device sdo operational as raid disk 12
May 12 07:40:19 server-pc kernel: [   11.049714] md/raid:md127: device sdn operational as raid disk 13
May 12 07:40:19 server-pc kernel: [   11.049714] md/raid:md127: device sdj operational as raid disk 8
May 12 07:40:19 server-pc kernel: [   11.049714] md/raid:md127: device sdh operational as raid disk 3
May 12 07:40:19 server-pc kernel: [   11.049715] md/raid:md127: device sdl operational as raid disk 4
May 12 07:40:19 server-pc kernel: [   11.049715] md/raid:md127: device sdf operational as raid disk 9
May 12 07:40:19 server-pc kernel: [   11.049715] md/raid:md127: device sdb operational as raid disk 2
May 12 07:40:19 server-pc kernel: [   11.049716] md/raid:md127: device sdd operational as raid disk 6
May 12 07:40:19 server-pc kernel: [   11.049716] md/raid:md127: device sde operational as raid disk 10
May 12 07:40:19 server-pc kernel: [   11.049716] md/raid:md127: device sdc operational as raid disk 5
May 12 07:40:19 server-pc kernel: [   11.050101] md/raid:md127: raid level 6 active with 14 out of 14 devices, algorithm 2
May 12 07:40:19 server-pc kernel: [   11.078716] md127: detected capacity change from 0 to 48007829520384


Now, booting with 4.19.0-8 kernel (raid IS mounted)..... I don't get the error about 'md0.device timing out' and notice how now, the array is called md0?

Code: Select all
May 12 07:42:08 server-pc kernel: [   10.874912] md/raid:md0: device sdh operational as raid disk 3
May 12 07:42:08 server-pc kernel: [   10.874912] md/raid:md0: device sdj operational as raid disk 8
May 12 07:42:08 server-pc kernel: [   10.874912] md/raid:md0: device sdl operational as raid disk 4
May 12 07:42:08 server-pc kernel: [   10.874913] md/raid:md0: device sdb operational as raid disk 2
May 12 07:42:08 server-pc kernel: [   10.874913] md/raid:md0: device sdf operational as raid disk 9
May 12 07:42:08 server-pc kernel: [   10.874913] md/raid:md0: device sdc operational as raid disk 5
May 12 07:42:08 server-pc kernel: [   10.874914] md/raid:md0: device sde operational as raid disk 10
May 12 07:42:08 server-pc kernel: [   10.874914] md/raid:md0: device sdd operational as raid disk 6
May 12 07:42:08 server-pc kernel: [   10.874914] md/raid:md0: device sdg operational as raid disk 1
May 12 07:42:08 server-pc kernel: [   10.874915] md/raid:md0: device sdk operational as raid disk 0
May 12 07:42:08 server-pc kernel: [   10.874915] md/raid:md0: device sdi operational as raid disk 7
May 12 07:42:08 server-pc kernel: [   10.874915] md/raid:md0: device sda operational as raid disk 11
May 12 07:42:08 server-pc kernel: [   10.874915] md/raid:md0: device sdn operational as raid disk 13
May 12 07:42:08 server-pc kernel: [   10.874916] md/raid:md0: device sdo operational as raid disk 12
May 12 07:42:08 server-pc kernel: [   10.875303] md/raid:md0: raid level 6 active with 14 out of 14 devices, algorithm 2
May 12 07:42:08 server-pc kernel: [   10.907815] md0: detected capacity change from 0 to 48007829520384


I'm fairly certain my array has always been md0. I guess that name change (from md127 to md0) is responsible for the failure?

Why is the -9 kernel trying to assemble my array as md127 while -8 uses the correct name (md0)?
road hazard
 
Posts: 25
Joined: 2017-06-21 19:36

Re: Debian 10.4 isn't mounting my mdadm array and 10.3 did

Postby road hazard » 2020-05-14 12:47

Solved!

So my original fstab was mounting my array via the dev name:

Code: Select all
/dev/md0 /mnt/md0 xfs defaults,nofail,discard 0 0


I changed it to mount via the UUID of the file system (provided via lsblk):

Code: Select all
UUID=ab816d7f-8aa0-43c2-ad8d-73bc675b6a34 /mnt/md0 xfs defaults,nofail,discard 0 0


And now the array is mounting perfectly* with kernel -9!!!

*For some weird reason though, it's still being mounted as md127 but at least I can access it normally from /mnt/md0, like I always could. But the md127 is still weird to me. Here is my current mdadm.conf file:

Code: Select all
ARRAY /dev/md/0 metadata=1.2 spares=1 name=server-pc:0 UUID=ab816d7f-8aa0-43c2-ad8d-73bc675b6a34


I ran the update-initramfs -u command after everything was up and running under -9..... should I go down the path of renaming the array?

Code: Select all
sudo mdadm --assemble /dev/md/0 --update=name --name=0 /dev/blah /dev/blah /dev/blah etc etc etc


.... or if everything is working fine (which it looks like it is), just leave well enough alone? I mean, I don't really care about having to use md127 when checking the array via mdadm commands, where as before, all my commands referenced md0..... it mounts under /mnt/md0 ..... it's working normally..... my Plex server can see everything on the array..... just leave it alone?
road hazard
 
Posts: 25
Joined: 2017-06-21 19:36

Re: Debian 10.4 isn't mounting my mdadm array and 10.3 did

Postby Usoop » 2020-05-14 13:53

road hazard wrote:
pendrachken wrote:Just booting back to an older kernel will not make you go back to an earlier release of Debian. So you are running 10.4, just with a known working kernel from 10.3. The release is still 10.4 because all of the OS other than that kernel is running the versions of software in 10.4.

By going back to a previous kernel and having it work immediately means there likely was a regression in the latest kernel. Or you could have had a one in a million failure when booting the newer kernel.
Your best bet would be to try booting a few times with the newest kernel, and see if your array fails to mount each time, or only part of the time.

If it fails every time, try mounting the array manually and see if that works. If all else fails, file a bug report, and state what testing you did ( the booting / manual mounting / ETC, and what the outcomes were ).


Thanks for the feedback! I was poking around in syslog and maybe found a clue:

First, booting with the 4.19.0-9 kernel (raid isn't being mounted), I see this error:

Code: Select all
May 12 07:37:13 server-pc systemd[1]: dev-md0.device: Job dev-md0.device/start timed out.
May 12 07:37:13 server-pc systemd[1]: Timed out waiting for device /dev/md0.
May 12 07:37:13 server-pc systemd[1]: Dependency failed for /mnt/md0.
May 12 07:37:13 server-pc systemd[1]: mnt-md0.mount: Job mnt-md0.mount/start failed with result 'dependency'.
May 12 07:37:13 server-pc systemd[1]: Startup finished in 11.398s (kernel) + 1min 30.148s (userspace) = 1min 41.546s.
May 12 07:37:13 server-pc systemd[1]: dev-md0.device: Job dev-md0.device/start failed with result 'timeout'.


Later on, the array is being assembled as md127 I guess?

Code: Select all
May 12 07:40:19 server-pc kernel: [   11.049712] md/raid:md127: device sdk operational as raid disk 0
May 12 07:40:19 server-pc kernel: [   11.049712] md/raid:md127: device sdg operational as raid disk 1
May 12 07:40:19 server-pc kernel: [   11.049713] md/raid:md127: device sdi operational as raid disk 7
May 12 07:40:19 server-pc kernel: [   11.049713] md/raid:md127: device sda operational as raid disk 11
May 12 07:40:19 server-pc kernel: [   11.049713] md/raid:md127: device sdo operational as raid disk 12
May 12 07:40:19 server-pc kernel: [   11.049714] md/raid:md127: device sdn operational as raid disk 13
May 12 07:40:19 server-pc kernel: [   11.049714] md/raid:md127: device sdj operational as raid disk 8
May 12 07:40:19 server-pc kernel: [   11.049714] md/raid:md127: device sdh operational as raid disk 3
May 12 07:40:19 server-pc kernel: [   11.049715] md/raid:md127: device sdl operational as raid disk 4
May 12 07:40:19 server-pc kernel: [   11.049715] md/raid:md127: device sdf operational as raid disk 9
May 12 07:40:19 server-pc kernel: [   11.049715] md/raid:md127: device sdb operational as raid disk 2
May 12 07:40:19 server-pc kernel: [   11.049716] md/raid:md127: device sdd operational as raid disk 6
May 12 07:40:19 server-pc kernel: [   11.049716] md/raid:md127: device sde operational as raid disk 10
May 12 07:40:19 server-pc kernel: [   11.049716] md/raid:md127: device sdc operational as raid disk 5
May 12 07:40:19 server-pc kernel: [   11.050101] md/raid:md127: raid level 6 active with 14 out of 14 devices, algorithm 2
May 12 07:40:19 server-pc kernel: [   11.078716] md127: detected capacity change from 0 to 48007829520384


Now, booting with 4.19.0-8 kernel (raid IS mounted)..... I don't get the error about 'md0.device timing out' and notice how now, the array is called md0?

Code: Select all
May 12 07:42:08 server-pc kernel: [   10.874912] md/raid:md0: device sdh operational as raid disk 3
May 12 07:42:08 server-pc kernel: [   10.874912] md/raid:md0: device sdj operational as raid disk 8
May 12 07:42:08 server-pc kernel: [   10.874912] md/raid:md0: device sdl operational as raid disk 4
May 12 07:42:08 server-pc kernel: [   10.874913] md/raid:md0: device sdb operational as raid disk 2
May 12 07:42:08 server-pc kernel: [   10.874913] md/raid:md0: device sdf operational as raid disk 9
May 12 07:42:08 server-pc kernel: [   10.874913] md/raid:md0: device sdc operational as raid disk 5
May 12 07:42:08 server-pc kernel: [   10.874914] md/raid:md0: device sde operational as raid disk 10
May 12 07:42:08 server-pc kernel: [   10.874914] md/raid:md0: device sdd operational as raid disk 6
May 12 07:42:08 server-pc kernel: [   10.874914] md/raid:md0: device sdg operational as raid disk 1
May 12 07:42:08 server-pc kernel: [   10.874915] md/raid:md0: device sdk operational as raid disk 0
May 12 07:42:08 server-pc kernel: [   10.874915] md/raid:md0: device sdi operational as raid disk 7
May 12 07:42:08 server-pc kernel: [   10.874915] md/raid:md0: device sda operational as raid disk 11
May 12 07:42:08 server-pc kernel: [   10.874915] md/raid:md0: device sdn operational as raid disk 13
May 12 07:42:08 server-pc kernel: [   10.874916] md/raid:md0: device sdo operational as raid disk 12
May 12 07:42:08 server-pc kernel: [   10.875303] md/raid:md0: raid level 6 active with 14 out of 14 devices, algorithm 2
May 12 07:42:08 server-pc kernel: [   10.907815] md0: detected capacity change from 0 to 48007829520384


I'm fairly certain my array has always been md0. I guess that name change (from md127 to md0) is responsible for the failure?

Why is the -9 kernel trying to assemble my array as md127 while -8 uses the correct name (md0)?


Thanks mate
Usoop
 
Posts: 2
Joined: 2020-04-16 10:34


Return to Beginners Questions

Who is online

Users browsing this forum: No registered users and 7 guests

fashionable