[Solved] Raid1 says clean but won't assemble

Message

seahorse41 · #1 Post by **seahorse41** » 2024-03-06 18:40

I had a working mdadm raid1.
With the intent of making a backup of my primary OS when it was unmounted, and after researching the safe commands to reassemble the raid in a different linux OS, I rebooted to System-Rescue-10, copied the mdadm.conf to /etc over the existing template. and attempted to mount /dev/md0
I got only one drive up. Ensue the mild panic, because this is the reason I did research first.
I aborted, and rebooted back to Debian 11 to see if anything was harmed. Sadly yes.
It says the drives are clean, but it won't reassemble. I am uncertain which path to take to get it back together.
If it was marked degraded, I would remove the bad one, and re-add it.

How can I determine why it is getting stuck?

Code: Select all

$ sudo mdadm --examine /dev/sd[de]1
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 1e3f7f7e:23a5b75f:6f76abf5:88f5e704
           Name : roxy10-debian11-x64:0  (local to host roxy10-debian11-x64)
  Creation Time : Sat Jan 27 12:07:27 2024
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 11720777728 (5588.90 GiB 6001.04 GB)
     Array Size : 5860388864 (5588.90 GiB 6001.04 GB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=0 sectors
          State : clean
    Device UUID : b65dd512:8928c097:47debae7:9c944a3e

Internal Bitmap : 8 sectors from superblock
    Update Time : Sat Mar  2 18:50:02 2024
  Bad Block Log : 512 entries available at offset 32 sectors
       Checksum : 39e567c9 - correct
         Events : 21691

   Device Role : Active device 0
   Array State : AA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 1e3f7f7e:23a5b75f:6f76abf5:88f5e704
           Name : roxy10-debian11-x64:0  (local to host roxy10-debian11-x64)
  Creation Time : Sat Jan 27 12:07:27 2024
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 11720777728 (5588.90 GiB 6001.04 GB)
     Array Size : 5860388864 (5588.90 GiB 6001.04 GB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=0 sectors
          State : clean
    Device UUID : c7b9578b:0eef6ae2:6c33a25b:386cf478

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Mar  5 07:38:27 2024
  Bad Block Log : 512 entries available at offset 32 sectors
       Checksum : aba80e46 - correct
         Events : 21704

   Device Role : Active device 1
   Array State : .A ('A' == active, '.' == missing, 'R' == replacing)

$ sudo mdadm --assemble --verbose /dev/md0 /dev/sdd1 /dev/sde1
mdadm: looking for devices for /dev/md0
mdadm: /dev/sdd1 is busy - skipping
mdadm: Merging with already-assembled /dev/md0
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdd1 is already in /dev/md0 as 0
mdadm: added /dev/sde1 to /dev/md0 as 1
mdadm: /dev/md0 has been started with 1 drive (out of 2).

$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : active (auto-read-only) raid1 sde1[1]
      5860388864 blocks super 1.2 [2/1] [_U]
      bitmap: 1/44 pages [4KB], 65536KB chunk

unused devices: <none>

$ lsblk
sdd       8:48   0   5.5T  0 disk  
└─sdd1    8:49   0   5.5T  0 part  
sde       8:64   0   5.5T  0 disk  
└─sde1    8:65   0   5.5T  0 part  
  └─md0   9:0    0   5.5T  0 raid1 /mnt/Ugreen_RAID1_6Tb

$ sudo dmsetup table
No devices found

$ sudo mdadm -E /dev/sdd
/dev/sdd:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)

$ sudo mdadm -E /dev/sde
/dev/sde:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)

Among the attempts I did also do a mdadm --stop md0 to remove the busy message, then reassemble from the right starting point, not showing in lsblk.

Also the 21,000 events happened with the previous enclosure in January, so is not relevant now.

I want to understand what happened before trying --force, or --zero-superblock, which appeared as solutions in my subsequent searches.
I can't tell for sure if the superblock is the problem, but since I don't see any error about it, I am looking for other ideas.

Code: Select all

$ cat /etc/fstab | grep dev/md
/dev/md0  /mnt/Ugreen_RAID1_6Tb   ext3  defaults,noatime,rw,nofail,x-systemd.device-timeout=4  0  0

$ sudo fdisk -l /dev/sdd
Disk /dev/sdd: 5.46 TiB, 6001175126016 bytes, 11721045168 sectors
Disk model: 726T6TALE604    
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 89AEDBB8-E01E-47FA-859B-A415D7DDEE35

Device     Start         End     Sectors  Size Type
/dev/sdd1   2048 11721043967 11721041920  5.5T Linux filesystem

$ sudo fdisk -l /dev/sde
Disk /dev/sde: 5.46 TiB, 6001175126016 bytes, 11721045168 sectors
Disk model: 726T6TALE604    
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: FF7400C3-E30B-43C6-98F4-9783F92981D0

Device     Start         End     Sectors  Size Type
/dev/sde1   2048 11721043967 11721041920  5.5T Linux filesystem

$ cat /etc/mdadm/mdadm.conf
ARRAY /dev/md0 metadata=1.2 name=roxy10-debian11-x64:0 UUID=1e3f7f7e:23a5b75f:6f76abf5:88f5e704

$ cat /proc/version 
Linux version 5.10.0-26-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.197-1 (2023-09-29)

$ cat /etc/debian_version 
11.8

Thank you

#2 Post by **Aki** » 2024-03-06 19:44

Hello,

As you reported, mdadm cannot access to disk /dev/sdd1:

Code: Select all

$ sudo mdadm --assemble --verbose /dev/md0 /dev/sdd1 /dev/sde1
mdadm: looking for devices for /dev/md0
--> mdadm: /dev/sdd1 is busy - skipping <--
mdadm: Merging with already-assembled /dev/md0
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdd1 is already in /dev/md0 as 0
mdadm: added /dev/sde1 to /dev/md0 as 1
mdadm: /dev/md0 has been started with 1 drive (out of 2).

I may be useful to check system journal/logs for errors about the busy disk (/dev/sdd1) and/or if other processes are accessing it.

seahorse41 · #3 Post by **seahorse41** » 2024-03-07 06:12

I can find some errors:

Code: Select all

[49887.632810] sd 7:0:0:0: [sdc] tag#26 data cmplt err -75 uas-tag 1 inflight: CMD 
[49887.632819] sd 7:0:0:0: [sdc] tag#26 CDB: Read(16) 88 00 00 00 00 02 ba a0 f0 f8 00 00 00 08 00 00
[49919.188430] sd 7:0:0:0: [sdc] tag#26 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD 
[49919.188444] sd 7:0:0:0: [sdc] tag#26 CDB: Read(16) 88 00 00 00 00 02 ba a0 f0 f8 00 00 00 08 00 00
[49919.216448] scsi host7: uas_eh_device_reset_handler start
[49924.296228] usb 7-2: Disable of device-initiated U1 failed.
[49929.415838] usb 7-2: Disable of device-initiated U2 failed.

[49938.242312] xhci_hcd 0000:01:00.0: WARN: Slot ID 5, ep index 2 has streams, but URB has no stream ID.
[49938.242324] xhci_hcd 0000:01:00.0: WARN can't find new dequeue state for invalid stream ID 0.
[49938.242328] xhci_hcd 0000:01:00.0: WARN Cannot submit Set TR Deq Ptr
[49938.242333] xhci_hcd 0000:01:00.0: WARN deq seg = 000000002d07b215, deq pt = 00000000ce5fe7ae
[49968.336429] sd 7:0:0:0: [sdc] tag#16 uas_eh_abort_handler 0 uas-tag 2 inflight: CMD IN 
[49968.336445] sd 7:0:0:0: [sdc] tag#16 CDB: Read(16) 88 00 00 00 00 02 ba a0 ef 80 00 00 00 08 00 00
[49968.356436] scsi host7: uas_eh_device_reset_handler start
[49968.356772] sd 7:0:0:0: [sdc] tag#20 uas_zap_pending 0 uas-tag 1 inflight: CMD 
[49968.356780] sd 7:0:0:0: [sdc] tag#20 CDB: Read(16) 88 00 00 00 00 02 ba a0 f0 f8 00 00 00 08 00 00
[49968.485091] usb 7-2: reset SuperSpeed Gen 1 USB device number 3 using xhci_hcd
[49968.507921] scsi host7: uas_eh_device_reset_handler success
[49968.598077] sd 7:0:0:0: [sdc] tag#20 data cmplt err -75 uas-tag 1 inflight: CMD 
[49968.598093] sd 7:0:0:0: [sdc] tag#20 CDB: Read(16) 88 00 00 00 00 00 00 00 00 18 00 00 00 08 00 00
[49999.057933] sd 7:0:0:0: [sdc] tag#20 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD 
[49999.057948] sd 7:0:0:0: [sdc] tag#20 CDB: Read(16) 88 00 00 00 00 00 00 00 00 18 00 00 00 08 00 00
[49999.081934] scsi host7: uas_eh_device_reset_handler start
[50004.161772] usb 7-2: Disable of device-initiated U1 failed.
[50009.281340] usb 7-2: Disable of device-initiated U2 failed.

[50044.250744] sd 7:0:0:0: [sdc] tag#22 uas_zap_pending 0 uas-tag 3 inflight: CMD 
[50044.250752] sd 7:0:0:0: [sdc] tag#22 CDB: Read(16) 88 00 00 00 00 00 00 00 01 08 00 00 00 f8 00 00
[50044.250819] sd 7:0:0:0: [sdc] tag#22 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=34s
[50044.250825] sd 7:0:0:0: [sdc] tag#22 CDB: Read(16) 88 00 00 00 00 00 00 00 01 08 00 00 00 f8 00 00
[50044.250832] blk_update_request: I/O error, dev sdc, sector 264 op 0x0:(READ) flags 0x80700 phys_seg 31 prio class 0
[50044.250907] blk_update_request: I/O error, dev sdc, sector 264 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[50044.250915] Buffer I/O error on dev sdc, logical block 33, async page read
[50044.251487] blk_update_request: I/O error, dev sdc, sector 264 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[50044.251494] Buffer I/O error on dev sdc, logical block 33, async page read

but as of the last time I plugged them in, no errors.

Code: Select all

$ sudo mdadm --stop /dev/md0
mdadm: stopped /dev/md0

$ sudo mdadm --assemble --verbose /dev/md0 /dev/sdd1 /dev/sdc1
mdadm: looking for devices for /dev/md0
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1.
mdadm: added /dev/sdd1 to /dev/md0 as 0 (possibly out of date)
mdadm: added /dev/sdc1 to /dev/md0 as 1
mdadm: /dev/md0 has been started with 1 drive (out of 2).

It shows busy when lsblk shows one drive attached to md0, so I stopped md0, and assembled as shown above.
I'm guessing there's nothing to discover now, and whatever happened is blurred. So the plan is get them back in sync, I assume that would be the --force command.
I'll do that tomorrow morning, unless somebody suggests some other test I should do first.

seahorse41 · #4 Post by **seahorse41** » 2024-03-07 14:15

Perhaps I'm asking the wrong question. I am confused why they both show clean, and the word Degraded is not shown.
What would --force do in this case?

Maybe I should treat it as degraded,
Is it better to --delete and --add the one that is not mounting?

#5 Post by **Aki** » 2024-03-07 16:50

Hello,

seahorse41 wrote: ↑2024-03-07 06:12 I'm guessing there's nothing to discover now, and whatever happened is blurred. So the plan is get them back in sync, I assume that would be the --force command.

The behavior of one disk of the array looks like erratic. This behavior may lead to "time out" issues with the RAID:

https://raid.wiki.kernel.org/index.php/Timeout_Mismatch

It may also be useful to document (for future reference) the manufacturer and the exact model of the disks that are part of the RAID1.

seahorse41 · #6 Post by **seahorse41** » 2024-03-07 17:48

Aki wrote: ↑2024-03-07 16:50 The behavior of one disk of the array looks like erratic. This behavior may lead to "time out" issues with the RAID:
https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
It may also be useful to document (for future reference) the manufacturer and the exact model of the disks that are part of the RAID1.

Your url link is 404, but I found what you pointed me to.
Uh oh, how often do these timing errors occur for SMR?
My drives are:
https://www.ebay.com/itm/296011038987

The data sheet for HC310 says it is CMR, and I remember researching this before purchasing them.

I'm leaning toward it being a usb cable or other intermittant connection, as I saw it in dmesg earlier. I then tried plugging various usb hard drives into different ports to see which are more reliable.
(Like a usb-3 port that won't mount a Touro HD, but does mount Seagate)

So my --force or --delete question still stands.

seahorse41 · #7 Post by **seahorse41** » 2024-03-08 00:46

Never mind, the correct answer is:

Code: Select all

mdadm --re-add /dev/md0 /dev/sdc1

The raid1 is back together.

Debian User Forums

[Solved] Raid1 says clean but won't assemble

[Solved] Raid1 says clean but won't assemble

Re: Raid1 says clean but won't assemble

Re: Raid1 says clean but won't assemble

Re: Raid1 says clean but won't assemble

Re: Raid1 says clean but won't assemble

Re: Raid1 says clean but won't assemble

Re: Raid1 says clean but won't assemble