Then it happened to me again, this time on a VM so I was able to play with it and revert to snapshots to reproduce it, and eventually find a fix for the problem. And amazingly, after the days I spent working on this the first time, the solution is incredibly simple - I'm astounded that I was unable to find anything like it posted anywhere. Here's the scoop...
Install a new system (Debian 8 in my case) and configure software-based RAID. In my case, I had 4 drives with 2 arrays: RAID 1 across 4 drives holding just the /boot partition, and RAID 5 across 3 drives with 1 spare, holding /. During the installation, lets say I gave the machine the hostname of HOSTA. During the installation, initramfs is built with that name in the /etc/mdadm/mdadm.conf file, and that name is also recorded somewhere else in the RAID configuration (maybe in the Superblock for each array?).
OK, the system is running fine, and along the way I decide that HOSTB is a more appropriate name for the machine. No problem, update /etc/hosts, /etc/hostname, and maybe a few other files (even /etc/mdadm/mdadm.conf!). Reboot and all is still OK... for awhile. Then one day it's time to update the OS software... apt-update, followed by apt-upgrade. Everything is still OK, until the next reboot - when loading initramfs, which then tries to mount the root file system, I'm greeted with:
Code: Select all
Gave up waiting for root device. Common problems:
— Boot args (cat /proc/cmdline)
— Check rootdelay= (did the system wait long enough?)
— Check root= (did the system wait for the right device?)
— Missing modules (cat /proc/modules; ls /dev)
ALERT! /dev/disk/by-uuid/054a6dcc-d280-4aff-9977 does not exist.
Dropping to a shell!
BusyBox v.1.22.1 (Ubuntu 1:1.22.0-9deb8u1) built-in shell (ash)
Enter 'help' for list of built-in commands.
/bin/sh: can't access tty: job control turned off
(initramfs)
Code: Select all
(initramfs) vi /etc/mdadm.mdadm.conf
<while in vi> :%s/HOSTB/HOSTA/ # replaces all occurrences of HOSTB with HOSTA
<while in vi> :wq # saves the file and quits vi
(initramfs) mdadm --assemble --scan
mdadm: /dev/md/0 has been started with 4 drives.
mdadm: /dev/md/1 has been started with 3 drives and 1 spare.
(initramfs) exit # the system resumes the boot process and completes normally
Code: Select all
root@HOSTB:~# vi /etc/mdadm/mdadm.conf
<while in vi> :%s/HOSTB/HOSTA/ # replaces all occurrences of HOSTB with HOSTA
<while in vi> :wq # saves the file and quits vi
root@HOSTB:~# update-initramfs -u -k all
Now I can safely reboot the system with its new hostname and all updates installed.
(**) One way to see that the old hostname is still ingrained in the configuration is with the 'mdadm -D /dev/mdX' command, replacing X with the appropriate array number on your system. Here's a sample:
Code: Select all
root@HOSTB:~# mdadm -D /dev/md1
/dev/md1:
Version : 1.2
Creation Time : Thu Dec 29 09:02:43 2016
Raid Level : raid5
Array Size : 2130618368 (2031.92 GiB 2181.75 GB)
Used Dev Size : 1065309184 (1015.96 GiB 1090.88 GB)
Raid Devices : 3
Total Devices : 4
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Thu Feb 9 17:25:16 2017
State : clean
Active Devices : 3
Working Devices : 4
Failed Devices : 0
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 512K
Name : HOSTA:1 # <--- here is the original hostname given to the machine during installation
UUID : ae1ab583:e5e30fb9:e2c807a2:b5e95f42
Events : 4831
Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 8 19 1 active sync /dev/sdb3
2 8 35 2 active sync /dev/sdc3
3 8 51 - spare /dev/sdd3
And finally... why are we having to deal with this at all? Is it a bug in apt-get, updating parts of initramfs that it should not? Or is the problem with mdadm, embedding a volatile hostname parameter as a permanent and irrevocable part of its configuration? Both of these questions are beyond my depth, but I post them here hoping someone whose depth is more appropriate can and will pass them along to relevant parties for consideration.
Update: I've since learned that the file update is done by a post-install script, not by apt. So it is the post-install script that needs to avoid replacing the hostname value in mdadm.conf.