I have a RAID5 array configured with MDADM in Ubuntu 24.04 LTS. The array consists of five 8TB drives, and there have been no changes to it recently.
Today I noticed that the array is not accessible.
mdadm --detail /dev/md0
shows that the array is Inactive:$ mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Raid Level : raid5 Total Devices : 5 Persistence : Superblock is persistent State : inactive Working Devices : 5 Name : Europa:0 (local to host Europa) UUID : 62595935:e04505fc:3e79426a:40326185 Events : 76498 Number Major Minor RaidDevice - 8 1 - /dev/sda1 - 8 81 - /dev/sdf1 - 8 65 - /dev/sde1 - 8 49 - /dev/sdd1 - 8 33 - /dev/sdc1
Using
mdadm --examine
of each drive, I find that they all say the state isClean
, and 4 of them show the array state isAAAAA
(all drives are active), but one of them (sda1
) shows the array state is....A
Using
cat /proc/mdstat
I find that ONLY devicesda1
appears!$ sudo cat /proc/mdstat Personalities : [raid1] [raid0] [raid6] [raid5] [raid4] [raid10] md1 : active raid1 sdg1[1] sdh1[0] 2930132992 blocks super 1.2 [2/2] [UU] bitmap: 0/22 pages [0KB], 65536KB chunk md0 : inactive sda1[4] 7813893632 blocks super 1.2
By looking at
mdadm --examine
for the event count and update time, I can see thatsda1
has slightly more events and a more recent update time than the other drives:$ mdadm --examine /dev/sda1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 | egrep 'Event|/dev/sd' /dev/sda1: Events : 76498 /dev/sdc1: Events : 76490 /dev/sdd1: Events : 76490 /dev/sde1: Events : 76490 /dev/sdf1: Events : 76490
and
$ mdadm --examine /dev/sda1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 | egrep 'Update Time|/dev/sd' /dev/sda1: Update Time : Mon Jan 13 14:51:59 2025 /dev/sdc1: Update Time : Mon Jan 13 05:03:20 2025 /dev/sdd1: Update Time : Mon Jan 13 05:03:20 2025 /dev/sde1: Update Time : Mon Jan 13 05:03:20 2025 /dev/sdf1: Update Time : Mon Jan 13 05:03:20 2025
So my question is how to interpret this?
When searching online I found most people reporting the inverse: one drive has fewer events than the others, and one drive has an earlier Update Time than the others. I can't imagine that 4 drives went bad at exactly the same time, especially since there was no power failure or anything else that might explain a widespread hardware issue.
So what does this mean, and how do I recover the array?
mdadm --assemble --force
. depending on how this happened, data corruption is possible in either case