Wednesday, May 9, 2012

Disk failure: Linux vs UNIX


I woke up this morning to a dozen emails from OSSEC, all saying the same thing:
smartd[6835]: Device: /dev/sdg, SMART Failure: FAILURE PREDICTION THRESHOLD EXCEEDED
This wasn't entirely unanticipated; /dev/sdg is one of six members of a software RAID5 I have running out of a pair of old Sun S1 SCSI disk arrays, and the disks in there are ten years old at this point.  However, I never bothered to learn how to "fix" a broken mdraid in Linux, figuring that it can't be too hard since Linux (especially the CentOS I am running in this machine) is allegedly an enterprise OS now.  It's almost trivial to do these sorts of disk swaps and software raid fixes in Solaris, after all.

Of course, this is Linux we're talking about, so nothing is ever as rosy as one would hope. To replace my disk and rebuild my array, I had to do something like this:

1. Figure out to what SCSI ID /dev/sdg corresponds

There is no easy way to do this in Linux since it eschews the sensible cXtYdZ device naming scheme in favor of the very-arbitrary sd[abcdefg].  Somewhat reminiscent of C:\, isn't it?

I wound having to divine the information like this:
# udevinfo -q all -n /dev/sdg
P: /block/sdg
N: sdg
S: disk/by-id/scsi-SSEAGATE_ST336706LC_3FD11881000022422267
S: disk/by-path/pci-0000:06:01.0-scsi-0:0:3:0
One of udev's alleged benefits is that it provides persistent drive naming, but I'm not really seeing how that helps me here.

2. Drop the disk from the software RAID

This step was remarkably painless.  It's just a matter of using mdadm to mark the failed disk as failed, then mdadm again to remove the disk from the array.
# mdadm --fail /dev/md0 /dev/sdg
# mdadm --remove /dev/md0/dev/sdg 
I don't know if there's a way to quiesce the bus and allow for a save disk removal.  I couldn't find a way to do it, so I guess it's a matter of just ripping the spinning disk out.

In Solaris+ZFS, removing a disk from an array is similarly simple.  Solaris also provides a hardware method for safely removing hot-swap disks. The commands would be something like
# zpool offline md0 c4t3d0
# cfgadm -x remove_device c4::dsk/c4t3d0

3. Manually replace the disk in the array

Since I'm using a Sun JBOD, this was easy.  A nice pull-out card tells me where target 3 is located.


Maybe other disk arrays are similarly suitable for brain-dead system administrators like me, but I don't feel like figuring out target offsets when my disks are failing.

4. Tell Linux to rescan the port for the new replacement disk

There is no command to tell Linux to re-scan for new disks.  You can either reboot (hello Windows), or recite this bizarre incantation to trigger a port rescan:
# echo "" > /sys/bus/scsi/devices/4\:0\:3\:0/rescan
Really?  This is how an alleged enterprise OS has you deal with hot-swap disks, which is a standard feature on every server manufactured in the last 10-15 years?  In Solaris, it's literally eight characters.
# devfsadm

5. Reattach and rebuild the array

Since we're back to dealing with mdraid, this step isn't so bad.  Just
# mdadm --add /dev/md0 /dev/sdg
and it starts rebuilding.  In Solaris+ZFS, the command would be something like
# zpool replace md0 c4t3d0

Moral of the Story

This minor episode is symptomatic of the problems I have with Linux as a general OS and enterprise environment.  mdraid has very simple, intuitive, and powerful controls, but it depends on "the rest of Linux" which is does not live up to the same standard as mdraid.  udev is pathological in the way it presents device files to the OS, but that pathosis is a legacy of the Linux device naming conventions.  

mdraid runs fine until udev decides to rename all your disk drives; while I understand there are ways of getting udev to statically bind device files to certain device UUIDs, the facts remain that (1) this is not the default behavior in any allegedly enterprise-ready Linux distribution I've used, and (2) not only do I not know how to do this, but figuring it out is not trivial.

At any rate, this sort of patchwork "home-grown" feeling of Linux is what continues to make me hate it.  The quality of its various components varies so widely, it's invariably frustrating to use.  Maybe I am speaking from ignorance, but I would much rather use an OS that was "developed" or "engineered" rather than "grown."  I suppose Red Hat is making strides in providing a sensible, unified interface; the problem there is that its tools all seem to take a Windows-like GUI form.

Coincidentally, I read that mdadm was developed by SUSE.  udev was developed by "some guy."

No comments:

Post a Comment