Migrating a Raid Array

A home computing guide to migrating from one large-ish array to another with hardware checks and extra precaution. The particulars of one array vs another, using lvm even a ZFS pool, is not really the issue. This guide is more concerned with:

identifying your devices
performing backups
testing new disk hardware

Let's do this.

Identifying Disks

Disks are going to be /dev/sda0, where a is the first disk and 0 is the first logical partition on that disk. Partitions get numbers, but if you're like me it's likely you've set up your old raid in the superblock and did not add any partitions, so your old raid drives and your new ones won't have any numbers.

You can get potentially identifying disk info via hdparm -I /dev/sdx. If you are replacing, eg, toshiba disks with WD ones, that should let you know which is which.

/dev/sdg:

ATA device, with non-removable media
    Model Number:       SAMSUNG HD204UI                         
    Serial Number:      S2H7J9FB200574      
    Firmware Revision:  1AQ10001
    Transport:          Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6
...

You can also positively identify the current raid disks in /proc/mdstat, which will look something like:

md1 : active raid5 sdb[1] sdd[3] sdc[0]
      5860270080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

Checking the Drives

The best way to check your new drives is with the badblocks program, which will test every block and report errors. If you have newer drives that are very large, you might need to specify the block size. Here's some sample output with a blocksize of 4k, verbose mode and a progress indicator:

~$ sudo badblocks -b 4096 -vs /dev/sdf
Checking blocks 0 to 1465130645
Checking for bad blocks (read-only test):   0.17% done, 2:37 elapsed. (0/0/0 errors)

This is essentially x86memtest for your disks. If they come back without too many errors, then you're almost certainly in the clear. badblocks can take an age and a half to complete, depending on disk size, number of scans, and the type of test. If you stop the program prematurely with ctrl-C, it will print out which block it stopped at. You can then resume by passing it <last-block> <first-block>:

~$ sudo badblocks -b 4096 -vs /dev/sdh
Checking blocks 0 to 1465130645
Checking for bad blocks (read-only test): 75.00% done, 24:39:29 elapsed. (0/0/0 errors)
^C
Interrupted at block 1098916032
~$ sudo badblocks -b 4096 -vs /dev/sdh 1465130645 1098916032
Checking blocks 1098916032 to 1465130645
Checking for bad blocks (read-only test): 75.01% done, 0:05 elapsed. (0/0/0 errors)

You might however still want to check out their vitals with smartmontools. The ubuntu wiki page is actually very helpful.

~$ sudo smartctl -s on /dev/sdf
~$ sudo smartctl -t long /dev/sdf
~$ sudo smartctl -l selftest /dev/sda
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.2.0-23-generic] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         2         -

Formatting & Backing Up

For my local backup system, I use a thermaltake USB external toaster mount HDD "enclosure" and a rotating cast of previous raid disk cast offs. I format these with ext4 because compatibility with non-linux systems is not a priority. If it's a virgin disk, the cfdisk line will make it easy to set up a partition table. -E lazy_itable_init allows part of the format to be done in the background after first-mount; if we're running a big backup right off the bat, then it'l let us start it potentially significantly (15-20) minutes faster, especially on large filesystems.

~$ sudo cfdisk /dev/sdf
~$ sudo mkfs.ext4 -E lazy_itable_init /dev/sdf1

For backing up, I generally use rsync. I have a script that plucks only directories that I want backed up and keeps a log on the backup destination, but the crux of it, and the part I used to have to read the rsync man page over and over for, is:

~$ rsync -a -h --progress --size-only --delete-after $source $dest

The state of a trailing slash on $source and $dest is important; if you want your destination to have the same layout as your source, leave it off.

-a - sets 'archive' mode, which implies copying over timestamps, permissions, ownership
-h - human readable sizes
--progress - show progress both of single files and overall xfer
--size-only - only use the file size to compare whether or not to update a file. This is way faster than a block-by-block comparison and it works 99.99% of the time.
--delete-after - delete extraneous files in the $dest after everything's been copied; this helps you maintain the proper dir layout even if you've moved or renamed files

Creating a Raid Array

Not the focus of this document, but it is the focus of this document. My general practice is:

use mdadm --create to create a raid array on partitionless drives
generally assemble 3-drive raid-5 arrays once disks appear on the market that are 2x the size of my old array disks (generally ~3y)

I generally do a partial backup of important data (eg. financial documents, cryptographic keys, photos, writings, etc) with the above rsync command before continuing. Fat-fingering the wrong drive or swapping src with dest in rsync with a --delete option can toast some or all of your old data, so creating a backup and taking it offline is a good idea.

From here, the array:

~$ sudo mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sde /dev/sdf /dev/sdh
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.

~$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] 
md0 : active raid5 sdh[3] sdf[1] sde[0]
      11720782848 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
      [>....................]  recovery =  0.0% (1064572/5860391424) finish=2201.5min speed=44357K/sec
      bitmap: 0/44 pages [0KB], 65536KB chunk

md1 : active raid5 sdb[1] sdd[3] sdc[0]
      5860270080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

At this point, the array is in degraded mode, but you can still use it! Create a filesystem the same as you did on the backup drive and start copying over everything with rsync -a ... like you did previously. It will probably take a while.