Edit
/MigratingRaid
Sorry, this wiki is closed to anonymous edits.
A home computing guide to migrating from one large-ish array to another with hardware checks and extra precaution. The particulars of one array vs another, using lvm even a ZFS pool, is not really the issue. This guide is more concerned with: 1. identifying your devices 2. performing backups 3. testing new disk hardware Let's **do this**. ### Identifying Disks Disks are going to be `/dev/sda0`, where `a` is the first disk and `0` is the first logical partition on that disk. Partitions get numbers, but if you're like me it's likely you've set up your old raid in the superblock and did not add any partitions, so your old raid drives and your new ones won't have any numbers. _You can get potentially identifying disk info_ via `hdparm -I /dev/sdx`. If you are replacing, eg, toshiba disks with WD ones, that should let you know which is which. ```sh /dev/sdg: ATA device, with non-removable media Model Number: SAMSUNG HD204UI Serial Number: S2H7J9FB200574 Firmware Revision: 1AQ10001 Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6 ... ``` You can also positively identify the _current_ raid disks in `/proc/mdstat`, which will look something like: ```sh md1 : active raid5 sdb[1] sdd[3] sdc[0] 5860270080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU] ``` ### Checking the Drives The best way to check your new drives is with the `badblocks` program, which will test every block and report errors. If you have newer drives that are very large, you might need to specify the block size. Here's some sample output with a blocksize of 4k, verbose mode and a progress indicator: ```sh ~$ sudo badblocks -b 4096 -vs /dev/sdf Checking blocks 0 to 1465130645 Checking for bad blocks (read-only test): 0.17% done, 2:37 elapsed. (0/0/0 errors) ``` This is essentially x86memtest for your disks. If they come back without too many errors, then you're almost certainly in the clear. `badblocks` can take an age and a half to complete, depending on disk size, number of scans, and the type of test. If you stop the program prematurely with `ctrl-C`, it will print out which block it stopped at. You can then resume by passing it `<last-block> <first-block>`: ```sh ~$ sudo badblocks -b 4096 -vs /dev/sdh Checking blocks 0 to 1465130645 Checking for bad blocks (read-only test): 75.00% done, 24:39:29 elapsed. (0/0/0 errors) ^C Interrupted at block 1098916032 ~$ sudo badblocks -b 4096 -vs /dev/sdh 1465130645 1098916032 Checking blocks 1098916032 to 1465130645 Checking for bad blocks (read-only test): 75.01% done, 0:05 elapsed. (0/0/0 errors) ``` You might however still want to check out their vitals with `smartmontools`. The [ubuntu wiki page](https://help.ubuntu.com/community/Smartmontools) is actually very helpful. ```sh ~$ sudo smartctl -s on /dev/sdf ~$ sudo smartctl -t long /dev/sdf ~$ sudo smartctl -l selftest /dev/sda smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.2.0-23-generic] (local build) Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 2 - ``` ### Formatting & Backing Up For my _local_ backup system, I use a [thermaltake USB external toaster mount HDD](http://www.amazon.com/Thermaltake-St0005u-Docking-Station-Compatible/dp/B001A4HAFS) "enclosure" and a rotating cast of previous raid disk cast offs. I format these with ext4 because compatibility with non-linux systems is not a priority. If it's a virgin disk, the `cfdisk` line will make it easy to set up a partition table. `-E lazy_itable_init` allows part of the format to be done in the background after first-mount; if we're running a big backup right off the bat, then it'l let us start it potentially significantly (15-20) minutes faster, especially on large filesystems. ```sh ~$ sudo cfdisk /dev/sdf ~$ sudo mkfs.ext4 -E lazy_itable_init /dev/sdf1 ``` For backing up, I generally use rsync. I have a script that plucks only directories that I want backed up and keeps a log on the backup destination, but the crux of it, and the part I used to have to read the rsync man page over and over for, is: ```sh ~$ rsync -a -h --progress --size-only --delete-after $source $dest ``` The state of a trailing slash on `$source` and `$dest` is important; if you want your destination to have the same layout as your source, leave it off. * `-a` - sets 'archive' mode, which implies copying over timestamps, permissions, ownership * `-h` - human readable sizes * `--progress` - show progress both of single files and overall xfer * `--size-only` - only use the file size to compare whether or not to update a file. This is way faster than a block-by-block comparison and it works 99.99% of the time. * `--delete-after` - delete extraneous files in the `$dest` after everything's been copied; this helps you maintain the proper dir layout even if you've moved or renamed files ### Creating a Raid Array Not the focus of this document, but it is the focus of [this document](https://raid.wiki.kernel.org/index.php/RAID_setup). My general practice is: * use `mdadm --create` to create a raid array on partitionless drives * generally assemble 3-drive raid-5 arrays once disks appear on the market that are 2x the size of my old array disks (generally ~3y) I generally do a partial backup of important data (eg. financial documents, cryptographic keys, photos, writings, etc) with the above rsync command before continuing. Fat-fingering the wrong drive or swapping src with dest in rsync with a `--delete` option can toast some or all of your old data, so creating a backup and taking it offline is a good idea. From here, the array: ```sh ~$ sudo mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sde /dev/sdf /dev/sdh mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md0 started. ~$ cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] md0 : active raid5 sdh[3] sdf[1] sde[0] 11720782848 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_] [>....................] recovery = 0.0% (1064572/5860391424) finish=2201.5min speed=44357K/sec bitmap: 0/44 pages [0KB], 65536KB chunk md1 : active raid5 sdb[1] sdd[3] sdc[0] 5860270080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU] ``` At this point, the array is in degraded mode, but you can still use it! Create a filesystem the same as you did on the backup drive and start copying over everything with `rsync -a ...` like you did previously. It will probably take a while.