Upgrading storage
increasing the size of a RAID5 array using mdadm, LUKS and LVM


Posted on 2015-03-25

A while ago I realized one of my RAID arrays was running out of space. Since I didn't have the space required to take a backup of everything I needed to perform the upgrade in place. In this post I replace all drives in my RAID5 array, resize the mdadm array, resize both the LVM physical volume and logical volume, the LUKS container and lastly the file system.

Upgrading the hardware

I have no more space left in my case nor do I have any SATA connections left on my motherboard, thus I bought a SATA to USB adapter which I used for replacing drives.

In this section we replace one drive at a time, we assume the old drive is called /dev/sda and the new drive is called /dev/sdb. The mdadm volume we operate on is /dev/md126.

Partitioning the new drives

I partitioned the drives with the same sized partitions as the old drives, I don't think thats required (anything bigger works), I just wanted to postpone taking a decision on the partition-size.

# Check the size of the partition of the previous drive
$ sudo parted /dev/sda unit s print
Model: ATA ST4000DM000-1F21 (scsi)
Disk /dev/sda: 7814037168s
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start  End          Size         File system  Name     Flags
 1      2048s  2930277160s  2930275113s               primary

# Partition the new drive in the same manner.
$ sudo parted /dev/sdb
(parted) mktable gpt
(parted) mkpart primary 2048s 2930277160s

# Check that it looks alright
$ sudo parted /dev/sdb unit s print
Model: ATA WDC WD40EFRX-68W (scsi)
Disk /dev/sdb: 7814037168s
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start  End          Size         File system  Name     Flags
 1      2048s  2930277160s  2930275113s               primary

Replacing the drives

Its easy to replace drives with mdadm, especially since version 3.3.

$ sudo mdadm --manage /dev/md126 --add-spare /dev/sdb1
mdadm: added /dev/sdb1
$ sudo mdadm /dev/md126 --replace /dev/sda1
mdadm: Marked /dev/sda1 (device 4 in /dev/md126) for replacement

--replace will replace the drive as soon as a replacement is available. When the drive is replaced the old drive will be marked as faulty. Using --replace compared to simply swapping the drives has the advantage of never putting the array in a degraded state, as the old drive will continue to be used until it is replaced.

Once the drive is replaced an email is generated with a Fail event, we then need to remove the old drive from the array:

$ sudo mdadm --manage /dev/md126 --remove failed
mdadm: hot removed 8:1 from /dev/md126

Increasing available space

In this section the mdadm device is still called /dev/md126 and both the LVM logical volume and physical volume is called frej.

Partition resizing

NOTE: I suggest you simply create the partitions as big as you want them from the beginning, it makes the upgrade simpler.

Stop

I do not know why it would be beneficial to not use the partitions when resizing them. However since there seem to be an equal divide between people saying you should and people saying it doesn't matter my goal was to take the safe path and not use the partitions while resizing. I did however forget to stop the RAID when resizing 4 out of my 5 partitions and never noticed any issues.

$ sudo umount /mnt/frej
$ sudo lvchange -an /dev/frej/frej
$ sudo vgchange -an frej
  0 logical volume(s) in volume group "frej" now active
$ sudo cryptsetup close /dev/mapper/frej
$ sudo mdadm --stop /dev/md126
mdadm: stopped /dev/md126

Resize

When resizing with parted you provide the end of the partition, not the size of it. I started specifying this as 4TB but then, as the start of my partition is 1048576B, my partition only became 3999998951936B big. Not having 4TB would have annoyed me so I resized it to the first multiple of 1 MiB above 4TB. Moreover we want the end of the sector, not the beginning of the next one, so we subtract one:

⌈(4*10¹² + 1024²)/1024²⌉*1024² - 1 = 4000001818623.

$ sudo parted /dev/sdb resizepart 1 4000001818623B

$ sudo parted /dev/sdb unit b print
Model: ATA WDC WD40EFRX-68W (scsi)
Disk /dev/sdb: 4000787030016B
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start     End             Size            File system  Name     Flags
 1      1048576B  4000001818623B  4000000770048B               primary

Restart

After resizing all partitions lets see if it worked:

$ sudo mdadm --assemble --scan
mdadm: /dev/md/frej has been started with 5 drives.
$ sudo vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "frej" using metadata type lvm2
$ sudo vgchange -ay frej
  1 logical volume(s) in volume group "frej" now active
$ sudo mount -a

It seemed I didn't have to reopen the LUKS container, I'm guessing it is related to me having the device in crypttab and something causes the device to be reopened.

Growing the RAID

$ sudo mdadm --grow /dev/md126 --size max
mdadm: component size of /dev/md126 has been set to 3906249728K
unfreeze

This will take a while, since it needs to resync the unused space.

$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid1]
md126 : active raid5 sdb1[5] sdf1[6] sde1[7] sdd1[9] sdc1[8]
      15624998912 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
      [=======>.............]  resync = 37.5% (1468675572/3906249728) finish=767.9min speed=52901K/sec

For me it took about 13 hours to complete the resync, thus mdadm's estimate was quite accurate.

Resizing the LUKS container and the LVM physical volume

$ sudo cryptsetup resize /dev/mapper/frej
$ sudo pvresize /dev/mapper/frej
  Physical volume "/dev/mapper/frej" changed
  1 physical volume(s) resized / 0 physical volume(s) not resized

Filling the empty space with random data

Since we have newly allocated space we should fill it with random data to make sure no information is leaked through our LUKS container. I have chosen to do this by simply creating a new logical volume with the remaining free space, create an encrypted device over it and fill that devices with zeros. Since the zeros are encrypted with a random key the final data should be random.

$ sudo lvcreate --extents 100%FREE --name filltemp frej
$ sudo cryptsetup --key-file=/dev/urandom create filltempcrypt /dev/frej/filltemp
$ sudo dd if=/dev/zero of=/dev/mapper/filltempcrypt bs=1M # This takes a _long_ time
$ sudo cryptsetup close /dev/mapper/filltempcrypt
$ sudo lvchange -an /dev/frej/filltemp
$ sudo lvremove /dev/frej/filltemp
  Logical volume "filltemp" successfully removed

You can check the progress of dd by sending it the SIGUSR1 signal:

$ sudo pkill -f '^dd if=/dev/zero'

The output from dd will look something like this:

220340+0 records in
220340+0 records out
112814080 bytes (113 MB) copied, 0.521177 s, 116 MB/s

NOTE: I ran into the same performance issue I did years ago when setting up the array, since I had reinitialized the array after boot my fixes did not get applied (they are run from rc.local).

Resize the LVM logical volume and file system

Now we need to extend the size of our logical volume and file system. I choose to add 3 TiB as thats what I needed to migrate the data I had on a different volume.

$ sudo lvextend --size +3T /dev/frej/frej
  Size of logical volume frej/frej changed from 5.46 TiB (1430796 extents) to 8.46 TiB (2217228 extents).
  Logical volume frej successfully resized

$ sudo resize2fs /dev/mapper/frej-frej
resize2fs 1.42.12 (29-Aug-2014)
Filesystem at /dev/mapper/frej-frej is mounted on /mnt/frej; on-line resizing required
old_desc_blocks = 350, new_desc_blocks = 542
The filesystem on /dev/mapper/frej-frej is now 2270441472 (4k) blocks long.

Conclusion

Using modern technologies such as mdadm, LVM and LUKS it is really easy to increase the storage capabilities of a server. Most of the steps can also be performed online. My chassi does not have the ability to easily replace drives, which means I risk damaging them if I try to replace a drive physically while the system is still online.

Had I only had a better chassi I could have performed this entire procedure online, completely without downtime.