A while ago I realized one of my RAID arrays was running out of space. Since I didn’t have the space required to take a backup of everything I needed to perform the upgrade in place. In this post I replace all drives in my RAID5 array, resize the mdadm array, resize both the LVM physical volume and logical volume, the LUKS container and lastly the file system.
Upgrading the hardware
I have no more space left in my case nor do I have any SATA connections left on my motherboard, thus I bought a SATA to USB adapter which I used for replacing drives.
In this section we replace one drive at a time, we assume the old drive is
called /dev/sda
and the new drive is called /dev/sdb
. The mdadm volume we
operate on is /dev/md126.
Partitioning the new drives
I partitioned the drives with the same sized partitions as the old drives, I don’t think thats required (anything bigger works), I just wanted to postpone taking a decision on the partition-size.
# Check the size of the partition of the previous drive
$ sudo parted /dev/sda unit s print
Model: ATA ST4000DM000-1F21 (scsi)
Disk /dev/sda: 7814037168s
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 2048s 2930277160s 2930275113s primary
# Partition the new drive in the same manner.
$ sudo parted /dev/sdb
(parted) mktable gpt
(parted) mkpart primary 2048s 2930277160s
# Check that it looks alright
$ sudo parted /dev/sdb unit s print
Model: ATA WDC WD40EFRX-68W (scsi)
Disk /dev/sdb: 7814037168s
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 2048s 2930277160s 2930275113s primary
Replacing the drives
Its easy to replace drives with mdadm, especially since version 3.3.
$ sudo mdadm --manage /dev/md126 --add-spare /dev/sdb1
mdadm: added /dev/sdb1
$ sudo mdadm /dev/md126 --replace /dev/sda1
mdadm: Marked /dev/sda1 (device 4 in /dev/md126) for replacement
--replace
will replace the drive as soon as a replacement is available. When
the drive is replaced the old drive will be marked as faulty. Using --replace
compared to simply swapping the drives has the advantage of never putting the
array in a degraded state, as the old drive will continue to be used until it
is replaced.
Once the drive is replaced an email is generated with a Fail event, we then need to remove the old drive from the array:
$ sudo mdadm --manage /dev/md126 --remove failed
mdadm: hot removed 8:1 from /dev/md126
Increasing available space
In this section the mdadm device is still called /dev/md126
and both the LVM
logical volume and physical volume is called frej
.
Partition resizing
NOTE: I suggest you simply create the partitions as big as you want them from the beginning, it makes the upgrade simpler.
Stop
I do not know why it would be beneficial to not use the partitions when resizing them. However since there seem to be an equal divide between people saying you should and people saying it doesn’t matter my goal was to take the safe path and not use the partitions while resizing. I did however forget to stop the RAID when resizing 4 out of my 5 partitions and never noticed any issues.
$ sudo umount /mnt/frej
$ sudo lvchange -an /dev/frej/frej
$ sudo vgchange -an frej
0 logical volume(s) in volume group "frej" now active
$ sudo cryptsetup close /dev/mapper/frej
$ sudo mdadm --stop /dev/md126
mdadm: stopped /dev/md126
Resize
When resizing with parted you provide the end of the partition, not the size of it. I started specifying this as 4TB but then, as the start of my partition is 1048576B, my partition only became 3999998951936B big. Not having 4TB would have annoyed me so I resized it to the first multiple of 1 MiB above 4TB. Moreover we want the end of the sector, not the beginning of the next one, so we subtract one:
⌈(4*10¹² + 1024²)/1024²⌉*1024² - 1 = 4000001818623
.
$ sudo parted /dev/sdb resizepart 1 4000001818623B
$ sudo parted /dev/sdb unit b print
Model: ATA WDC WD40EFRX-68W (scsi)
Disk /dev/sdb: 4000787030016B
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 1048576B 4000001818623B 4000000770048B primary
Restart
After resizing all partitions lets see if it worked:
$ sudo mdadm --assemble --scan
mdadm: /dev/md/frej has been started with 5 drives.
$ sudo vgscan
Reading all physical volumes. This may take a while...
Found volume group "frej" using metadata type lvm2
$ sudo vgchange -ay frej
1 logical volume(s) in volume group "frej" now active
$ sudo mount -a
It seemed I didn’t have to reopen the LUKS container, I’m guessing it is related to me having the device in crypttab and something causes the device to be reopened.
Growing the RAID
$ sudo mdadm --grow /dev/md126 --size max
mdadm: component size of /dev/md126 has been set to 3906249728K
unfreeze
This will take a while, since it needs to resync the unused space.
$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid1]
md126 : active raid5 sdb1[5] sdf1[6] sde1[7] sdd1[9] sdc1[8]
15624998912 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
[=======>.............] resync = 37.5% (1468675572/3906249728) finish=767.9min speed=52901K/sec
For me it took about 13 hours to complete the resync, thus mdadm’s estimate was quite accurate.
Resizing the LUKS container and the LVM physical volume
$ sudo cryptsetup resize /dev/mapper/frej
$ sudo pvresize /dev/mapper/frej
Physical volume "/dev/mapper/frej" changed
1 physical volume(s) resized / 0 physical volume(s) not resized
Filling the empty space with random data
Since we have newly allocated space we should fill it with random data to make sure no information is leaked through our LUKS container. I have chosen to do this by simply creating a new logical volume with the remaining free space, create an encrypted device over it and fill that devices with zeros. Since the zeros are encrypted with a random key the final data should be random.
$ sudo lvcreate --extents 100%FREE --name filltemp frej
$ sudo cryptsetup --key-file=/dev/urandom create filltempcrypt /dev/frej/filltemp
$ sudo dd if=/dev/zero of=/dev/mapper/filltempcrypt bs=1M # This takes a _long_ time
$ sudo cryptsetup close /dev/mapper/filltempcrypt
$ sudo lvchange -an /dev/frej/filltemp
$ sudo lvremove /dev/frej/filltemp
Logical volume "filltemp" successfully removed
You can check the progress of dd by sending it the SIGUSR1 signal:
$ sudo pkill -f '^dd if=/dev/zero'
The output from dd will look something like this:
220340+0 records in
220340+0 records out
112814080 bytes (113 MB) copied, 0.521177 s, 116 MB/s
NOTE: I ran into the same performance issue I did years ago when setting up the array, since I had reinitialized the array after boot my fixes did not get applied (they are run from rc.local).
Resize the LVM logical volume and file system
Now we need to extend the size of our logical volume and file system. I choose to add 3 TiB as thats what I needed to migrate the data I had on a different volume.
$ sudo lvextend --size +3T /dev/frej/frej
Size of logical volume frej/frej changed from 5.46 TiB (1430796 extents) to 8.46 TiB (2217228 extents).
Logical volume frej successfully resized
$ sudo resize2fs /dev/mapper/frej-frej
resize2fs 1.42.12 (29-Aug-2014)
Filesystem at /dev/mapper/frej-frej is mounted on /mnt/frej; on-line resizing required
old_desc_blocks = 350, new_desc_blocks = 542
The filesystem on /dev/mapper/frej-frej is now 2270441472 (4k) blocks long.
Conclusion
Using modern technologies such as mdadm, LVM and LUKS it is really easy to increase the storage capabilities of a server. Most of the steps can also be performed online. My chassi does not have the ability to easily replace drives, which means I risk damaging them if I try to replace a drive physically while the system is still online.
Had I only had a better chassi I could have performed this entire procedure online, completely without downtime.