Big Bubbles (no troubles)

What sucks, who sucks and you suck

Expanding a ZFS Mirror With 4K Disks

Finally filled my 1TB NAS file server this year, to the point where the 30GB of images I brought back from New Zealand (oh-by-the-way, did-I-mention-we-went-to-NZ?) could not be dumped to disk, let alone processed. So it was time to exchange the two Samsung hard drives composing the mirrored data storage for new 2TB drives. I’m using ZFS on OpenIndiana - this should be easy, right?

Alas not, because technology waits for no man. When I built the file server originally, SATA hard drives (which had only just replaced ye olde IDE) had 512 byte blocks. The ones you buy today have a 4KB block size. And, inevitably, you can’t mix 512 and 4K drives in the same ZFS virtual device (vdev). So whereas previously, expanding the mirror would have been a relatively simple matter of detaching one drive at a time, swapping a larger one in and then resyncing the data, attempting to attach a 4K drive to a vdev originally built on 512 byte drives will return an error. (How this process should work incidentally, if a neophyte user were ever to be capable of doing it, would be that one simply pulls a drive and swaps it, the server immediately starts rebuilding the mirror and sends a notification email/tweet/text when done, and then the user swaps the other drive. But then humanity wouldn’t need sysadmins or IT-literate relatives.)

So the strategy now would be to create an entirely new pool in ZFS with the 4K drives and copy the data over manually from the existing pool. As my NAS doesn’t have any free slots, I had to do this one drive at a time. As a further complication, my mirrored pool also has a separate intent log device (SLOG), which is also mirrored for resilience. Here’s the layout:

NAME          STATE     READ WRITE CKSUM
datapool      ONLINE       0     0     0
  mirror-0    ONLINE       0     0     0
    c2t1d0    ONLINE       0     0     0
    c3t1d0    ONLINE       0     0     0
logs
  mirror-1    ONLINE       0     0     0
    c2t0d0s3  ONLINE       0     0     0
    c3t0d0s3  ONLINE       0     0     0

And here’s the play (you can tell I rewatched The Avengers recently, right?):

  1. Backup your data. Also backup your drive list, zpool status and ZFS filesystem properties (zfs get all ...) for each filesystem in the pool - and don’t back this up to a folder in the pool. Make sure you can identify each physical drive & slot with its associated device name.
  2. Ensure the pool has been recently scrubbed.
  3. Stop all activity on the pool. If you’re using an account with a home directory on that pool, logout and create/use an account with a home directory on the root pool or elsewhere. Unmount any filesystems in use on remote clients.
  4. To be sure, temporarily disable any services that use storage on the pool. E.g.:

    svcadm disable -t nfs/server  
    

    In my case, this meant nfs/server, smb/server, a Squid instance and a Serviio UPnP service. (If you have other pools that are also remotely shared and in use, you’ll need to unshare the affected filesystems instead.)

  5. Take a recursive snapshot of the pool:

    zfs snapshot -r datapool@snap1
    

    (I actually used the datestamp rather than ‘snap1’ for the snapshot name.) Any data on the pool now changed after this snapshot will not be migrated, unless you take further incremental snapshots.

  6. Detach one drive from the pool:

    zpool detach datapool c3t1d0
    
  7. Shut down and power off the system, remove the detached drive and insert one of the new ones.
  8. Reboot and use the format command to confirm that the new drive is visible and identified correctly.
  9. Create a new ZFS pool from the new drive:

    zpool create datapool2 c3t1d0
    

    You may want to confirm the block size chosen for the new pool, although for most recent drives and Illumos releases, it should be correct. The command is:

    zdb | egrep 'name|ashift'
    

    For 4K blocks, look for an ashift value of 12 below the pool (9 for 512 byte blocks).

  10. Send the snapshot of the existing pool to the new pool:

    zfs send -RLce datapool@snap1 | zfs recv -Fd datapool2
    

    (You need the -R; -Lce are optional parameters for Illumos/OpenIndiana that will improve space efficiency. On the receiving side, -d is necessary to strip the existing pool name as a prefix from the filesystems.) If you have it installed or can compile it, mbuffer is useful in the middle of this pipeline to speed things up by buffering the I/O.

    Best to run this within a screen session or similar, as it could take several hours (over eight in my case, without mbuffer).

  11. Confirm that all the filesystems are now present in the new pool and have the same properties and approximate sizes as the original ones.
  12. At this point, I removed the log device from the original pool:

     zpool remove datapool mirror-1
    
  13. Assuming you have successfully stopped all activity on the original pool, you should be able to export both it and the new pool:

     zpool export datapool
     zpool export datapool2
    
  14. Re-import the new pool with the correct (original) pool name:

     zpool import datapool2 datapool
    
  15. Check that your filesystems are mounted correctly again:

     zfs list
    
  16. Reboot the system and confirm that the correct pool is imported and the filesystems are still present. You may also want to test some of the affected services.
  17. Shut down and power off the system again. Remove the remaining original drive and insert the other new (larger) drive in its place.
  18. Boot up and attach the new drive to the pool:

     zpool attach datapool c3t1d0
    
  19. Wait for the pool to resilver the mirror (again, likely some hours). The data is accessible again but you’re better waiting for the resilver to complete to confirm the new drives are operating correctly.
  20. At this point, I readded the mirrored log device:

     zpool add -f datapool log mirror c2t0d0s3 c3t0d0s3
    

    Do not, and I mean DO NOT, omit the keyword “log” from this command, as I did on my first attempt. It won’t do what you mean; instead it will concatenate the extra devices to the existing mirror and you will be unable to undo this mistake as you can’t remove non-redundant devices from a vdev. The only solution will be to break the mirror again and repeat the entire process with the released drive forming another new pool. Yes, that’s what happened here and there was much wailing and gnashing of teeth to accompany it.