At home I have a Solaris box which, among other things, stores my digital music library.
$ zfs list space/export/flac space/export/mp3 NAME USED AVAIL REFER MOUNTPOINT space/export/flac 543G 980G 540G /export/flac space/export/mp3 125G 980G 122G /export/mp3
The FLACs are mostly ripped from CDs, so technically, I don’t need to back them up because I still have the original media. But, said media is packed away and hard to get to, and ripping it just the once was plenty. I don’t want to do it again. I’d rather have it all backed up in S3, where recovery would be a long, but hands-free process. There’s also a bunch of stuff from other sources, which I can’t easily get again.
Everything in the
flac dataset exists, transcoded, in
mp3, so I
can cram more stuff onto my phone, car USB stick or whatever. I have
a script which ensures consistency between the two. So, I’m wasting
my money backing up the MP3s that I’ve generated from FLACs.
I have a simple script which syncs the
mp3 ZFS datasets
with S3 buckets. (Including, of course, lots of safety checks to
make sure we don’t accidentally sync an empty directory because a
mount failed somewhere!)
I don’t want to write something that checks the bucket copies
conditionally: Amazon have done that for me with
aws sync. It
would be far simpler for me to sync an
mp3 dataset which only
contains the stuff that can’t be re-generated from the original
I’m going to time the operations so you can see how quick ZFS does
these things. I habitually (and dogmatically) use
displays all its times in seconds.
# timex zfs snapshot space/export/mp3@for_aws real 0.13 user 0.00 sys 0.00 # timex zfs clone -o mountpoint=/export/mp3_sync space/export/mp3@for_aws \ space/export/mp3_sync real 0.50 user 0.00 sys 0.01
Now I have an exact copy of my original
mp3 dataset, mounted at
/export/mp3_sync. Although they are clearly linked to one another,
changes made in either dataset will not be reflected in the other.
mp3 filesystems follow the same hierarchy, so I
$ timex find /export/flac -type d | sed 's|/flac/|/mp3_sync/|' | while read d > do > test -d $d && [[ -n $(print $d | cut -d/ -f6) ]] && rm -fr $d > done real 6.63 user 0.01 sys 0.11 $ du -sh /export/mp3* 122G /export/mp3 58G /export/mp3_sync
Well, it’s not much, but it’s still 64Gb/month I don’t need to pay for.
cut test is necessary because I only want to remove
directories which are albums: they’re all at a cetain depth.)
When the sync is done, it’s quick and easy to clean up:
$ zfs destroy space/export/mp3_sync real 2.17 user 0.00 sys 0.12 $ zfs destroy space/export/mp3@for_aws real 0.18 user 0.00 sys 0.00
This is a quick and fairly trivial project, which took all of five seconds to roll into my AWS backup script, but I think it’s interesting because it shows a more creative use of ZFS features than simply rolling back to yesterday’s code, or stamping out zones from a cloned dataset.