Saturday, August 02, 2008

Amanda: simple ZFS backup or S3

When I first started researching ZFS, I found it somewhat troubling that no native backup solution existed. Of course there was the ZFS send/recv commands, but those didn't necessarily work well with existing backup technologies. At the same time, the venerable open source backup solution, amanda had found a way to move beyond its limitation of maximum tape size restricting backup run size. Over time, we have found ways to marry these two solutions.

In my multi-tier use of ZFS for backup, I always need an n-tier component that will allow for permanent archiving to tape every 6 months or year, as deemed fit for the data being backed up. These are full backups only, and due to the large amounts of data in the second tier pool, a backup to tape may span dozens of tapes and run multiple days. I found I had to tweak amanda's typical configuration to allow for very long estimate times, as the correct approach to backing up a ZFS filesystem today involves tar. Amanda's approach does a full tar estimate of a backup before a real backup is attempted. Otherwise, a sufficiently tape library is all you need and a working amanda client configuration on your ZFS-enabled system.

For those following along, I'm an avid user of NexentaStor for my second tier storage solution. Setup of an amanda client on that software appliance is actually quite easy.

setup network service amanda-client edit-settings
setup network service amanda-client conf-check
setup network service amanda-client enable

That's all that one needs to do. There is a sample line in the amanda configuration that you adjust in the first command above. The line I used is similar to this:

amandasrv.stanford.edu amanda amdump

You'll find that depending on your build of amanda server, that you'll either have the legacy user name of "amanda", the zmanda default of "amanda_backup", or the Redhat default of "backup" as the user things run as. I guess there had to be a user naming conflict at some point with "amanda".

The hardest part of the configuration is finding where you have your long term snapshots. Since a backup run can take days to weeks, you'll likely wish to backup volumes relative to a monthly snapshot. In your amanda /etc/amanda/CONFIDR/disklist configuration, a sample you may have for a ZFS-based client named nexenta-nas with volume tier2/dir* is:

nexenta-nas /volumes/tier2/dir1/.zfs/snapshot/snap-monthly-1-latest user-tar-span
nexenta-nas /volumes/tier2/dir2/.zfs/snapshot/snap-monthly-1-latest user-tar-span


Note well the use of user-tar-span in the two lines above. This allows for the backing up large volumes over multiple tapes in amanda. That one limitation of tape spanning in amanda was solved in a novel way. They break up backup streams into "chunksizes" of a set size to allow for a write failure at the end of one tape to begin fresh again at the beginning of that chunk on the following tape. This feature allows amanda to also be used to backup to Amazon's S3 service. Yes, instead of going to tape, you can configure a tape server to write to an S3 service. S3 limits writes to a maximum of 2GB a file, and amanda's virtual tape solution combined with that chunk sizing of backups works wonderfully to mate ZFS-based storage solutions to S3 for an n-tier solution. Please consult Zmanda's howto for configuring your server correctly. There really is nothing left to configure to get ZFS data to S3.

Followers