Saturday, August 02, 2008

Amanda: simple ZFS backup or S3

When I first started researching ZFS, I found it somewhat troubling that no native backup solution existed. Of course there was the ZFS send/recv commands, but those didn't necessarily work well with existing backup technologies. At the same time, the venerable open source backup solution, amanda had found a way to move beyond its limitation of maximum tape size restricting backup run size. Over time, we have found ways to marry these two solutions.

In my multi-tier use of ZFS for backup, I always need an n-tier component that will allow for permanent archiving to tape every 6 months or year, as deemed fit for the data being backed up. These are full backups only, and due to the large amounts of data in the second tier pool, a backup to tape may span dozens of tapes and run multiple days. I found I had to tweak amanda's typical configuration to allow for very long estimate times, as the correct approach to backing up a ZFS filesystem today involves tar. Amanda's approach does a full tar estimate of a backup before a real backup is attempted. Otherwise, a sufficiently tape library is all you need and a working amanda client configuration on your ZFS-enabled system.

For those following along, I'm an avid user of NexentaStor for my second tier storage solution. Setup of an amanda client on that software appliance is actually quite easy.

setup network service amanda-client edit-settings
setup network service amanda-client conf-check
setup network service amanda-client enable

That's all that one needs to do. There is a sample line in the amanda configuration that you adjust in the first command above. The line I used is similar to this: amanda amdump

You'll find that depending on your build of amanda server, that you'll either have the legacy user name of "amanda", the zmanda default of "amanda_backup", or the Redhat default of "backup" as the user things run as. I guess there had to be a user naming conflict at some point with "amanda".

The hardest part of the configuration is finding where you have your long term snapshots. Since a backup run can take days to weeks, you'll likely wish to backup volumes relative to a monthly snapshot. In your amanda /etc/amanda/CONFIDR/disklist configuration, a sample you may have for a ZFS-based client named nexenta-nas with volume tier2/dir* is:

nexenta-nas /volumes/tier2/dir1/.zfs/snapshot/snap-monthly-1-latest user-tar-span
nexenta-nas /volumes/tier2/dir2/.zfs/snapshot/snap-monthly-1-latest user-tar-span

Note well the use of user-tar-span in the two lines above. This allows for the backing up large volumes over multiple tapes in amanda. That one limitation of tape spanning in amanda was solved in a novel way. They break up backup streams into "chunksizes" of a set size to allow for a write failure at the end of one tape to begin fresh again at the beginning of that chunk on the following tape. This feature allows amanda to also be used to backup to Amazon's S3 service. Yes, instead of going to tape, you can configure a tape server to write to an S3 service. S3 limits writes to a maximum of 2GB a file, and amanda's virtual tape solution combined with that chunk sizing of backups works wonderfully to mate ZFS-based storage solutions to S3 for an n-tier solution. Please consult Zmanda's howto for configuring your server correctly. There really is nothing left to configure to get ZFS data to S3.


James said...

Jim, I've not found anything that explains labeling for spanning tapes. What did you do for this? Thanks, Jim (Joe Little) said...

First, I'm Joe :)

In amanda you either pre-label the tapes using "amlabel" and it will span your series, like "CIS-01" to "CIS-40". You define the tape sets and reuse attributes in the amanda.conf file. You can also just fill up a library full of empty tapes and tell newer Amanda versions to auto-label. The key thing to know is that these are amanda file-mark labels at the beginning of the tape. There is separate support for reading and handling the various tape-library bar code labels. I don't bother to try to match that up with Amanda labels other than the names I give them.

James said...

Sorry about Joe! Thanks for clearing that up. I wasn't sure that it would just go through the tapelist. It sounded like it would take a second tape and label it like DailySet1-10-2 if the first one was DailySet1-10. This makes a difference on the number of tapes to use. Again, thanks!

Mostly Nothing said...

So do you run a script before amanda kicks off or have amanda manipulate the snapshot?

Is the snapshot handling the incremental or is amanda? I'm guessing Amanda.

This is pretty cool. I never liked the send/recv stuff, that I'm doing in my home spun script.

Mostly Nothing said...

So do you run a script before amanda kicks off or have amanda manipulate the snapshot?

Is the snapshot handling the incremental or is amanda? I'm guessing Amanda.

This is pretty cool. I never liked the send/recv stuff, that I'm doing in my home spun script. (Joe Little) said...

The snapshots are generated by NexentaStor for you. There are other "services" that can be implemented on standard OpenSolaris, but I use Nexenta to manage the snapshots, retention policies, etc.

As for who is handling incrementals, its all in the snapshots. As stated in the article, I only use amanda to go to tape (or S3) for a final archive, so it is always just a full non-incremental backup step. The point to using monthly snapshots is to allow the backup a very long window without worry of data changing. My backups to tape now regularly run weeks at a time for dozens of terabytes.

Chuff said...

This seems to be my problem. Backing up multi terabytes is taking 4-6 days with a DDT solution just on the DD side. What I really need is a snapshot that only shows the differential (or cumulative) from the last snapshot. Still looking for something more robust. (Joe Little) said...

auto-tier is what I'm using for D2D backups, but it does need to traverse both filesystems to see differences. What you what is auto-sync, a service facility of NexentaStor that just sends those deltas using zfs send/recv or rsync protocols, but based on snapshots. It requires both primary disk and 2nd tier to be NexentaStor-based solutions, and it will mirror the data of the first tier to the second tier using the same snapshots, but the second tier can then have a larger snapshot retention pool.

I used auto-tier in my case as our primary storage is still mostly NetApp based. We use NexentaStor for primary storage, but since first tier is hybrid between the two, we need to use auto-tier.

Dustin J. Mitchell said...

Thanks for the post. Regarding the spanning, there's work afoot in 3.2 to make this even better-behaved, so that you don't waste space at the end of each tape. I wrote it up here, if you're interested.