Automated Backups with backup and Rsync.net

For the past 18 months, I’ve been meaning to review the way I go about doing backups. Whilst I’m quite settled on backing up workstations, until about two weeks ago I had a half-configured, non-recoverable solution which was only setup on one out of several servers. That’s not very useful.

As well as the obvious requirement for something automated, I also wanted something which could allow me to reuse much of the backup definitions across multiple servers but also allowed me to avoid writing bash scripts. I’d come across the backup Ruby gem some time ago and after some quick testing, this seemed to work quite well.

Then, I needed some sort of remote host to backup to. I had heard of Rsync.net through the prgmr mailing list. They provide a reasonably cheap remote filesystem with which you can access over ssh (and so, scp and rsync work). On the machine being backed up, this is great as I don’t need any special tools to upload backups and for restoring the same applies.

Backup Strategy

The backup gem uses the concepts of “models” to define a “backup”. This consists of any directories or databases you may wish to backup, along with any processing you may wish to do with it (compressing, encrypting, etc) and where to store it.

My approach to the models is to create different ones for the different “roles” a system performs. By default, this includes a “base” role which archives /etc, /home and /var/log (in a multi-user system, I’d probably split out /home to it’s own model). Then, additional models are defined for more specific roles. So, to backup a Wordpress site, a “site” model would first backup the associated database and then collect up the data in /var/www/site/ into the one archive.

Each model is set to be compressed using gzip, upload to Rsync.net and then notify me if any errors occur. It also keeps the last 5 versions of the backup.

backup handles organising the subsequent archives on the remote quite well. For each server, I define a directory to collect them in and then backup sorts each run by a directory named after the timestamp. Each model is then collected inside. Much like this:

server_name/
    base/
        2013.07.29.21.54.01/
            base.tar
        2013.07.30.00.30.07/
            base.tar

Finally, a cron job is used to automate it to run daily.

rsync.net

Rsync.net is refreshingly simple (a bit like prgmr is, I suppose). After signing up, you’ll get sent the details needed to access the relevant box, where you can ssh in, and do the basics (change the password, check your quota usage, move files, create directories, etc.).

I opted for a geographically distributed plan (it’s slightly more expensive, but it is the primary backup method I’m using) and the lowest plan — the amount of data is tiny as it’s mostly text files.

And that’s essentially it. I paid for a year so I’ll be reminded about it next year to go about renewing it.

`backup`

The backup Ruby gem gives you a command line tool which will help generate the models and run them. But you should just install it as a gem, rather than using Bundler or anything else.

I did all of this under a specific backup user. This is configured to allow it to use tar through sudo without asking for a password and not much else. It expects the backup models and configuration files to exist in ~/Backup/, so this seemed the best approach.

gem install backup

The documentation suggests using the model generator to get started and that’s pretty much what I did:

backup generate:model -t base --archives --storages='scp' --compressors='gzip' --notifiers='mail'

This will give a rather detailed and well commented example. I started with this and stripped it down to the bare essentials. If you don’t have one already, it will also create a template config.rb, which will contain a similar set of examples. config.rb can be used to set defaults for the models, so I opted to fill this with as much as possible.

But, some Real World™ examples are much more useful:

`base.rb`

##
# Base: Basic Linux backup model.
# Archives and compresses: /etc, /var/log, /home, /mail.
# Uploads to rsync.net.
#
# $ backup perform -t base
##
Backup::Model.new(:base, 'Basic Linux Model') do
  # archive
  archive :etc do |archive|
    archive.use_sudo
    archive.add "/etc/"
  end

  archive :logs do |archive|
    archive.use_sudo
    archive.tar_options '--warning=no-file-changed'

    archive.add '/var/log'
  end

  archive :home do |archive|
    archive.use_sudo
    archive.add '/home'
    # don't backup up the backup data.
    archive.exclude '/home/backup/Backup/.tmp/'
  end

  archive :mail do |archive|
    archive.use_sudo
    archive.tar_options '--warning=no-file-changed'

    archive.add '/var/mail'
  end
  
  # compressor
  compress_with Gzip

  # storage
  store_with SCP

  # notifier
  notify_by Mail
end

The archive blocks are just an abstration over tar, so you can pass through options. In this case, I’ve ignored file change warnings for areas which are likely to not harmfully change whilst the backup is running.

Both the storage and notifier lines assume the configuration has already been made. If you didn’t have these in config.rb, it wouldn’t work and you’d need to expand the line into a block.

`config.rb`

##
# SCP Storage Type Defaults
##
Backup::Storage::SCP.defaults do |server|
  server.username   = ""
  server.ip         = ""
  server.port       = 22
  server.path       = "~/server_name/"
  server.keep       = 5
end

##
# Notifier Defaults
##
Backup::Notifier::Mail.defaults do |mail|
  mail.from                 = ""
  mail.to                   = ""
  mail.address              = "smtp.gmail.com"
  mail.port                 = 587
  mail.domain               = ""
  mail.user_name            = ""
  mail.password             = ""
  mail.authentication       = "plain"
  mail.encryption           = :starttls
end

# * * * * * * * * * * * * * * * * * * * *
#        Do Not Edit Below Here.
# All Configuration Should Be Made Above.

##
# Load all models from the models directory.
Dir[File.join(File.dirname(Config.config_file), "models", "*.rb")].each do |model|
  instance_eval(File.read(model))
end

The scp block contains the details for Rsync.net. The mail defaults are currently set to the values for Gmail’s SMTP server (you’ll need to fill in all of the other relevant bits). By default, this will notify about all events (successful, with warnings or a failure). I kept this like this for about two weeks to confirm it was running correctly.

Aside: Criticism

The configuration files would be more appropriately stored in /etc, given it’s designed for Unix-like systems.
The timestamps are annoying. Unix timestamps or ISO 8601 is far more appropriate than defining another bloody date format.

Automation

To automate it, I just have a crontab entry under the backup user. This then runs at 0130 every morning and emails me if necessary.

30 01 * * * backup perform -t base,www >/dev/null

(By default, it writes a lot to stdout, I’d rather not fill up an unmonitored inbox with successes…)

I configured this about two weeks before writing up this blog post. It’s been working well since then, and I’ve deployed a similar configuration across at least two other boxes.

I now need to investigate the Chef cookbook and modify it to work similarly to this…