Automated Backups with backup and Rsync.net
For the past 18 months, I’ve been meaning to review the way I go about doing backups. Whilst I’m quite settled on backing up workstations, until about two weeks ago I had a half-configured, non-recoverable solution which was only setup on one out of several servers. That’s not very useful.
As well as the obvious requirement for something automated, I also wanted something 
which could allow me to reuse much of the backup definitions across multiple servers
but also allowed me to avoid writing bash scripts. I’d come across the backup
Ruby gem some time ago and after some quick testing, this seemed to work
quite well.
Then, I needed some sort of remote host to backup to. I had heard of Rsync.net 
through the prgmr mailing list. They provide a reasonably cheap remote filesystem
with which you can access over ssh (and so, scp and rsync work). On the machine
being backed up, this is great as I don’t need any special tools to upload backups
and for restoring the same applies.
Backup Strategy
The backup gem uses the concepts of “models” to define a “backup”. This consists
of any directories or databases you may wish to backup, along with any processing
you may wish to do with it (compressing, encrypting, etc) and where to store it.
My approach to the models is to create different ones for the different “roles” a system
performs. By default, this includes a “base” role which archives /etc, /home
and /var/log (in a multi-user system, I’d probably split out /home to it’s own
model). Then, additional models are defined for more specific roles. So, to backup
a Wordpress site, a “site” model would first backup the associated database and
then collect up the data in /var/www/site/ into the one archive.
Each model is set to be compressed using gzip, upload to Rsync.net and then 
notify me if any errors occur. It also keeps the last 5 versions of the backup.
backup handles organising the subsequent archives on the remote quite well. For
each server, I define a directory to collect them in and then backup sorts each
run by a directory named after the timestamp. Each model is then collected inside.
Much like this:
server_name/
    base/
        2013.07.29.21.54.01/
            base.tar
        2013.07.30.00.30.07/
            base.tar
Finally, a cron job is used to automate it to run daily.
rsync.net
Rsync.net is refreshingly simple (a bit like prgmr is, I suppose). After
signing up, you’ll get sent the details needed to access the relevant box, where you
can ssh in, and do the basics (change the password, check your quota usage, move
files, create directories, etc.).
I opted for a geographically distributed plan (it’s slightly more expensive, but it is the primary backup method I’m using) and the lowest plan — the amount of data is tiny as it’s mostly text files.
And that’s essentially it. I paid for a year so I’ll be reminded about it next year to go about renewing it.
backup
The backup Ruby gem gives you a command line tool which will help generate 
the models and run them. But you should just install it as a gem, rather than using
Bundler or anything else.
I did all of this under a specific backup user. This is configured to allow it to
use tar through sudo without asking for a password and not much else. It expects
the backup models and configuration files to exist in ~/Backup/, so this seemed
the best approach.
gem install backup
The documentation suggests using the model generator to get started and that’s pretty much what I did:
backup generate:model -t base --archives --storages='scp' --compressors='gzip' --notifiers='mail'
This will give a rather detailed and well commented example. I started with this
and stripped it down to the bare essentials. If you don’t have one already, it will
also create a template config.rb, which will contain a similar set of examples.
config.rb can be used to set defaults for the models, so I opted to fill this with
as much as possible.
But, some Real World™ examples are much more useful:
base.rb
##
# Base: Basic Linux backup model.
# Archives and compresses: /etc, /var/log, /home, /mail.
# Uploads to rsync.net.
#
# $ backup perform -t base
##
Backup::Model.new(:base, 'Basic Linux Model') do
  # archive
  archive :etc do |archive|
    archive.use_sudo
    archive.add "/etc/"
  end
  archive :logs do |archive|
    archive.use_sudo
    archive.tar_options '--warning=no-file-changed'
    archive.add '/var/log'
  end
  archive :home do |archive|
    archive.use_sudo
    archive.add '/home'
    # don't backup up the backup data.
    archive.exclude '/home/backup/Backup/.tmp/'
  end
  archive :mail do |archive|
    archive.use_sudo
    archive.tar_options '--warning=no-file-changed'
    archive.add '/var/mail'
  end
  
  # compressor
  compress_with Gzip
  # storage
  store_with SCP
  # notifier
  notify_by Mail
end
The archive blocks are just an abstration over tar, so you can pass through
options. In this case, I’ve ignored file change warnings for areas which are likely
to not harmfully change whilst the backup is running.
Both the storage and notifier lines assume the configuration has already been made.
If you didn’t have these in config.rb, it wouldn’t work and you’d need to expand
the line into a block.
config.rb
##
# SCP Storage Type Defaults
##
Backup::Storage::SCP.defaults do |server|
  server.username   = ""
  server.ip         = ""
  server.port       = 22
  server.path       = "~/server_name/"
  server.keep       = 5
end
##
# Notifier Defaults
##
Backup::Notifier::Mail.defaults do |mail|
  mail.from                 = ""
  mail.to                   = ""
  mail.address              = "smtp.gmail.com"
  mail.port                 = 587
  mail.domain               = ""
  mail.user_name            = ""
  mail.password             = ""
  mail.authentication       = "plain"
  mail.encryption           = :starttls
end
# * * * * * * * * * * * * * * * * * * * *
#        Do Not Edit Below Here.
# All Configuration Should Be Made Above.
##
# Load all models from the models directory.
Dir[File.join(File.dirname(Config.config_file), "models", "*.rb")].each do |model|
  instance_eval(File.read(model))
end
The scp block contains the details for Rsync.net. The mail defaults are
currently set to the values for Gmail’s SMTP server (you’ll need to fill in all of
the other relevant bits). By default, this will notify about all events (successful,
with warnings or a failure). I kept this like this for about two weeks to confirm
it was running correctly.
Aside: Criticism
- The configuration files would be more appropriately stored in /etc, given it’s designed for Unix-like systems.
- The timestamps are annoying. Unix timestamps or ISO 8601 is far more appropriate than defining another bloody date format.
Automation
To automate it, I just have a crontab entry under the backup user. This then
runs at 0130 every morning and emails me if necessary.
30 01 * * * backup perform -t base,www >/dev/null
(By default, it writes a lot to stdout, I’d rather not fill up an unmonitored
inbox with successes…)
I configured this about two weeks before writing up this blog post. It’s been working well since then, and I’ve deployed a similar configuration across at least two other boxes.
I now need to investigate the Chef cookbook and modify it to work similarly to this…