Test Environments with Vagrant and Chef

I recently completed a project for an assignment which had a mess of dependencies. It depends upon libgit2 and pygit2, both of which are a complete pain to get working. After it blew up in my face for the third time, I realised I was being silly and should go back to using Vagrant, and along with it, the configuration management tool, Chef.

Why Chef, over it’s main competitor, Puppet? I like it more. But I can’t really describe why. I’m like that sometimes. I wish it were because one had much nicer, more accessable documentation, but neither seems to be the case. Fortunately, Vagrant has lovely documentation and so I’m not going to replicate it. I am going to go into detail about how to use both together and provide an introduction to Chef, though.

Vagrant allows you to start, stop and destroy virtual machines in a nice abstraction which also provides hooks to automate the provisioning of them. I want to be able to quickly spin up a test environment per project and when it’s started I want everything configured for me, dotfiles, libraries, applications, shared folders and so on.

Chef handles the second bit. It’s known as a configuration management tool. It describes what we want the system to look like after it’s run, but we don’t care about how we get there. Chef overuses the cooking analogy a bit, but: A recipe describes the way a specific tool is configured, along with any dependencies. A cookbook contains the recipes we wish to run on a system. I’m mostly going to talk about using “Chef Solo”, that is, Chef without a server as that works best for environments which are quickly created, used and thrown away.

Before You Begin

Firstly, you should install:

Then you should at least skim read the Vagrant documentation. I’ll only describe it lightly. By the end, I’ll have a “test-environment” repository which I’ll clone whenever I need a new to spin something up.

Virtual Machines with Vagrant

Vagrant is the glue that holds all of this together and makes interacting with virtual machines as pleasurable as possible. It uses a simple Ruby DSL¹ to describe the virtual machines that should be managed. All of this is stored in a Vagrantfile.

Vagrant uses the notion of “boxes” to describe preconfigured base VMs for which we can work from. I’m using Ubuntu here. You should also look at Gareth Rushgrove’s Vagrant Boxes site. A simple Vagrant friendly install with a copy of Chef is all you need.

Firstly, create a Vagrantfile and, if you don’t have one already, grab a box:

vagrant init
vagrant box add precise32 \ http://files.vagrantup.com/precise32.box

Then start the Vagrantfile (it already has an example configuration, but I prefer it a little tidier):

Vagrant.configure("2") do |config|
  config.vm.box = "precise32"
end

Now, you can start the virtual machine and ssh into it:

vagrant up
vagrant ssh

And, you can throw the box away by running:

vagrant destroy

This might be the bit where you wish you had an SSD, if you don’t already have one. You’ll do that a lot. This base box configures Ruby, Chef and a few other things for us. And notably it’ll share the working directory in which the Vagrantfile is contained. You should now be good to start thinking about provisioning.

Provisioning with Chef

There are a few versions of Chef. These can be broadly split into “client”, known as “Solo” and “client-server”, in the case of “Chef Server”, “Chef Hosted” or “Chef Private”. These latter three provide a central area in which to manage nodes, cookbooks, roles and so on.

The node is a server (actually, it’s not necessarily a server, because you could use Chef to manage your workstation too) that is managed by Chef. The routine to complete a task is known as a recipe. This is used to install, configure and start services, from the appropriate package management system, or from source and using the correct dependencies. These recipes are grouped together in a “cookbook”.

A set of cookbooks may be configured together so that a node can have a “role” applied to it. This is something like “web” or “database”, causing Chef to configure the appropriate cookbooks for it.

“Data Bags” are collections of data (which may be encrypted) which are used by Chef to aid the install. The idea here is that the data can be kept seperate from the configuration, like usernames, passwords or keys. Below, I’m using this to put a special set of ssh keys in place so I can push code up to GitHub inside the test environment.

One of the few key points to Chef, and similar configuration management tools is that of “idempotence” — every time Chef is run, the same state will be had at the end. So, for example if you run Chef manually and everything is configured, nothing will happen. If you destroy this virtual machine and create another, once Chef has done it’s thing, it’ll be exactly the same as the old one was.

knife is Chef’s main client side tool for interacting with all of this. Whilst it’s designed to be used to interact with the Chef Server, it can also be used locally to assemble cookbooks. Once you hunt around, you’ll also see that knife has plugins available to interact with EC2, OpenStack and many others.

I’m missing a bunch of stuff out here because I’m only interested in using Chef with Vagrant. Look at the “Further Reading” heading at the bottom.

Building Cookbooks

Cookbooks look a little bit like this (with each indent being a subdirectory):

cookbooks
    cookbook-name
        recipes
            default.rb

The root cookbooks represents the default collection of cookbooks that Chef — and especially when used with Vagrant — looks for. Inside here are the cookbooks themselves. A single cookbook can contain multiple recipes, but the one that is called by default is funnily enough, called, default.rb.

A cookbook is then used to describe the tool which needs installing, for example, git. Our cookbook would be called “git” and our default.rb recipe might look like this:

package "git"

This will ensure that the package from the local package management system, “git” is installed.

Of course, that doesn’t necessarily exist on every system’s package manager. Chef has a library called “Ohai” that sniffs out the node and reports what it is and what it can do. And to interact with Ohai, Chef’s DSL provides methods like value_for_platform.

You might wish to use Opscode’s ‘chef-repo’, but I’d prefer to build this up myself, at least for now. It is all about learning it, afterall. The same applies for using knife, which I mentioned before. This can be used to create a cookbook’s file structure for you.

Opscode, and the community maintain a collection of cookbooks that you can use. A lot of these include tested support for lots of different OS styles and Linux distributions. So you’ll probably often find you might wish to lean towards these. Opscode have a community page for shared Cookbooks. And an organisation on GitHub containing all of the ones which they manage themselves. I’ll use some of these later as submodules.

Vagrant Integration

Vagrant supports configuring Chef from inside the Vagrantfile. Typically, you’d define a node.json file. This would contain the “run list” — the list of recipes that a node should manage. Vagrant handles writing, and copying over the VM this node.json file for you on vagrant up or on vagrant reload. You can also assign roles, add recipes or assign a “data bag” that should be used. The configuration for the git cookbook above should look something like this:

Vagrant.configure("2") do |config|
  config.vm.box = "precise32"

  config.vm.provision :chef_solo do |chef|
    chef.add_recipe "git"
  end
end

This will ensure Chef Solo is invoked when you start up your Vagrant VM, adding “git” to it’s run list.

A Cookbook for `dotfiles`

That’s all well and good. But, a more complex worked example is more useful. I want to be able to run a git clone on my dotfiles and then run the included setup.sh script, but before I do this, I need to also ensure that all of it’s dependencies (and the tools I expect) are installed. On top of this, I want a set of keys copied over so that I can access the likes of GitHub and so forth.

To do this, I’ll create a cookbook called dotfiles which is tasked with installing the client utilities I expect (git, vim, tmux). Followed by configuring the keys. After this, it will pul down a clone of my dotfiles and running it’s setup.sh script.

But before this is installed, I’ll run a few more recipes which install the system libraries and build environment. Then we’ll end up with a Vagrantfile which looks something like this:

Vagrant.configure("2") do |config|
  config.vm.box = "precise32"

  config.vm.provision :chef_solo do |chef|
    chef.json = {
      "dotfiles" => {
        "user" => "vagrant",
        "group" => "vagrant",
        "public_key" => IO.read(File.expand_path("~/.ssh/id_rsa.pub")),
        "private_key" => IO.read(File.expand_path("~/.ssh/id_rsa"))
      }
    }

    chef.add_recipe "build-essential"
    chef.add_recipe "python"
    chef.add_recipe "dotfiles"
  end
end

The recipes will be installed in the same order as they are listed here. The chef.json call passes along a data bag. This is used to provide the target user (Chef is run as root, and I probably won’t always want to be using my dotfiles cookbook like this) and to pass along my own public/private key pair², by using a quick one-liner to suck up the file and pass it along as a string.

In the dotfiles cookbook, default.rb looks like this:

# tools
%w(git vim vim-scripts tmux).each do |pkg|
  package pkg
end

home_dir = "/home/#{node['dotfiles']['user']}"

# setup ssh keys
file "#{home_dir}/.ssh/id_rsa" do
  owner node['dotfiles']['user']
  group node['dotfiles']['group']
  mode "0600"
  content node['dotfiles']['private_key']
  action :create
end

file "#{home_dir}/.ssh/id_rsa.pub" do
  owner node['dotfiles']['user']
  group node['dotfiles']['group']
  mode "0600"
  content node['dotfiles']['public_key']
  action :create
end

# sync dotfiles
git "#{home_dir}/dotfiles" do
  repository "git://github.com/nickcharlton/dotfiles.git"
  reference "master"
  enable_submodules true
  user node['dotfiles']['user']
  group node['dotfiles']['group']
  action :checkout
end

# setup dotfiles
bash "setup_dotfiles" do
  cwd "#{home_dir}/dotfiles"
  user node['dotfiles']['user']
  group node['dotfiles']['group']
  environment "HOME" => home_dir
  code "./setup.sh"
end

You can use a Ruby whitespace array (the %w() bit) to iterate around a list of packages, and pass that to the package method. This works fine for me, as I’m only going to use Debian/Ubuntu. Then, we create the ssh key files, by using the values from the data bag.

For the last bit, we clone a copy of my dotfiles repo (including it’s multiple submodules) and run the setup script. The environment variable of $HOME specifies where the user’s home directory, so we override this so that the script can handle being run by root.

The process of setting up a new environment is now a matter of cloning the “test-environment” repository, then running vagrant up. On my 2009 MacBook Pro, on a crappy DSL connection (there’s a few packages to grab) this takes just under 5 minutes, but without grabbing the packages, the Chef run only takes 30 seconds of that. With a good connection (or a local apt-cache) and something a bit faster than this machine you could cut that down quite a bit.