nickcharlton.net


Ruby Subprocesses with stdout and stderr Streams

I’ve been doing a few things with Ruby which involve controlling and responding to long-running processes, where the Ruby-based ‘wrapper’ takes the task of automating something which is otherwise quite complex. Perhaps the best example is boxes, which uses a collection of Rake tasks to generate Vagrant boxes using Packer –– each build takes somewhere in the region of twenty minutes to complete.

But, I wanted to be able to more closely control the output (hiding much of it from view) and react to events like build failures, which wasn’t possible by using system(). This needed to be able to handle the output as it came line by line without blocking (handling them as a stream), be able to handle stdout and stderr independently, and allow me to collect all of the output from a subprocess (for providing as a sort-of stack trace).

I went through many different solutions (using Open3.popen3, PTY and others), before coming across this hybrid solution using popen3 and separate threads for each output stream in this Stack Overflow post which met most of my requirements.

This gave me a basic solution which looks like this:

require 'open3'

cmd = './packer_mock.sh'
data = {:out => [], :err => []}

# see: http://stackoverflow.com/a/1162850/83386
Open3.popen3(cmd) do |stdin, stdout, stderr, thread|
  # read each stream from a new thread
  { :out => stdout, :err => stderr }.each do |key, stream|
    Thread.new do
      until (raw_line = stream.gets).nil? do
        parsed_line = Hash[:timestamp => Time.now, :line => "#{raw_line}"]
        # append new lines
        data[key].push parsed_line
        
        puts "#{key}: #{parsed_line}"
      end
    end
  end

  thread.join # don't exit until the external process is done
end

Line 3 pecifies the command that will be run. This is just a shell script which prints a multitude of characters for testing. Line 4 defines the final data structure; a Hash with two Arrays for stdout and stderr.

The next interesting bits are Lines 9 and 10 which create a Thread for handling the stdout and stderr streams seperately. Inside this the until block reads from the given stream, structures it and stores it. I’m adding a Time object here to aid my presentation of it later.

Line 16 would be replaced by a conditional depending on the amount of verbosity the user desired. Finally, Line 21 joins the thread once it has finished executing.

This, then, allows me to continue handling long processes as a stream, but handle each line individually. But the interface is a little awkward to use. Providing a simpler command a single block could simplify this, something like:

Utils::Subprocess.new './packer_mock.sh' do |stdout, stderr, thread|
  puts "stdout: #{stdout}" # => "simple output"
  puts "stderr: #{stderr}" # => "error: an error happened"
  puts "pid: #{thread.pid}" # => 12345
end

Which could be implemented like this rather impressively nested bit of code:

require 'open3'

module Utils
  class Subprocess
    def initialize(cmd, &block)
      # see: http://stackoverflow.com/a/1162850/83386
      Open3.popen3(cmd) do |stdin, stdout, stderr, thread|
        # read each stream from a new thread
        { :out => stdout, :err => stderr }.each do |key, stream|
          Thread.new do
            until (line = stream.gets).nil? do
              # yield the block depending on the stream
              if key == :out
                yield line, nil, thread if block_given?
              else
                yield nil, line, thread if block_given?
              end
            end
          end
        end

        thread.join # don't exit until the external process is done
      end
    end
  end
end

Unlike the first approach, this just passes back the lines as strings which is nil if there’s no value. The final argument to the block is the thread the subprocess is run as. thread.pid will give the PID.

For now, this works pretty well for boxes and will allow me to throw it into something like Jenkins without a ridiculous amount of logs to parse to see which ones build successfully.

Annual Review 2013

Introduction

For at least the past three years, I’ve been reading Chris Guillebeau’s Annual Review series. He publishes a set of blog posts in December each year running through what he (and his business) did, what he thought of it and what he’d like to do in the upcoming year.

I’ve been doing something similar myself for about the same time (I also used to come up with a plan for summers back when I had long expanses of free-time; usually a collection of things I wished to learn), but I never published them. Much of the concept of the “Annual Review” is in doing it, but without publishing, it’s all too easy to forget you did it and there’s a certain feeling of accountability when you’ve published a block of self-criticism and plans for the year ahead. Especially when they all seem to pass by so quickly.

This year was my final year of University, so that’s a general trend throughout. The final push after several years of studying seems to throw out much of the other things you want to do, and so the latter few months are much more interesting.

But anyway, let’s begin…

Projects

Nearly all of my projects (big and small) are something to do with programming, most recently these seem to have refocused on Ruby. But for much of University, I spent my time using Python (especially when graphs where involved), but after some niggling annoyances (no closing of blocks, pip, time and date handling) I found myself being pushed me back over to Ruby.

Outside of Ruby, I’ve been spending it working with iOS, with a little bit of Mac development interspersed. The introduction of iOS 7 has made for some interesting new challenges.

I continue to maintain several projects which are hosted on GitHub and of these I’ve been able to impart a stronger focus on automated tests. Several of them have been configured with Travis CI, but so far many of them have poor coverage, especially the ones with a web component, or the needs of a complex environment. Next year, I expect to keep going down this path.

There’s still a hardware project or two which I’ve not gotten around to thinking much about since the end of University. The new year may (or may not) bring me to coming back to them.

I’m quite pleased with all of these smaller projects and I get quite a lot of satisfaction out of building small well-tested libraries (before going away for Christmas, I also released keypath-ruby –– an approach for accessing nested Ruby collections).

One of the things I’ve been less pleased with is the amount of time I’ve spent working with servers and infrastructure. Whilst the work I’ve done with Packer has been going rather well (this bought about boxes), automating infrastructure using Chef has been a slow and quite painful process. I find it to be a frustrating tool to work with, but I’m also not hugely convinced by the alternatives.

Sadly, this has kept me from working on some of the things I’ve really wanted to like Predict the Sky, and a few others.

But overall, this is probably the most pleasing section. I’m pleased with both the odds and ends I’ve spent my time working on, and the bigger ones too.

Writing

This year, I published 16 blog posts, most of them in August. I also added a link/commentry section to the site and published 23 links to (mostly long) articles I’ve come across. Over the year, the amount of visitors to this site have gone up quite markedly. It’s nice to be read.

After August, I seem to have come to a bit of a halt as I took on a much bigger project. But I do wish I’d been writing along the way; instead I seem to have lots of drafts.

So, as a plan, I aim to write and publish more. I’m already quite good with keeping a journal.

Reading

Books. I love them. But I don’t read as many as I’d like. I’ve set myself a plan to read 50 of them in 2014 and I’ll be interested to see how well I do.

I’m currently reading Morrissey’s Autobiography, which turned out to be a surprisingly gripping dive into his mind, from his early days in school, to a discovery of music and poetry to the Smiths, legal battles and his solo career. It’s a bit unfocused, but interesting. I’m most of the way through now.

I’ve also ended up with subscriptions to the New Statesman, GQ and The New Yorker, which regularly fill me with lots of different (and notably offline) things to read.

Travel

With my final year of University, I didn’t travel much at all this year (apart from between Plymouth and Surrey1 which doesn’t really count). I hope to fix this next year, with at least a trip or two to mainland Europe.

Studying

Graduating is a curious thing. Suddenly you are propelled forth into the world, expected to have some sort of plan. For most of us, this certainly isn’t the case.

But after four years of University (one was on placement), I’m now the owner of a Bachelors Degree in Computer Science.

Work

After taking a few months off to recover from University, I started freelancing in June. First with some smaller projects, but then from a lucky bit of timing picking up something much larger.

Next year I look forward to expanding the client base, and mixing up the projects a little. Some of this is already in motion.

Health & Fitness

University seemed to put this into quite a sharp decline. I used to cycle daily when I was on placement, but I stopped on moving back to Plymouth. Now, I don’t do anywhere near what I’d like to be doing.

I intend to start running in 2014. But I’ve said this (mostly to people) before. I’ll see how I go in a year.

Into 2014

2013 was interesting, and 2014 will mark my first complete year out of some sort of education system. I’m really looking forward to it.

I’m still seem to be on an academic calendar, so what most would consider some sort of “new years resolution” or some such has been in progress since September. Be it balancing work with other projects, learning German through Duolingo or one of a few other things.

But in 2014, I want to finish learning to drive and acquire a car, get running regularly (I would like to be capable of running a 5k), keep going on my desire to dress much better, and chipping away at all of those other things that I do which irritate me.

It’s going to be a fun year.


  1. My Mum’s.

Switching Season Report, 2013 Edition

Alex Payne looks at the alternatives to his current setup and draws some conclusions.

In his article, he’s put words to things I’ve so far been unable to describe to people myself:

“the Galaxy S4 is uninspired but good”

“does not feel like a premium product. It is plasticky and sort of embarrassing to carry around, though its cheapness does lend it a sense of devil-may-care durability.”

And, then:

Let’s not even speak of the system fonts, ugly but ignorable on a phone and downright offensive on a tablet. No, while much improved from several years ago, one does not use Android in 2013 for its looks.

On Linux as a deskop:

“Trying to compute like a normal with Linux is, after all these years, still an exercise in masochism.”

Yep.

I sometimes get to play around with Android devices (it’s interesting to see how they’re changing), and recently I acquired a ZTE One to have a look at Firefox OS. I’m intending to write a review about that soon.

Alfred Workflow: Paste Cleanly

There is nothing more annoying than seeing, or ending up sharing URLs that look like this (it’s split so it doesn’t look completely terrible…):

http://www.nytimes.com/2013/08/25/opinion/sunday/ \
im-thinking-please-be-quiet.html? \
ref=opinion&_r=3&utm_source=buffer&utm_campaign=Buffer
&utm_content=buffer8fcfe&utm_medium=twitter&

It’s not so much the efforts of marketing people to track how their URLs spread around the web that annoys me, but more that it’s so damn ugly and far too long. But it is also happens to be the case that I don’t particularly care about marketing people’s feelings by me removing them.

I’m already an avid user of Alfred, where I use one of the example workflows to allow me to use “Cmd + Shift + V” to paste as plain text. I figured the addition of a little “cleaning” script based on a regex would be a nice way to implement it, so I did1.

Jump to the end if you’re just looking for the workflow.

Regex and Test Pattern

The Google Analytics arguments all start with “utm_” (presumably standing for the original Urchin product name). This is quite easy to match:

(?i)(?:utm_+)[a-zA-Z]*=[a-zA-Z0-9]*(&)?

This searches for “utm_”, a collection of other characters until an =, then another collection of characters until either the end is reached, or it walks into an “&”. It also does this case insensitively.

This will correctly match/remove the offending string from all of the URLs below:

http://example.com/slug?utm_content=test
http://example.com/slug?another=yep&utm_content=test
http://example.com/slug?utm_content=test&required=true

I was originally also matching either a ? or & at the start and removing that, too. But, with the last example (which, actually, I haven’t seen in the wild), this could potentially break the URL. Instead, I’m checking for a lost ? on the end of the URL before passing it back.

And so, the workflow’s regex implementation:

input = '{query}'

# remove any matching Google tracking strings
input.gsub!(/(?i)(?:utm_+)[a-zA-Z]*=[a-zA-Z0-9]*(&)?/, '')

# if there's a trailing ?, remove it
if input.end_with? '?'
  input.chop!
end

# pass it back, without the newline
print input.strip

Alfred Workflow

The workflow is based on the “Paste as plain text from hotkey” example, which I already had mapped to “Cmd + Shift + V”. I’m just slotting through a “Run Script” action before it is pasted.

The Paste Cleanly Workflow.
The Paste Cleanly Workflow.

You can download an archive of the workflow here.

And so, there we go. As I get offended by more web-based atrocities2, I’ll likely add to the script more.


  1. Something similar might already exist. It was quicker for me to build it myself, than search the forums to see if someone else had done it already. But, I suppose, this is rendered a bit moot now I’ve written a blog post on it.

  2. Yeah, I know. That is a bit strong.

Mocking Web Requests with VCR and MiniTest

I just released moviesapi. In the post I introduced it, I mentioned wanting to be able to add reliable tests. Ben Keeping responsed suggesting that I have a look at VCR. So I did.

It’s a Ruby library that records the web requests that your application depends upon and saves it down to disk. On subsequent test runs, it reuses (“replays”) the previously saved data, vastly increasing the speed. For screenscraping tools like moviesapi or UrbanScraper, I can verify that my code is behaving correctly, even if the source has changed (this is another problem) and without constantly hitting the remote web service.

Tutorials covering both VCR and MiniTest were a little thin on the ground, so I thought I’d write one. As an introduction to MiniTest, I’d suggest Matt Sears’ Quick Reference post. I’d also suggest giving the VCR README at least a skim read.

The overall application I’m testing is a Sinatra one, but here, I’m more interested in testing the class that handles the screenscraping. Some of the MiniTest suggestions came from the Sinatra Recipes site.

Gemfile

Firstly, I added development and tests groups to my Gemfile, like so:

group :development, :test do
  gem 'minitest', '~> 5.0.6'
  gem 'webmock', '~> 1.12.0'
  gem 'vcr', '~> 2.5.0'
end

VCR is a high-level wrapper around a group of different web mocking libraries, webmock is just one of those supported. We’ll need this too.

MiniTest is actually already in the standard library from Ruby 1.9, but I’m keeping a reference for clarity (and a slightly newer version).

Spec Structure

I’m writing specs here, so everything is in a directory named “spec”:

spec/
    cassettes/
    support/vcr_setup.rb
    spec_helper.rb
    movie_spec.rb

cassettes holds the recorded requests. vcr_setup.rb contains VCR configuration (it’s loaded by spec_helper.rb). spec_helper.rb sets up the tests and provides any common configuration. Finally, movie_spec.rb is the spec I’ll be running. It’s from moviesapi.

vcr_setup.rb

require 'vcr'

VCR.configure do |c|
  c.cassette_library_dir = 'spec/cassettes'
  c.hook_into :webmock
end

This specifies where to find the cassettes — we assume everything is run from the root of the project. Then it specifices which mocking library to use, in this case, webmock.

spec_helper.rb

This contains common code to all of the specs (or tests). It’s typically used to load in the application and run any common load configuration (like setting ENV to test) and is then included in each spec (or test) to make it available on test run.

require 'minitest/autorun'
require 'minitest/pride'

# pull in the VCR setup
require File.expand_path './support/vcr_setup.rb', __dir__

# pull in the code to test
require File.expand_path '../movies.rb', __dir__

Rakefile

Finally, these additions to the Rakefile will allow your tests to be running according to typical Ruby conventions. It will also run the test suite as the default rake task:

require 'rake/testtask'

Rake::TestTask.new :spec do |t|
  t.test_files = Dir['spec/*_spec.rb']
end

task :default => :spec

For all of these helper files, they have been slimmed down a little. You may find the ones in the repo more helpful. (Also, these will work with Travis CI.)

Writing Specs that use VCR

A typical MiniTest spec looks a bit like this:

describe 'Something' do
  before do
    # something that should be done before a test starts
  end

  after do
    # something that should be done after a test ends
  end

  it 'does something' do
    # test things
  end
end

The MiniTest DSL provides several blocks that make up the spec. The describe block defines the behaviour you are specifying. The it block defines the test case. Then, inside here, “matchers” are used to confirm the output. MiniTest provides a reasonable collection of these in it’s docs, but you can also define your own1.

When using VCR with MiniTest, there are two approaches to work with. The first is to manually specify a cassette to encapsulate the test run. This is described in the VCR Getting Started Guide and looks a bit like this:

VCR.use_cassette('cassete name') do
  # the test
end

The second approach is to use the before and after blocks along with some runtime metadata that MiniTest provides. That looks a bit like this:

describe 'Movies' do
  before do
    VCR.insert_cassette name
  end

  after do
    VCR.eject_cassette
  end

  it 'fetches a list of cinemas' do
    # the test
  end
end

before and after are executed around each it block. So here, name is “fetches a list of cinemas”. If you were to have multiple it blocks, cassettes would be defined for each. The cassettes are then saved in spec/cassettes, in this case it is: test_0001_fetches_a_list_of_cinemas.yml. This is quite a nice approach for having a cassette dynamically defined for each spec.

For testing the result of the screenscraping, I have been checking the contents of the eventual data structure. Unlike with a typical web service, I can’t check the data that is contained within. Similarly, the cassettes are commited to the repository because I am checking for the behavioural correctness of my code — not that the web service/site has changed and broken it. For testing the behaviour of code that depends upon this, mocking will fit perfectly.

MiniTest combines with VCR quite nicely — especially once you work out the conventions to follow for structuring the test suite. If not for anything else, mocking out the web requests like this saves a signficant amount of time when testing web service interaction.


  1. This gist by Jared Ning contains a good set of examples of defining your own.