Category Archives: Automated Testing

RSpec mocks and stubs in plain English

One of the most common questions I see from beginners to Rails testing is what mocks and stubs are and when to use them. If you’re confused about mocks and stubs, you’re not alone. In my experience very few people understand them. In this post I’ll attempt to help clarify the matter, particularly in the context of Rails/RSpec.

I want to preface my explanation with a disclaimer. This post is partly derived from personal experience but it’s mostly derived from me just reading a bunch of posts online and reading a few books to try to understand mocks and stubs better. It’s entirely possible that what follows is not 100% accurate. But I’m risking the inaccuracy because I think the internet needs a clear, simple explanation of mocks and stubs and so far I haven’t been able to find one.

Here’s what we’re going to cover in this post:

The problem with testing terminology

The field of automated testing has about a million different terms and there’s not a complete consensus on what they all mean. For example, are end-to-end tests and acceptance tests the same thing? Some people would say yes, some would say no, and there’s no central authority to say who’s right and who’s wrong. That’s a problem with testing terminology in general.

A problem with mocks and stubs in particular is that programmers are often sloppy with the language. People say mock when they mean stub and vice versa.

The result of these two issues is that many explanations of mocks and stubs are very very very confusing.

Test doubles

I had a lightbulb moment when I read in Gerard Meszaros’ xUnit Test Patterns that mocks and stubs are each special types of test doubles.

To me this was a valuable piece of truth. Mocks and stubs are both types of test doubles. I can understand that. That’s a piece of knowledge that’s not likely to be invalidated by something else I’ll read later. My understanding has permanently advanced a little bit.

What’s a test double? The book xUnit Test Patterns (which I understand actually coined the term “test double”) likens a test double to a Hollywood stunt double. From the book’s online explanation of test doubles:

When the movie industry wants to film something that is potentially risky or dangerous for the leading actor to carry out, they hire a “stunt double” to take the place of the actor in the scene. The stunt double is a highly trained individual who is capable of meeting the specific requirements of the scene. They may not be able to act, but they know how to fall from great heights, crash a car, or whatever the scene calls for. How closely the stunt double needs to resemble the actor depends on the nature of the scene. Usually, things can be arranged such that someone who vaguely resembles the actor in stature can take their place.

The example I use later in this post is the example of a payment gateway. When we’re testing some code that interacts with a payment gateway, we of course don’t want our test code to actually hit e.g. the production Stripe API and charge people’s credit cards. We want to use a test double in place of the real payment gateway. The production Stripe API is like Tom Cruise in Mission Impossible, the payment gateway test double is of course like the stunt double.

Stub explanation

The best explanation of mocks and stubs I’ve been able to find online is this post by a guy named Michal Lipski.

I took his explanation, mixed it in my brain with other stuff I’ve read, and came up with this explanation:

A Test Stub is a fake object that’s used in place of a real object for the purpose of getting the program to behave the way we need it to in order to test it. A big part of a Test Stub’s job is to return pre-specified hard-coded values in response to method calls.

You can visit the xUnit Patterns Test Stub page for a more detailed and precise explanation. My explanation might not be 100% on the mark. I’m going for clarity over complete precision.

When would you want to use a test stub? We’ll see in my example code shortly.

Mock explanation

A Mock Object is a fake object that’s used in place of a real object for the purpose of listening to the methods called on the mock object. The main job of a Mock Object is to ensure that the right methods get called on it.

Again, for a more precise (but harder to understand) explanation, you can check out the xUnit Patterns Mock Object page.

We’ll also see a mock object use case in my example code.

The difference between mocks and stubs

As I understand it, and to paint with a very broad brush, Test Stubs help with inputs and Mock Objects help with outputs. A Test Stub is a fake thing you stick in there to trick your program into working properly under test. A Mock Object is a fake thing you stick in there to spy on your program in the cases where you’re not able to test something directly.

Again, I’m going for conciseness and clarity over 100% accuracy here. Now let’s take a look at a concrete example.

Example application code

Below is an example Ruby program I wrote. I tried to write it to meet the following conditions:

  • It’s as small and simple as possible
  • It would actually benefit from the use of one mock and one stub

My program involves three classes:

  • Payment: meant to simulate an ActiveRecord model
  • PaymentGateway: simulates a third-party payment gateway (e.g. Stripe)
  • Logger: logs payments

The code snippet below includes the program’s three classes as well as a single test case for the Payment class.

class Payment
  attr_accessor :total_cents

  def initialize(payment_gateway, logger)
    @payment_gateway = payment_gateway
    @logger = logger
  end

  def save
    response = @payment_gateway.charge(total_cents)
    @logger.record_payment(response[:payment_id])
  end
end

class PaymentGateway
  def charge(total_cents)
    puts "THIS HITS THE PRODUCTION API AND ALTERS PRODUCTION DATA. THAT'S BAD!"

    { payment_id: rand(1000) }
  end
end

class Logger
  def record_payment(payment_id)
    puts "Payment id: #{payment_id}"
  end
end

describe Payment do
  it 'records the payment' do
    payment_gateway = PaymentGateway.new
    logger = Logger.new

    payment = Payment.new(payment_gateway, logger)
    payment.total_cents = 1800
    payment.save
  end
end

Our test has two problems.

One is that it hits the real production API and alters production data. (My code doesn’t really do this; you can pretend that my puts statement hits a real payment gateway and charges somebody’s credit card.)

The other problem is that the test doesn’t verify that the payment gets logged. We could comment out the @logger.record_payment(response[:payment_id]) line and the test would still pass.

This is what we see when we run the test:

THIS HITS THE PRODUCTION API AND ALTERS PRODUCTION DATA. THAT'S BAD!
Payment id: 302
.

Finished in 0.00255 seconds (files took 0.09594 seconds to load)
1 example, 0 failures

Let’s first address the problem of altering production data. We can do this by replacing PaymentGateway with a special kind of test double, a stub.

Stub example

Below I’ve replaced payment_gateway = PaymentGateway.new with payment_gateway = double(). I’m also telling my new Test Double object (that is, my Test Stub) that it should expect to receive a charge method call, and when it does, return a payment id of 1234.

class Payment
  attr_accessor :total_cents

  def initialize(payment_gateway, logger)
    @payment_gateway = payment_gateway
    @logger = logger
  end

  def save
    response = @payment_gateway.charge(total_cents)
    @logger.record_payment(response[:payment_id])
  end
end

class PaymentGateway
  def charge(total_cents)
    puts "THIS HITS THE PRODUCTION API AND ALTERS PRODUCTION DATA. THAT'S BAD!"

    { payment_id: rand(1000) }
  end
end

class Logger
  def record_payment(payment_id)
    puts "Payment id: #{payment_id}"
  end
end

describe Payment do
  it 'records the payment' do
    payment_gateway = double()
    allow(payment_gateway).to receive(:charge).and_return(payment_id: 1234)

    logger = Logger.new

    payment = Payment.new(payment_gateway, logger)
    payment.total_cents = 1800
    payment.save
  end
end

If we run this test now, we can see that we no longer get the jarring “THIS HITS THE PRODUCTION API” message. This is because our test no longer calls the charge method on an instance of PaymentGateway, it calls the charge method on a test double.

Our payment_gateway variable is no longer actually an instance of PaymentGateway, it’s an instance of RSpec::Mocks::Double.

Payment id: 1234
.

Finished in 0.00877 seconds (files took 0.09877 seconds to load)
1 example, 0 failures

That takes care of the hitting-production problem. What about verifying the logging of the payment? This is a job for a different kind of test double, a mock object (or just mock).

Mock example

Now let’s replace Logger.new with logger = double(). Notice how RSpec doesn’t make a distinction between mocks and stubs. They’re all just Test Doubles. If we want to use a Test Double as a mock or as a stub, RSpec leaves that up to us and doesn’t care.

We’re also telling our new Mock Object that it needs (not just can, but has to, and it will raise an exception if not) receive a record_payment method call with the value 1234.

class Payment
  attr_accessor :total_cents

  def initialize(payment_gateway, logger)
    @payment_gateway = payment_gateway
    @logger = logger
  end

  def save
    response = @payment_gateway.charge(total_cents)
    @logger.record_payment(response[:payment_id])
  end
end

class PaymentGateway
  def charge(total_cents)
    puts "THIS HITS THE PRODUCTION API AND ALTERS PRODUCTION DATA. THAT'S BAD!"

    { payment_id: rand(1000) }
  end
end

class Logger
  def record_payment(payment_id)
    puts "Payment id: #{payment_id}"
  end
end

describe Payment do
  it 'records the payment' do
    payment_gateway = double()
    allow(payment_gateway).to receive(:charge).and_return(payment_id: 1234)

    logger = double()
    expect(logger).to receive(:record_payment).with(1234)

    payment = Payment.new(payment_gateway, logger)
    payment.total_cents = 1800
    payment.save
  end
end

Now that we have the line that says expect(logger).to receive(:record_payment).with(1234), our test is asserting that the payment gets logged. We can verify this by commenting out the @logger.record_payment(response[:payment_id]) and running our test again. We get the following error:

Failures:

  1) Payment records the payment
     Failure/Error: expect(logger).to receive(:record_payment).with(1234)
     
       (Double (anonymous)).record_payment(1234)
           expected: 1 time with arguments: (1234)
           received: 0 times

Why I don’t often use mocks or stubs in Rails

Having said all this, I personally hardly ever use mocks or stubs in Rails. I can count on one hand all the times I’ve used mocks or stubs over my eight years so far of doing Rails.

The main reason is that I just don’t often have the problems that test doubles solve. In Rails we don’t really do true unit tests. For better or worse, I’ve never encountered a Rails developer, myself included, who truly wants to test an object completely in isolation from all other objects. Instead we write model tests that hit the database and have full access to every single other object in the application. Whether this is good or bad can be debated but one thing seems clear to me: this style of testing eliminates the need for test doubles the vast majority of the time.

Another reason is that I find that tests written using test doubles are often basically just a restatement of the implementation of whatever’s being tested. The code says @logger.record_payment and the test says expect(logger).to receive(:record_payment). Okay, what did we really accomplish? We’re not testing the result, we’re testing the implementation. That’s probably better than nothing but if possible I’d rather just test the result, and quite often there’s nothing stopping me from testing the result instead of the implementation.

Lastly, I personally haven’t found myself working on a lot of projects that use some external resource like a payment gateway API that would make test doubles useful. If I were to work on such a project I imagine I certainly would make use of test doubles, I just haven’t worked on that type of project very much.

The difference between integration tests and controller tests in Rails

I recently came across a Reddit post asking about the difference between integration tests and controller tests. Some of the comments were interesting:

“I always write controller tests. I only write integration tests if there’s some JavaScript interacting with the controller or if the page is really fucking valuable.”

“In this case I would say it depends if you are building a full rails app (front-end and back-end) or only and API (back-end). For the first I would say an integration test should go through the front-end (which eventually calls the controllers). If you are doing an API, integration and controller tests would be the same.”

“Controller tests attempt to test the controller in isolation, and integration tests mimic a browser clicking through the app, i.e. they touch the entire stack. Honestly I only write integration tests. Controller tests are basically exactly the same thing but worse, and they are harder to write. “

“As far as I understand for the direction of Rails, controller tests will be going away and you’ll need to use integration tests.”

“From my experience integration tests are always harder to maintain and are much more fragile. Not trying to neglect it’s value but it comes with a price. I wonder how will people going to test api only rails apps if controller tests are gone?”

Which of these things are accurate and which are BS? I’ll do my best here to clarify. I’ll also add my own explanation of the difference between integration tests and controller tests.

My explanation of integration tests vs. controller tests

Before I even share my explanation I need to provide some context. There are two realities that make questions like “What’s the difference between integration tests and controller tests?” hard to answer.

Problem: terminology

First, there’s no consensus in the testing world on terminology. What one person calls an integration test might be what another person would call an end-to-end test or an acceptance test. No one can say whether any particular definition of a term is right or wrong because there’s no agreed-upon standard.

Problem: framework differences

Second, in what context are we talking about integration tests vs. controller tests? RSpec? MiniTest? Something else? The concepts of integration tests and controller tests map slightly differently onto each framework. Here’s how I’d put it:

General Testing Concept Relevant MiniTest Concept Relevant RSpec Concept
Integration Test Integration Test Feature Spec
End-to-End Test System Test Feature Spec
Controller Test Functional Test Request Spec (and, previously, Controller Spec)

Yikes. In order to talk about the difference between integration tests and controller tests I needed to involve no fewer than eight different terms: integration test, end-to-end test, system test, feature spec, controller test, functional test, request spec and controller spec.

I could share a treatment of integration and controller tests that’s very precise and technically correct but first let me try, for clarity’s sake, to share a useful approximation to the truth.

My approximately-true explanation

If a Rails application can be thought of a stack listed top-to-bottom as views-controllers-models, controller tests exercise controllers and everything “lower”. So controller tests test the “controllers-models” part of the “views-controllers-models” stack.

Integration tests test the views and everything “lower”. So in the views-controllers-models stack, integration tests test all three layers.

The messier truth

The messier truth is that since RSpec is (as far as I can tell) substantially more popular than MiniTest for commercial Rails applications, then you, dear reader, are more likely to be an RSpec user than a MiniTest user.

So instead of “integration tests”, the term that’s applicable to you is feature specs. Luckily the definition is pretty much the same, though. Feature specs exercise the whole application stack (views-controllers-models).

When we start to talk about controller tests (or controller specs in RSpec terminology), things get a little confusing. In addition to the concept of controller specs, there now exists the concept of request specs. What’s the difference between the two? As far as I can tell, the main difference is that request specs run in a closer simulation to a real application environment than controller specs. Therefore, the RSpec core team recommends using request specs over controller specs. So the main thing to be aware of is that when someone says “controller tests”, the RSpec concept to map that to is “request specs”.

To sum up: in RSpec, it’s not integration tests vs. controller tests, it’s feature specs vs. request specs. (For the record, I’m not a big fan of RSpec’s somewhat arcane terminology.)

When to use integration tests vs. controller tests

Now that we’ve covered what these two things are, how do we know when to use them?

When I use controller tests/request specs

I’ll start with controller tests (or again in RSpec terminology, request specs). As I discussed in detail in a different article, I only use controller/request specs in two specific scenarios: 1) when I’m maintaining a legacy project and 2) when I’m maintaining an API-only application.

Why are these the only two scenarios in which I use request specs? Because when I’m doing greenfield development, I try very hard to make my controllers do almost nothing by themselves. I push all possible behavior into models. Legacy projects, however, often possess bloated controllers containing lots of code. I find tests useful in teasing apart those bloated controllers so I can refactor the code to mostly use models instead. The reason I use request specs when developing API-only applications is because integration tests/feature specs just aren’t possible. There’s no browser with which to interact.

When I use integration tests/feature specs

Practically every feature I write gets an integration test. For example, when I’m building CRUD functionality for a resource (let’s say a resource called `Customer`), I’ll write a feature spec for attempting to create a new customer record (using both valid and invalid inputs, checking for either success or failure), a feature spec for attempting to update a customer record, and perhaps a feature spec for deleting a customer record. Since feature specs exercise the whole application stack including controllers, I pretty much always find redundant the idea of writing both a feature spec and a request spec for a particular feature.

Addressing the comments

Finally I want to address some of the comments I saw on the Reddit post I referenced at the beginning of this article because I don’t think all of them are accurate.

Page value

“I always write controller tests. I only write integration tests if there’s some JavaScript interacting with the controller or if the page is really fucking valuable.”

I personally don’t buy this idea. To me, one of the main benefits of having integration test/feature spec coverage is that I can automatically check the whole application for regressions every time I make a new commit. I hate it when a client of mine has to point out an error to me, even if it’s something trivial. I’d much rather have my code get tested by automated tests than by my client. (And yes, I get that tests can’t prove the absence of bugs and that my test suite won’t always catch all regressions, but I think something is a lot better than nothing.)

API-only applications

“In this case I would say it depends if you are building a full rails app (front-end and back-end) or only and API (back-end). For the first I would say an integration test should go through the front-end (which eventually calls the controllers). If you are doing an API, integration and controller tests would be the same.”

Let me address the last sentence first. Yes, if you’re writing an API-only application, the max you can test is “from the controller down”, so the idea of adding an integration test that tests an additional layer doesn’t apply.

Now let me address the first part of the comment. The idea behind the comment makes sense to me. I think what the comment author is saying is that if the application is a “traditional” Rails application, then an integration test would hit the front-end, the user-facing part of the application, and exercise all parts of the code from that starting point.

“The same thing but worse”

“Controller tests attempt to test the controller in isolation, and integration tests mimic a browser clicking through the app, i.e. they touch the entire stack. Honestly I only write integration tests. Controller tests are basically exactly the same thing but worse, and they are harder to write”

I believe I basically agree with this comment but I’d like to get a little more specific. I wouldn’t exactly agree that “controller tests are basically the exact same thing” as integration tests. As I said above, integration tests/feature specs test one additional layer of the application beyond controllers. There’s a lot in that extra layer. It makes a big difference.

As for the second part, “Controller tests are basically the same thing but worse,” it would be helpful to say why they’re “worse”. I would repeat what I said earlier in that controller tests/requests specs for me are usually redundant to any integration tests/feature specs I might have. The exceptions, again, are legacy projects and API-only applications.

Controller tests going away?

“As far as I understand for the direction of Rails, controller tests will be going away and you’ll need to use integration tests.”

Unless I’m mistaken I believe this comment is a little confused. As I mentioned earlier in this article, it’s true that the RSpec core team no longer recommends using controller specs. The recommended replacement though isn’t integration tests but request specs. However, I personally tend to favor integration tests/feature specs over controller specs/request specs anyway.

I’m not able to find any evidence that MiniTest controller tests are going away.

Integration tests fragile and harder to maintain?

From my experience integration tests are always harder to maintain and are much more fragile. Not trying to neglect it’s value but it comes with a price. I wonder how will people going to test api only rails apps if controller tests are gone?

I wouldn’t necessarily disagree. Out of the test types I use, I find integration tests/feature specs to be the most expensive to maintain. I would, however, suggest that the additional value integration tests/feature specs provide over controller tests/request specs more than makes up for their extra cost.

My Vim setup for Rails

Vim is my favorite editor because using Vim I can edit circles around anyone using Atom, Visual Studio, Sublime Text or any other non-keyboard-based editor. If “computers are a bicycle for the mind”, then using Vim is like shifting that bicycle into high gear. I like to shift that gear even higher using a few certain plugins:

vim-rspec lets me run my tests with a keystroke.

vim-rails takes advantage of the common structure of all Rails applications to let me navigate any Rails application super quickly.

ctrlp.vim allows me to quickly search for and open files (not earth-shattering or unique but of course very useful).

Testing private methods

A question that has befuddled many developers, including myself for a long time, is: how do I test private methods? Should I test private methods?

My opinion is yes, but not directly. I test the behavior of a class’s private methods indirectly through the class’s public methods. In other words, I test my private methods the exact same way my private methods are used by the rest of the application.

The reasoning with my testing approach has to do with the reason why private methods exist. In my mind the value of private methods is that since a private method is hidden from the outside world I can feel free to refactor the private methods at any time and in any way, knowing that I’m not going to mess up any of the code that uses my class.

Here’s a concrete example of a class makes what I think is appropriate use of private methods:

class TypeaheadTag < ActionView::Helpers::Tags::Base
  def render
    hidden_field + text_field_tag
  end

  private

  def hidden_field
    @template_object.hidden_field(
      @object_name,
      "#{@method_name}_id",
      class: "#{@method_name}_id"
    )
  end

  def text_field_tag
    @template_object.text_field_tag(
      @method_name,
      value,
      class: "#{@options[:class]} #{@method_name}_typeahead"
    )
  end
end

I wrote this class to DRY up Twitter Bootstrap typeahead components, which appeared in many places in an application I was developing. The existence of this class (along with other supporting code) allows me to spit out a typeahead field as succinctly as this:

<%= form.typeahead :person, class: 'form-control' %>

The exact manner in which this typeahead class conjures up its necessary components – a hidden field and a text field – is its own private business. No external party needs to know or care how these things happen. The class should be free to refactor these methods, split these two methods into several methods if desired, or combine them all into one, all without altering its public interface one bit.

This principle is why I think it’s useful for tests not to have any knowledge of a class’s private methods. Again, it’s not that the behavior inside the private methods doesn’t get tested – it does. It just gets tested via the class’s public interface.

How to run system specs headlessly or not headlessly at will

I used to prefer seeing my system specs run in the browser but lately I’ve been preferring to run them headlessly. It’s a little faster that way and I find it a little less disruptive.

Sometimes I still find myself wanting to see a certain test in the browser though. Seeing the test run in the browser can make diagnosis of any problems much easier.

The dream

What would be ideal is if I could do something like the following. To run a spec headlessly, I could do this:

$ rspec spec/system/create_customer_spec.rb

To see the same spec run in the browser, I could do this:

$ SHOW_BROWSER=true rspec spec/system/create_customer_spec.rb

Let’s take a look at how to turn that dream into reality.

The implementation

The desired functionality can be implemented by adding the following to spec/rails_helper.rb:

# spec/rails_helper.rb

RSpec.configure do |config|
  config.before(:each, type: :system) do
    driven_by ENV['SHOW_BROWSER'] ? :selenium_chrome : :selenium_chrome_headless
  end
end

Now, for all system specs, Capybara will use the :selenium_chrome_headless driver normally and the :selenium_chrome driver when SHOW_BROWSER=true is specified.

Thanks to Reddit user jrochkind for helping me improve this solution.

Test smell: Obscure Test

You’ve probably heard of the idea of a “code smell” – a hint that something in the code is not quite right and ought to be changed.

Just as there are code smells, there are “test smells”. The book xUnit Test Patterns describes a number of them.

One of the smells described in the book is Obscure Test. An Obscure Test is a test that has a lot of noise in it, noise that’s making it hard to discern what the test is actually doing.

Here’s an example of an Obscure Test I wrote myself:

context 'the element does not exist' do
  before do
    contents = %(
      <?xml version="1.0" encoding="UTF-8"?>
      <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd">
        <channel>
          <item></item>
        </channel>
      </rss>
    )

    xml_doc = Nokogiri::XML(contents)
    episode_element = xml_doc.xpath('//item').first
    @rss_feed_episode = RSSFeedEpisode.new(episode_element)
  end

  it 'returns an empty string' do
    expect(@rss_feed_episode.content('title')).to eq('')
  end
end

There’s a lot of noise in the contents variable (e.g. <?xml version="1.0" encoding="UTF-8"?>). All that stuff is irrelevant to what the test is actually supposed to be testing. All this test should really care about is that we have an empty set of tags.

Here’s a refactored version of the same test:

context 'the element does not exist' do
  let(:rss_feed_episode) do 
    RSSFeedEpisodeTestFactory.create("<item></item>")
  end

  it 'returns an empty string' do
    expect(rss_feed_episode.content('title')).to eq('')
  end
end

Hopefully this is much more clear. The gory details of how to bring an RSS feed episode into existence are abstracted away into RSSFeedEpisodeTestFactory, a new class I created. Here’s what that class looks like:

class RSSFeedEpisodeTestFactory
  def self.create(inner_contents)
    @inner_contents = inner_contents
    rss_feed_episode
  end

  def self.rss_feed_episode
    RSSFeedEpisode.new(xml_doc.xpath('//item').first)
  end

  def self.xml_doc
    Nokogiri::XML(contents)
  end

  def self.contents
    %(
      <?xml version="1.0" encoding="UTF-8"?>
      <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd">
        <channel>#{@inner_contents}</channel>
      </rss>
    )
  end
end

Now I can use this factory class wherever I like. It not only helps me keep my tests more understandable but also helps cut down on duplication.

In the video below you can watch me refactor the “obscure” version into the more readable version as part of one of my free live Rails testing workshops.

Using tests as a tool to wrangle legacy projects

Legacy projects

In my career I’ve worked on my fair share of legacy projects. In fact I’d say that almost all the 50+ production projects I’ve worked on in my career have been legacy projects to some degree.

My experience is undoubtedly not a unique one. I would bet that most of most developers’ work has been on legacy projects. Due to Sturgeon’s law, I think it’s probably safe to assume that most code in the world is legacy code.

The challenges of working on a legacy project

Maintaining a legacy project is often a bummer. Many of the following things are often the case.

  • Changes take much longer than they should, perhaps by a factor of 10 or more.
  • Deployments are scary. They’re preceded by stress and followed by firefighting.
  • Due to the fragility of the system and frequent appearance of bugs and outages, trust in the development team is low.
  • The development team is always “behind” and under pressure to cut corners to meet deadlines.
  • Stakeholders are mostly not thrilled.
  • Due to these bad conditions, developer turnover is high, meaning the team has to spend time training new developers, leaving less time for development.

How do you reverse these challenges? The various problems associated with legacy projects have different causes which are not all solved by the same solutions. No single solution is a panacea, although some solutions go further than others.

One thing I’ve found to have a pretty good ROI is automated testing.

The ways automated tests can help legacy projects

Changes take less time

Why do changes take longer in legacy projects than in cleanly-coded ones?

The first reason is that in order to fix a bug or add a feature, you often have to have an understanding of the area of code you’re changing before you can make the change. If you don’t understand the existing code, you might not even know where to try to slot in your new code.

The second reason is that while any change in any codebase represents a certain amount of risk, the risk in legacy projects is amplified. Legacy projects features are often a delicate Jenga tower, ready to come crashing down at the slightest touch.

Tests can address both of these problems.

Tests can aid a developer in understanding a piece of code in multiple ways. Earliest on, before any changes take place, characterization tests can reveal the previously mysterious behavior of a piece of code.

The idea with a characterization test is that you write a test for a certain method that you know will fail (e.g. `expect(customer.name).to eq(‘asdf’)`), run the test, then change the test to expect the real value you saw (e.g. `expect(customer.name).to eq(‘Gern Blanston’)`). Through this “reverse TDD” method, a picture of the code’s behavior eventually becomes clear.

Later on, once a robust suite of tests has been built up around an area of functionality, tests can aid in the understandability of the code by enabling refactoring.

If my code is well-covered by tests, I can feel free to give descriptive names to obscurely-named variables, break large methods into smaller ones, break large classes into smaller ones, and whatever other changes I want. The knowledge that my tests protect against regressions can allow me to make changes with a sufficient level of confidence that my changes probably aren’t breaking anything.

Of course, the scope of my refactorings should be proportionate to my level of confidence that my changes are safe. Grand, sweeping refactorings will pretty much always come back to bite me.

Deployments become less scary

Legacy projects are often figuratively and literally more expensive to deploy than well-done projects.

The story often goes like this: management observes that each time a deployment happens, bugs are introduced. Bugs frustrate users, erode users’ trust, incur support cost, and of course incur additional development cost.

Because the organization wants to minimize these problematic and expensive bugs, an extensive manual pre-deployment QA process is put in place, which of course costs money. Often there will be a “feature freeze” leading up to the deployment during the QA period, which of course incurs the opportunity cost of discontinuous developer productivity.

And because releases are so risky and expensive, management wants to do the risky and expensive thing less frequently. But this unfortunately has the opposite of the intended effect.

Because each deployment is bigger, there’s more stuff to go wrong, a greater “risk surface area”. It also takes more time and effort to find the root cause of any issue the deployment introduced because a) there’s more stuff to sort through when searching for the root cause and b) the root cause was potentially introduced a long time ago and so may well not be fresh in the developers’ minds.

Tests can help make deployments less risky in two ways. For one, tests tend to:

  • Help protect against regressions
  • Help protect against new bugs
  • Help improve code quality by enabling refactoring

All these things make the application less fragile, decreasing the likelihood that any particular deployment will introduce bugs.

Second, the presence of automated tests enables the practice of continuous deployment.

If an application can be tested to a reasonable degree of certainty by running an automated test suite that takes just a few minutes to run, then each developer can potentially deploy many times per day. Spread across the team, this might mean dozens or hundreds of deployments per day.

Contrast this to the alternative. If there is no automated test suite and each deployment requires the QA team to go through a manual QA process first, then that QA process is a bottleneck that prevents deployments from happening very frequently. Even if the QA team might be technically capable of running through the whole manual test process daily to allow daily deployments, they would probably not want to spend their whole jobs doing only pre-deployment tests. And that would still only get you one deployment per day instead of the dozens or hundreds that continuous deployment would allow.

The obstacles to putting tests on a legacy project

Let’s say you’ve read the above and you say, “Great. I’m sold on adding tests to my testless legacy project. How do I start?”

Unfortunately it’s not usually simple or easy to add tests to a currently-untested legacy project. There are two main obstacles.

Missing test infrastructure

Writing tests requires a test infrastructure. In order to write a test, I need to be able to part of the application I’m testing (often called the “system under test” or SUT) into a state that allow me to exercise the code and make the assertions I need to make in order to tell me that the code is working.

Getting a test framework and related testing tools installed and configured can be non-trivial. I don’t know of any way to make this easier, I just think it’s a good thing to be aware of.

I’ve worked in teams where the boss says, “Let’s make sure we write tests for all new features.” This seems like a sensible rule on the surface but unfortunately there’s no way this rule can be followed if there’s not already a testing infrastructure and testing strategy in place.

The development team and leadership have to agree on what testing tools they’re going to use, which itself can be a very non-trivial step. All the right people have to have buy-in on the testing tooling and testing strategy or else the team will experience a lot of drag as they try to add tests to the application. You have to pull the car onto the road before you hit the gas. If you hit the gas while you’re still in the garage, you’re just going to crash into the wall. So make sure to think about the big picture and get the right team members on board before you try to add tests to your legacy application.

Dependencies and tight coupling

Sometimes it’s easy to get the SUT into the necessary state. More often in legacy projects, it’s very difficult. Sometimes it’s practically impossible. The reason for the difficulty is often tight coupling between dependencies.

Here’s what I mean by tight coupling. Let’s say we want to test class A which depends on classes B and C, which in turn depend on classes D, E, F and G. This means we have to instantiate seven objects (instances of A, B, C, D, E, F and G) just to test class A. If the dependency chain is long enough, the test may be so painful to write that the developer will decide in frustration to toss his laptop off a bridge and begin a new life in subsistence farming rather than write the test.

How to overcome the obstacles to adding tests to legacy code

Missing test infrastructure

If a project has no tests at all, you have two jobs before you:

  1. Write some tests
  2. Set up the testing infrastructure to make it possible to write tests

Both these things are hard. To the extent possible, I would want to “play on easy mode” to the extent possible when I’m first starting to put test coverage on the project.

So rather than trying to identify the most mission-critical parts of the application and putting tests on those, I would ask myself, “What would be easiest to test?” and write some tests for that area of the code.

For example, maybe the application I’m working on has a checkout page and a contact page. The checkout page is obviously more mission-critical to the business than the contact page. If the checkout page breaks there are of course immediate financial and other business repercussions. If the contact page breaks the consequences might be negligible. So a set of tests covering the checkout page would clearly be more valuable than a set of tests covering the contact page.

However, the business value of the set of tests I’m considering writing isn’t the only factor to take into consideration. If a set of tests for the checkout page would be highly valuable but the setup and infrastructure work necessary to get those tests into place is sufficiently time-consuming that the team is too daunted by the work to ever even get started, then that area of the code is probably not the best candidate for the application’s first tests.

By adding some trivial tests to a trivial piece of functionality, a beachhead can be established that makes the addition of later tests that much easier. It may be that in order to add a test for the contact page, it takes 15 minutes of work to write the test and four hours of work to put the testing infrastructure in place. Now, when I finally do turn my attention to writing tests for the checkout page, much of the plumbing work has already been done, and now my job is that much easier.

Dependencies and tight coupling

If a project was developed without testing in mind, the code will often involve a lot of tight coupling (objects that depend closely on other objects) which makes test setup difficult.

The solution to this problem is simple in concept – just change the tightly coupled objects to loosely coupled ones – but the execution of this solution is often not simple or easy.

The challenge with legacy projects is that you often don’t want to touch any of this mysterious code before it has some test coverage, but it’s impossible to add tests without touching the code first, so it’s a chicken-egg problem. (Credit goes to Michael Feathers’ Working Effectively with Legacy Code for pointing out this chicken-egg problem.)

So how can this chicken-egg problem be overcome? One technique that can be applied is the Sprout Method technique, described both in WEWLC as well as Martin Fowler’s Refactoring.

Here’s an example of some obscure code that uses tight coupling:

require 'open-uri'
file = open('http://www.gutenberg.org/files/11/11-0.txt')
text = file.read
text.gsub!(/-/, ' ')
words = text.split
cwords = []
words.each do |w|
  w.gsub!(/[,\?\.‘’“”\:;!\(\)]/, '')
  cwords << w.downcase
end
words = cwords
words.sort!
whash = {}
words.each do |w|
  if whash[w].nil?
    whash[w] = 0
  end
  whash[w] += 1
end
whash = whash.sort_by { |k, v| v }.to_h
swords = words.sort_by do |el|
  el.length
end
lword = swords.last
whash.each do |k, v|
  puts "#{k.ljust(lword.length + 3 - v.to_s.length, '.')}#{v}"
end

If you looked at this code, I would forgive you for not understanding it at a glance.

One thing that makes this code problematic to test is that it’s tightly coupled to two dependencies: 1) the content of the file at http://www.gutenberg.org/files/11/11-0.txt and 2) stdout (the puts on the second-to-last line).

If I want to separate part of this code from its dependencies, maybe I could grab this chunk of the code:

require 'open-uri'
file = open('http://www.gutenberg.org/files/11/11-0.txt')
text = file.read
text.gsub!(/-/, ' ')
words = text.split
cwords = []
words.each do |w|
  w.gsub!(/[,\?\.‘’“”\:;!\(\)]/, '')
  cwords &lt;&lt; w.downcase
end
words = cwords
words.sort!
whash = {}
words.each do |w|
  if whash[w].nil?
    whash[w] = 0
  end
  whash[w] += 1
end
whash = whash.sort_by { |k, v| v }.to_h
swords = words.sort_by do |el|
  el.length
end
lword = swords.last
whash.each do |k, v|
  puts "#{k.ljust(lword.length + 3 - v.to_s.length, '.')}#{v}"
end

I can now take these lines and put them in their own new method:

require 'open-uri'

def count_words(text)
  text.gsub!(/-/, ' ')
  words = text.split
  cwords = []
  words.each do |w|
    w.gsub!(/[,\?\.‘’“”\:;!\(\)]/, '')
    cwords << w.downcase
  end
  words = cwords
  words.sort!
  whash = {}
  words.each do |w|
    if whash[w].nil?
      whash[w] = 0
    end
    whash[w] += 1
  end
  whash = whash.sort_by { |k, v| v }.to_h
  swords = words.sort_by do |el|
    el.length
  end
  lword = swords.last
  [whash, lword]
end

file = open('http://www.gutenberg.org/files/11/11-0.txt')
whash, lword = count_words(file.read)

whash.each do |k, v|
  puts "#{k.ljust(lword.length + 3 - v.to_s.length, '.')}#{v}"
end

I might still not understand the code but at least now I can start to put tests on it. Whereas before the program would only operate on the contents of http://www.gutenberg.org/files/11/11-0.txt and output the contents to the screen, now I can feed in whatever content I like and have the results available in the return value of the method. (If you’d like to see another example of the Sprout Method technique, I wrote a post about it here.

In addition to the Sprout Method technique I’ve also brought another concept into the picture which I’ll describe now.

Dependency injection

Dependency injection (DI) is a fancy term for a simple concept: passing an object’s dependencies as instance variables (or passing a method’s dependencies as arguments).

I applied DI in my above example when I defined a count_words method that takes a text argument. Instead of the method being responsible for knowing about the content it parses, the method will now happily parse whatever string we give it, not caring if it comes from open('http://www.gutenberg.org/files/11/11-0.txt') or a hard-coded string like 'please please please let me get what i want'.

This gives us a new capability: now, instead of being confined to testing the content of Alice’s Adventures in Wonderland, we can write a test that feeds in the extremely plain piece of content 'please please please let me get what i want' and assert that the resulting count of 'please' is 3. We can of course feed in other pieces of content, like 'hello-hello hello, hello' to ensure the proper behavior under other conditions (in this case, the condition when “weird” characters are present).

If you’d like to see an object-oriented example of applying dependency injection to our original piece of legacy code, here’s how I might do that:

require 'open-uri'

class Document
  def initialize(text)
    @text = text
  end

  def count_words
    @text.gsub!(/-/, ' ')
    words = @text.split
    cwords = []
    words.each do |w|
      w.gsub!(/[,\?\.‘’“”\:;!\(\)]/, '')
      cwords << w.downcase
    end
    words = cwords
    words.sort!
    whash = {}
    words.each do |w|
      if whash[w].nil?
        whash[w] = 0
      end
      whash[w] += 1
    end
    whash = whash.sort_by { |k, v| v }.to_h
    swords = words.sort_by do |el|
      el.length
    end
    lword = swords.last
    [whash, lword]
  end
end

file = open('http://www.gutenberg.org/files/11/11-0.txt')
document = Document.new(file.read)
whash, lword = document.count_words

whash.each do |k, v|
  puts "#{k.ljust(lword.length + 3 - v.to_s.length, '.')}#{v}"
end

Conclusion

Hopefully this post has armed you with a few useful techniques to help you get your legacy project under control and start turning things around.

“We don’t have time to write tests”

I wish I had a dollar for every time I heard some variation of “we would have written tests for this but we didn’t have time.” Let’s unpack the meaning of this statement.

The hidden (fallacious) beliefs behind “no time for tests”

If I say I didn’t write tests due to lack of time, I must view tests as something “extra”. Tests are an extra thing that take up time but aren’t strictly necessary. The real work of course is the feature development itself, writing application code. If I write tests for my code I can finish in 10 hours but if I skip the tests I can finish in 6 hours. So by excluding tests, I’m making the responsible choice for the business and focusing only on what’s essential.

If I’m thinking this way, I’m fooling myself.

Tests save time in the short run

In my experience most developers believe that testing saves time in the long run but not in the short run. Tests are an investment you make: you slow things down today so that in six months things can be faster.

I don’t believe that this is an accurate view of reality. It’s true it often takes longer to write a feature plus tests than it would take to write the same feature without tests. For example, if I build a very simple form that saves a database record and that’s all there is to it, then skipping the test would save time (and writing the test would arguably not add huge value). But the second things get somewhat nuanced, tests start being a big time saver.

The reason is that everything needs to get tested, it’s just a question of whether that testing happens via automated tests or manual tests. Let’s say I’m working on an e-commerce application that has a coupon feature at checkout. Certain coupons can be stacked and certain ones can’t. Certain products are eligible for discounts and certain ones aren’t. There are a lot of possible paths through this feature. All the outcomes need to be tested and, every time the code gets changed, everything needs to be re-tested to check for regressions. Doing this manually is a huge pain in the ass. It’s easier, faster, more reliable and more enjoyable to write automated tests that check all this stuff for us. In this case we don’t have to wait six months before the tests start making our work faster. We experience the benefits of testing immediately.

How to convince yourself or your manager to let you write tests

When the pressure of a deadline is bearing down upon you and you feel like all the stakeholders are breathing down your neck, it can often be very difficult to resist the temptation to skip testing.

Sometimes the pressure to skip tests comes from leadership/management. But in my experience the choice to skip tests usually belongs to the developers.

When you find yourself in those situations here’s my advice. First ask, “Why do I feel tempted not to write tests?” For me personally, I feel tempted not to write tests when doing so would require me to interrupt my feeling of being “in the zone” so I can stop and puzzle over how to go about writing tests for the code I’m writing. I don’t want to stop the momentum, I want to move onto the next to-do on my list and demonstrate progress to my stakeholders. The idea of writing a test feels irresponsible. So in these situations I remind myself not to listen to my feelings but instead listen to my brain. I let my brain remind me that it’s okay to take the time to do my job the most effective way I know how to do it. I give myself permission to do a good job. That might sound silly but for me it actually works.

If the instruction to skip testing comes from management I think there are three options.

One option is to try to educate management that writing tests actually saves time, not just in the long term but in the short term. Unfortunately, experience tells me that this approach is usually not successful. If management commands the development team not to write tests, the root cause is usually an unfixable “we’re too busy sawing to sharpen the saw” mindset that can’t be reversed with any amount of reasoning. Stupid managers can’t be trained to be smart. But if management is in fact smart but uneducated on testing, there’s hope. Managers really do want their developers to follow good practices to ensure sustainable productivity. If your manager is smart, he or she will probably be on your side on the testing issue, provided the manager really understands what it’s all about.

Another option is just to say fuck it and write tests anyway, against orders. I personally have never been big on the idea of doing my work in a dumb way just because my boss told me to. Sometimes I get in trouble for it, sometimes I don’t. One time I got fired for my uncooperativeness. Usually I don’t get in trouble though and everything works out fine. I will say, by the way, that I don’t think I’ve ever been ordered not to write tests. I have been ordered to do other dumb things though, like write code in isolation for three months before deploying to production. This leads me to option three.

If you’re continually ordered by management to skip tests, a very real option is just to leave and get a different job.

But no matter what your scenario, don’t fall into the fallacious belief that skipping tests saves time. It usually doesn’t.

Rails scaffolding and TDD are incompatible (but that’s okay)

Testing + TDD = a serious learning curve

Learning Rails testing is pretty hard. There are a lot of principles and tools to learn. Getting comfortable with testing in Rails (or any framework) often takes developers years.

Compounding the difficulty is TDD. If I’m just starting out with testing, should I learn TDD? Writing tests at all is hard enough. How am I supposed to write a test first?

And if the “testing + TDD” combo doesn’t generate enough cognitive turmoil, “testing + TDD + scaffolding” makes the scenario even murkier.

Scaffolding and TDD

In my experience most Rails developers take advantage of scaffolding to generate CRUD interfaces really quickly. Scaffolding is of course awesome and a big part of what makes Rails Rails.

Where things can get confusing is when you mix the ideas of “I want to use scaffolding to build an application quickly” and “I want to TDD my Rails app”. You cannot do both at the same time.

It seems obvious to me now but it took me a long time to realize it. In TDD, I write the tests before writing code. Scaffolding generates code but not tests*, the exact opposite. So I can’t apply TDD to scaffold-generated code. Test-after is my only option.

*It’s true that an application can be configured to automatically generate the shells of tests whenever a scaffold is generated, but they’re just shells of tests and they don’t actually test anything.

What to do about this TDD-scaffolding incompatibility

So, scaffolding and TDD are incompatible. What do we do about this?

My answer is nothing. The fact that scaffold-generated code can’t be TDD’d isn’t a bad thing, it’s just a fact. I’m making a conscious trade-off: in exchange for getting a bunch of CRUD code for free, I’m trading away the benefits of TDD in that particular area. I think on balance I come out ahead.

This fact does mean that I have to exercise a little more discipline when I’m working with scaffold-generated code. TDD automatically results in good code coverage because, if I follow the methodology to the letter (which BTW I don’t always), I never write a line of code without writing a test first. The rhythm of the red-green-refactor loop means that I don’t really have to apply much discipline. I just put one foot in front of the other according to the rules of TDD and when the dust settles I have good test coverage.

But when doing test-after, I need to resist the temptation to say “Hey, this code works. Do I really need to write a test? I’ll just move onto the next thing.”

How I write tests for scaffold-generated code

After I generate a scaffold I’ll usually follow something like the following process.

  1. Write model specs for attribute validation
  2. Write a feature spec for the “happy path” of creating the resource
  3. Write a feature spec for the “happy path” of updating the resource
  4. Look for anything that would be tedious to regression-test manually and write a feature spec for that

That’s about all I worry about for scaffold-generated code. If during the course of development a stupid bug pops up that a test would easily have prevented, I’ll backfill that code path with a test. And if I need to extend the functionality provided by the scaffold (in other words, if I need to write new code manually), I’ll TDD that, because at that point there’s nothing stopping me from doing so.

Avoiding Network Calls In Rails Tests Without Using Mocks, Stubs or VCR

An important principle of testing is that tests should be deterministic. The passing or failing of a test shouldn’t depend on the date it was run, whether any certain other tests ran before it, or whether some condition outside the application changed.

If a test hits an external URL then it’s susceptible to being non-deterministic. I’ll give you an example from something I was working on today.

Today I wrote a test that said, “When I save a new podcast to the database, all the episodes listed in that podcast’s XML feed should get saved to the database.” I pointed my code at the XML feed URL for the Tim Ferriss Show and expected that when I saved the Tim Ferriss Show, 349 episodes would go into the database.

You can probably sense the problem with this test. What happens when I run this test next week when Tim Ferriss has 350 episodes? The test will fail even though the code will still work.

The solution to this problem was to change the test so it doesn’t hit the URL. But I didn’t use VCR or mocks or stubs. I used dependency injection.

Basically I changed the level of responsibility of the method that parses the XML feed. The original “agreement” was “I’ll give you an XML feed URL and you grab its contents from the internet and parse the file and save the episodes to the database”. I changed that agreement to “I’ll give you the contents of an XML file (which could have come from the internet or from the filesystem) and you parse it and save the episodes to the database”.

Here’s what my parsing method looks like:

def save_and_parse!(xml_feed_contents)
  save!

  PodcastRSSFile.new(
    show: self,
    contents: xml_feed_contents
  ).consume!
end

This method (which you can see in context on GitHub here) doesn’t know or care where the XML feed contents came from.

This means that in production I can pass XML feed contents that came from the internet, and in test I can pass XML feed contents that came from the filesystem.

Does this mean that the actual act of downloading the XML feed content from the internet is untested? Yes. That’s a downside, but I think it’s a small one and I prefer that downside over the downside of having non-deterministic tests.

By the way, I recorded myself writing this test as part of one of my free Rails testing workshops. You can see the recording on YouTube here.