Category Archives: Programming

Common causes of flickering/flapping/flaky tests

A flapping test is a test that sometimes passes and sometimes fails even though the application code being tested hasn’t changed. Flapping tests can be hard to reproduce, diagnose, and fix. Here are some of the common causes I know of for flapping tests. If you know the common causes of flapping tests it can go a long way toward diagnosis. Once diagnosis is out of the way, the battle is half over.

Race conditions

Let’s say you have a Capybara test that clicks a page element that fires off an AJAX request. The AJAX request completes, making some other page element clickable. Sometimes the AJAX request beats the test runner, meaning the test works fine. Sometimes the test runner beats the AJAX request, meaning the test breaks.

There’s a common race condition I’ve experienced in Capybara. A page will load, then Capybara will visit a subsequent page to verify some result. Unfortunately, the visit happens before the prior page gets a chance to fully do its thing, resulting in a failing test. An easy way to fix this is to add an expect(page).to have_content('some content') just before the visit. This way Capybara will wait for the first page to fully load before it tries to visit the second page.

“Leaking” tests (non-deterministic test suite)

Sometimes a test will fail to clean up after itself and “leak” state into other tests. To take a contrived but obvious example, let’s say test A checks that the “list customers” page shows exactly 5 customers and test B verifies that a customer can be created successfully. If test B doesn’t clean up after itself, then running test B before test A will cause test A to fail because there are now 6 customers instead of 5. This exact issue is almost never an issue in Rails (because each test runs inside a transaction) but database interaction isn’t the only possible form of leaky tests.

When I’m fixing this sort of issue I like to a) determine an order of running the tests that always causes my flapping test to fail and then b) figure out what’s happening in the earlier test that makes the later test fail. In RSpec I like to take note of the seed number, then run rspec --seed <seed number> which will re-run my tests in the same order.

Reliance on third-party code

Sometimes tests unexpectedly fail because some behavior of third-party code has changed. I actually sometimes consciously choose to live with this category of flapping test. If an API I rely on goes down and my tests fail, the tests arguably should fail because my application is genuinely broken. But if my tests are flapping in a way that’s not legitimate, I’d probably change my test to use something like VCR instead of hitting the real API.

A Rails model test “hello world”

The following is an excerpt from my book, Rails Testing for Beginners.

What we’re going to do

What we’re going to do in this post is:

  1. Initialize a new Rails application
  2. Install RSpec using the rspec-rails gem
  3. Generate a User model
  4. Write a single test for that User model

The test we’ll write will be a trivial one. The user will have both a first and last name. On the User model we’ll define a full_name method that concatenates first and last name. Our test will verify that this method works properly. Let’s now begin the first step, initializing the Rails application.

Initializing the Rails application

Let’s initialize the application using the -T flag meaning “no test framework”. (We want RSpec in this case, not the Rails default of MiniTest.)

$ rails new model_test_hello_world -T
$ cd model_test_hello_world

Now we can install RSpec.

Installing RSpec

To install RSpec, all we have to do is add rspec-rails to our Gemfile and then do a bundle install.

# Gemfile

group :development, :test do
  gem 'rspec-rails'
end
$ bundle install

We’ve installed the RSpec gem but we still have to run rails g rspec:install which will create a couple configuration files for us.

$ rails g rspec:install

Now we can move on to creating the User model.

Creating the User model

Let’s create a User model with just two attributes: first_name and last_name.

$ rails g scaffold user first_name:string last_name:string
$ rails db:migrate

Now we can write our test.

The User test

Writing the test

Here’s the test code:

# spec/models/user_spec.rb

require 'rails_helper'

RSpec.describe User, type: :model do
  describe '#full_name' do
    it 'concatenates first and last name' do
      user = User.new(first_name: 'Abraham', last_name: 'Lincoln')
      expect(user.full_name).to eq('Abraham Lincoln')
    end
  end
end

Let’s take a closer look at this test code.

Examining the test

Let’s focus for a moment on this part of the test code:

it 'concatenates first and last name' do
  user = User.new(first_name: 'Abraham', last_name: 'Lincoln')
  expect(user.full_name).to eq('Abraham Lincoln')
end

This code is saying:

  1. Instantiate a new User object with first name Abraham and last name Lincoln.
  2. Verify that when we call this User object’s full_name method, it returns the full name, Abraham Lincoln.

What are the its and describes all about? Basically, these “it blocks” and “describe blocks” are there to allows us to put arbitrary labels on chunks of code to make the test more human-readable. (This is not precisely true but it’s true enough for our purposes for now.) As an illustration, I’ll drastically change the structure of the test:

# spec/models/user_spec.rb

require 'rails_helper'

RSpec.describe User, type: :model do
  it 'does stuff' do
    user = User.new(first_name: 'Abraham', last_name: 'Lincoln')
    expect(user.full_name).to eq('Abraham Lincoln')
  end
end

This test will pass just fine. Every it block needs to be nested inside a do block, but RSpec doesn’t care how deeply nested our test cases are or whether the descriptions we put on our tests (e.g. the vague and mysterious it 'does stuff') make any sense.

Running the test

If we run this test it will fail. It fails because the full_name method is not yet defined.

F

Failures:

  1) User#full_name concatenates first and last name
     Failure/Error: expect(user.full_name).to eq('Abraham Lincoln')
     
     NoMethodError:
       undefined method `full_name' for #<User:0x00007ff73d419e20>
     # ./spec/models/user_spec.rb:7:in `block (3 levels) in <top (required)>'

Finished in 0.00695 seconds (files took 0.77932 seconds to load)
1 example, 1 failure

Failed examples:

rspec ./spec/models/user_spec.rb:5 # User#full_name concatenates first and last name

Let’s now write the full_name method.

# app/models/user.rb

class User < ApplicationRecord
  def full_name
    "#{first_name} #{last_name}"
  end
end

Now the test passes.

.                                              

Finished in 0.008 seconds (files took 0.80982 seconds to load)                                
1 example, 0 failures

Taming legacy Ruby code using the “Sprout Method” technique (example 2)

Note: this post is titled “example 2” because some time ago I wrote another Sprout Method example, but after I wrote that post I decided I could come up with a better example.

The Sprout Method technique

If a project was developed without testing in mind, the code will often involve a lot of tight coupling (objects that depend closely on other objects) which makes test setup difficult.

The solution to this problem is simple in concept – just change the tightly coupled objects to loosely coupled ones – but the execution of this solution is often not simple or easy.

The challenge with legacy projects is that you often don’t want to touch any of this mysterious code before it has some test coverage, but it’s impossible to add tests without touching the code first, so it’s a chicken-egg problem. (Credit goes to Michael Feathers’ Working Effectively with Legacy Code for pointing out this chicken-egg problem.)

So how can this chicken-egg problem be overcome? One technique that can be applied is the Sprout Method technique, described both in WEWLC as well as Martin Fowler’s Refactoring.

Here’s an example of some obscure code that uses tight coupling:

require 'open-uri'
file = open('http://www.gutenberg.org/files/11/11-0.txt')
text = file.read
text.gsub!(/-/, ' ')
words = text.split
cwords = []
words.each do |w|
  w.gsub!(/[,\?\.‘’“”\:;!\(\)]/, '')
  cwords << w.downcase
end
words = cwords
words.sort!
whash = {}
words.each do |w|
  if whash[w].nil?
    whash[w] = 0
  end
  whash[w] += 1
end
whash = whash.sort_by { |k, v| v }.to_h
swords = words.sort_by do |el|
  el.length
end
lword = swords.last
whash.each do |k, v|
  puts "#{k.ljust(lword.length + 3 - v.to_s.length, '.')}#{v}"
end

If you looked at this code, I would forgive you for not understanding it at a glance.

One thing that makes this code problematic to test is that it’s tightly coupled to two dependencies: 1) the content of the file at http://www.gutenberg.org/files/11/11-0.txt and 2) stdout (the puts on the second-to-last line).

If I want to separate part of this code from its dependencies, maybe I could grab this chunk of the code (which is the majority of the code, but doesn’t include the network dependency):

text.gsub!(/-/, ' ')
words = text.split
cwords = []
words.each do |w|
  w.gsub!(/[,\?\.‘’“”\:;!\(\)]/, '')
  cwords &lt;&lt; w.downcase
end
words = cwords
words.sort!
whash = {}
words.each do |w|
  if whash[w].nil?
    whash[w] = 0
  end
  whash[w] += 1
end
whash = whash.sort_by { |k, v| v }.to_h
swords = words.sort_by do |el|
  el.length
end
lword = swords.last

I can now take these lines and put them in their own new method:

require 'open-uri'

def count_words(text)
  text.gsub!(/-/, ' ')
  words = text.split
  cwords = []
  words.each do |w|
    w.gsub!(/[,\?\.‘’“”\:;!\(\)]/, '')
    cwords << w.downcase
  end
  words = cwords
  words.sort!
  whash = {}
  words.each do |w|
    if whash[w].nil?
      whash[w] = 0
    end
    whash[w] += 1
  end
  whash = whash.sort_by { |k, v| v }.to_h
  swords = words.sort_by do |el|
    el.length
  end
  lword = swords.last
  [whash, lword]
end

file = open('http://www.gutenberg.org/files/11/11-0.txt')
whash, lword = count_words(file.read)

whash.each do |k, v|
  puts "#{k.ljust(lword.length + 3 - v.to_s.length, '.')}#{v}"
end

I might still not understand the code but at least now I can start to put tests on it. Whereas before the program would only operate on the contents of http://www.gutenberg.org/files/11/11-0.txt and output the contents to the screen, now I can feed in whatever content I like and have the results available in the return value of the method. (If you’d like to see another example of the Sprout Method technique, I wrote another example here.

Code smell: long, procedural background jobs (and how to fix it)

I’ve worked on a handful of Rails applications that have pretty decent test coverage in terms of model specs and feature specs but are lacking in tests for background jobs. It seems that developers often find background jobs more difficult to test than Active Record models—otherwise the test coverage would presumably be better in that area. I’ve found background jobs of 100+ lines in length with no tests, even though the same application’s Active Record models had pretty good test coverage.

Apparently the background job code is somehow viewed as “separate” from the application code. For some reason the background job code doesn’t get tests, and for some reason the background job code doesn’t have an object-oriented structure like the rest of the application.

Whatever the root cause of this issue, the fix is fortunately often pretty simple. We can apply the extract class refactoring technique to turn the long procedure background job into one or more neatly abstracted classes.

Just as I would when embarking on the risky task of refactoring any piece of untested legacy code, I would use characterization testing to help me make sure I’m preserving the original behavior of the background job code as I move it into its own class.

A Rails testing “hello world” using RSpec and Capybara

The following is an excerpt from my book, The Complete Guide to Rails Testing.

What we’re going to do

One of the biggest challenges in getting started with testing is just that—the simple act of getting started. What I’d like is for you, dear reader, to get a small but meaningful “win” under your belt as early as possible. If you’ve never written a single test in a Rails application before, then by the end of this chapter, you will have written your first test.

Here’s what we’re going to do:

    • Initialize a plain-as-possible Rails application
    • Create exactly one static page that says “Hello, world!”
    • Write a single test (using RSpec and Capybara) that verifies that our static page in fact says “Hello, world!”

The goal here is just to walk through the motions of writing a test. The test itself is rather pointless. I would probably never write a test in real life that just verifies the content of a static page. But the goal is not to write a realistic test but to provide you with a mental “Lego brick” that you can combine with other mental Lego bricks later. The ins and outs of writing tests get complicated quick enough that I think it’s valuable to start with an example that’s almost absurdly simple.

The tools we’ll be using

For this exercise we’ll be using RSpec, anecdotally the most popular of the Rails testing frameworks, and Capybara, a library that enables browser interaction (clicking buttons, filling out forms, etc.) using Ruby. You may find some of the RSpec or Capybara syntax confusing or indecipherable. You also may well not understand where the RSpec stops and the Capybara starts. For the purposes of this “hello world” exercise, that’s okay. Our goal right now is not deep understanding but to begin putting one foot in front of the other.

Initializing the application

Run the “rails new” command

First we’ll initialize this Rails app using the good old rails new command.

$ rails new hello_world -T

You may be familiar with the -T flag. This flag means “no tests”. If we had done rails new hello_world without the -T flag, we would have gotten a Rails application with a test directory containing MiniTest tests. I want to use RSpec, so I want us to start with no MiniTest.

Let’s also create our application’s database at this point since we’ll have to do that eventually anyway.

$ cd hello_world
$ rails db:create

Update our Gemfile

This is the step where RSpec and Capybara will start to come into the picture. Each library will be included in our application in the form of a gem. We’ll also include the webdrivers gem which is necessary in order for Capybara to interact with the browser.

Let’s add the following to our Gemfile under the :development, :test group. I’ve added a comment next to each gem or group of gems describing its role. Even with the comments, it may not be abundantly clear at this moment what each gem is for. At the end of the exercise we’ll take a step back and talk about which library enabled which step of what we just did.

group :development, :test do
  # The RSpec testing framework
  gem 'rspec-rails'

  # Capybara, the library that allows us to interact with the browser using Ruby
  gem 'capybara'

  # The following gems aids with the nuts and bolts
  # of interacting with the browser.
  gem 'webdrivers'
end

Don’t forget to bundle install.

Install RSpec

Although we’ve already installed the RSpec gem, we haven’t installed RSpec into our application. Just like the Devise gem, for example, which requires us not only to add devise to our Gemfile but to also run rails g devise:install, RSpec installation is a two-step process. After we run this command we’ll have a spec directory in our application containing a couple config files.

$ rails g rspec:install

Now that we’ve gotten the “plumbing” work out of the way, let’s write some actual application code.

Creating our static page

Let’s generate a controller, HelloController, with just a single action, index.

$ rails g controller hello_world index

Let’s modify the action’s template so it just says “Hello, world!”.

Hello, world!

Just as a sanity check, let’s start the Rails server and open up our new page in the browser to make sure it works as expected.

$ rails server
$ open http://localhost:3000/hello_world/index

We have our test infrastructure. We have the feature we’re going to test. Now let’s write the test itself.

Writing our test

Write the actual test

Let’s create a file called spec/hello_world_spec.rb with the following content:

require 'rails_helper'

RSpec.describe 'Hello world', type: :system do
  describe 'index page' do
    it 'shows the right content' do
      visit hello_world_index_path
      expect(page).to have_content('Hello, world!')
    end
  end
end

If you focus on the “middle” two lines, the ones starting with visit and expect, you can see that this test takes the form of two steps: visit the hello world index path and verify that the page there says “Hello, world!” If you’d like to understand the test in more detail, below is an annotated version.

# This pulls in the config from spec/rails_helper.rb
# that's needed in order for the test to run.
require 'rails_helper'

# RSpec.describe, or just describe, is how all RSpec tests start.
# The 'Hello world' part is an arbitrary string and could have been anything.
# In this case we have something extra in our describe block, type: :system.
# The type: :system setting does have a functional purpose. It's what
# triggers RSpec to bring Capybara into the picture when we run the test.
RSpec.describe 'Hello world', type: :system do

  describe 'index page' do
    it 'shows the right content' do

      # This is where Capybara starts to come into the picture. "visit" is a
      # Capybara method. hello_world_index_path is just a Rails routing
      # helper method and has nothing to do with RSpec or Capybara.
      visit hello_world_index_path

      # The following is a mix of RSpec syntax and Capybara syntax. "expect"
      # and "to" are RSpec, "page" and "have_content" are Capybara. Newcomers
      # to RSpec and Capybara's English-sentence-like constructions often
      # have difficulty remembering when two words are separated by a dot or
      # an underscore or parenthesis, myself included. Don't worry, you'll
      # get familiar over time.
      expect(page).to have_content('Hello, world!')
    end
  end
end

Now that we’ve written and broken down our test, let’s run it!

Run the test

This test can be run by simply typing rspec followed by the path to the test file.

$ rspec spec/hello_world_spec.rb

The test should pass. There’s a problem with what we just did, though. How can we be sure that the test is testing what we think it’s testing? It’s possible that the test is passing because our code works. It’s also possible that the test is passing because we made a mistake in writing our test. This concern may seem remote but “false positives” happen a lot more than you might think. The only way to be sure our test is working is to see our test fail first.

Make the test fail

Why does seeing a test fail prior to passing give us confidence that the test works? What we’re doing is exercising two scenarios. In the first scenario, the application code is broken. In the second scenario, the application code is working correctly. If our test passes under that first scenario, the scenario where we know the application code is broken, then we know our test isn’t doing its job properly. A test should always fail when its feature is broken and always pass when its feature works. Let’s now break our “feature” by changing the text on our page to something wrong.

Jello, world!

When we run our test now it should see it fail with an error message like expected to find text "Hello, world!" in "Jello, world!".

$ rspec spec/hello_world_spec.rb

Watch the test run in the browser

Add the following to the bottom of spec/rails_helper.rb:

Capybara.default_driver = :selenium_chrome

Then run the test again:

$ rspec spec/hello_world_spec.rb

This time, Capybara should pop open a browser where you can see our test run. When I’m actively working with a test and I want to see exactly what it’s doing, I often find the running of the test to be too fast for me, so I’ll temporarily add a sleep wherever I want my test to slow down. In this case I would put the sleep right after visiting hello_index_path.

require 'rails_helper'

RSpec.describe 'Hello world', type: :system do
  describe 'index page' do
    it 'shows the right content' do
      visit hello_world_index_path
      sleep(5)
      expect(page).to have_content('Hello, world!')
    end
  end
end

Review

There we have it: the simplest possible Rails + RSpec + Capybara test. Unfortunately “the simplest possible Rails + RSpec + Capybara test” is still not particularly simple in absolute terms, but it’s pretty simple compared to everything that goes on in a real production application.

Who’s to blame for bad code? Coders.

When time pressure from managers drives coders to write poor-quality code, some people think the responsible party is the managers. I say it’s the coders. Here’s why I think so.

The ubiquity of time pressure in software development work

I’ve worked on a lot of development projects with a lot of development teams. In the vast majority of cases, if not all cases, the development team is perceived as being “behind”. Sometimes the development team’s progress is slower than the development team themselves set. Sometimes development progress is seen as behind relative to some arbitrary, unrealistic deadline set by someone outside the development team. Either way, no one ever comes by and says to the development team, “Wow, you guys are way ahead of where we expected!”

Often leadership will beg, plead, cajole, flatter, guilt-trip or bribe in order to increase the chances that their timeline will be met. They’ll ask the developers things like, “How can we ‘get creative’ to meet this timeline?” They’ll offer to have lunch, dinner, soda and alcohol brought in for a stretch of time so the developers can focus on nothing but the coding task at hand. Some leaders, lacking a sense of short-term/long-term thinking, will ask the developers to cut corners, to consciously incur technical debt in order to work faster. They might say things like “users don’t care about code” or “perfect is the enemy of the good“.

Caving to the pressure

Unfortunately, in my experience, most developers, most of the time, cave to the pressure to cut corners and deliver poor quality work. If you think I’m mistaken, I would ask you to show me a high-quality production codebase. There are very few in existence.

I myself am by no means innocent. I’ve caved to the pressure and delivered poor-quality work. But that doesn’t mean I think it’s excusable to do poor work or even (as some developers seem to believe) heroic.

Why bad code is the fault of coders, not managers

Why are coders, not managers, the ones responsible for bad code? Because coders are the ones who wrote the code. It was their fingers on the keyboard. The managers weren’t holding a gun to the developers’ heads and demanding they do a bad job.

“But,” a developer might say, “I had to cut corners in order to meet the deadline, and if I didn’t meet the deadline, I might get fired.” I don’t believe this reasoning stands up very well to scrutiny. I’d like to examine this invalid excuse for doing poor work as well as some others.

Invalid excuses for shipping poor work

Fallacy: cutting corners is faster

Why exactly is “bad code” bad? Bad code is bad precisely because it’s slow and expensive to work with. Saying, “I know this isn’t the right way to do it but it was faster” is a self-contradiction. The exact reason “the right way” is the right way is because it’s faster. Faster in the long run, of course. Sloppy, rushed work can be faster in the very short-term, of course, but any speed gain is usually negated the next time someone has to work with that area of code, which could be as soon as later that same day.

Fallacy: “they didn’t give me enough time to do a good job”

If I’m not given enough time to do a good job, the answer is not to do a bad job. The answer is that that thing can’t be done in that amount of time.

If my boss comes to me and says, “Can we do X in 6 weeks” and my response is “It would take 12 weeks to do X, but if we cut a lot of corners we could do it in 6 weeks” then all my boss hears is “WE CAN DO IT IN 6 WEEKS!”

Given the option between hitting a deadline with “bad code” and missing a deadline, managers will roughly always choose hitting the deadline with bad code. In this way managers are like children who ask over and over to eat candy for dinner. They can’t be trusted to take the pros and cons and make the right decision.

Fallacy: cutting corners is a business decision

It’s my boss’s role to tell me what to do and when to do it. It’s my job to, for the most part, comply. As a programmer I typically won’t have visibility into business decisions the way my boss does. As a programmer I will often not have the right skillset or level of information necessary to have a valid opinion on most business matters.

This idea that my boss can tell me what to do only applies down to a certain level of granularity though. I should let my boss tell me what to do but I shouldn’t let my boss tell me how to do it. I care about the professional reputation I have with my colleagues and with myself. I think there’s a certain minimum level of quality below which professionals should not be willing to go, even though it’s always an option to go lower.

Fallacy: “we can’t afford perfection”

I wish I had a dollar for every time I heard somebody say something like, “Guys, it doesn’t need to be perfect, it just needs to be good enough.”

The fallacy here is that the choice is between “perfect” and “good”. That’s a mistaken way of looking at it. The standard mode of operation during a rush period for developers is to pile sloppy Band-Aid fix on top of sloppy Band-Aid fix as if they desire nothing more strongly in life than to paint themselves into a corner from which they will never be able to escape. The alternative to this horror show isn’t “perfection”. The alternative is just what I might call a “minimally good job”.

Fallacy: “if I don’t comply with instructions to cut corners, I’ll get fired”

This might be true a very very very small percentage of the time. Most of the time it’s not, though. It’s pretty hard to get fired from a job. Firing someone is painful and expensive. Usually someone has to rack up many offenses before they finally get booted.

Fallacy: “we’ll clean it up after the deadline”

Fixing sloppy work after the deadline “when we have more time” or “when things slow down” never happens. Deadlines, such as a go-live date, often translate to increased pressure and time scarcity because now there’s more traffic and visibility on the system. Deadlines are usually followed by more deadlines. You never get “more time”. Pressure rarely goes down over time. It usually goes up.

Fallacy: “I would have done a better job, but it wasn’t up to me”

Nope. I alone am responsible for the work I do. If someone pressured me to do the wrong thing and in response I chose to do the wrong thing, then I’m responsible for choosing to cave to the pressure.

Fallacy: “I haven’t been empowered to make things better”

It’s rare that a superior will come along and tell me, “Jason, you explicitly have permission to make things better around here.” The reality is that it’s up to me to observe what needs to be done better and help make it happen. I’m a believer in the saying that “it’s easier to get forgiveness than permission”. I tend to do what my professional judgment tells me needs to be done without necessarily asking my boss if it’s okay. Does this get me in trouble sometimes? Yes. I look at those hand-slaps as the cost of self-empowerment.

Plus, all employees have power over their superiors in that they have the power to leave. I don’t mean this is a “let me have my way or I quit!” way. I mean that if I work somewhere that continually tries to force me to do a bad job, I don’t have to just accept it as my fate in life. I can go work somewhere else. That doesn’t fix anything for the organization I’m leaving (bad organizations and bad people can’t be fixed anyway) but it does fix the problem for me.

My admonition to my colleagues: don’t make excuses for poor work

I think developers make too many excuses and apologies for bad code. (And as a reminder, “bad code” means code that’s risky, time-consuming and expensive to work with.)

It’s not realistic to do a great job all the time, but I think we can at least do a little better. Right now I think we as a profession err too much on the side of justifying poor-quality work. I think we could stand to err more on the side of taking responsibility.

Sometimes it’s better for tests to hit the real API

A mysteriously broken test suite

The other day I ran my test suite and discovered that nine of my tests were suddenly and mysteriously broken. These tests were all related to the same feature: a physician search field that hits something called the NPPES NPI Registry, a public API.

My first thought was: is something wrong with the API? I visited the NPI Registry site and, sure enough, I got a 500 error when I clicked the link to their API URL. The API appeared to be broken. This was a Sunday so I didn’t necessarily the NPI API maintainers to fix it speedily. I decided to just ignore these nine specific tests for the time being.

By mid-Monday, to my surprise, my tests were still broken and the NPI API was still giving a 500 error. I would have expected them to have fixed it by then.

The surprising root cause of the bug

On Tuesday the API was still broken. On Wednesday the API was still broken. At this point I thought okay, maybe it’s not the case that the NPI API has just been down for four straight days with the NPI API maintainers not doing anything about it. Maybe the problem lies on my end. I decided to do some investigation.

Sure enough, the problem was on my end. Kind of. Evidently the NPI API maintainers had made a small tweak to how the API works. In addition to the fields I had already been sending like first_name and last_name, the NPI API now required an additional field, address_purpose. Without address_purpose, the NPI API would return a 500 error. I added address_purpose to my request and all my tests started passing again. And, of course, my application’s feature started working again.

The lesson: my tests were right to hit the real API

When I first discovered that nine of my tests were failing due to a broken external API, my first thought was, “Man, maybe I should mock out those API calls so my tests don’t fail when the API breaks.” But then I thought about it a little harder. The fact that the third-party API was broken meant my application was broken. The tests were failing because they ought to have been failing.

It’s been my observation that developers new to testing often develop the belief that “third-party API calls should be mocked out”. I don’t think this is always true. I think the decision whether to mock third-party APIs should be made on a case-by-case basis depending on certain factors. These factors include, but aren’t limited to:

  • Would allowing my tests to hit the real API mess up production data?
  • Is the API rate-limited?
  • Is the API prone to sluggishness or outages?

(Thanks to Kostis Kapelonis, who I interviewed on The Ruby Testing Podcast, for helping me think about API interaction this way.)

If hitting the real API poses no danger to production data, I would probably be more inclined to let my tests hit the real API than to mock out the responses. (Or, to be more precise, I would want my test suite to include some integration tests that hit the real API, even if I might have some other API-related tests that mock out API responses for economy of speed.) My application is only really working if the API is working too.

Refactorings should be atomic

It’s often seen as wise not to try to make huge, sweeping improvements to a codebase all in a go. I couldn’t agree more. Such changes carry way too much risk.

An alternative way of approaching codebase improvements is to apply the Boy Scout rule, “always leave the campground a little cleaner than you found it”. I think this rule is a pretty good one and appropriate under some circumstances, although I think some caution is in order as to how exactly to interpret the rule. The main danger of this rule is that if interpreted the wrong way, it discourages atomicity.

Tiny atomic refactorings

I have a mantra I repeat to myself regularly: “never underestimate your own ability to screw stuff up”. Seemingly safe, innocuous, inconsequential one-line changes have an amazing ability to introduce surprising bugs.

For this reason, I virtually never mix refactorings or cleanups in with my feature work or bugfixes, even very tiny ones, unless the thing I’m cleaning up is directly in the path of the work I need to do to complete my feature or bugfix. Let’s say for example that I’m adding a new capability to the sales report. I don’t want to toss in a small cleanup and accidentally break the password reset feature.

If I discover a tiny improvement I want to make, e.g. replacing a single instance of appointments.map { |a| a.user } to appointments.map(&:user), I’ll make a separate commit just with that tiny improvement instead of letting it become a “rider” on an unrelated commit. This is basically just an application of the idea of atomic commits.

Bigger atomic refactorings

Sometimes a refactoring isn’t all that small. For example, sometimes you want to take a handful of classes and move them into a new namespace to give some structure to the files in your application.

In these cases I’ll try to create one single commit that effects the whole entire refactoring. This way, if I get 75% done with the refactoring and realize my whole premise is a blunder, I can just git checkout . && git clean -df and cleanly go back to the way things were before I started my refactoring work.

When to perform refactorings

In my experience there are two good times to perform a refactoring. If there’s an area of the code that I need to do some work in but it’s messy, I’ll perform a refactoring (or multiple refactorings) in that area of code as a separate piece of work before I start making my change in that area. I think of this as “cleaning the kitchen before I make dinner”. It’s faster and more enjoyable to make dinner in a clean kitchen than in a messy one.

One good time to perform a refactor is before a change. The other is after a change. If I make a change to an area of code and I find that afterward that area is unacceptably messy, I’ll perform a refactoring—again, as a separate piece of work—to clean it up. By the way, it’s possible to end up with a messy area of a codebase by simply writing poor-quality code, but that’s not the only way to introduce bad code. The accumulation of technical debt is natural and unavoidable and in fact the only way not to accumulate technical debt is to periodically perform refactorings.

The one time that I don’t think is a good time to perform refactorings is in the middle of a change. If the refactoring introduces a bug but the refactoring change is mixed with a change that you want to keep, it can be hard to back out the refactoring change without also affecting the “good” change. It’s also harder to review a pull request that contains both a change and a refactoring than it is to review a pull request that only contains a change. In order to have refactorings as a separate pull request, this means that the development team (and perhaps technical leadership) will have to be on board with the idea that you’ll regularly have PRs that contain nothing but a refactoring.

By the way, credit goes to Ben Orenstein (who I interviewed on my podcast) for teaching me that just before a change and just after a change are good times to do refactorings.

When refactorings can’t be atomic

Sometimes a refactoring is too big to be atomic. Sometimes I embark upon a “refactoring project” that will touch many areas of the codebase over the course of multiple weeks or months. In these cases I deliberately spread the refactoring project into small pieces over time as a way of mitigating risk. I’ve found though that most big refactorings, while not atomic as a whole, can be split into a number of smaller refactorings that are themselves individually atomic.

How I write characterization tests

What characterization tests are and what problem they solve

Let’s say I come across a piece of code that I want to refactor but unfortunately a) I don’t understand what it does and b) it’s not covered by tests. Refactoring untested code I don’t understand that’s not tested is risky business. But how can I write tests for the code if I don’t even know what it’s supposed to do?

Luckily there’s a technique called Characterization Testing, which I first discovered from Michael Feathers’ book, which makes solving this challenge pretty straightforward and easy. I’m going to show you how I do it.

What I’m going to show you

Below is a chunk of code I found in an old project of mine. The two methods below, services_with_info and products_with_info have a couple problems. One is that they’re very duplicative of each other. Another is that the Appointment class, which clocks in at 708 lines of code (most of which I’ve omitted out of mercy for the reader), is doing way two much stuff, and these two methods aren’t helping. These two methods should probably really belong to some other abstraction so Appointment can focus on being an Appointment and not have to worry about serializing its appointment_services and appointment_products.

class Appointment < ActiveRecord::Base
  def services_with_info
    appointment_services.collect { |item|
      item.serializable_hash.merge(
        "price" => number_with_precision(item.price, :precision => 2),
        "label" => item.service.name,
        "item_id" => item.service.id,
        "type" => "service"
      )
    }
  end

  def products_with_info
    appointment_products.collect { |item|
      item.serializable_hash.merge(
        "price" => number_with_precision(item.price, :precision => 2),
        "label" => item.product.name,
        "item_id" => item.product.id,
        "type" => "product"
      )
    }
  end
end

Refactoring the services_with_info method

Of these two methods I’m going to focus first on services_with_info. My goal is to take the body of the method, ask myself “What currently-nonexistent abstraction does this group of lines represent?”, create that abstraction, and move the body of the method there.

I might come up with an abstraction that I’m happy with on the first try, I might not. The first abstraction idea I’ll try is a new abstraction called AppointmentServiceCollection. Below is the class. For the most part all I’ve done is copy and paste the contents of services_with_info into this new class.

class AppointmentServiceCollection
  def initialize(appointment_services)
    @appointment_services = appointment_services
  end

  def to_h
    @appointment_services.to_h { |item|
      item.serializable_hash.merge(
        "price" => number_with_precision(item.price, :precision => 2),
        "label" => item.service.name,
        "item_id" => item.service.id,
        "type" => "service"
      )
    }
  end
end

First goal: simply instantiate the object inside a test

When I start writing my test for this new class I’m going to put in as little effort as possible. My first question to myself will be: can I even instantiate an AppointmentServiceCollection?

require 'spec_helper'

describe AppointmentServiceCollection do
  describe '#to_h' do
    it 'works' do
      appointment_service = create(:appointment_service)
      collection = AppointmentServiceCollection.new([appointment_service])
    end
  end
end

Second goal: see what value the method-under-test returns

When I run the test it passes. So far so good. My next step will be to add a very silly assertion.

require 'spec_helper'

describe AppointmentServiceCollection do
  describe '#to_h' do
    it 'works' do
      appointment_service = create(:appointment_service)
      collection = AppointmentServiceCollection.new([appointment_service])
      expect(collection.to_h).to eq('asdf')
    end
  end
end

I of course know that collection.to_h won’t equal asdf, but I don’t know what it will equal, so any value I check for will be equally wrong. No point in exerting any mental effort guessing what the output will be. The test result will tell me soon enough anyway.

When I run my test now, here’s what I get:

F

Failures:

  1) AppointmentServiceCollection#to_h works
     Failure/Error: expect(collection.to_h).to eq('asdf')
     TypeError:
       wrong element type AppointmentService at 0 (expected array)
     # ./app/models/appointment_service_collection.rb:7:in `to_h'
     # ./app/models/appointment_service_collection.rb:7:in `to_h'
     # ./spec/models/appointment_service_collection_spec.rb:8:in `block (3 levels) in <top (required)>'

Finished in 0.50943 seconds (files took 3.44 seconds to load)
1 example, 1 failure

That’s unexpected. I could scratch my head and puzzle over why this is happening, but instead I’ll just comment out code until the error goes away.

class AppointmentServiceCollection
  def initialize(appointment_services)
    @appointment_services = appointment_services
  end

  def to_h
    @appointment_services.to_h { |item|
      #item.serializable_hash.merge(
        #"price" => number_with_precision(item.price, :precision => 2),
        #"label" => item.service.name,
        #"item_id" => item.service.id,
        #"type" => "service"
      #)
    }
  end
end

When I run the test again I get the same error again. I’ll comment out more code.

class AppointmentServiceCollection
  def initialize(appointment_services)
    @appointment_services = appointment_services
  end

  def to_h
    #@appointment_services.to_h { |item|
      #item.serializable_hash.merge(
        #"price" => number_with_precision(item.price, :precision => 2),
        #"label" => item.service.name,
        #"item_id" => item.service.id,
        #"type" => "service"
      #)
    #}
  end
end

Now, not too surprisingly, I get a different error instead:

F

Failures:

  1) AppointmentServiceCollection#to_h works
     Failure/Error: expect(collection.to_h).to eq('asdf')
       
       expected: "asdf"
            got: nil
       
       (compared using ==)
     # ./spec/models/appointment_service_collection_spec.rb:8:in `block (3 levels) in <top (required)>'

Finished in 0.49608 seconds (files took 3.41 seconds to load)
1 example, 1 failure

This makes a lightbulb go on for me. Clearly the problem lies in the #@appointment_services.to_h { |item| line. The problem must be the to_h. What if I try map instead?

class AppointmentServiceCollection
  def initialize(appointment_services)
    @appointment_services = appointment_services
  end

  def to_h
    @appointment_services.map { |item|
      #item.serializable_hash.merge(
        #"price" => number_with_precision(item.price, :precision => 2),
        #"label" => item.service.name,
        #"item_id" => item.service.id,
        #"type" => "service"
      #)
    }
  end
end

That works. Instead of getting the “wrong element type” error I was getting before when the @appointment_services I’m now getting a different error that’s more in line with my expectations:

F

Failures:

  1) AppointmentServiceCollection#to_h works
     Failure/Error: expect(collection.to_h).to eq('asdf')
       
       expected: "asdf"
            got: [nil]
       
       (compared using ==)
       
       Diff:
       @@ -1,2 +1,2 @@
       -"asdf"
       +[nil]
       
     # ./spec/models/appointment_service_collection_spec.rb:8:in `block (3 levels) in <top (required)>'

Finished in 0.48778 seconds (files took 3.5 seconds to load)
1 example, 1 failure

Now let me uncomment the full body of the to_h method and see what happens.

class AppointmentServiceCollection
  def initialize(appointment_services)
    @appointment_services = appointment_services
  end

  def to_h
    @appointment_services.map { |item|
      item.serializable_hash.merge(
        "price" => number_with_precision(item.price, :precision => 2),
        "label" => item.service.name,
        "item_id" => item.service.id,
        "type" => "service"
      )
    }
  end
end

Now I get a different error. It doesn’t like my use of number_with_precision.

F

Failures:

  1) AppointmentServiceCollection#to_h works
     Failure/Error: expect(collection.to_h).to eq('asdf')
     NoMethodError:
       undefined method `number_with_precision' for #<AppointmentServiceCollection:0x007ff60274e640>
     # ./app/models/appointment_service_collection.rb:9:in `block in to_h'
     # ./app/models/appointment_service_collection.rb:7:in `map'
     # ./app/models/appointment_service_collection.rb:7:in `to_h'
     # ./spec/models/appointment_service_collection_spec.rb:8:in `block (3 levels) in <top (required)>'

Finished in 0.52793 seconds (files took 3.32 seconds to load)
1 example, 1 failure

Luckily I happen to know exactly how to fix this. In order to use the number_with_precision helper in a model (as opposed to in a view), you need to include the ActionView::Helpers::NumberHelper module.

class AppointmentServiceCollection
  include ActionView::Helpers::NumberHelper

  def initialize(appointment_services)
    @appointment_services = appointment_services
  end

  def to_h
    @appointment_services.map { |item|
      item.serializable_hash.merge(
        "price" => number_with_precision(item.price, :precision => 2),
        "label" => item.service.name,
        "item_id" => item.service.id,
        "type" => "service"
      )
    }
  end
end

The return values, revealed

With this error fixed, I now get my most interesting test result yet:

F                                              

Failures:                                      

  1) AppointmentServiceCollection#to_h works   
     Failure/Error: expect(collection.to_h).to eq('asdf')                                     
                                               
       expected: "asdf"                        
            got: [{"id"=>1, "appointment_id"=>1, "service_id"=>1, "created_at"=>Sun, 24 Feb 2019 16:45:16 EST -05:00, "updated_at"=>Sun, 24 Feb 2019 16:45:16 EST -05:00, "length"=>nil, "stylist_id"=>2, "price"=>"0.00", "label"=>"ve0xttqqfl", "item_id"=>1, "type"=>"service"}]        
                                               
       (compared using ==)                     
                                               
       Diff:                                   
       @@ -1,2 +1,12 @@                        
       -"asdf"                                 
       +[{"id"=>1,                             
       +  "appointment_id"=>1,                 
       +  "service_id"=>1,                     
       +  "created_at"=>Sun, 24 Feb 2019 16:45:16 EST -05:00,                                 
       +  "updated_at"=>Sun, 24 Feb 2019 16:45:16 EST -05:00,                                 
       +  "length"=>nil,                       
       +  "stylist_id"=>2,                     
       +  "price"=>"0.00",                     
       +  "label"=>"ve0xttqqfl",               
       +  "item_id"=>1,                        
       +  "type"=>"service"}]                  
                                               
     # ./spec/models/appointment_service_collection_spec.rb:8:in `block (3 levels) in <top (required)>'                                                                                      

Finished in 0.72959 seconds (files took 3.24 seconds to load)                                 
1 example, 1 failure

The reason I say this result is interesting is because instead of an error I get the actual value that the method is returning. The secrets of the universe are being revealed to me.

More realistic assertions

At this point I’m able to come up with some more realistic values to put in my test. I know that my test will fail because my values are just arbitrary and made up, but I feel confident that my values are pretty close.

require 'spec_helper'

describe AppointmentServiceCollection do
  describe '#to_h' do
    it 'works' do
      appointment_service = create(:appointment_service)
      collection = AppointmentServiceCollection.new([appointment_service])

      item = collection.to_h.first
      expect(item['price']).to eq('30.00')
      expect(item['label']).to eq('Mens Haircut')
      expect(item['item_id']).to eq(appointment_service.item.id)
      expect(item['type']).to eq('service')
    end
  end
end

I run my test just for fun and it of course fails. My next step is to make the test setup match my assertion values.

require 'spec_helper'

describe AppointmentServiceCollection do
  describe '#to_h' do
    it 'works' do
      appointment_service = create(
        :appointment_service,
        price: 30,
        service: create(:service, name: 'Mens Haircut')
      )

      collection = AppointmentServiceCollection.new([appointment_service])

      item = collection.to_h.first
      expect(item['price']).to eq('30.00')
      expect(item['label']).to eq('Mens Haircut')
      expect(item['item_id']).to eq(appointment_service.service.id)
      expect(item['type']).to eq('service')
    end
  end
end

What I do after I get the test passing

The test now passes. So far I’ve touched my application code as little as possible because I wanted to maximize the chances of preserving the original functionality. Now that I have a test in place, my test can do the job of preserving the original functionality, and I’m free to clean up and refactor the code.

class AppointmentServiceCollection
  include ActionView::Helpers::NumberHelper
  ITEM_TYPE = 'service'

  def initialize(appointment_services)
    @appointment_services = appointment_services
  end

  def to_h
    @appointment_services.map do |appointment_service|
      appointment_service.serializable_hash.merge(extra_attributes(appointment_service))
    end
  end

  private

  def extra_attributes(appointment_service)
    {
      'price'   => number_with_precision(appointment_service.price, precision: 2),
      'label'   => appointment_service.service.name,
      'item_id' => appointment_service.service.id,
      'type'    => ITEM_TYPE
    }
  end
end

I’ll also want to clean up my test code.

require 'spec_helper'

describe AppointmentServiceCollection do
  describe '#to_h' do
    let(:appointment_service) do
      create(
        :appointment_service,
        price: 30,
        service: create(:service, name: 'Mens Haircut')
      )
    end

    let(:item) { AppointmentServiceCollection.new([appointment_service]).to_h.first }

    it 'adds the right attributes' do
      expect(item['price']).to eq('30.00')
      expect(item['label']).to eq('Mens Haircut')
      expect(item['item_id']).to eq(appointment_service.service.id)
      expect(item['type']).to eq('service')
    end
  end
end

Lastly, I’ll replace the original services_with_info method body with my new abstraction. Here’s the original code.

def services_with_info
  appointment_services.collect { |item|
    item.serializable_hash.merge(
      "price" => number_with_precision(item.price, :precision => 2),
      "label" => item.service.name,
      "item_id" => item.service.id,
      "type" => "service"
    )
  }
end

Here’s the new version which uses AppointmentServiceCollection.

def services_with_info
  AppointmentServiceCollection.new(appointment_services).to_h
end

What we’ve accomplished

This is only a drop in the bucket but it helps. “A journey of a thousand miles begins with a single step,” as they say. This area of the code is now slightly more understandable. Next I could give the products_with_info the same treatment. Perhaps I could extract the duplication between the two methods into a superclass. If I apply this tactic to the Appointment class over and over, I could probably whittle it down from its current 708 lines to the more reasonable size of perhaps a few tens of lines.

How to clear up obscure Rails tests using Page Objects

The challenge of keeping test code clean

The hardest part of a programmer’s job isn’t usually figuring out super hard technical problems. The biggest challenge for most developers, in my experience, is to write code that can stand up over time without collapsing under the weight of its own complexity.

Just as it’s challenging to keep a clean and understandable codebase, it’s also challenging to keep a clean and understandable test suite, and for the same exact reasons.

If I look at a test file for the first time and I’m immediately able to grasp what the test is doing and why it’s doing it, then I have a clear test. The test has a high signal-to-noise ratio. That’s good.

The opposite scenario is when the test is full of noise. Perhaps the test contains so many low-level details that the high-level meaning of the test is obscured. The term for this condition, this “test smell”, is Obscure Test. That’s bad. If the test code is obscure when it could have been clear, that creates extra head-scratching time.

One tool that can be used to help make Obscure Tests more understandable is the concept of a Page Object.

What a Page Object is and when it’s helpful

As a preface: Page Objects are only relevant when dealing with integration tests—that is, tests that interact with the browser. Page Objects don’t come into the picture for model tests or anything else that doesn’t interact with a browser.

A Page Object is an abstraction of a component of a web page. For example, I might create a Page Object that represents the sign-in form for a web application. (In fact, virtually all my Page Objects represent forms, since that where I find them to be most helpful.)

The idea is that instead of using low-level, “computerey” commands to interact with a part of a page, a Page Object allows me to use higher-level language that’s more easily understandable by humans. Just as Capybara is an abstraction on top of Selenium that’s clearer for a programmer to read and write than using Selenium directly, Page Objects can be an abstraction layer on top of Capybara commands that’s clearer for a programmer to read and write than using Capybara directly (at least that’s arguably the case, and only in certain situations, and when done intelligently).

Last note on what a Page Object is and isn’t: I personally started with the misconception that a Page Object is an abstraction of a page. According to Martin Fowler, that’s not the idea. A Page Object is an abstraction of a certain part of a page. There’s of course no law saying you couldn’t create an abstraction that represents a whole page instead of a component on a page, but having done it both ways, I’ve found it more useful to create Page Objects as abstractions of components rather than of entire pages. I find that my Page Objects come together more neatly with the rest of my test code when that’s the case.

A Page Object example

I’m going to show you an example of a Page Object by showing you the following:

  1. A somewhat obscure test
  2. A second, cleaner version of the same test, using a Page Object
  3. The Page Object code that enables the second version of the test

Here’s the obscure test example. What exactly it does is not particularly important. Just observe how hard the test code is to understand.

The obscure test

RSpec.describe "Grocery list", type: :system do
  let!(:banana) { create(:grocery_item) }

  describe "deleting a grocery item" do
    it "removes the item" do
      within ".grocery-item-#{banana.id}" do
        click_on "Delete"
      end

      expect(page).not_to have_content(banana.name)
    end
  end
end

This test is okay but it’s a little bit noisy. We shouldn’t have to care about the DOM details of how the Delete button for this grocery item is located.

Let’s see if we can improve this code with the help of a Page Object.

RSpec.describe "Grocery list", type: :system do
  let!(:banana) { create(:grocery_item) }
  let!(:grocery_list) { PageObjects::GroceryList.new }

  describe "deleting a grocery item" do
    it "removes the item" do
      grocery_list.click_delete(banana)
      expect(grocery_list).not_to have_item(banana)
    end
  end
end

Here’s the Page Object code that made this cleanup possible:

module PageObjects
  class GroceryList
    include Capybara::DSL

    def click_delete(item)
      within ".grocery-item-#{item.id}" do
        click_on "Delete"
      end
    end

    def has_item?(item)
      page.has_content?(item.name)
    end
  end
end

I find that Page Objects only start to become valuable beyond a certain threshold of complexity. I tend to see how far I can get with a test before I consider bringing a Page Object into the picture. It might be several weeks or months between the time I first write a test and the time I modify it to use a Page Object. But in the cases where a Page Object is useful, I find them to be a huge help in adding clarity to my tests.