Category Archives: Programming

Common causes of flickering/flapping/flaky tests

A flapping test is a test that sometimes passes and sometimes fails even though the application code being tested hasn’t changed. Flapping tests can be hard to reproduce, diagnose, and fix. Here are some of the common causes I know of for flapping tests. If you know the common causes of flapping tests it can go a long way toward diagnosis. Once diagnosis is out of the way, the battle is half over.

Race conditions

Let’s say you have a Capybara test that clicks a page element that fires off an AJAX request. The AJAX request completes, making some other page element clickable. Sometimes the AJAX request beats the test runner, meaning the test works fine. Sometimes the test runner beats the AJAX request, meaning the test breaks.

“Leaking” tests (non-deterministic test suite)

Sometimes a test will fail to clean up after itself and “leak” state into other tests. To take a contrived but obvious example, let’s say test A checks that the “list customers” page shows exactly 5 customers and test B verifies that a customer can be created successfully. If test B doesn’t clean up after itself, then running test B before test A will cause test A to fail because there are now 6 customers instead of 5. This exact issue is almost never an issue in Rails (because each test runs inside a transaction) but database interaction isn’t the only possible form of leaky tests.

When I’m fixing this sort of issue I like to a) determine an order of running the tests that always causes my flapping test to fail and then b) figure out what’s happening in the earlier test that makes the later test fail. In RSpec I like to take note of the seed number, then run rspec --seed <seed number> which will re-run my tests in the same order.

Reliance on third-party code

Sometimes tests unexpectedly fail because some behavior of third-party code has changed. I actually sometimes consciously choose to live with this category of flapping test. If an API I rely on goes down and my tests fail, the tests arguably should fail because my application is genuinely broken. But if my tests are flapping in a way that’s not legitimate, I’d probably change my test to use something like VCR instead of hitting the real API.

A Rails model test “hello world”

The following is an excerpt from my book, Rails Testing for Beginners.

What we’re going to do

What we’re going to do in this post is:

  1. Initialize a new Rails application
  2. Install RSpec using the rspec-rails gem
  3. Generate a User model
  4. Write a single test for that User model

The test we’ll write will be a trivial one. The user will have both a first and last name. On the User model we’ll define a full_name method that concatenates first and last name. Our test will verify that this method works properly. Let’s now begin the first step, initializing the Rails application.

Initializing the Rails application

Let’s initialize the application using the -T flag meaning “no test framework”. (We want RSpec in this case, not the Rails default of MiniTest.)

Now we can install RSpec.

Installing RSpec

To install RSpec, all we have to do is add rspec-rails to our Gemfile and then do a bundle install.

We’ve installed the RSpec gem but we still have to run rails g rspec:install which will create a couple configuration files for us.

Now we can move on to creating the User model.

Creating the User model

Let’s create a User model with just two attributes: first_name and last_name.

Now we can write our test.

The User test

Writing the test

Here’s the test code:

Let’s take a closer look at this test code.

Examining the test

Let’s focus for a moment on this part of the test code:

This code is saying:

  1. Instantiate a new User object with first name Abraham and last name Lincoln.
  2. Verify that when we call this User object’s full_name method, it returns the full name, Abraham Lincoln.

What are the its and describes all about? Basically, these “it blocks” and “describe blocks” are there to allows us to put arbitrary labels on chunks of code to make the test more human-readable. (This is not precisely true but it’s true enough for our purposes for now.) As an illustration, I’ll drastically change the structure of the test:

This test will pass just fine. Every it block needs to be nested inside a do block, but RSpec doesn’t care how deeply nested our test cases are or whether the descriptions we put on our tests (e.g. the vague and mysterious it 'does stuff') make any sense.

Running the test

If we run this test it will fail. It fails because the full_name method is not yet defined.

Let’s now write the full_name method.

Now the test passes.

Code smell: long, procedural background jobs (and how to fix it)

I’ve worked on a handful of Rails applications that have pretty decent test coverage in terms of model specs and feature specs but are lacking in tests for background jobs. It seems that developers often find background jobs more difficult to test than Active Record models—otherwise the test coverage would presumably be better in that area. I’ve found background jobs of 100+ lines in length with no tests, even though the same application’s Active Record models had pretty good test coverage.

I’m not sure why this is. My suspicion is that background job code is somehow viewed as “separate” from the application code. For some reason the background job code doesn’t need tests, and for some reason the background job code wouldn’t benefit from an object-oriented structure like the rest of the application.

Whatever the root cause of this issue, the fix is fortunately often pretty simple. We can apply the extract class refactoring technique to turn the long procedure background job into one or more neatly abstracted classes.

Just as I would when embarking on the risky task of refactoring any piece of untested legacy code, I would use characterization testing to help me make sure I’m preserving the original behavior of the background job code as I move it into its own class.

A Rails testing “hello world” using RSpec and Capybara

The following is an excerpt from my book, Rails Testing for Beginners.

What we’re going to do

One of the biggest challenges in getting started with testing is just that—the simple act of getting started. What I’d like is for you, dear reader, to get a small but meaningful “win” under your belt as early as possible. If you’ve never written a single test in a Rails application before, then by the end of this chapter, you will have written your first test.

Here’s what we’re going to do:

    • Initialize a plain-as-possible Rails application
    • Create exactly one static page that says “Hello, world!”
    • Write a single test (using RSpec and Capybara) that verifies that our static page in fact says “Hello, world!”

The goal here is just to walk through the motions of writing a test. The test itself is rather pointless. I would probably never write a test in real life that just verifies the content of a static page. But the goal is not to write a realistic test but to provide you with a mental “Lego brick” that you can combine with other mental Lego bricks later. The ins and outs of writing tests get complicated quick enough that I think it’s valuable to start with an example that’s almost absurdly simple.

The tools we’ll be using

For this exercise we’ll be using RSpec, anecdotally the most popular of the Rails testing frameworks, and Capybara, a library that enables browser interaction (clicking buttons, filling out forms, etc.) using Ruby. You may find some of the RSpec or Capybara syntax confusing or indecipherable. You also may well not understand where the RSpec stops and the Capybara starts. For the purposes of this “hello world” exercise, that’s okay. Our goal right now is not deep understanding but to begin putting one foot in front of the other.

Initializing the application

Run the “rails new” command

First we’ll initialize this Rails app using the good old rails new command.

You may be familiar with the -T flag. This flag means “no tests”. If we had done rails new hello_world without the -T flag, we would have gotten a Rails application with a test directory containing MiniTest tests. I want to use RSpec, so I want us to start with no MiniTest.

Let’s also create our application’s database at this point since we’ll have to do that eventually anyway.

Update our Gemfile

This is the step where RSpec and Capybara will start to come into the picture. Each library will be included in our application in the form of a gem. We’ll also include two other gems, selenium-webdriver and chromedriver-helper, which are necessary in order for Capybara to interact with the browser.

Let’s add the following to our Gemfile under the :development, :test group. I’ve added a comment next to each gem or group of gems describing its role. Even with the comments, it may not be abundantly clear at this moment what each gem is for. At the end of the exercise we’ll take a step back and talk about which library enabled which step of what we just did.

Don’t forget to bundle install.

Install RSpec

Although we’ve already installed the RSpec gem, we haven’t installed RSpec into our application. Just like the Devise gem, for example, which requires us not only to add devise to our Gemfile but to also run rails g devise:install, RSpec installation is a two-step process. After we run this command we’ll have a spec directory in our application containing a couple config files.

Now that we’ve gotten the “plumbing” work out of the way, let’s write some actual application code.

Creating our static page

Let’s generate a controller, HelloController, with just a single action, index.

Let’s modify the action’s template so it just says “Hello, world!”.

Just as a sanity check, let’s start the Rails server and open up our new page in the browser to make sure it works as expected.

We have our test infrastructure. We have the feature we’re going to test. Now let’s write the test itself.

Writing out test

Write the actual test

Let’s create a file called spec/hello_world_spec.rb with the following content:

If you focus on the “middle” two lines, the ones starting with visit and expect, you can see that this test takes the form of two steps: visit the hello world index path and verify that the page there says “Hello, world!” If you’d like to understand the test in more detail, below is an annotated version.

Now that we’ve written and broken down our test, let’s run it!

Run the test

This test can be run by simply typing rspec followed by the path to the test file.

The test should pass. There’s a problem with what we just did, though. How can we be sure that the test is testing what we think it’s testing? It’s possible that the test is passing because our code works. It’s also possible that the test is passing because we made a mistake in writing our test. This concern may seem remote but “false positives” happen a lot more than you might think. The only way to be sure our test is working is to see our test fail first.

Make the test fail

Why does seeing a test fail prior to passing give us confidence that the test works? What we’re doing is exercising two scenarios. In the first scenario, the application code is broken. In the second scenario, the application code is working correctly. If our test passes under that first scenario, the scenario where we know the application code is broken, then we know our test isn’t doing its job properly. A test should always fail when its feature is broken and always pass when its feature works. Let’s now break our “feature” by changing the text on our page to something wrong.

When we run our test now it should see it fail with an error message like expected to find text "Hello, world!" in "Jello, world!".

Watch the test run in the browser

Add the following to the bottom of spec/rails_helper.rb:

Then run the test again:

This time, Capybara should pop open a browser where you can see our test run. When I’m actively working with a test and I want to see exactly what it’s doing, I often find the running of the test to be too fast for me, so I’ll temporarily add a sleep wherever I want my test to slow down. In this case I would put the sleep right after visiting hello_index_path.


There we have it: the simplest possible Rails + RSpec + Capybara test. Unfortunately “the simplest possible Rails + RSpec + Capybara test” is still not particularly simple in absolute terms, but it’s pretty simple compared to everything that goes on in a real production application.

Who’s to blame for bad code? Coders.

When time pressure from managers drives coders to write poor-quality code, some people think the responsible party is the managers. I say it’s the coders. Here’s why I think so.

The ubiquity of time pressure in software development work

I’ve worked on a lot of development projects with a lot of development teams. In the vast majority of cases, if not all cases, the development team is perceived as being “behind”. Sometimes the development team’s progress is slower than the development team themselves set. Sometimes development progress is seen as behind relative to some arbitrary, unrealistic deadline set by someone outside the development team. Either way, no one ever comes by and says to the development team, “Wow, you guys are way ahead of where we expected!”

Often leadership will beg, plead, cajole, flatter, guilt-trip or bribe in order to increase the chances that their timeline will be met. They’ll ask the developers things like, “How can we ‘get creative’ to meet this timeline?” They’ll offer to have lunch, dinner, soda and alcohol brought in for a stretch of time so the developers can focus on nothing but the coding task at hand. Some leaders, lacking a sense of short-term/long-term thinking, will ask the developers to cut corners, to consciously incur technical debt in order to work faster. They might say things like “users don’t care about code” or “perfect is the enemy of the good“.

Caving to the pressure

Unfortunately, in my experience, most developers, most of the time, cave to the pressure to cut corners and deliver poor quality work. If you think I’m mistaken, I would ask you to show me a high-quality production codebase. There are very few in existence.

I myself am by no means innocent. I’ve caved to the pressure and delivered poor-quality work. But that doesn’t mean I think it’s excusable to do poor work or even (as some developers seem to believe) heroic.

Why bad code is the fault of coders, not managers

Why are coders, not managers, the ones responsible for bad code? Because coders are the ones who wrote the code. It was their fingers on the keyboard. The managers weren’t holding a gun to the developers’ heads and demanding they do a bad job.

“But,” a developer might say, “I had to cut corners in order to meet the deadline, and if I didn’t meet the deadline, I might get fired.” I don’t believe this reasoning stands up very well to scrutiny. I’d like to examine this invalid excuse for doing poor work as well as some others.

Invalid excuses for shipping poor work

Fallacy: cutting corners is faster

Why exactly is “bad code” bad? Bad code is bad precisely because it’s slow and expensive to work with. Saying, “I know this isn’t the right way to do it but it was faster” is a self-contradiction. The exact reason “the right way” is the right way is because it’s faster. Faster in the long run, of course. Sloppy, rushed work can be faster in the very short-term, of course, but any speed gain is usually negated the next time someone has to work with that area of code, which could be as soon as later that same day.

Fallacy: “they didn’t give me enough time to do a good job”

If I’m not given enough time to do a good job, the answer is not to do a bad job. The answer is that that thing can’t be done in that amount of time.

If my boss comes to me and says, “Can we do X in 6 weeks” and my response is “It would take 12 weeks to do X, but if we cut a lot of corners we could do it in 6 weeks” then all my boss hears is “WE CAN DO IT IN 6 WEEKS!”

Given the option between hitting a deadline with “bad code” and missing a deadline, managers will roughly always choose hitting the deadline with bad code. In this way managers are like children who ask over and over to eat candy for dinner. They can’t be trusted to take the pros and cons and make the right decision.

Fallacy: cutting corners is a business decision

It’s my boss’s role to tell me what to do and when to do it. It’s my job to, for the most part, comply. As a programmer I typically won’t have visibility into business decisions the way my boss does. As a programmer I will often not have the right skillset or level of information necessary to have a valid opinion on most business matters.

This idea that my boss can tell me what to do only applies down to a certain level of granularity though. I should let my boss tell me what to do but I shouldn’t let my boss tell me how to do it. I care about the professional reputation I have with my colleagues and with myself. I think there’s a certain minimum level of quality below which professionals should not be willing to go, even though it’s always an option to go lower.

Fallacy: “we can’t afford perfection”

I wish I had a dollar for every time I heard somebody say something like, “Guys, it doesn’t need to be perfect, it just needs to be good enough.”

The fallacy here is that the choice is between “perfect” and “good”. That’s a mistaken way of looking at it. The standard mode of operation during a rush period for developers is to pile sloppy Band-Aid fix on top of sloppy Band-Aid fix as if they desire nothing more strongly in life than to paint themselves into a corner from which they will never be able to escape. The alternative to this horror show isn’t “perfection”. The alternative is just what I might call a “minimally good job”.

Fallacy: “if I don’t comply with instructions to cut corners, I’ll get fired”

This might be true a very very very small percentage of the time. Most of the time it’s not, though. It’s pretty hard to get fired from a job. Firing someone is painful and expensive. Usually someone has to rack up many offenses before they finally get booted.

Fallacy: “we’ll clean it up after the deadline”

I find it funny that some people can make this claim with a straight face. Fixing sloppy work after the deadline “when we have more time” or “when things slow down” never happens. Deadlines, such as a go-live date, often translate to increased pressure and time scarcity because now there’s more traffic and visibility on the system. Deadlines are usually followed by more deadlines. You never get “more time”. Pressure rarely goes down over time. It usually goes up.

Fallacy: “I would have done a better job, but it wasn’t up to me”

Nope. I alone am responsible for the work I do. If someone pressured me to do the wrong thing and in response I chose to do the wrong thing, then I’m responsible for choosing to cave to the pressure.

Fallacy: “I haven’t been empowered to make things better”

It’s rare that a superior will come along and tell me, “Jason, you explicitly have permission to make things better around here.” The reality is that it’s up to me to observe what needs to be done better and help make it happen. I’m a believer in the saying that “it’s easier to get forgiveness than permission”. I tend to do what my professional judgment tells me needs to be done without necessarily asking my boss if it’s okay. Does this get me in trouble sometimes? Yes. I look at those hand-slaps as the cost of self-empowerment.

Plus, all employees have power over their superiors in that they have the power to leave. I don’t mean this is a “let me have my way or I quit!” way. I mean that if I work somewhere that continually tries to force me to do a bad job, I don’t have to just accept it as my fate in life. I can go work somewhere else. That doesn’t fix anything for the organization I’m leaving (bad organizations and bad people can’t be fixed anyway) but it does fix the problem for me.

My plea to my colleagues: don’t make excuses for poor work

I think developers make too many excuses and apologies for bad code. (And as a reminder, “bad code” means code that’s risky, time-consuming and expensive to work with.)

It’s not realistic to do a great job all the time, but I think we can at least do a little better. Right now I think we as a profession err too much on the side of justifying poor-quality work. I think we could stand to err more on the side of taking responsibility.

Sometimes it’s better for tests to hit the real API

A mysteriously broken test suite

The other day I ran my test suite and discovered that nine of my tests were suddenly and mysteriously broken. These tests were all related to the same feature: a physician search field that hits something called the NPPES NPI Registry, a public API.

My first thought was: is something wrong with the API? I visited the NPI Registry site and, sure enough, I got a 500 error when I clicked the link to their API URL. The API appeared to be broken. This was a Sunday so I didn’t necessarily the NPI API maintainers to fix it speedily. I decided to just ignore these nine specific tests for the time being.

By mid-Monday, to my surprise, my tests were still broken and the NPI API was still giving a 500 error. I would have expected them to have fixed it by then.

The surprising root cause of the bug

On Tuesday the API was still broken. On Wednesday the API was still broken. At this point I thought okay, maybe it’s not the case that the NPI API has just been down for four straight days with the NPI API maintainers not doing anything about it. Maybe the problem lies on my end. I decided to do some investigation.

Sure enough, the problem was on my end. Kind of. Evidently the NPI API maintainers had made a small tweak to how the API works. In addition to the fields I had already been sending like first_name and last_name, the NPI API now required an additional field, address_purpose. Without address_purpose, the NPI API would return a 500 error. I added address_purpose to my request and all my tests started passing again. And, of course, my application’s feature started working again.

The lesson: my tests were right to hit the real API

When I first discovered that nine of my tests were failing due to a broken external API, my first thought was, “Man, maybe I should mock out those API calls so my tests don’t fail when the API breaks.” But then I thought about it a little harder. The fact that the third-party API was broken meant my application was broken. The tests were failing because they ought to have been failing.

It’s been my observation that developers new to testing often develop the belief that “third-party API calls should be mocked out”. I don’t think this is always true. I think the decision whether to mock third-party APIs should be made on a case-by-case basis depending on certain factors. These factors include, but aren’t limited to:

  • Would allowing my tests to hit the real API mess up production data?
  • Is the API rate-limited?
  • Is the API prone to sluggishness or outages?

(Thanks to Kostis Kapelonis, who I interviewed on The Ruby Testing Podcast, for helping me think about API interaction this way.)

If hitting the real API poses no danger to production data, I would probably be more inclined to let my tests hit the real API than to mock out the responses. (Or, to be more precise, I would want my test suite to include some integration tests that hit the real API, even if I might have some other API-related tests that mock out API responses for economy of speed.) My application is only really working if the API is working too.

Refactorings should be atomic

It’s often seen as wise not to try to make huge, sweeping improvements to a codebase all in a go. I couldn’t agree more. Such changes carry way too much risk.

An alternative way of approaching codebase improvements is to apply the Boy Scout rule, “always leave the campground a little cleaner than you found it”. I think this rule is a pretty good one and appropriate under some circumstances, although I think some caution is in order as to how exactly to interpret the rule. The main danger of this rule is that if interpreted the wrong way, it discourages atomicity.

Tiny atomic refactorings

I have a mantra I repeat to myself regularly: “never underestimate your own ability to screw stuff up”. Seemingly safe, innocuous, inconsequential one-line changes have an amazing ability to introduce surprising bugs.

For this reason, I virtually never mix refactorings or cleanups in with my feature work or bugfixes, even very tiny ones, unless the thing I’m cleaning up is directly in the path of the work I need to do to complete my feature or bugfix. Let’s say for example that I’m adding a new capability to the sales report. I don’t want to toss in a small cleanup and accidentally break the password reset feature.

If I discover a tiny improvement I want to make, e.g. replacing a single instance of { |a| a.user } to, I’ll make a separate commit just with that tiny improvement instead of letting it become a “rider” on an unrelated commit. This is basically just an application of the idea of atomic commits.

Bigger atomic refactorings

Sometimes a refactoring isn’t all that small. For example, sometimes you want to take a handful of classes and move them into a new namespace to give some structure to the files in your application.

In these cases I’ll try to create one single commit that effects the whole entire refactoring. This way, if I get 75% done with the refactoring and realize my whole premise is a blunder, I can just git checkout . && git clean -df and cleanly go back to the way things were before I started my refactoring work.

When to perform refactorings

In my experience there are two good times to perform a refactoring. If there’s an area of the code that I need to do some work in but it’s messy, I’ll perform a refactoring (or multiple refactorings) in that area of code as a separate piece of work before I start making my change in that area. I think of this as “cleaning the kitchen before I make dinner”. It’s faster and more enjoyable to make dinner in a clean kitchen than in a messy one.

One good time to perform a refactor is before a change. The other is after a change. If I make a change to an area of code and I find that afterward that area is unacceptably messy, I’ll perform a refactoring—again, as a separate piece of work—to clean it up. By the way, it’s possible to end up with a messy area of a codebase by simply writing poor-quality code, but that’s not the only way to introduce bad code. The accumulation of technical debt is natural and unavoidable and in fact the only way not to accumulate technical debt is to periodically perform refactorings.

The one time that I don’t think is a good time to perform refactorings is in the middle of a change. If the refactoring introduces a bug but the refactoring change is mixed with a change that you want to keep, it can be hard to back out the refactoring change without also affecting the “good” change. It’s also harder to review a pull request that contains both a change and a refactoring than it is to review a pull request that only contains a change. In order to have refactorings as a separate pull request, this means that the development team (and perhaps technical leadership) will have to be on board with the idea that you’ll regularly have PRs that contain nothing but a refactoring.

By the way, credit goes to Ben Orenstein (who I interviewed on my podcast) for teaching me that just before a change and just after a change are good times to do refactorings.

When refactorings can’t be atomic

Sometimes a refactoring is too big to be atomic. Sometimes I embark upon a “refactoring project” that will touch many areas of the codebase over the course of multiple weeks or months. In these cases I deliberately spread the refactoring project into small pieces over time as a way of mitigating risk. I’ve found though that most big refactorings, while not atomic as a whole, can be split into a number of smaller refactorings that are themselves individually atomic.

How I write characterization tests

What characterization tests are and what problem they solve

Let’s say I come across a piece of code that I want to refactor but unfortunately a) I don’t understand what it does and b) it’s not covered by tests. Refactoring untested code I don’t understand that’s not tested is risky business. But how can I write tests for the code if I don’t even know what it’s supposed to do?

Luckily there’s a technique called Characterization Testing, which I first discovered from Michael Feathers’ book, which makes solving this challenge pretty straightforward and easy. I’m going to show you how I do it.

What I’m going to show you

Below is a chunk of code I found in an old project of mine. The two methods below, services_with_info and products_with_info have a couple problems. One is that they’re very duplicative of each other. Another is that the Appointment class, which clocks in at 708 lines of code (most of which I’ve omitted out of mercy for the reader), is doing way two much stuff, and these two methods aren’t helping. These two methods should probably really belong to some other abstraction so Appointment can focus on being an Appointment and not have to worry about serializing its appointment_services and appointment_products.

Refactoring the services_with_info method

Of these two methods I’m going to focus first on services_with_info. My goal is to take the body of the method, ask myself “What currently-nonexistent abstraction does this group of lines represent?”, create that abstraction, and move the body of the method there.

I might come up with an abstraction that I’m happy with on the first try, I might not. The first abstraction idea I’ll try is a new abstraction called AppointmentServiceCollection. Below is the class. For the most part all I’ve done is copy and paste the contents of services_with_info into this new class.

First goal: simply instantiate the object inside a test

When I start writing my test for this new class I’m going to put in as little effort as possible. My first question to myself will be: can I even instantiate an AppointmentServiceCollection?

Second goal: see what value the method-under-test returns

When I run the test it passes. So far so good. My next step will be to add a very silly assertion.

I of course know that collection.to_h won’t equal asdf, but I don’t know what it will equal, so any value I check for will be equally wrong. No point in exerting any mental effort guessing what the output will be. The test result will tell me soon enough anyway.

When I run my test now, here’s what I get:

That’s unexpected. I could scratch my head and puzzle over why this is happening, but instead I’ll just comment out code until the error goes away.

When I run the test again I get the same error again. I’ll comment out more code.

Now, not too surprisingly, I get a different error instead:

This makes a lightbulb go on for me. Clearly the problem lies in the #@appointment_services.to_h { |item| line. The problem must be the to_h. What if I try map instead?

That works. Instead of getting the “wrong element type” error I was getting before when the @appointment_services I’m now getting a different error that’s more in line with my expectations:

Now let me uncomment the full body of the to_h method and see what happens.

Now I get a different error. It doesn’t like my use of number_with_precision.

Luckily I happen to know exactly how to fix this. In order to use the number_with_precision helper in a model (as opposed to in a view), you need to include the ActionView::Helpers::NumberHelper module.

The return values, revealed

With this error fixed, I now get my most interesting test result yet:

The reason I say this result is interesting is because instead of an error I get the actual value that the method is returning. The secrets of the universe are being revealed to me.

More realistic assertions

At this point I’m able to come up with some more realistic values to put in my test. I know that my test will fail because my values are just arbitrary and made up, but I feel confident that my values are pretty close.

I run my test just for fun and it of course fails. My next step is to make the test setup match my assertion values.

What I do after I get the test passing

The test now passes. So far I’ve touched my application code as little as possible because I wanted to maximize the chances of preserving the original functionality. Now that I have a test in place, my test can do the job of preserving the original functionality, and I’m free to clean up and refactor the code.

I’ll also want to clean up my test code.

Lastly, I’ll replace the original services_with_info method body with my new abstraction. Here’s the original code.

Here’s the new version which uses AppointmentServiceCollection.

What we’ve accomplished

This is only a drop in the bucket but it helps. “A journey of a thousand miles begins with a single step,” as they say. This area of the code is now slightly more understandable. Next I could give the products_with_info the same treatment. Perhaps I could extract the duplication between the two methods into a superclass. If I apply this tactic to the Appointment class over and over, I could probably whittle it down from its current 708 lines to the more reasonable size of perhaps a few tens of lines.

How to clear up obscure Rails tests using Page Objects

The challenge of keeping test code clean

The hardest part of a programmer’s job isn’t usually figuring out super hard technical problems. The biggest challenge for most developers, in my experience, is to write code that can stand up over time without collapsing under the weight of its own complexity.

Just as it’s challenging to keep a clean and understandable codebase, it’s also challenging to keep a clean and understandable test suite, and for the same exact reasons.

If I look at a test file for the first time and I’m immediately able to grasp what the test is doing and why it’s doing it, then I have a clear test. The test has a high signal-to-noise ratio. That’s good.

The opposite scenario is when the test is full of noise. Perhaps the test contains so many low-level details that the high-level meaning of the test is obscured. The term for this condition, this “test smell”, is Obscure Test. That’s bad. If the test code is obscure when it could have been clear, that creates extra head-scratching time.

One tool that can be used to help make Obscure Tests more understandable is the concept of a Page Object.

What a Page Object is and when it’s helpful

As a preface: Page Objects are only relevant when dealing with end-to-end tests—that is, tests that interact with the browser. Page Objects don’t come into the picture for model tests or anything else that doesn’t interact with a browser.

A Page Object is an abstraction of a component of a web page. For example, I might create a Page Object that represents the sign-in form for a web application. (In fact, virtually all my Page Objects represent forms, since that where I find them to be most helpful.)

The idea is that instead of using low-level, “computerey” commands to interact with a part of a page, a Page Object allows me to use higher-level language that’s more easily understandable by humans. Just as Capybara is an abstraction on top of Selenium that’s clearer for a programmer to read and write than using Selenium directly, Page Objects can be an abstraction layer on top of Capybara commands that’s clearer for a programmer to read and write than using Capybara directly (at least that’s arguably the case, and only in certain situations, and when done intelligently).

Last note on what a Page Object is and isn’t: I personally started with the misconception that a Page Object is an abstraction of a page. According to Martin Fowler, that’s not the idea. A Page Object is an abstraction of a certain part of a page. There’s of course no law saying you couldn’t create an abstraction that represents a whole page instead of a component on a page, but having done it both ways, I’ve found it more useful to create Page Objects as abstractions of components rather than of entire pages. I find that my Page Objects come together more neatly with the rest of my test code when that’s the case.

A Page Object example

I’m going to show you an example of a Page Object by showing you the following:

  1. A somewhat obscure test
  2. A second, cleaner version of the same test, using a Page Object
  3. The Page Object code that enables the second version of the test

Here’s the obscure test example. What exactly it does is not particularly important. Just observe how hard the test code is to understand.

The obscure test

I find this test pretty hard to follow, and I’m the one who wrote the code. Why are we doing click_on 'OCT' twice in a row? Is that necessary or is that a mistake? (After I re-familiarized myself with the UI that this test exercises, I found that click_on 'OCT' is in fact necessary. The first click_on 'OCT' selects OCT from a menu of options, and as a result a new OCT button is added to the page. The second click_on 'OCT' clicks the new OCT button.)

Let’s see if we can improve this code with the help of a Page Object.

I dare say that that’s much better. Even without understanding the domain-specific terms in this test, you can see now that we’re 1) adding a module to something called a “chart appointment”, then we’re 2) saving some module data, 3) saving some different module data, then finally 4) expecting that the most recent module data is what actually got persisted.

Here’s the Page Object code that made this cleanup possible:

I find that Page Objects only start to become valuable beyond a certain threshold of complexity. I tend to see how far I can get with a test before I consider bringing a Page Object into the picture. It might be several weeks or months between the time I first write a test and the time I modify it to use a Page Object. But in the cases where a Page Object is useful, I find them to be a huge help in adding clarity to my tests.

RSpec mocks and stubs in plain English

One of the most common questions I see from beginners to Rails testing is what mocks and stubs are and when to use them. If you’re confused about mocks and stubs, you’re not alone. In my experience very few people understand them. In this post I’ll attempt to help clarify the matter, particularly in the context of Rails/RSpec.

I want to preface my explanation with a disclaimer. This post is partly derived from personal experience but it’s mostly derived from me just reading a bunch of posts online and reading a few books to try to understand mocks and stubs better. It’s entirely possible that what follows is not 100% accurate. But I’m risking the inaccuracy because I think the internet needs a clear, simple explanation of mocks and stubs and so far I haven’t been able to find one.

Here’s what we’re going to cover in this post:

The problem with testing terminology

The field of automated testing has about a million different terms and there’s not a complete consensus on what they all mean. For example, are end-to-end tests and acceptance tests the same thing? Some people would say yes, some would say no, and there’s no central authority to say who’s right and who’s wrong. That’s a problem with testing terminology in general.

A problem with mocks and stubs in particular is that programmers are often sloppy with the language. People say mock when they mean stub and vice versa.

The result of these two issues is that many explanations of mocks and stubs are very very very confusing.

Test doubles

I had a lightbulb moment when I read in Gerard Meszaros’ xUnit Test Patterns that mocks and stubs are each special types of test doubles.

To me this was a valuable piece of truth. Mocks and stubs are both types of test doubles. I can understand that. That’s a piece of knowledge that’s not likely to be invalidated by something else I’ll read later. My understanding has permanently advanced a little bit.

What’s a test double? The book xUnit Test Patterns (which I understand actually coined the term “test double”) likens a test double to a Hollywood stunt double. From the book’s online explanation of test doubles:

When the movie industry wants to film something that is potentially risky or dangerous for the leading actor to carry out, they hire a “stunt double” to take the place of the actor in the scene. The stunt double is a highly trained individual who is capable of meeting the specific requirements of the scene. They may not be able to act, but they know how to fall from great heights, crash a car, or whatever the scene calls for. How closely the stunt double needs to resemble the actor depends on the nature of the scene. Usually, things can be arranged such that someone who vaguely resembles the actor in stature can take their place.

The example I use later in this post is the example of a payment gateway. When we’re testing some code that interacts with a payment gateway, we of course don’t want our test code to actually hit e.g. the production Stripe API and charge people’s credit cards. We want to use a test double in place of the real payment gateway. The production Stripe API is like Tom Cruise in Mission Impossible, the payment gateway test double is of course like the stunt double.

Stub explanation

The best explanation of mocks and stubs I’ve been able to find online is this post by a guy named Michal Lipski.

I took his explanation, mixed it in my brain with other stuff I’ve read, and came up with this explanation:

A Test Stub is a fake object that’s used in place of a real object for the purpose of getting the program to behave the way we need it to in order to test it. A big part of a Test Stub’s job is to return pre-specified hard-coded values in response to method calls.

You can visit the xUnit Patterns Test Stub page for a more detailed and precise explanation. My explanation might not be 100% on the mark. I’m going for clarity over complete precision.

When would you want to use a test stub? We’ll see in my example code shortly.

Mock explanation

A Mock Object is a fake object that’s used in place of a real object for the purpose of listening to the methods called on the mock object. The main job of a Mock Object is to ensure that the right methods get called on it.

Again, for a more precise (but harder to understand) explanation, you can check out the xUnit Patterns Mock Object page.

We’ll also see a mock object use case in my example code.

The difference between mocks and stubs

As I understand it, and to paint with a very broad brush, Test Stubs help with inputs and Mock Objects help with outputs. A Test Stub is a fake thing you stick in there to trick your program into working properly under test. A Mock Object is a fake thing you stick in there to spy on your program in the cases where you’re not able to test something directly.

Again, I’m going for conciseness and clarity over 100% accuracy here. Now let’s take a look at a concrete example.

Example application code

Below is an example Ruby program I wrote. I tried to write it to meet the following conditions:

  • It’s as small and simple as possible
  • It would actually benefit from the use of one mock and one stub

My program involves three classes:

  • Payment: meant to simulate an ActiveRecord model
  • PaymentGateway: simulates a third-party payment gateway (e.g. Stripe)
  • Logger: logs payments

The code snippet below includes the program’s three classes as well as a single test case for the Payment class.

Our test has two problems.

One is that it hits the real production API and alters production data. (My code doesn’t really do this; you can pretend that my puts statement hits a real payment gateway and charges somebody’s credit card.)

The other problem is that the test doesn’t verify that the payment gets logged. We could comment out the @logger.record_payment(response[:payment_id]) line and the test would still pass.

This is what we see when we run the test:

Let’s first address the problem of altering production data. We can do this by replacing PaymentGateway with a special kind of test double, a stub.

Stub example

Below I’ve replaced payment_gateway = with payment_gateway = double(). I’m also telling my new Test Double object (that is, my Test Stub) that it should expect to receive a charge method call, and when it does, return a payment id of 1234.

If we run this test now, we can see that we no longer get the jarring “THIS HITS THE PRODUCTION API” message. This is because our test no longer calls the charge method on an instance of PaymentGateway, it calls the charge method on a test double.

Our payment_gateway variable is no longer actually an instance of PaymentGateway, it’s an instance of RSpec::Mocks::Double.

That takes care of the hitting-production problem. What about verifying the logging of the payment? This is a job for a different kind of test double, a mock object (or just mock).

Mock example

Now let’s replace with logger = double(). Notice how RSpec doesn’t make a distinction between mocks and stubs. They’re all just Test Doubles. If we want to use a Test Double as a mock or as a stub, RSpec leaves that up to us and doesn’t care.

We’re also telling our new Mock Object that it needs (not just can, but has to, and it will raise an exception if not) receive a record_payment method call with the value 1234.

Now that we have the line that says expect(logger).to receive(:record_payment).with(1234), our test is asserting that the payment gets logged. We can verify this by commenting out the @logger.record_payment(response[:payment_id]) and running our test again. We get the following error:

Why I don’t often use mocks or stubs in Rails

Having said all this, I personally hardly ever use mocks or stubs in Rails. I can count on one hand all the times I’ve used mocks or stubs over my eight years so far of doing Rails.

The main reason is that I just don’t often have the problems that test doubles solve. In Rails we don’t really do true unit tests. For better or worse, I’ve never encountered a Rails developer, myself included, who truly wants to test an object completely in isolation from all other objects. Instead we write model tests that hit the database and have full access to every single other object in the application. Whether this is good or bad can be debated but one thing seems clear to me: this style of testing eliminates the need for test doubles the vast majority of the time.

Another reason is that I find that tests written using test doubles are often basically just a restatement of the implementation of whatever’s being tested. The code says @logger.record_payment and the test says expect(logger).to receive(:record_payment). Okay, what did we really accomplish? We’re not testing the result, we’re testing the implementation. That’s probably better than nothing but if possible I’d rather just test the result, and quite often there’s nothing stopping me from testing the result instead of the implementation.

Lastly, I personally haven’t found myself working on a lot of projects that use some external resource like a payment gateway API that would make test doubles useful. If I were to work on such a project I imagine I certainly would make use of test doubles, I just haven’t worked on that type of project very much.