Why use Factory Bot instead of creating test data manually?

by Jason Swett,

I recently taught a Rails testing class where we wrote tests using RSpec, Capybara, Factory Bot and Faker.

During the class, one of the students asked, why do we use Factory Bot? What’s the advantage over creating test data manually?

The answer to this is perhaps most easily explained with an example. I’m going to show an example of a test setup that creates data manually, then a test setup that uses Factory Bot, so you can see the difference.

In both examples, the test setup is for a test that needs two Customer records to exist. A Customer object has several attributes and a couple associations.

Non-Factory-Bot example

Here’s what it looks like when I create my two Customer records manually.

The upside to this approach is that there’s no extra tooling involved. It’s just Active Record. There are a few downsides though.

First, it’s wasteful and tedious to have to think of and type out all this fake data (e.g. “555-123-4567”).

Second, it’s going to be unclear to an outside reader what data is significant and what’s just arbitrary. For example, is it significant that John Smith lives in Minneapolis or could his address have been anywhere?

Third, all this test data adds noise to the test and makes it an Obscure Test. When there’s a bunch of test data in the test it makes it much harder to tell at a glance what the essence of the test is and what behavior it’s testing.

This example is tedious enough with only two associations on the Customer model (State and User). You can imagine how bad things might get when you have truly complex associations.

RSpec.describe Customer do
  before do
    @customers = Customer.create!([
      {
        first_name: 'John',
        last_name: 'Smith',
        phone_number: '555-123-4567',
        address_line_1: '123 Fake Street',
        city: 'Minneapolis',
        state: State.create!(name: 'Minnesota', abbreviation: 'MN'),
        zip_code: '55111',
        user: User.create!(
          email: 'john.smith@example.com',
          password: 'gdfkgfgasdf18233'
        )
      },
      {
        first_name: 'Kim',
        last_name: 'Jones',
        phone_number: '555-883-2283',
        address_line_1: '338 Notreal Ave',
        city: 'Chicago',
        state: State.create!(name: 'Illinois', abbreviation: 'IL'),
        zip_code: '60606',
        user: User.create!(
          email: 'kim.jones@example.com',
          password: 'eejkgsfg238231188'
        )
      }
    ])
  end
end

Factory Bot example

Here’s a version of the test setup that uses Factory Bot. It achieves the same result, the creation of two Customer records. The code for this version is obviously much more concise.

RSpec.describe Customer do
  before do
    @customers = FactoryBot.create_list(:customer, 2)
  end
end

This simple code is made possible through factory definitions. In this case there are three factory definitions: one for Customer, one for State and one for User.

In all three of the factory definitions I’m using an additional gem called Faker. Faker helps with the generation of things like random names, phone numbers, email addresses, etc.

Here are the three factory definitions.

FactoryBot.define do
  factory :customer do
    first_name { Faker::Lorem.characters(10) }
    last_name { Faker::Lorem.characters(10) }
    phone_number { Faker::PhoneNumber.cell_phone }
    address_line_1 { Faker::Lorem.characters(10) }
    city { Faker::Lorem.characters(10) }

    # This line will generate an associated State record
    # using the factory definition for State
    state

    # Same here, but for User
    user
  end
end

FactoryBot.define do
  factory :state do
    name { Faker::Lorem.characters(10) }
    abbreviation { Faker::Lorem.characters(10) }
  end
end

FactoryBot.define do
  factory :user do
    email { Faker::Internet.email }
    password { Faker::Internet.password }
  end
end

Takeaways

If you were wondering why exactly we use Factory Bot, the answer is that it makes our tests more convenient to write and more understandable to read. In addition to Factory Bot, the Faker gem can help take away some of the tedium of having to create test data values.

There’s also one other popular method of creating test data which is to use fixtures. Fixtures have the advantage of speeding up a test suite because they’re only loaded once at the beginning of the test suite run (as opposed to factories which are typically run once per test) but I prefer factories because I feel they make tests easier to understand. You can read more about fixtures vs. factories here.

12 thoughts on “Why use Factory Bot instead of creating test data manually?

  1. Peter Nagy

    I have some problems with Faker. It creates unstable tests when your code is vulnerable to some corner case values. These test will behave like a Heisenbug: randomly fails but because you cannot see what was the value Faker generated for the test it will be a pain in the back to find the vulnerable code part.

    Reply
    1. Carlos Schallenberger

      in this case i’d say that you should use a specific string who will be always valid in your factory.
      in above example, the field NAME couldn’t has the character H (just an example):

      factory :person do
      name { “James” }
      address { Faker.address }
      age { Faker.number(8) }
      end

      Does it make sense?

      Reply
  2. HT

    It’s important to use the right tool for the job. FactoryBot+Faker can make your tests exponentially slow to run if not use wisely. Sometimes Fixtures+ERB or a low-key homebaked Factory solution is the way to go.

    Reply
  3. Sam Livingston-Gray

    I ripped all of the Faker calls out of one of my test suites last year. We found that it was occasionally generating data that caused tests to fail (for illustration purposes: an extra “John Smith” in a test that creates three users, two of them named Smith, and then does something with all the Smiths). This was causing flaky specs in CI, which was bad enough. Even worse, every time I tried to use `rspec bisect` to narrow a specific failure down to a minimal reproduction case, bisect basically ran forever, because the problematic data was only generated on the Nth call to Faker, so the failure would only happen when a specific set of tests were run in a specific order with a specific PRNG seed.

    I replaced all of the Faker calls with FactoryBot sequences, and our test suite has been *much* more stable ever since.

    Reply
    1. Jason Swett Post author

      Very interesting. I haven’t experienced these pains myself yet but I can imagine how after the test suite gets to a certain size these sorts of issues would become more probable.

      Reply
  4. Youri van der Lans

    Instead defining the entire customer object within the test this can also be done with some fixtures as defaults. In your before filter you could update the fixture to your needs.

    Personally I feel factory bot is another DSL we have to learn and understand while we can also use the default provided by Rails.

    Reply
  5. Steven Chanin

    Nice write up. Thanks for taking the time to share.

    I noticed that the factory for State has a 10 character string for abbreviation:

    abbreviation { Faker::Lorem.characters(10) }

    I think you’d just want 2 characters

    abbreviation { Faker::Lorem.characters(2) }

    Reply
    1. Jason Swett Post author

      Thanks! The reason for 10 is that there are only 36^2 = 1296 possibilities for 10 characters. With that small of a set a collision is practically guaranteed, so I had to make the number bigger.

      Reply
  6. Julián

    Why not Faker::Name.name and Faker::Name.last_name for names instead of Faker::Lorem.characters(10)?

    Reply
  7. Thomas

    The only problem I sometimes encounter is that if a Model has business logic (especially with callbacks), you have to reload the model in your test in order to get the updated data. It feels tricky and unpredictable compared to a simple Rails model. I’d love a post on Factories limitations compared to simple AR objects

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *