Author Archives: Jason Swett

The difference between let, let! and instance variables in RSpec

The purpose of let and the differences between let and instance variables

RSpec’s let helper method is a way of defining values that are used in tests. Below is a typical example.

require 'rspec'

RSpec.describe User do
  let(:user) { User.new }

  it 'does not have an id when first instantiated' do
    expect(user.id).to be nil
  end
end

Another common way of setting values is to use instance variables in a before block like in the following example.

require 'rspec'

RSpec.describe User do
  before { @user = User.new }

  it 'does not have an id when first instantiated' do
    expect(@user.id).to be nil
  end
end

There are some differences between the let approach and the instance variable approach, with one in particular that’s quite significant.

Differences between let and instance variables

First, there’s the stylistic difference. The syntax is of course a little different between the two approaches. Instance variables are of course prefixed with @. Some people might prefer one syntax over the other. I personally find the let syntax ever so slightly tidier.

There are also a couple mechanical differences. Because of how instance variables work in Ruby, you can use an undefined instance variable and Ruby won’t complain. This presents a slight danger. You could for example accidentally pass some undefined instance variable to a method, meaning you’d really be passing nil as the argument. This means you might be testing something other than the behavior you meant to test. This danger is admittedly remote though. Nonetheless, the let helper defined not an instance variable but a new method (specifically, a memoized method—we’ll see more on this shortly), meaning that if you typo your method’s name, Ruby will most certainly complain, which is of course good.

The other mechanical difference is that let can create values that get evaluated lazily. I personally find this to be a dangerous and bad idea, which I’ll explain below, but it is a capability that the helper offers.

Perhaps the most important difference between let and instance variables is that instance variables, when set in a before block, can leak from one file to another. If for example an instance variable called @customer is set in “File A”, then “File B” can reference @customer and get the value that was set in File A. Obviously this is bad because we want our tests to be completely deterministic and independent of one another.

How let works and the difference between let and let!

How let works

I used to assume that let simply defines a new variable for me to use. Upon closer inspection, I learned that let is a method that returns a method. More specifically, let returns a memoized method, a method that only gets run once.

Since that’s perhaps kind of mind-bending, let’s take a closer look at what exactly this means.

An example method

Consider this method that 1) prints something and then 2) returns a value.

def my_name
  puts 'thinking about what my name is...'
  'Jason Swett'
end

puts my_name

When we run puts my_name, we see the string that gets printed (puts 'thinking about what my name is...') followed by the value that gets returned by the method (Jason Swett).

$ ruby my_name.rb
thinking about what my name is...
Jason Swett

Now let’s take a look at some let syntax that will create the same method.

require 'rspec'

describe 'my_name' do
  let(:my_name) do
    puts 'thinking about what my name is...'
    'Jason Swett'
  end

  it 'returns my name' do
    puts my_name
  end
end

When we run this test file and invoke the my_name method, the same exact thing happens: the method `puts`es some text and returns my name.

$ rspec my_name_spec.rb
thinking about what my name is...
Jason Swett
.

Finished in 0.00193 seconds (files took 0.08757 seconds to load)
1 example, 0 failures

Just to make it blatantly obvious and to prove that my_name is indeed a method call and not a variable reference, here’s a version of this file with parentheses after the method call.

require 'rspec'

describe 'my_name' do
  let(:my_name) do
    puts 'thinking about what my name is...'
    'Jason Swett'
  end

  it 'returns my name' do
    puts my_name() # this explicitly shows that my_name() is a method call
  end
end

Memoization

Here’s a version of the test that calls my_name twice. Even though the method gets called twice, it only actually gets evaluated once.

require 'rspec'

describe 'my_name' do
  let(:my_name) do
    puts 'thinking about what my name is...'
    'Jason Swett'
  end

  it 'returns my name' do
    puts my_name
    puts my_name
  end
end

If we run this test, we can see that the return value of my_name gets printed twice and the thinking about what my name is... part only gets printed once.

$ rspec my_name_spec.rb
thinking about what my name is...
Jason Swett
Jason Swett
.

Finished in 0.002 seconds (files took 0.08838 seconds to load)
1 example, 0 failures

The lazy evaluation of let vs. the immediate evaluation of let!

When we use let, the code inside our block gets evaluated lazily. In other words, none of the code inside the block gets evaluated until we actually call the method created by our let block.

Take a look at the following example.

require 'rspec'

describe 'let' do
  let(:message) do
    puts 'let block is running'
    'VALUE'
  end

  it 'does stuff' do
    puts 'start of example'
    puts message
    puts 'end of example'
  end
end

When we run this, we’ll see start of example first because the code inside our let block doesn’t get evaluated until we call the message method.

$ rspec let_example_spec.rb
start of example
let block is running
VALUE
end of example
.

Finished in 0.00233 seconds (files took 0.09836 seconds to load)
1 example, 0 failures

The “bang” version of let, let!, evaluates the contents of our block immediately, without waiting for the method to get called.

require 'rspec'

describe 'let!' do
  let!(:message) do
    puts 'let block is running'
    'VALUE'
  end

  it 'does stuff' do
    puts 'start of example'
    puts message
    puts 'end of example'
  end
end

When we run this version, we see let block is running appearing before start of example.

$ rspec let_example_spec.rb 
let block is running
start of example
VALUE
end of example
.

Finished in 0.00224 seconds (files took 0.09131 seconds to load)
1 example, 0 failures

I always use let! instead of let. I’ve never encountered a situation where the lazily-evaluated version would be helpful but I have encountered situations where the lazily-evaluated version would be subtly confusing (e.g. a let block is saving a record to the database but it’s not abundantly clear exactly at what point in the execution sequence the record gets saved). Perhaps there’s some performance benefit to allowing the lazy evaluation but in most cases it’s probably negligible. Confusion is often more expensive than slowness anyway.

Takeaways

  • The biggest advantage to using let over instance variables is that instance variables can leak from test to test, which isn’t true of let.
  • The difference between let and let! is that the former is lazily evaluated while the latter is immediately evaluated.
  • I always use the let! version because I find the execution path to be more easily understandable.

When to refactor

As a system grows, constant refactoring is needed in order to keep the code understandable.

There are four possible times to perform any piece of refactoring.

  1. During a change
  2. Independently of changes
  3. After a change
  4. Before a change

Some of these times to perform refactorings are good and some are less good. Let’s evaluate each of these times.

Evaluating the various times to refactor

Refactoring during a change

Of the four possible times to refactor, refactoring during a change is the worst time to do it. The reason is because when you look back at the changes you made, you have no easy way to see what’s a behavior change and what’s a refactoring. If you introduce a bug during your work, you won’t know if the bug came from the behavior change or a mistake in your refactoring. It also makes pull requests harder to review.

Mixing refactoring with behavior changes also creates problems around version control. If you discover later that your commit introduced a bug and you need to roll back the commit, you’ll be forced to roll back both the refactoring and the behavior change, even if only one of the two is problematic and the other one is fine. Mixing refactoring with behavior changes also makes it hard to pinpoint bugs using git bisect.

Independently of changes

Performing refactoring independently of changes is an okay time to do it but it’s not my favorite. Bad code only costs something if and when that code needs to be worked with. Having bad code sitting in your codebase is kind of like having a dull saw sitting in your garage. Sharp saws are better than dull saws, but dull saws don’t cause problems until you actually have to use them. Better in my opinion to sharpen a saw either right before or right after you use it, since that way you can be more sure your sharpening is worthwhile because you’re sharpening a saw that you know gets used.

After a change

Refactoring code right after a change is a great time to do it. For anything but the smallest changes, it’s basically impossible to write good code on the first try. Sometimes a change triggers small, local refactorings. Sometimes a change alters the story of the codebase enough that a broad refactoring scattered across the whole application is called for.

One downside to refactoring after a change is that it can be hard to tell exactly what needs refactoring. If you’re refactoring something you wrote five minutes ago, then you’re not a very good simulation of a future maintainer who has to look at your code for the first time and try to understand it. For this reason I don’t refactor my code to an obsessive degree when I’m performing refactorings right after a change because I know that my idea of what’s “best” is likely to be wrong at this point in time. I’ll probably have a better idea of how to clearly express the code after I’ve had some time to forget what it does so that the gaps between what the code does and how well the code explains what it does are more obvious.

Before a change

Refactoring before a change is also a great time to do it. As mentioned above, it’s often better to refactor a piece of code you don’t understand than a piece of code you do understand, because if you don’t understand the code, then you have a stronger need for the code to clearly explain to you what it does. If you understand a piece of code throughly then the code might seem clear enough even when it’s not.

Another reason that refactoring before a change is a good idea is that you know for sure that your refactoring is going to be helpful. In fact, you may well discover that the change you want to make is basically impossible without some refactoring first.

A metaphor to remember this by

The Boy Scout Rule, “leave the campground cleaner than you found it”, is a popular rule among programmers, but I think the Boy Scout Rule too easily misinterpreted. In what way exactly are you supposed to leave the campground cleaner than you found it?

I prefer a kitchen metaphor. I think “clean the kitchen before you make dinner” and “clean the kitchen after you make dinner” make pretty good metaphors for refactoring before and after a change. Obviously it’s more annoying and time-consuming to try to cook dinner in a kitchen full of dirty dishes than a clean one. And if you clean the kitchen after you make dinner, it just makes it that much faster and easier to make dinner next time because there will be less cleaning.

Takeaways

Don’t mix refactoring with behavior changes. The best time to do refactoring is either right before or right after a change.

Understanding Ruby blocks

Blocks are a fundamental concept in Ruby. Many common Ruby methods use blocks. Blocks are also an integral part of many domain-specific languages (DSLs) in libraries like RSpec, Factory Bot, and Rails itself.

In this post we’ll discuss what a block is. Then we’ll take a look at four different native Ruby methods that take blocks (times, each, map and tap) in order to better understand what use cases blocks are good for.

Lastly, we’ll see how to define our own custom method that takes a block.

What a block is

Virtually all languages have a way for functions to take arguments. You pass data into a function and then the function does something with that data.

A block takes that idea to a new level. A block is a way of passing behavior rather than data to a method. The examples that follow will illustrate exactly what is meant by this.

Native Ruby methods that take blocks

Here are four native Ruby methods that take blocks. For each one I’ll give a description of the method, show an example of the method being used, and then show the output that that example would generate.

Remember that blocks are a way to pass behavior rather than data into methods. In each description, I’ll use the phrase “Behavior X” to describe the behavior that might be passed to the method.

Method: times

Description: “However many times I specify, repeat Behavior X.”

Example: three times, print the text “hello”. (Behavior X is printing “hello”.)

3.times do
  puts "hello"
end

Output:

hello
hello
hello

Method: each

“Take this array. For each element in the array, execute Behavior X.”

Example: iterate over an array containing three elements and print each element. (Behavior X is printing the element.)

[1, 2, 3].each do |n|
  puts n
end

Output:

1
2
3

Method: map

“Take this array. For each element in the array, execute Behavior X, append the return value of X to a new array, and then after all the iterations are complete, return the newly-created array.”

Example: iterate over an array and square each element. (Behavior X is squaring the element.)

squares = [1, 2, 3].map do |n|
  n * n
end

puts squares.join(",")

Output:

1,4,9

Method: tap

“See this value? Perform Behavior X and then return that value.”

Example: initialize a file, write some content to it, then return the original file. (Behavior X is writing to the file.)

require "tempfile"

file = Tempfile.new.tap do |f|
  f.write("hello world")
  f.rewind
end

puts file.read

Output:

hello world

Now let’s look at how we can write our own method that can take a block.

Custom methods that take blocks

An HTML generator

Here’s a method which we can give an HTML tag as well as a piece of behavior. The method will execute our behavior. Before and after the behavior will be the opening and closing HTML tags.

inside_tag("p") do
  puts "Hello"
  puts "How are you?"
end

The output of this code looks like this.

<p>
Hello
How are you?
</p>

In this example, the “Behavior X” that we’re passing to our method is printing the text “Hello” and then “How are you?”.

The method definition

Here’s what the definition of such a method might look like.

def inside_tag(tag, &block)
  puts "<#{tag}>"  # output the opening tag
  block.call       # call the block that we were passed
  puts "</#{tag}>" # output the closing tag
end

Adding an argument to the block

Blocks can get more interesting when add arguments.

In the below example, the inside_tag block now passes an instance of Tag back to the block, allowing the behavior in the block to call tag.content rather than just puts. This allows our content to be indented.

class Tag
  def content(value)
    puts "  #{value}"
  end
end

def inside_tag(tag, &block)
  puts "<#{tag}>"
  block.call(Tag.new)
  puts "</#{tag}>"
end

inside_tag("p") do |tag|
  tag.content "Hello"
  tag.content "How are you?"
end

The above code gives the following output.

<p>
  Hello
  How are you?
</p>

Passing an object back to a block is a common DSL technique used in libraries like RSpec, Factory Bot, and Rails itself.

The technical details of blocks

There are a lot of technical details to learn about blocks. There are some interesting questions you could ask about blocks, including the following:

These are all good questions worth knowing the answer to, and you can click the links above to find out. But understanding these details is not necessary in order to understand the high-level gist of blocks.

Takeaway

A block is a way of passing behavior rather than data to a method. Not only do native Ruby methods make liberal use of blocks, but so do many popular Ruby libraries. Custom methods that take blocks can also sometimes be a good way to add expressiveness to your own applications.

How to Dockerize a Rails application (“hello world” version)

Reasons to Dockerize a Rails application

I had to hear Docker explained many times before I finally grasped why it’s useful.

I’ll explain in my own words why I think Dockerizing my applications is something worth exploring. There are two benefits that I recognize: one for the development environment and one for the production environment.

Development environment benefits

When a new developer joins a team, that developer has to get set up with the team’s codebase(s) and get a development environment running. This can often be time-consuming and tedious. I’ve had experiences where getting a dev environment set up takes multiple days. If an application is Dockerized, spinning up a dev environment can be as simple as running a single command.

In addition to simplifying the setup of a development environment, Docker can simplify the running of a development environment. For example, in addition to running the Rails server, my application also needs to run Redis and Sidekiq. These services are listed in my Procfile.dev file, but you have to know that file is there, and you need to know to start the app using foreman start -f Procfile.dev. With Docker you can tell it what services you need to run and Docker will just run the services for you.

Production environment benefits

There are a lot of different ways to deploy an application to production. None of them is particularly simple. As of this writing, my main production application is deployed to AWS using Ansible for infrastructure management. This is nice in many ways but it’s also somewhat duplicative. I’m using one (currently manual) way to set up my development environment and then another (codified, using Ansible) ways to set up my production environment.

Docker allows me to set up both my development environment and production environment the same way, or at least close to the same way. Once I have my application Dockerized I can use a tool like Kubernetes to deploy my application to any cloud provider without having to do a large amount of unique infrastructure configuration myself the way I currently am with Ansible. (At least that’s my understanding. I’m not at the stage of actually running my application in production with Docker yet.)

What we’ll be doing in this tutorial

In this tutorial we’ll be Dockerizing a Rails application using Docker and a tool called Docker Compose.

The Dockerization we’ll be doing will be the kind that will give us a development environment. Dockerizing a Rails app for use in production hosting will be a separate later tutorial.

My aim for this tutorial is to cover the simplest possible example of Dockerizing a Rails application. What you’ll get as a result is unfortunately not robust enough to be usable as a development environment as-is, but will hopefully serve as a good exercise to build your Docker confidence and to serve as a good jumping-off point for creating a more robust Docker configuration.

In this example our Rails application will have a PostgreSQL database and no other external dependencies. No Redis, no Sidekiq. Just a database.

Prerequisites

I’m assuming that before you begin this tutorial you have both Docker and Docker Compose installed. I’m assuming you’re using a Mac. Nothing else is required.

If you’ve never Dockerized anything before, I’d recommend that you check out my other post, How to Dockerize a Sinatra application, before digging into this one. The other post is simpler because there’s less stuff involved.

Fundamental Docker concepts

Let’s say I have a tiny Dockerized Ruby (not Rails, just Ruby) application. How did the application get Dockerized, and what does it mean for it to be Dockerized?

I’ll answer this question by walking sequentially through the concepts we’d make use of during the process of Dockerizing the application.

Images

When I run a Dockerized application, I’m running it from an image. An image is kind of like a blueprint for an application. The image doesn’t actually do anything, it’s just a definition.

If I wanted to Dockerize a Ruby application, I might create an image that says “I want to use Ruby 2.7.1, I have such-and-such application files, and I use such-and-such command to start my application”. I specify all these things in a particular way inside a Dockerfile, which I then use to build my Docker image.

Using the image, I’d be able to run my Dockerized application. The application would run inside a container.

Containers

Images are persistent. Once I create an image, it’s there on my computer (findable using the docker images command) until I delete it.

Containers are more ephemeral. When I use the docker run command to run an image, part of what happens is that I get a container. (Containers can be listed using docker container ls.) The container will exist for a while and then, when I kill my docker run process, the container will go away.

The difference between Docker and Docker Compose

One of the things that confused me in other Rails + Docker tutorials was the usage of Docker Compose. What is Docker Compose? Why do we need to use it in order to Dockerize Rails?

Docker Compose is a tool that lets you Dockerize an application that’s composed of multiple containers.

When in this example we Dockerize a Rails application that uses PostgreSQL, we can’t use just one image/container for that. We have to have one container for Rails and one container for PostgreSQL. Docker Compose lets us say “hey, my application has a Rails container AND a PostgreSQL container” and it lets us say how our various containers need to talk to each other.

The files involved in Dockerizing our application

Our Dockerized Rails application will have two containers: one for Rails and one for PostgreSQL. The PostgreSQL container can mostly be grabbed off the shelf using a base image. Since certain container needs are really common—e.g. a container for Python, a container for MySQL, etc.—Docker provides images for these things that we can grab and use in our application.

For our PostgreSQL need, we’ll grab the PostgreSQL 11.5 image from Docker Hub. Not much more than that is necessary for our PostgreSQL container.

Our Rails container is a little more involved. For that one we’ll use a Ruby 2.7.1 image plus our own Dockerfile that describes the Rails application’s dependencies.

All in all, Dockerizing our Rails application will involve two major files and one minor one. An explanation of each follows.

Dockerfile

The first file we’ll need is a Dockerfile which describes the configuration for our Rails application. The Dockerfile will basically say “use this version of Ruby, put the code in this particular place, install the gems using Bundler, install the JavaScript dependencies using Yarn, and run the application using this command”.

You’ll see the contents of the Dockerfile later in the tutorial.

docker-compose.yml

The docker-compose.yml file describes what our containers are and how they’re interrelated. Again, we’ll see the contents of this file shortly.

init.sql

This file plays a more minor role. In order for the PostgreSQL part of our application to function, we need a user with which to connect to the PostgreSQL instance. The only way to have a user is for us to create one. Docker allows us to have a file called init.sql which will execute once per container, ever. That is, the init.sql will run the first time we run our container and never again after that.

Dockerizing the application

Start from this repo called boats.

$ git clone git@github.com:jasonswett/boats.git

The master branch is un-Dockerized. You can start here and Dockerize the app yourself or you can switch to the docker branch which I’ve already Dockerized.

Dockerfile

Paste the following into a file called Dockerfile and put it right at the project root.

# Use the Ruby 2.7.1 image from Docker Hub
# as the base image (https://hub.docker.com/_/ruby)
FROM ruby:2.7.1

# Use a directory called /code in which to store
# this application's files. (The directory name
# is arbitrary and could have been anything.)
WORKDIR /code

# Copy all the application's files into the /code
# directory.
COPY . /code

# Run bundle install to install the Ruby dependencies.
RUN bundle install

# Install Yarn.
RUN curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | apt-key add -
RUN echo "deb https://dl.yarnpkg.com/debian/ stable main" | tee /etc/apt/sources.list.d/yarn.list
RUN apt-get update && apt-get install -y yarn

# Run yarn install to install JavaScript dependencies.
RUN yarn install --check-files

# Set "rails server -b 0.0.0.0" as the command to
# run when this container starts.
CMD ["rails", "server", "-b", "0.0.0.0"]

docker-compose.yml

Create another file called docker-compose.yml. Put this one at the project root as well.

# Use the file format compatible with Docker Compose 3.8
version: "3.8"

# Each thing that Docker Compose runs is referred to as
# a "service". In our case, our Rails application is one
# service ("web") and our PostgreSQL database instance
# is another service ("database").
services:

  database:
    # Use the postgres 11.5 base image for this container.
    image: postgres:11.5

    volumes:
      # We need to tell Docker where on the PostgreSQL
      # container we want to keep the PostgreSQL data.
      # In this case we're telling it to use a directory
      # called /var/lib/postgresql/data, although it
      # conceivably could have been something else.
      #
      # We're associating this directory with something
      # called a volume. (You can see all your Docker
      # volumes by running +docker volume ls+.) The name
      # of our volume is db_data.
      - db_data:/var/lib/postgresql/data

      # This copies our init.sql into our container, to
      # a special file called
      # /docker-entrypoint-initdb.d/init.sql. Anything
      # at this location will get executed one per
      # container, i.e. it will get executed the first
      # time the container is created but not again.
      #
      # The init.sql file is a one-line that creates a
      # user called (arbitrarily) boats_development.
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql

  web:
    # The root directory from which we're building.
    build: .

    # This makes it so any code changes inside the project
    # directory get synced with Docker. Without this line,
    # we'd have to restart the container every time we
    # changed a file.
    volumes:
      - .:/code:cached

    # The command to be run when we run "docker-compose up".
    command: bash -c "rm -f tmp/pids/server.pid && bundle exec rails s -p 3000 -b '0.0.0.0'"

    # Expose port 3000.
    ports:
      - "3000:3000"

    # Specify that this container depends on the other
    # container which we've called "database".
    depends_on:
      - database

    # Specify the values of the environment variables
    # used in this container.
    environment:
      RAILS_ENV: development
      DATABASE_NAME: boats_development
      DATABASE_USER: boats_development
      DATABASE_PASSWORD: 
      DATABASE_HOST: database

# Declare the volumes that our application uses.
volumes:
  db_data:

init.sql

This one-liner is our third and final Docker-related file to add. It will create a PostgreSQL user for us called boats_development. Like the other two files, this one can also go at the project root.

CREATE USER boats_development SUPERUSER;

config/database.yml

We’re done adding our Docker files but we still need to make one change to the Rails application itself. We need to modify the Rails app’s database configuration so that it knows it needs to be pointed at a PostgreSQL instance running in the container called database, not the same container the Rails app is running in.

default: &default
  adapter: postgresql
  encoding: unicode
  database: <%= ENV['DATABASE_NAME'] %>
  username: <%= ENV['DATABASE_USER'] %>
  password: <%= ENV['DATABASE_PASSWORD'] %>
  port: 5432
  host: <%= ENV['DATABASE_HOST'] %>
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
  timeout: 5000

development:
  <<: *default

test:
  <<: *default

production:
  <<: *default

Building and running our Dockerized application

Run the following command to build the application we’ve just described in our configuration files.

$ docker-compose build

Once that has successfully completed, run docker-compose up to run our application’s containers.

$ docker-compose up

The very last step before we can see our Rails application in action is to create the database, just like we would if we were running a Rails app without Docker.

$ docker-compose run web rails db:create
$ docker-compose run web rails db:migrate

The docker-compose run command is what we use to run commands inside a container. Running docker-compose run web means “run this command in the container called web“.

Finally, open http://localhost:3000 to see your Dockerized Rails application running in the browser.

$ open http://localhost:3000

Congratulations. You now have a Dockerized Rails application.

If you followed this tutorial and it didn’t work for you, please leave a comment with your problem, and if I can, I’ll help troubleshoot. Good luck.

How to Dockerize a Sinatra application

Why we’re doing this

Docker is difficult

In my experience, Dockerizing a Rails application for the first time is pretty hard. Actually, doing anything with Docker seems pretty hard. The documentation isn’t that good. Clear examples are hard to find.

Dockerizing Rails is too ambitious as a first goal

Whenever I do anything for the first time, I want to do the simplest, easiest possible version of that thing before I try anything more complicated. I also never want to try to learn more than one thing at once.

If I try to Dockerize a Rails application without any prior Docker experience, then I’m trying to learn the particulars of Dockerizing a Rails application while also learning the general principles of Docker at the same time. This isn’t a great way to go.

Dockerizing a Sinatra application gives us practice

Dockerizing a Sinatra application lets us learn some of the principles of Docker, and lets us get a small Docker win under our belt, without having to confront all the complications of Dockerizing a Rails application. (Sinatra is a very simple Ruby web application framework.)

After we Dockerize our Sinatra application we’ll have a little more confidence and a little more understanding than we did before. This confidence and understanding will be useful when we go to try to Dockerize a Rails application (which will be a future post).

By the way, if you’ve never worked with Sinatra before, don’t worry. No prior Sinatra experience is necessary.

What we’re going to do

Here’s what we’re going to do:

  1. Create a Sinatra application
  2. Run the Sinatra application to make sure it works
  3. Dockerize the Sinatra application
  4. Run the Sinatra application using Docker
  5. Shotgun a beer in celebration (optional)

I’m assuming you’re on a Mac and that you already have Docker installed. If you don’t want to copy/paste everything, I have a repo of all the files here.

(Side note: I must give credit to Marko Anastasov’s Dockerize a Sinatra Microservice post, from which this post draws heavily.)

Let’s get started.

Creating the Sinatra application

Our Sinatra “application” will have just one file. The application will have just one endpoint. Create a file called hello.rb with the following content.

# hello.rb

require 'sinatra'

get '/' do
  'It works!'
end

We’ll also need to create a Gemfile that says Sinatra is a dependency.

# Gemfile

source 'https://rubygems.org'

gem 'sinatra'

Lastly for the Sinatra application, we’ll need to add the rackup file, config.ru.

# config.ru

require './hello'

run Sinatra::Application

After we run bundle install to install the Sinatra gem, we can run the Sinatra application by running ruby hello.rb.

$ bundle install
$ ruby hello.rb

Sinatra apps run on port 4567 by default, so let’s open up http://localhost:4567 in a browser.

$ open http://localhost:4567

If everything works properly, you should see the following.

Dockerizing the Sinatra application

Dockerizing the Sinatra application will involve two steps. First, we’ll create a Dockerfile will tells Docker how to package up the application. Next we’ll use our Dockerfile to build a Docker image of our Sinatra application.

Creating the Dockerfile

Here’s what our Dockerfile looks like. You can put this file right at the root of the project alongside the Sinatra application files.

# Dockerfile

FROM ruby:2.7.4

WORKDIR /code
COPY . /code
RUN bundle install

EXPOSE 4567

CMD ["bundle", "exec", "rackup", "--host", "0.0.0.0", "-p", "4567"]

Since it might not be clear what each part of this file does, here’s an annotated version.

# Dockerfile

# Include the Ruby base image (https://hub.docker.com/_/ruby)
# in the image for this application, version 2.7.4.
FROM ruby:2.7.4

# Put all this application's files in a directory called /code.
# This directory name is arbitrary and could be anything.
WORKDIR /code
COPY . /code

# Run this command. RUN can be used to run anything. In our
# case we're using it to install our dependencies.
RUN bundle install

# Tell Docker to listen on port 4567.
EXPOSE 4567

# Tell Docker that when we run "docker run", we want it to
# run the following command:
# $ bundle exec rackup --host 0.0.0.0 -p 4567.
CMD ["bundle", "exec", "rackup", "--host", "0.0.0.0", "-p", "4567"]

Building the Docker image

All we need to do to build the Docker image is to run the following command.

I’m choosing to tag this image as hello, although that’s an arbitrary choice that doesn’t connect with anything inside our Sinatra application. We could have tagged it with anything.

The . part of the command tells docker build that we’re targeting the current directory. In order to work, this command needs to be run at the project root.

$ docker build --tag hello .

Once the docker build command successfully completes, you should be able to run docker images and see the hello image listed.

Running the Docker image

To run the Docker image, we’ll run docker run. The -p 4567:4567 portion says “take whatever’s on port 4567 on the container and expose it on port 4567 on the host machine”.

$ docker run -p 4567:4567 hello

If we visit http://localhost:4567, we should see the Sinatra application being served.

$ open http://localhost:4567

Conclusion

Congratulations. You now have a Dockerized Ruby application!

With this experience behind you, you’ll be better equipped to Dockerize a Rails application the next time you try to take on that task.

Intro to logical arguments for programmers

What it means to be wrong and why it’s bad

Logical incorrectness

It’s generally better to be right than to be wrong. Since there’s more than one way to be wrong, I want to be specific about the type of wrongness I want to address in this post before I move on.

The type of wrongness I’m interested in in this post is logical incorrectness, like two plus two equals five.

The danger of being wrong

Being right or wrong isn’t just an academic concern. In programming, being wrong often has concrete negative economic (and other) impacts. Developers who are often wrong will be much less efficient and burn up much more payroll cost and opportunity cost than developers who are wrong less often.

Being wrong is also not something that happens every great once in a while. Most humans are wrong about a whole bunch of stuff, a lot of the time, because that’s just human nature. Even really smart people are wrong about things a very nonzero amount of the time.

So I want to share some things we developers can do in order to be wrong less. But first let me share a concrete example of the kind of mistakenness I’m talking about.

An example of being wrong

Bug: an appointment goes missing

Let’s say I’m building some scheduling software. One of my users, Rachel, reports to me that yesterday she rescheduled someone’s appointment from July 1st at 10am to July 3rd at 10am. Today, Rachel looked at both the schedule for July 1st and July 3rd and the appointment isn’t present on either day. Apparently there’s a bug that removes appointments from the schedule when you try to reschedule them.

So I start to look at the code and see if I can find any evidence that this buggy behavior is present. Unfortunately, the code is very complicated, and my investigation takes a long time. My investigation lasts an entire day. By the end of the day I’ve made almost no progress toward fixing the bug.

The bug was not the bug

Unbeknownst to me, the thing I thought was the bug was not actually the bug. In fact, there was no bug. Between the time Rachel rescheduled the appointment and the time Rachel found the appointment missing, another user, Janice. deleted the appointment. There was in fact no bug at all. I was wrong. I wasted a whole day as a consequence of being wrong.

How to be less wrong

We developers can be wrong less of the time by studying epistemology. Epistemology is a branch of philosophy which deals with the acquisition of knowledge. Epistemology tells us how we can know, with certainty, what’s true and what isn’t.

More narrowly, we can study logic. Logic is a branch of philosophy that deals with a formal system of reasoning. One of the central ideas of logic is that of an argument. Arguments are the ideas we’ll be focusing on in this post.

The definition of a logical argument

An argument is a group of statements including one or more premises and one and only one conclusion. (I shamelessly stole this definition word-for-word from this web page.)

Now let’s talk about what a premise is and what a conclusion is. As an aid I’ll share an example of a logical argument, henceforth just referred to as an “argument”.

Argument example

All fish live in water.
All sharks are fish.
Therefore, all sharks live in water.

This argument contains two premises. “All fish live in water” is a premise. “All sharks are fish” is also a premise.

This argument’s conclusion is of course “Therefore, all sharks live in water”. If it’s true that all fish live in water and it’s true that all sharks are fish, then it’s of course true that all sharks live in water.

Validity and soundness

Not all arguments are good ones. An argument can be valid or invalid and sound or unsound.

Validity

An argument is valid if the truth of the argument’s conclusion is logically connected to the argument’s premises. Our above fish/shark argument is a valid argument because, if the argument’s premises are true, its conclusion must necessarily be true. We could make the argument invalid by changing some things.

All fish live in water.
All sharks are fish.
Therefore, all sharks have fins.

This argument isn’t valid because its conclusion doesn’t logically flow from its premises. It happens to be true that all sharks have fins, but that fact isn’t true as a natural consequence of this argument’s premises, so the argument isn’t valid.

Note that validity doesn’t have anything to do with truth. An argument can be valid even if its premises aren’t true.

All turtles are invisible.
Everyone has a turtle in their brain.
Therefore, everyone has an invisible turtle in their brain.

The premises of the above argument aren’t true (at least as far as I know) but the argument is nonetheless valid.

Soundness

An argument is sound if the argument is valid and its premises are true. Our first argument (“all fish live in water, all sharks are fish, therefore all sharks live in water”) is sound because the argument is valid and its premises are true. Sound arguments always have true conclusions.

Here’s another sound argument.

Every 20th century American president has been male.
Richard Nixon was a 20th century American president.
Richard Nixon was male.

Now comes the fun part, where we apply logical arguments to programming.

Arguments in programming

Read the following argument, keeping in mind the definitions of validity and soundness. See if you can tell if the argument is valid or invalid, sound or unsound. (If you don’t want a spoiler, don’t scroll past the argument until you’ve read the full argument.)

The site is unusually slow today.
We performed a large deployment this morning.
The deployment is the cause of the slowness.

This argument is unsound. Even if the premises are true, we can’t know based on the premises that the deployment was the cause of the slowness. How do we know it’s not a coincidence? For all we know, our site got featured on Hacker News and a big traffic spike is the cause of the slowness. Our argument is unsound because its conclusion isn’t necessarily true based on its premises. So, the reason that the argument is unsound is because even though its premises are true, its logic is invalid.

Here’s another example. Instead of just two premises, this argument has three.

Sometimes slowness is caused by code changes.
Sometimes slowness is caused by traffic spikes.
The site is unusually slow today.
The cause of the slowness is either a code change or a traffic spike.

This argument is also unsound. There are more possible reasons for a site to be slow than just a code change or a traffic spike. For example, maybe our DevOps person killed half the servers in the middle of the night last night without our knowing it. So despite true premises, this argument, like the preceding one, is invalid.

Here’s another example.

The code in the most recent deployment introduced a bug.
The only thing that went out in the most recent deployment was Josh’s code.
Josh’s code caused the bug.

As long as this argument’s premises are true, this argument is sound. If we know for sure that the most recent deployment introduced a bug, and we know for sure that the only thing that went out in the most recent deployment was Josh’s code, then it does logically follow that Josh’s code caused the bug.

Here’s a final example. This one is a little more detailed than the previous ones.

At 10:32am, a duplicate $20 charge appeared in the system for patient #5225.
Also at 10:32am, Jason carelessly performed a manual actual action on patient #5225, an action that was related to that patient’s $20 charge.
Jason caused the duplicate charge.

This is another unsound argument. Even though it sounds likely that my action caused the duplicate charge, it’s not logically valid to make that inference based on the premises. The invalidity is perhaps not obvious, but can be made more apparent by asking the question: “Are there any possible circumstances under which Jason’s manual action would NOT have created the duplicate charge?” For example, it’s possible that the card could have accidentally been run twice, and the timing was a coincidence.

I have empirical proof of the invalidity of the above argument because this is a real-life example and, in fact, my action was not the cause of the duplicate charge. Part of what helped me determine this is the following sound argument.

It’s impossible to create a charge without having a patient’s credit card information.
It would have been physically impossible for me to involve the patient’s credit card information when I performed my manual action because we don’t store credit card information in the system.
My action couldn’t have created the duplicate charge.

The preceding sound argument (and remember, sound arguments always have true conclusions) led me to investigate more deeply. What I ultimately discovered was that the patient’s card did in fact get run twice and the timing was just a coincidence. Why wasn’t it obvious from the outset that the patient’s card was run twice? Because we use Authorize.net as a payment gateway, and apparently sometimes the Authorize.net API returns a failure response even when the charge was successfully incurred, so from the perspective of our application there was only one charge that got successfully created, even though in reality there were two.

Good luck with your arguments

Next time you’re confronted with a programming mystery, I invite you to frame your mystery in terms of arguments. Write down your premises, and make sure to write down only premises that are true, otherwise your argument will be unsound. Then try to come up with a conclusion and make sure that your conclusion necessarily follows from your premises so that your argument is valid. If your argument is valid and sound, your conclusion will necessarily be true.

If you’d like to have an argument with me on Twitter, you can find me here.

Why I don’t use Shoulda matchers

One of the testing questions I commonly get is about Shoulda matchers. People ask if I use Shoulda matchers and if Shoulda matchers are a good idea.

I’ll share my thoughts on this. First I’ll explain what Shoulda is, then I’ll explain why I don’t use it.

What Shoulda is

If you’re unfamiliar with Shoulda matchers, the premise, from the GitHub description, is: “Shoulda Matchers provides RSpec- and Minitest-compatible one-liners to test common Rails functionality that, if written by hand, would be much longer, more complex, and error-prone.”

A few examples of specific Shoulda matchers are validates_presence_of (expects that a model attribute has a presence validator), have_many (expects that a has_many association exists), and redirect_to (expects that a redirection takes place).

I like the idea of a library that can clean up a lot of my repetitive test code. Unfortunately, Shoulda matchers only apply to the kinds of tests I would never write.

Test behavior, not implementation

To me it doesn’t make much sense to, for example, write a test that only checks for the presence of an Active Record association and doesn’t do anything else.

If I have an association, presumably that association exists in order to enable some piece of behavior, or else it would be pointless for the association to exist. For example, if a User class has_many :posts, then that association only makes sense if there’s some Post-related behavior.

So there are two possibilities in light of testing that the User class has_many :posts. One is that I write a test for both the association itself and the behavior enabled by the association, in which case the test for the association is redundant and adds no value. The other possibility is that I write a test only for the post association, but not for the post behavior, which wouldn’t make much sense because why wouldn’t I write a test for the post behavior?

To me it only makes sense in this example to write tests for the post behavior and write no tests directly for the association. The logic of this decision can be proved by imagining what would happen if the has_many :posts line were removed. Any tests for the post behavior would start failing because the behavior would be broken without the association line present.

Takeaway

Don’t test low-level implementations. It’s pointless. Test behavior instead.

Since Shoulda is only good for testing low-level implementations, I don’t recommend using it.

Lessons I learned converting all my database IDs to UUIDs

The motivation

About seven months ago I became aware of a slightly worrisome problem in the application I maintain at work.

The application is a medical application. Each patient in the system has a numeric, incremental account number like 4220. Due to coincidental timing of business and technology, the database ID of each patient is pretty close to the account number. The patient with account number 4220 might have a database ID of something like 4838.

You can imagine how this might cause confusion. Imagine wanting to check which patient you’re viewing and you see 4220 on the screen and patients/4838 in the URL. If you’re not paying close attention it can be confusing.

I brought this issue up with my boss. Turns out my boss had actually gotten tripped up on this issue himself. I brought up the option of switching from numeric IDs to UUIDs as a way to fix the problem and he agreed that it was a good idea. I also brought up my judgment that this would likely be a large, risky, time-consuming project, but we both agreed that the risks and costs associated with the UUID change were less than the risks and costs associated with leaving things the way they were.

The approach

Acknowledging the risk of the project, I decided to distribute the risk in small pieces over time so that no single change would carry too much risk.

Rather than trying to convert all my 87 tables to UUIDs at once, which would be way too risky, I decided to convert 1-10 tables at a time, starting with just one and ramping it up over time as the level of uncertainty decreased.

The planned cadence was once per week, spreading the work across 20-40 weeks, depending on how many tables could be converted in a batch. I applied each UUID change on a Saturday morning, although I didn’t start off doing them Saturdays. I was prompted to start doing it this way after one of the early migrations caused a problem. This brings me to my first lesson.

Lesson 1: apply UUID changes off-hours with a significant time buffer

My first mistake was that I didn’t properly calibrate my methodology to the true level of risk involved.

For one of the earlier UUID migrations I performed, I did it maybe an hour and a half before open of business while no one was doing anything. Unfortunately, something went wrong with the migration, and I didn’t have time to fully fix the problem (nor did I have time to roll it back) before people started using the system.

This incident showed me that I needed a bigger buffer time.

If I wanted to perform the migrations off-hours, my options were to a) perform the migrations on weekdays between about 10pm and 6am or b) perform the migrations on weekends. I decided to go with weekends.

In addition to adding buffer time, I added a way for me to conveniently put the app into “maintenance mode” which would block all users except me from using the app while maintenance mode was on.

Since the time I added these changes to my change process, there have been no UUID-related incidents.

Lesson 2: beware of subtle references

When a table’s IDs get converted to UUIDs, all the references to that table of course need to get updated too. A table called customers needs to have every customer_id column updated in order for the association to be preserved.

This is easy enough when the referencing column’s name matches the referenced table’s name (e.g. customer_id matches customers) and/or when a foreign key constraint is present, making it physically impossible to forget about the association. It’s harder when neither a matching column name nor a foreign key constraint exists to clue you in.

You know that incident I mentioned in Lesson 1? That incident was caused by my failure to detect active_storage_attachments.record_id as a column that needed its values changed from numeric IDs to UUIDs. This caused a peculiar bug where most of the file uploads in the system started to appear on records to which they did not belong. Not always, just sometimes. The random behavior had do do with the fact that an expression like '123abc'.to_i evaluates to 123 while an expression like 'abc123'.to_i evaluates to 0.

Anyway, the fix to that issue was conceptually straightforward enough once I knew what was happening, even if the fix was a little tedious.

At this point you might wonder whether good test coverage might have caught such a bug. I must have been taking stupid pills on the day this incident happened because, in addition to everything else I did wrong, I ran the test suite but didn’t bother to check the results before I deployed, even though I knew my change was particularly risky. If I had checked the test suite results I would have seen that there were a few failing tests related to file uploads. I haven’t made that mistake since.

Lesson 3: keep the numeric IDs around

For the first several tables I converted, it didn’t occur to me that I might want to keep the numeric IDs around for any reason. Two things that happened later showed me that it is in fact a good idea to keep the numeric IDs around.

First, keeping the numeric IDs makes for easier rollback. The incident from Lesson 1 could have been rolled back fairly trivially if I had just kept the numeric ID column.

Second, sometimes an incremental ID is handy. I recently built a feature that lets patients pay their statements online. The patients can find their statement by entering their statement ID. It’s not realistic to ask 80 year-old un-tech-savvy people to enter values like CB08B2 where they have to make the distinction between a zero and a letter O. So for that feature I used the numeric, sequential ID of the statements to show to patients.

Lesson 4: there are a lot of special cases

As of this writing I’ve converted 37 tables from numeric IDs to sequential IDs. These were done, again, in small batches of 1 to 10, usually more like 1 to 3.

Almost every batch presented me with some small, new, unique obstacle. Most of these obstacles were too small and uninteresting for it to be useful for me to list every one. For example, I learned on one batch that if a view (that is, database view, not Rails view) references the table I’m changing, then the view needs to be dropped before I make the UUID change and then re-created afterward. There was a lot of little stuff like that.

For this reason, I didn’t have much success with the gems that exist to assist with UUID migrations. Everything I did was custom. It wasn’t much code, though, mostly just a script that I used for each migration and added to each time I encountered a new issue. The current total size of the script is 91 lines.

Lesson 5: performance is fine (for me)

I’ve read some comments online that say UUIDs are bad for performance. Some cursory research tells me that, at least for PostgreSQL, this isn’t really true, at least not enough to matter.

I’ve also experienced no discernible performance hits as a result of converting tables from incremental primary keys to UUID primary keys.

Takeaways

If I had a time machine, here’s some advice I would give my past self.

  • Apply the UUID changes off-hours with a big buffer.
  • For each change, look really carefully for referencing columns.
  • Keep the numeric IDs around.
  • Expect to hit a lot of little snags along the way.

All in all this project has been worth it. Only one production incident was caused. The incident was recovered from relatively easily. The work hasn’t been that painful. And there’s a great satisfaction I get from looking at patient URLs and knowing they will never cause confusion with patient account numbers again.

Why use Factory Bot instead of creating test data manually?

I recently taught a Rails testing class where we wrote tests using RSpec, Capybara, Factory Bot and Faker.

During the class, one of the students asked, why do we use Factory Bot? What’s the advantage over creating test data manually?

The answer to this is perhaps most easily explained with an example. I’m going to show an example of a test setup that creates data manually, then a test setup that uses Factory Bot, so you can see the difference.

In both examples, the test setup is for a test that needs two Customer records to exist. A Customer object has several attributes and a couple associations.

Non-Factory-Bot example

Here’s what it looks like when I create my two Customer records manually.

The upside to this approach is that there’s no extra tooling involved. It’s just Active Record. There are a few downsides though.

First, it’s wasteful and tedious to have to think of and type out all this fake data (e.g. “555-123-4567”).

Second, it’s going to be unclear to an outside reader what data is significant and what’s just arbitrary. For example, is it significant that John Smith lives in Minneapolis or could his address have been anywhere?

Third, all this test data adds noise to the test and makes it an Obscure Test. When there’s a bunch of test data in the test it makes it much harder to tell at a glance what the essence of the test is and what behavior it’s testing.

This example is tedious enough with only two associations on the Customer model (State and User). You can imagine how bad things might get when you have truly complex associations.

RSpec.describe Customer do
  before do
    @customers = Customer.create!([
      {
        first_name: 'John',
        last_name: 'Smith',
        phone_number: '555-123-4567',
        address_line_1: '123 Fake Street',
        city: 'Minneapolis',
        state: State.create!(name: 'Minnesota', abbreviation: 'MN'),
        zip_code: '55111',
        user: User.create!(
          email: 'john.smith@example.com',
          password: 'gdfkgfgasdf18233'
        )
      },
      {
        first_name: 'Kim',
        last_name: 'Jones',
        phone_number: '555-883-2283',
        address_line_1: '338 Notreal Ave',
        city: 'Chicago',
        state: State.create!(name: 'Illinois', abbreviation: 'IL'),
        zip_code: '60606',
        user: User.create!(
          email: 'kim.jones@example.com',
          password: 'eejkgsfg238231188'
        )
      }
    ])
  end
end

Factory Bot example

Here’s a version of the test setup that uses Factory Bot. It achieves the same result, the creation of two Customer records. The code for this version is obviously much more concise.

RSpec.describe Customer do
  before do
    @customers = FactoryBot.create_list(:customer, 2)
  end
end

This simple code is made possible through factory definitions. In this case there are three factory definitions: one for Customer, one for State and one for User.

In all three of the factory definitions I’m using an additional gem called Faker. Faker helps with the generation of things like random names, phone numbers, email addresses, etc.

Here are the three factory definitions.

FactoryBot.define do
  factory :customer do
    first_name { Faker::Lorem.characters(10) }
    last_name { Faker::Lorem.characters(10) }
    phone_number { Faker::PhoneNumber.cell_phone }
    address_line_1 { Faker::Lorem.characters(10) }
    city { Faker::Lorem.characters(10) }

    # This line will generate an associated State record
    # using the factory definition for State
    state

    # Same here, but for User
    user
  end
end

FactoryBot.define do
  factory :state do
    name { Faker::Lorem.characters(10) }
    abbreviation { Faker::Lorem.characters(10) }
  end
end

FactoryBot.define do
  factory :user do
    email { Faker::Internet.email }
    password { Faker::Internet.password }
  end
end

Takeaways

If you were wondering why exactly we use Factory Bot, the answer is that it makes our tests more convenient to write and more understandable to read. In addition to Factory Bot, the Faker gem can help take away some of the tedium of having to create test data values.

There’s also one other popular method of creating test data which is to use fixtures. Fixtures have the advantage of speeding up a test suite because they’re only loaded once at the beginning of the test suite run (as opposed to factories which are typically run once per test) but I prefer factories because I feel they make tests easier to understand. You can read more about fixtures vs. factories here.

How I set up Factory Bot on a fresh Rails project

A reader of mine recently asked me how I set up Factory Bot for a new Rails project.

There are four steps I go through to set up Factory Bot.

  1. Install the factory_bot_rails gem
  2. Set up one or more factory definitions
  3. Install Faker
  4. Add the Factory Bot syntax methods to my rails_helper.rb file

Following are the details for each step.

Install the factory_bot_rails gem

The first thing I do is to include the factory_bot_rails gem (not the factory_bot gem) in my Gemfile. I include it under the :development, :test group.

Here’s a sample Gemfile from a project with only the default gems plus a few that I added for testing.

Remember that after you add a gem to your Gemfile you’ll need to run bundle install in order to actually install the gem.

source 'https://rubygems.org'
git_source(:github) { |repo| "https://github.com/#{repo}.git" }

ruby '2.7.0'

# Bundle edge Rails instead: gem 'rails', github: 'rails/rails'
gem 'rails', '~> 6.0.2', '>= 6.0.2.2'
# Use postgresql as the database for Active Record
gem 'pg', '>= 0.18', '< 2.0'
# Use Puma as the app server
gem 'puma', '~> 4.1'
# Use SCSS for stylesheets
gem 'sass-rails', '>= 6'
# Transpile app-like JavaScript. Read more: https://github.com/rails/webpacker
gem 'webpacker', '~> 4.0'
# Turbolinks makes navigating your web application faster. Read more: https://github.com/turbolinks/turbolinks
gem 'turbolinks', '~> 5'
# Build JSON APIs with ease. Read more: https://github.com/rails/jbuilder

gem 'devise'

# Reduces boot times through caching; required in config/boot.rb
gem 'bootsnap', '>= 1.4.2', require: false

group :development, :test do
  gem 'pry'
  gem 'rspec-rails'
  gem 'capybara'
  gem 'webdrivers'
  gem 'factory_bot_rails'
end

group :development do
  # Access an interactive console on exception pages or by calling 'console' anywhere in the code.
  gem 'web-console', '>= 3.3.0'
  gem 'listen', '>= 3.0.5', '< 3.2'
  # Spring speeds up development by keeping your application running in the background. Read more: https://github.com/rails/spring
  gem 'spring'
  gem 'spring-watcher-listen', '~> 2.0.0'
end

# Windows does not include zoneinfo files, so bundle the tzinfo-data gem
gem 'tzinfo-data', platforms: [:mingw, :mswin, :x64_mingw, :jruby]

Set up one or more factory definitions

Factory definitions are kind of the “templates” that are used for generating new objects.

For example, I have a user object that needs an email and a password, then I would create a factory definition saying “hey, make me a user with an email and password”. The actual code might look like this:

FactoryBot.define do
  factory :user do
    email { 'test@example.com' }
    password { 'password1' }
  end
end

Factory Bot is smart enough to know that when I say factory :user do, I’m talking about an Active Record class called User.

There’s a problem with this way of defining my User factory though. If I have a unique constraint on the users.email column in the database (for example), then I won’t ever be able to generate more than one User object. The first user’s email address will be test@example.com (no problem so far) but then when I go to create a second user, its email address will also be test@example.com, and if I have a unique constraint on users.email, the creation of this second record will not be allowed.

We need a way of making it so the factories’ values can be unique. One way, which I’ve done before, is to append a random number to the end of the email address, e.g. "test#{SecureRandom.hex}@example.com". There’s a different way to do it, though, that I find nicer. That way is to use another gem called Faker.

Install Faker

Just like I showed with factory_bot_rails above, the Faker gem can be added by putting it into the :development, :test group of the Gemfile.

Then we can change our User factory definition as follows.

FactoryBot.define do
  factory :user do
    email { Faker::Internet.email }
    password { Faker::Internet.password }
  end
end

This will give us random values like eldora@jones.net and lazaromertz@ko.name.

Add the Factory Bot syntax methods to my rails_helper.rb file

The syntax for actually using a Factory Bot factory in a test is as follows:

FactoryBot.create(:user)

There’s nothing wrong with this, but I find that these FactoryBot are so numerous in my test files that their presence feels a little noisy.

There’s a way to make it so that instead we can just write this:

create(:user)

The way to do that is to add a bit of code to spec/rails_helper.rb.

RSpec.configure do |config|
  config.include FactoryBot::Syntax::Methods
end

(You don’t actually add the RSpec.configure do |config| to the spec/rails_helper.rb file. It’s already there. I’m just including it here to show that that’s the block inside of which the config.include FactoryBot::Syntax::Methods line goes.)

What to do next

If you’re curious how to put Factory Bot together with the other testing tools to write some complete Rails tests, I might suggest my Rails testing “hello world” tutorial using RSpec and Capybara.