Author Archives: Jason Swett

Why use Factory Bot instead of creating test data manually?

I recently taught a Rails testing class where we wrote tests using RSpec, Capybara, Factory Bot and Faker.

During the class, one of the students asked, why do we use Factory Bot? What’s the advantage over creating test data manually?

The answer to this is perhaps most easily explained with an example. I’m going to show an example of a test setup that creates data manually, then a test setup that uses Factory Bot, so you can see the difference.

In both examples, the test setup is for a test that needs two Customer records to exist. A Customer object has several attributes and a couple associations.

Non-Factory-Bot example

Here’s what it looks like when I create my two Customer records manually.

The upside to this approach is that there’s no extra tooling involved. It’s just Active Record. There are a few downsides though.

First, it’s wasteful and tedious to have to think of and type out all this fake data (e.g. “555-123-4567”).

Second, it’s going to be unclear to an outside reader what data is significant and what’s just arbitrary. For example, is it significant that John Smith lives in Minneapolis or could his address have been anywhere?

Third, all this test data adds noise to the test and makes it an Obscure Test. When there’s a bunch of test data in the test it makes it much harder to tell at a glance what the essence of the test is and what behavior it’s testing.

This example is tedious enough with only two associations on the Customer model (State and User). You can imagine how bad things might get when you have truly complex associations.

RSpec.describe Customer do
  before do
    @customers = Customer.create!([
      {
        first_name: 'John',
        last_name: 'Smith',
        phone_number: '555-123-4567',
        address_line_1: '123 Fake Street',
        city: 'Minneapolis',
        state: State.create!(name: 'Minnesota', abbreviation: 'MN'),
        zip_code: '55111',
        user: User.create!(
          email: 'john.smith@example.com',
          password: 'gdfkgfgasdf18233'
        )
      },
      {
        first_name: 'Kim',
        last_name: 'Jones',
        phone_number: '555-883-2283',
        address_line_1: '338 Notreal Ave',
        city: 'Chicago',
        state: State.create!(name: 'Illinois', abbreviation: 'IL'),
        zip_code: '60606',
        user: User.create!(
          email: 'kim.jones@example.com',
          password: 'eejkgsfg238231188'
        )
      }
    ])
  end
end

Factory Bot example

Here’s a version of the test setup that uses Factory Bot. It achieves the same result, the creation of two Customer records. The code for this version is obviously much more concise.

RSpec.describe Customer do
  before do
    @customers = FactoryBot.create_list(:customer, 2)
  end
end

This simple code is made possible through factory definitions. In this case there are three factory definitions: one for Customer, one for State and one for User.

In all three of the factory definitions I’m using an additional gem called Faker. Faker helps with the generation of things like random names, phone numbers, email addresses, etc.

Here are the three factory definitions.

FactoryBot.define do
  factory :customer do
    first_name { Faker::Lorem.characters(10) }
    last_name { Faker::Lorem.characters(10) }
    phone_number { Faker::PhoneNumber.cell_phone }
    address_line_1 { Faker::Lorem.characters(10) }
    city { Faker::Lorem.characters(10) }

    # This line will generate an associated State record
    # using the factory definition for State
    state

    # Same here, but for User
    user
  end
end

FactoryBot.define do
  factory :state do
    name { Faker::Lorem.characters(10) }
    abbreviation { Faker::Lorem.characters(10) }
  end
end

FactoryBot.define do
  factory :user do
    email { Faker::Internet.email }
    password { Faker::Internet.password }
  end
end

Takeaways

If you were wondering why exactly we use Factory Bot, the answer is that it makes our tests more convenient to write and more understandable to read. In addition to Factory Bot, the Faker gem can help take away some of the tedium of having to create test data values.

There’s also one other popular method of creating test data which is to use fixtures. Fixtures have the advantage of speeding up a test suite because they’re only loaded once at the beginning of the test suite run (as opposed to factories which are typically run once per test) but I prefer factories because I feel they make tests easier to understand. You can read more about fixtures vs. factories here.

How I set up Factory Bot on a fresh Rails project

A reader of mine recently asked me how I set up Factory Bot for a new Rails project.

There are four steps I go through to set up Factory Bot.

  1. Install the factory_bot_rails gem
  2. Set up one or more factory definitions
  3. Install Faker
  4. Add the Factory Bot syntax methods to my rails_helper.rb file

Following are the details for each step.

Install the factory_bot_rails gem

The first thing I do is to include the factory_bot_rails gem (not the factory_bot gem) in my Gemfile. I include it under the :development, :test group.

Here’s a sample Gemfile from a project with only the default gems plus a few that I added for testing.

Remember that after you add a gem to your Gemfile you’ll need to run bundle install in order to actually install the gem.

source 'https://rubygems.org'
git_source(:github) { |repo| "https://github.com/#{repo}.git" }

ruby '2.7.0'

# Bundle edge Rails instead: gem 'rails', github: 'rails/rails'
gem 'rails', '~> 6.0.2', '>= 6.0.2.2'
# Use postgresql as the database for Active Record
gem 'pg', '>= 0.18', '< 2.0'
# Use Puma as the app server
gem 'puma', '~> 4.1'
# Use SCSS for stylesheets
gem 'sass-rails', '>= 6'
# Transpile app-like JavaScript. Read more: https://github.com/rails/webpacker
gem 'webpacker', '~> 4.0'
# Turbolinks makes navigating your web application faster. Read more: https://github.com/turbolinks/turbolinks
gem 'turbolinks', '~> 5'
# Build JSON APIs with ease. Read more: https://github.com/rails/jbuilder

gem 'devise'

# Reduces boot times through caching; required in config/boot.rb
gem 'bootsnap', '>= 1.4.2', require: false

group :development, :test do
  gem 'pry'
  gem 'rspec-rails'
  gem 'capybara'
  gem 'webdrivers'
  gem 'factory_bot_rails'
end

group :development do
  # Access an interactive console on exception pages or by calling 'console' anywhere in the code.
  gem 'web-console', '>= 3.3.0'
  gem 'listen', '>= 3.0.5', '< 3.2'
  # Spring speeds up development by keeping your application running in the background. Read more: https://github.com/rails/spring
  gem 'spring'
  gem 'spring-watcher-listen', '~> 2.0.0'
end

# Windows does not include zoneinfo files, so bundle the tzinfo-data gem
gem 'tzinfo-data', platforms: [:mingw, :mswin, :x64_mingw, :jruby]

Set up one or more factory definitions

Factory definitions are kind of the “templates” that are used for generating new objects.

For example, I have a user object that needs an email and a password, then I would create a factory definition saying “hey, make me a user with an email and password”. The actual code might look like this:

FactoryBot.define do
  factory :user do
    email { 'test@example.com' }
    password { 'password1' }
  end
end

Factory Bot is smart enough to know that when I say factory :user do, I’m talking about an Active Record class called User.

There’s a problem with this way of defining my User factory though. If I have a unique constraint on the users.email column in the database (for example), then I won’t ever be able to generate more than one User object. The first user’s email address will be test@example.com (no problem so far) but then when I go to create a second user, its email address will also be test@example.com, and if I have a unique constraint on users.email, the creation of this second record will not be allowed.

We need a way of making it so the factories’ values can be unique. One way, which I’ve done before, is to append a random number to the end of the email address, e.g. "test#{SecureRandom.hex}@example.com". There’s a different way to do it, though, that I find nicer. That way is to use another gem called Faker.

Install Faker

Just like I showed with factory_bot_rails above, the Faker gem can be added by putting it into the :development, :test group of the Gemfile.

Then we can change our User factory definition as follows.

FactoryBot.define do
  factory :user do
    email { Faker::Internet.email }
    password { Faker::Internet.password }
  end
end

This will give us random values like eldora@jones.net and lazaromertz@ko.name.

Add the Factory Bot syntax methods to my rails_helper.rb file

The syntax for actually using a Factory Bot factory in a test is as follows:

FactoryBot.create(:user)

There’s nothing wrong with this, but I find that these FactoryBot are so numerous in my test files that their presence feels a little noisy.

There’s a way to make it so that instead we can just write this:

create(:user)

The way to do that is to add a bit of code to spec/rails_helper.rb.

RSpec.configure do |config|
  config.include FactoryBot::Syntax::Methods
end

(You don’t actually add the RSpec.configure do |config| to the spec/rails_helper.rb file. It’s already there. I’m just including it here to show that that’s the block inside of which the config.include FactoryBot::Syntax::Methods line goes.)

What to do next

If you’re curious how to put Factory Bot together with the other testing tools to write some complete Rails tests, I might suggest my Rails testing “hello world” tutorial using RSpec and Capybara.

The difference between system specs and feature specs

If you’re like me, you might have found the difference between RSpec’s “feature specs” and “system specs” to be a little too nuanced to be easily understandable. Here’s my explanation of the difference between the two.

Two levels of Rails tests

For some background, I want to talk about the main two types of Rails tests that I use, and where feature specs, the predecessor to system specs, come intro the picture.

I think of Rails tests as existing on two basic levels.

High-level, course-grained

One level of tests is at a high level. These tests simulate a user visiting pages, filling in forms, clicking links and buttons, etc. Different people use different terms for these tests including “integration test”, “end-to-end test” and “acceptance test”.

Because these types of tests are often expensive to write and run, they tend not to cover every nook and cranny of the application’s behavior.

In RSpec terminology, these types of tests have historically been called feature specs.

Low-level, fine-grained

The other level of tests is at a lower level. There are all kinds of tests you could possibly write with Rails/RSpec (model specs, request specs, view specs, helper specs) but I tend to skip most of those in most scenarios and only write model specs.

From feature spec to system spec

The backstory

When Rails 5.1 came out in April 2017, one of the features it introduced was system tests. Here’s why this was done.

By default Rails ships with Minitest. The inclusion of Minitest provides a way to write model tests and certain other kinds of tests, but it historically hasn’t provided a way to write full-blown end-to-end without doing some extra work yourself, like bringing Capybara and Database Cleaner into the picture. This is the rationale for Rails 5.1’s addition of system tests, in my understanding. (To be clear, we’re still talking about system tests, not system specs.)

System specs wrap system tests

According to the RSpec docs, “System specs are RSpec’s wrapper around Rails’ own system tests.” This means that it’s no longer required to explicitly include the Capybara gem, and because system tests are already run inside a transaction, you don’t need Database Cleaner.

Summary

System specs are a wrapper around Rails’ system tests. The benefits of using system specs instead of feature specs is that you don’t have to explicitly include the Capybara gem, nor do you have to use Database Cleaner.

My method of systematic troubleshooting

Why systematic troubleshooting is valuable

Computers are complicated, programming is complicated, and the problems we programmers have to solve are often complicated.

Because human brains are only so powerful, and because humans are susceptible to logical fallacies, it’s very important to have a systematic approach to troubleshooting if we want to have a hope of solving our problems and solving them in a timely manner.

If you don’t have a good methodology, you’ll guess and flail and get frustrated and ultimately probably fail to fix the issue. If you do have a good methodology, almost any problem will eventually collapse under the crushing weight of your capabilities.

Here are some questions I tend to ask myself when troubleshooting technical issues.

Do I know, with certainty, exactly what’s wrong?

Don’t get fooled

Most of the time, when I’m presented with a problem, I don’t know exactly what’s wrong at first. It’s of course impossible to fix a problem if I don’t know what needs fixing.

The challenge here is not to fool yourself into thinking the problem is something other than what it really is.

I’ve been guilty on a number of occasions of taking a bug report at face value and then discovering later that the reporter of the bug was mistaken and that the bug was something else. If someone tells me “we’re not able to charge American Express cards in the system,” I should translate that statement most of the time to “something seems to be wrong with credit card payments”.

State only what you know for sure to be true

Here’s a quote from Zen and the Art of Motorcycle Maintenance (using motorcycles as the subject of investigation rather than computers):

It is much better to enter a statement “Solve Problem: Why doesn’t cycle work?” which sounds dumb but is correct, than it is to enter a statement “Solve Problem: What is wrong with the electrical system?” when you don’t absolutely know the trouble is in the electrical system. What you should state is “Solve Problem: What is wrong with cycle?” and then state as the first entry of Part Two: “Hypothesis Number One: The trouble is in the electrical system.” You think of as many hypotheses as you can, then you design experiments to test them to see which are true and which are false.

Good advice. (And a good book.)

How can I narrow the scope of my investigation?

Most problems are too big

With most problems, there are so many variables interacting with each other that it’s impossible to model the whole situation in my head. So I try to see what I can eliminate from the picture.

Let’s say I’m trying to deploy a Rails application to AWS. Some of my deployment worked but I can’t connect to my RDS instance (that is, my database instance).

In this case, my EC2 instance can’t connect to my RDS instance. Or, more specifically, the Rails app on my EC2 instance can’t connect to my RDS instance. How do I know whether the problem lies with my RDS instance, my EC2 instance, my Rails app, all of these, none of these, or some combination? That’s a lot of possibilities.

Narrowing it down

So rather than trying to tackle the whole problem, I’ll narrow it down to just the RDS instance. Is anything wrong with the RDS instance? How can I interrogate the RDS instance without bringing all the other parts into the picture?

One thing I can do in this case is to use the PostgreSQL CLI client (that is to say, the psql command) on my laptop (not on the EC2 instance, on my laptop) to try to connect to the RDS instance. This way I’m not involving my EC2 instance, I’m not involving environment variables, I’m not involving Rails, I’m only dealing with the RDS instance.

I can run the command psql my-db-name -U postgres -h my-rds-hostname and see what happens. If I can connect to my database that way and run a query, then I can be sure that my database name, my password and my hostname are all good. If I get prompted for a password but it tells me the password is wrong, then I know the problem is my password. If I don’t even get prompted for a password, then something else is wrong.

If I don’t know what’s wrong, in what areas might the problem lie and what tests can I perform to get a yes/no in each area?

List hypotheses

Before I actually get to work investigating a problem, I’ll usually list a number of hypotheses that I can test.

Here are some hypotheses for the RDS connection problem.

  • I’m using the wrong database credentials
  • I have the URL for my RDS instance wrong
  • The environment variables for my database credentials, RDS URL, etc., aren’t even set
  • My RDS instance’s security group is set to block traffic
  • Something else is wrong that I can’t think of

Test the hypotheses

Then I’ll try to get an answer to each question. I might start with what’s easiest to test or I might start with what I think is most likely to be the culprit.

I already shared one way of checking the second hypothesis, “I have the URL for my RDS instance wrong”. If I believe I have my RDS instance URL right, or if my tests in that area are inconclusive, I might try testing a different hypothesis.

When working with AWS, I find it easy to accidentally make my security groups (i.e. the sets of rules that control what types of traffic various entities can receive, like HTTP traffic, SSH traffic, etc.) too restrictive. PostgreSQL requests typically travel on port 5432, so if I don’t have port 5432 open for my RDS instance, I won’t be able to connect on port 5432.

To see if this is the issue I can use an open port checker to hit my RDS URL at port 5432. Again, this test tests only one thing at a time, which means if it doesn’t work, I can be pretty sure exactly what doesn’t work. If I perform a test that could have a large number reasons for returning negative, I haven’t learned very much. Simple, narrow tests are the most useful kinds of tests to perform.

Have I tried everything that can possibly be tried? If not, what haven’t I tried yet?

This is a question I often ask myself when I get stuck. It might sound like a dumb question but it turns out to be a productive question to ask a surprising portion of the time.

Persistence conquers all things

The point of this question is to facilitate persistence. It’s basically true that every problem is solvable and that the only way to fail is to give up. After all, if you understood everything about the problem and all the parts surrounding it, you would know exactly how to fix the problem, and there wouldn’t be a problem.

For example, with the RDS example, if you knew everything about networking, and databases, and Linux, and Rails, then you’d know exactly how to fix the problem. Luckily it never gets to that point. Usually there’s a relatively small amount of knowledge and understanding that lies between you and the solution to the problem.

So next time you’re confronted with a hairy issue, remember to use a systematic methodology, remember not to get fooled into believing things about the situation that aren’t true, remember to be persistent, and it’s entirely likely that you’ll be able to solve your problem.

How to restart Sidekiq automatically for each deployment

This post will cover why it’s necessary to restart Sidekiq for each deployment as well as how to achieve it.

Why it’s necessary to restart Sidekiq for each deployment

Example: database column name change

Let’s say you have a model in your Rails application called Customer. The corresponding customers table has a column that someone has called fname.

Because you’re not a fan of making your code harder to work with through overabbreviation, you decide to alter the customers table and rename fname to first_name. (You also change any instances of fname to first_name in the Rails code.) You deploy this change and the database migration runs in production.

Problem: out-of-date Sidekiq code

Unfortunately, unless you restart the Sidekiq process, there will be a problem. At the time the Sidekiq process was started (and by “Sidekiq process” I of course mean the thing that runs when you run bundle exec sidekiq), the Rails code referred to a column called fname.

If you changed the column name without restarting Sidekiq, the code the Sidekiq process is running will be referring to the non-existent fname column and your application will break.

The fix is simple: restart Sidekiq each time you deploy. The way to achieve this, however, is not so simple.

How to get Sidekiq to restart

systemd and systemctl

Before we get into the specifics of how to get Sidekiq to restart, a little background is necessary.

You’ve probably run Linux commands before that look something like sudo service nginx restart. I’ve been running such commands for many years without understanding what the “service” part is all about.

The “service” part is related to a pair of Linux concepts, systemd and systemctl. In the words of the DigitalOcean article I just linked, systemd is “an init system and system manager” and systemctl is “the central management tool for controlling the init system”. To put it in something closer to layperson’s terms, systemd and systemctl are a way to manage processes in a convenient way.

Using systemd to manage Sidekiq

If we register Sidekiq as a service with systemd, we gain the ability to run commands like service sidekiq start, service sidekiq restart and service sidekiq stop.

Once we have that ability, restarting Sidekiq on each deployment is as simple as adding service sidekiq restart as a step in our deployment script.

Adding a sidekiq.service file

The first step in telling systemd about Sidekiq is to put a sidekiq.service file in a certain directory. Which directory depends on which Linux distribution you’re using. I use Ubuntu so my file goes at /lib/systemd/system/sidekiq.service.

For the contents of the file you can use this sidekiq.service file from the Sidekiq repo as a starting point.

Important modifications to make to sidekiq.service

Change WorkingDirectory from /opt/myapp/current to the directory where your app is served from.

Change User=deploy and Group=deploy to whatever user and group you use for deployment.

If you’re using a version of Sidekiq that’s earlier than 6.0.6, you’ll need to change Type=notify to Type=simple and remove the WatchdogSec=5 line.

Bringing in environment variables

If your application depends on environment variables, which it of course almost certainly does, the Sidekiq process will need to be aware of those environment variable values in order to run.

There are multiple ways to pull environment variables into the Sidekiq service. In my case I did so by creating a file called /etc/profile.d/sidekiq_env (the name and location are arbitrary) and adding a line EnvironmentFile=/etc/profile.d/sidekiq_env to my sidekiq.service. Here’s a sample sidekiq_env file so you know how it’s formatted:

RAILS_ENV=production
RAILS_FORCE_SSL=true
RAILS_SKIP_MIGRATIONS=false
# etc

Trying out your Sidekiq service

Once you’ve done all the above you can run systemctl enable sidekiq and then service sidekiq start. If you get something other than no output, something is wrong. If you get no output, the service theoretically started successfully, although you won’t know for absolute certain until you verify by running a job while tailing the logs.

You can tail the logs by running journalctl -u sidekiq -f. I verified my Sidekiq systemd setup by watching the output of those logs while invoking a Sidekiq job.

Troubleshooting

If everything is working at this point, congratulations. In the much more likely case that something is wrong, there are a couple ways you can troubleshoot.

First of all, you should know that you need to run systemctl daemon-reload after each change to sidekiq.service in order for the change to take effect.

One way is to use the output from the same journalctl -u sidekiq -f command above. Another is to run systemctl status sidekiq and see what it says.

If Sidekiq doesn’t run properly via systemd, try manually running whatever command is in ExecStart, which, if you used the default, is /usr/local/bin/bundle exec sidekiq -e production. That command should of course work, so if it doesn’t, then there’s a clue.

But it is possible for the command in ExecStart to work and for the systemd setup to still be broken. If, for example, you have your environment variables loaded properly in the shell but you don’t have environment variables loaded properly in your sidekiq.service file, the service start sidekiq command won’t work. Examine any Rails errors closely to determine whether the problem might be due to a missing environment variable value.

Bonus: restarting Sidekiq via Ansible

If you happen to use Ansible to manage your infrastructure like I do, here’s how you can add a task to your Ansible playbooks for restarting Sidekiq.

The restart task itself looks like this:

- name: Restart sidekiq
  service:
    name: sidekiq
    state: restarted
    daemon_reload: yes

This task alone wasn’t enough for me though. I wanted to add a task to copy the sidekiq.service file. I also needed to add a task to enable the Sidekiq service.

tasks:
  - name: Copy sidekiq.service
    template:
      src: sidekiq.service
      dest: /lib/systemd/system/sidekiq.service
      force: yes
      owner: root
      group: root
      mode: 0644

  - name: Enable sidekiq
    service:
      name: sidekiq
      enabled: yes

  - name: Restart sidekiq
    service:
      name: sidekiq
      state: restarted
      daemon_reload: yes

Good luck

Good luck, and please leave a comment if you have troubles or need clarification.

How to launch an EC2 instance using Ansible

What this post covers

In this post I’m going to show what could be considered a “hello world” of Ansible + AWS, using Ansible to launch an EC2 instance.

Aside from the time required to set up an AWS account and install Ansible, you should be able to get your EC2 instance running in 20 minutes or less.

Why Ansible + AWS for Rails hosting?

AWS vs. Heroku

For hosting Rails applications, the service I’ve reached for the most in the past is Heroku.

Unfortunately it’s not always possible or desirable to use Heroku. Heroku can get expensive at scale. There are also sometimes legal barriers due to e.g. HIPAA.

So, for whatever reason, AWS is sometimes a more viable option than Herokou.

The challenges with AWS + Rails

Unfortunately once you leave Heroku and enter the land of AWS, you’re largely on your own in many ways. Unlike with Heroku, there’s not one single way to do your deployment. There are basically infinite possible ways.

One way is to deploy manually, but that has all the disadvantages you can imagine a manual solution would have, such as, most obviously, a bunch of manual work to do each time you deploy.

Another option is to use Elastic Beanstalk. Elastic Beanstalk is kind of like AWS’s answer to Heroku, but it’s not nearly as nice as Heroku, and customizations can be a little tricky/hacky.

The advantages of using Ansible

I’ve been using Ansible for the last several months both on a commercial project and for my own personal projects.

If you’re not familiar with Ansible, here’s a description from the docs: “Ansible is an IT automation tool. It can configure systems, deploy software, and orchestrate more advanced IT tasks such as continuous deployments or zero downtime rolling updates.”

Ansible is often seen mentioned with similar tools like Puppet and Chef. I went with Ansible because among Ansible, Puppet, and Chef, Ansible had documentation that I could actually comprehend.

I’ve personally been using Ansible for two things so far: 1) provisioning EC2 instances and 2) deploying my Rails application. When I say “provisioning” I mainly mean spinning up an EC2 instance and installing all the software on it that my Rails app needs, like PostgreSQL, Bundler, Yarn, etc.

I like using Ansible because it allows me to manage my infrastructure using (at least to an extent so far) infrastructure as code. Rather than e.g. manually installing PostgreSQL and Bundler each time I provision a new EC2 instance, I write playbooks comprised of tasks (playbook and task are Ansible terms) that say things like “install Postgresql” and “install the Bundler gem”. This makes the cognitive burden of maintenance way lower and it also makes my infrastructure setup less of a black box.

Instructions for provisioning an EC2 instance

Before you start you’ll need to have Ansible installed and of course have an AWS account available for use.

For this exercise we’re going to create two files. The first will be an Ansible playbook in a file called launch.yml which can be placed anywhere on your filesystem.

The launch playbook

Copy and paste from the content below into a file called launch.yml placed, again, wherever you want.

Make careful note of the region entry. I use us-west-2, so if you use that region also, you’ll need to make sure to look for your EC2 instance in that region and not in another one.

Also make note of the image entry. I believe EC2 images can vary from region to region, so make sure that the image ID you use does in fact exist in the region you use.

Lastly, replace my_ssh_key_name with the full path to whatever SSH key you normally use to SSH into your EC2 instances, for example, ~/.ssh/aws-key.

---
- hosts: localhost
  gather_facts: false
  vars_files:
    - vars.yml

  tasks:
    - name: Provision instance
      ec2:
        aws_access_key: "{{ aws_access_key }}"
        aws_secret_key: "{{ aws_secret_key }}"
        key_name: my_ssh_key_name
        instance_type: t2.micro
        image: ami-0d1cd67c26f5fca19
        wait: yes
        count: 1
        region: us-west-2

The vars file

Rather than hard-coding the entries for aws_access_key and aws_secret_key, which would of course be a bad idea if we were to commit our playbook to version control, we can have a separate file where we keep secret values. This separate file can either be added to a .gitignore or managed with something called Ansible Vault (which is outside the scope of this post).

Create a file called vars.yml in the same directory where you put launch.yml. This file will only need the two lines below. You’ll of course need to replace my placeholder values with your real AWS access key and secret key.

---
aws_access_key: XXXXXXXXXXXXXXXX
aws_secret_key: XXXXXXXXXXXXXXXX

The launch command

With our playbook and vars file in place, we can now run the command to execute the playbook:

$ ansible-playbook -v launch.yml

You’ll probably see a couple warnings including No config file found; using defaults and [WARNING]: No inventory was parsed, only implicit localhost is available. These are normal and can be ignored. In many use cases for Ansible, we’re running our playbooks against remote server instances, but in this case we don’t even have any server instances yet, so we’re just running our playbook right on localhost. For whatever reason Ansible feels the need to warn us about this.

Verification

If you now open up your AWS console and go to EC2, you should now be able to see a fresh new EC2 instance.

To me this sure beats manually clicking through the EC2 instance launch wizard.

Good luck, and if you have any troubles, please let me know in the comments.

How to do multi-step forms in Rails

Two kinds of multi-step forms

The creation of multi-step forms is a relatively common challenge faced in Rails programming (and probably in web development in general of course). Unlike regular CRUD interfaces, there’s not a prescribed “Rails Way” to do multi-step forms. This, plus the fact that multi-step forms are often inherently complicated, can make them a challenge.

Following is an explanation of I do multi-step forms. First, it’s helpful to understand that there are two types of them. There’s an easy kind and a hard kind.

After I explain what the easy kind and the hard kind of multi-step forms are, I’ll show an illustration of the hard kind.

The easy kind

The easy kind of multi-step form is when each step involves just one Active Record model.

In this scenario you can have a one-to-one mapping between a controller and a step in the form.

For example, let’s say that you have a form with two steps. In the first step you collect information about a user’s vehicle. The corresponding model to this first step is Vehicle. In the second step you collect information about the user’s home. The corresponding model for the second step is Home.

Your multi-step form code could involve two controllers, one called Intake::VehiclesController and the other called Intake::HomesController. (I’m making up the Instake namespace prefix because maybe we’ll decide to call this multi-step form the “intake” form.)

To be clear, the Intake::VehiclesController would exist in addition to the main scaffold-generated VehiclesController, not instead of it. Same of course with the Intake::HomesController. The reason for this is that it’s entirely likely that the “intake” controllers would have different behavior from the main controllers for those resources. For example, Intake::VehiclesController would probably only have the new and create actions, and none of the other ones like delete. Intake::VehiclesController#create action would also probably have a redirect_to that sends the user to the second step, Intake::HomesController#new, once the vehicle information is successfully collected.

To summarize the solution to the easy type of multi step form: For each step of your form, create a controller that corresponds to the Active Record model that’s associated with that step.

Again, this solution only works if your form steps and your Active Record models have a one-to-one relationship. If that’s not the case, you’re probably not dealing with the easy type of multi-step form. You’re probably dealing with the hard kind.

The hard kind

The more difficult kind of multi-step form is when there’s not a tidy one-to-one mapping between models and steps.

Let’s say, for example, that the multi-step form needs to collect user profile/account information. The first step collects first name and last name. The second step collects email and password. All four of these attributes (first name, last name, email and password) exist on the same model, User. How do we validate the first name and last name attributes in the first step when the User model wants to validate a whole User object at a time?

The answer is that we create two new concepts/objects called, perhaps, Intake::UserProfile and Intake::UserAccount. The Intake::UserProfile object knows how to validate first_name and last_name. The Intake::UserAccount knows how to validate email and password. Only after each form step is validated do we attempt to save a User record to the database.

If you found the last paragraph difficult to follow, it’s probably because I was describing a scenario that isn’t very common in Rails applications. I’m talking about creating models that don’t inherit from ActiveRecord::Base but rather that mix in ActiveModel::Model in order to gain some Active-Record-like capabilities.

All this is easier to illustrate with an example than to describe with words, so let’s get into the details of how this might be accomplished.

The tutorial

Overview

This tutorial will illustrate a multi-step form where the first step collects “user profile” information (first name and last name) and the second step collects “user account” information (email and password).

Although I’m aware of the Wicked gem, my approach doesn’t use any external libraries. I think gems lend themselves well to certain types of problems/tasks, like tasks where a uniform solution works pretty well for everyone. In my experience multi-step forms are different enough from case to case that a gem to “take out the repetitive work” doesn’t really make sense because most of the time-consuming work is unique to that app, and the rest of the work is easily handled by the benefits that Rails itself already provides.

Here’s the code for my multi-step form, starting with the user profile model.

The user profile model

What will ultimately be created in the database as a result of the user completing the multi-step form is a single User record.

For each of the two forms (again, user profile and user account) we want to be able to validate the form but we don’t necessarily want to persist the data yet. We only want to persist the data after the successful completion of the last step.

One way to achieve this is to create forms that don’t connect to an ActiveRecord::Base object, but instead connect to an ActiveModel::Model object.

If you mix in ActiveModel::Model into a plain old Ruby object, you gain the ability to plug that object into a Rails form just as you would with a regular ActiveRecord object. You also gain the ability to do validations just as you would with a regular ActiveRecord object.

Below I’ve created a class that will be used in the user profile step. I’ve called it UserProfile and put it under an arbitrarily-named Intake namespace.

module Intake
  class UserProfile
    include ActiveModel::Model
    attr_accessor :first_name, :last_name
    validates :first_name, presence: true
    validates :last_name, presence: true
  end
end

The user profile controller

This model will connect to a controller I’m calling UserProfilesController. Similar to how I put the UserProfile model class inside a namespace, I’ve put my controller class inside a namespace just so my root-level namespace doesn’t get cluttered up with the complex details of my multi-step form.

I’ve annotated the controller code with comments to explain what’s happening.

module Intake
  class UserProfilesController < ApplicationController
    def new
      # An instance of UserProfile is created just the
      # same as you would for any Active Record object.
      @user_profile = UserProfile.new
    end

    def create
      # Again, an instance of UserProfile is created
      # just the same as you would for any Active
      # Record object.
      @user_profile = UserProfile.new(user_profile_params)

      # The valid? method is also called just the same
      # as for any Active Record object.
      if @user_profile.valid?

        # Instead of persisting the values to the
        # database, we're temporarily storing the
        # values in the session.
        session[:user_profile] = {
          'first_name' => @user_profile.first_name,
          'last_name' => @user_profile.last_name
        }

        redirect_to new_intake_user_account_path
      else
        render :new
      end
    end

    private

    # The strong params work exactly as they would
    # for an Active Record object.
    def user_profile_params
      params.require(:intake_user_profile).permit(
        :first_name,
        :last_name
      )
    end
  end
end

The user profile view

Similar to how the user profile controller is very nearly the same as a “regular” controller even though no Active Record class or underlying database table is involved, the user profile form markup is indistinguishable from the code that would be used for an Active-Record-backed class.

<h2>User Profile Information</h2>

<%= form_with(model: @user_profile, url: local: true) do |f| %>
  <% if @user_profile.errors.any? %>
    <div id="error_explanation">
      <b><%= pluralize(@user_profile.errors.count, "error") %> prohibited this user profile from being saved:</b>

      <ul>
        <% @user_profile.errors.full_messages.each do |message| %>
          <li><%= message %></li>
        <% end %>
      </ul>
    </div>
  <% end %>

  <div class="form-group">
    <%= f.label :first_name %>
    <%= f.text_field :first_name, class: 'form-control' %>
  </div>

  <div class="form-group">
    <%= f.label :last_name %>
    <%= f.text_field :last_name, class: 'form-control' %>
  </div>

  <%= f.submit 'Next Step', class: 'btn btn-primary' %>
<% end %>

The user account model

The user account model follows the exact same principles as the user profile model. Instead of inheriting from ActiveRecord::Base, it mixes in ActiveModel::Model.

module Intake
  class UserAccount
    include ActiveModel::Model
    attr_accessor :email, :password
    validates :email, presence: true
    validates :password, presence: true
  end
end

The user account controller

The user account controller differs slightly from the user profile controller because the user account controller step is the last step of the multi-step form process. I’ve added annotations to this controller’s code to explain the differences.

module Intake
  class UserAccountsController < ApplicationController
    def new
      @user_account = UserAccount.new
    end

    def create
      @user_account = UserAccount.new(user_account_params)

      if @user_account.valid?

        # The values from the previous form step need to be
        # retrieved from the session store.
        full_params = user_account_params.merge(
          first_name: session['user_profile']['first_name'],
          last_name: session['user_profile']['last_name']
        )

        # Here we finally carry out the ultimate objective:
        # creating a User record in the database.
        User.create!(full_params)

        # Upon successful completion of the form we need to
        # clean up the session.
        session.delete('user_profile')

        redirect_to users_path
      else
        render :new
      end
    end

    private

    def user_account_params
      params.require(:intake_user_account).permit(
        :email,
        :password
      )
    end
  end
end

The user account view

Like the user profile view, the user account view is indistinguishable from a regular Active Record view.

<h2>User Account Information</h2>

<%= form_with(model: @user_account, local: true) do |f| %>
  <% if @user_account.errors.any? %>
    <div id="error_explanation">
      <b><%= pluralize(@user_account.errors.count, "error") %> prohibited this user account from being saved:</b>

      <ul>
        <% @user_account.errors.full_messages.each do |message| %>
          <li><%= message %></li>
        <% end %>
      </ul>
    </div>
  <% end %>

  <div class="form-group">
    <%= f.label :email %>
    <%= f.email_field :email, class: 'form-control' %>
  </div>

  <div class="form-group">
    <%= f.label :password %>
    <%= f.password_field :password, class: 'form-control' %>
  </div>

  <%= f.submit 'Save and Finish', class: 'btn btn-primary' %>
<% end %>

Routing

Lastly, we need to tie everything together with some routing directives.

Rails.application.routes.draw do
  namespace :intake do
    resources :user_profiles, only: %i[new create]
    resources :user_accounts, only: %i[new create]
  end
end

Demonstration

Below is a recording of me interacting with a multi-step form using all the code shown above.

Takeaways

There are two types of multi-step forms. The easy kind is where each step corresponds to a single Active Record model. In those cases you can use a dedicated controller per step and redirect among the steps. The harder kind is where there’s not a one-to-one mapping. In those cases you can mix in ActiveModel::Model to lend Active-Record-like behaviors to plain old Ruby objects.

In addition to the bare functionality I described above, you can imagine multi-step forms involving more sophisticated requirements, like the ability to go back to previous steps, for example. I didn’t want to clutter my tutorial with those details but I think those behaviors would be manageable enough to add on top of the underlying overall approach I describe.

As a last side note, you don’t have to use session storage to save the data from each form step until the final aggregation at the end. You could store that data any way you want, including using a full-blown Active Record class for each step. There are pros and cons to the various possible ways of doing it. One thing I like about this approach is that I don’t have any “residual” data lying around at the end that I have to clean up. I can imagine other scenarios where other approaches would be more appropriate though, e.g. multi-step forms that the user would want to be able to save and finishing filling out at a later time.

The most important thing in my mind is to keep the multi-step form code easily understandable and to make its code sufficiently modularized (e.g. using namespaces) that it doesn’t clutter up the other concerns in the application.

Git workflow anti-patterns

Here are some Git workflow anti-patterns I’ve observed in my career. This is not meant to be an exhaustive list. These are just some of the common ones I’ve come across.

And as a side note: I firmly believe that the workflows I describe in the post are the best way to do things most of the time, but at the same time I know that there’s almost never a right answer that applies to everything all the time. For example, what’s appropriate for a 20-person team isn’t appropriate for a solo developer. Keep that in mind as you’re reading.

Now let’s get into the anti-patterns.

Long-lived feature branches

I don’t like feature branches to last for more than a few days. The longer a branch lives, the more likely it is that there will be certain problems, discussed below.

My feature branches tend to last less than one day because often the entire cycle of coding, review, deployment and verification all happens inside of one day.

Problem: merge conflicts are more likely

The longer a feature branch lives, a) the more code there will be to merge and b) the more opportunity the master branch will have had to get out of sync with the feature branch.

Problem: the PR will be harder to review

This one is kind of self-explanatory. The more stuff there is to review, the harder it is to review it.

Now let’s talk about some of the upstream causes of overly long/large feature branches.

Upstream cause: feature was too big

A very common cause of a branch that’s too big is a feature that was too big. In my experience programmers and team leaders are often bad about letting user stories through that are too big and too nebulously-defined. If there’s not a super clear answer to the question, “How exactly do we know when this feature is done?” then it’s a sign the story isn’t “shovel-ready” and that it should be clarified more before someone starts working on it.

Upstream cause: ignorance of feature flags

Sometimes it seems that a long-lived branch is necessary because the feature is irreducibly large and the feature can’t be released to users until it’s all the way done.

I’ve never seen this problem stand up to an intelligent use of feature flags and a judicious sequencing of the coding work.

In my experience it’s always possible to work in a sequence more or less like this:

  1. Perform the foundational work for the feature (e.g. creation of new database tables, etc.), then merge and deploy
  2. Write the model-level code (i.e. not the UI) for the feature, then merge and deploy
  3. Build the UI for the feature, but put the UI behind a feature flag, then merge and deploy
  4. Toggle the feature flag “on” to surface the feature to users

It’s true that using feature flags adds overhead. It’s more tedious and time-consuming to use feature flags than to not. But using feature flags is less time-consuming than dealing with all the problems introduced by long-lived feature branches.

Misunderstanding the reasons for merging and deploying

Git workflows and deployment workflows are inextricably linked. The way a team handles deployments has a bearing on how it uses Git and vice versa. Let’s talk about the reasons for merging and deploying.

Deployments aren’t just for shipping code

On the surface, it seems like the reason for deploying is to deliver new value (features and bugfixes) to users. This is true, but all by itself, it’s overly simplistic. There are practical considerations involved as well.

Every deployment carries a risk of introducing a problem. The bigger the deployments, the more stuff there is to potentially go wrong. The less frequently deployments are performed, the bigger the opportunity for someone to forget a crucial step in the deployment workflow, for software rot to cause a problem with the deployment mechanisms, or for some other problem to creep in.

We do deployments to deliver value to users but also to maintain dev/prod parity. Small, frequent deployments make it less likely that any one deployment will be problematic.

Frequent merging is essential to dev/prod parity

I can’t have tight parity between dev and prod if I don’t have tight parity between my master branch and my feature branches. The delta between production and development is “code in development minus code in production”. All the code in my feature branch is “code in development”, so it all counts toward widening dev/prod parity.

To summarize: long-lived feature branches make deployments riskier.

Using a complicated Git workflow like git-flow

I’ve worked at one or two places that used git-flow. I’ve always found it painful due to its needless complexity and its sensitivity to human error.

With git-flow, you have a develop branch that exists in perpetuity alongside master. In addition there are potentially some ephemeral release and hotfix branches. All feature branches are branched off of develop are merged back into develop. At one place I worked we also had a staging branch that lived in perpetuity.

Some people can handle all this mental juggling but I can’t. I also don’t see the point since other branching models exist that are both simpler and better.

My preferred branching workflow

I like to use a branching workflow like this:

  1. Create a feature branch off master
  2. Build the feature
  3. Create a PR for that branch
  4. Merge and deploy
  5. Delete the feature branch

That’s it! No need for a “branching model” that takes a lengthy blog post to explain. Just make a branch, merge the branch, then delete the branch. It’s very similar to GitHub’s Git workflow.

Key takeaways

  • Don’t take on stories that you think will take more than a few days to build. If a feature seems bigger, break it down into several smaller stories.
  • Don’t create branches that last more than a few days. If the feature seems irreducibly large, consider using feature flags and a smart sequencing of the development work to release the feature in smaller chunks.
  • Deploy frequently, preferably every time you merge to master. Deployment workflows and Git workflows are deeply linked.
  • Use a simple branching workflow like GitHub flow.

The broken fax and the duped programmer

The big infrastructure switchover

For the last few weeks at work I’ve been working on a project to move our AWS infrastructure off of Elastic Beanstalk and onto a more custom architecture using Ansible.

Today was the riskiest point of the project. The reason is that this last Saturday I performed the big switchover where I pointed our main DNS record at the new infrastructure, leaving our old infrastructure behind. Today, Monday, is the first day that the new infrastructure is being used in production.

A problem with faxes

Our system has a component that sends faxes. It’s a relatively “risky” feature because the successful transmission of a fax involves a background job, a cron job, and a third-party API. Anytime I mess with infrastructure changes, the fax component is the first area I test for breakage.

This morning, around open of business, I got an error notification email from Bugsnag. The error was one I had never seen before. It was an error about not being able to send a fax because the recipient number was missing.

It was probably (probably) the infrastructure

My initial reaction was of course to put two and two together. The fax component is particularly sensitive to infrastructure changes. Today is the first day after a big infrastructure change. Therefore, it’s highly likely that the infrastructure change caused the fax bug.

However, to my admitted surprise, the infrastructure change was not the cause of the bug. The fax component was working just as well as it always had. The never-seen-before error appeared today not because the infrastructure change broke something but because of coincidental timing.

The fax component was working just as well as it always had. The error would have appeared anytime the fax recipient number was missing. It just so happened that the first time the fax recipient number was ever missing was also the first day the new infrastructure was being used in production.

My takeaway

I know the difference between a likelihood and a certainty. Yet I allowed myself to get fooled because I was sloppy with my application of logic.

I should have slowed down and said, “Yes, the infrastructure change is almost certainly the reason for the new bug, but that’s not the same thing as it being certainly the reason for the bug. There is a non-zero chance that the culprit is something else.

And in this case the culprit was in fact something else.

When used intelligently, Rails concerns are great

Is a knife a good thing?

There seem to be two main camps regarding Rails concerns. One camp says concerns are good. The other camp says concerns are bad.

I reject this dichotomy. To state what ought to be obvious, not everything is either good or bad. One example is a knife.

You could use a knife to stab me in the leg, or you could use a knife to whittle me a beautiful wooden sculpture.

Or, if you were uninformed regarding good uses for knives, you could use a knife instead of a spoon in a misguided attempt to eat a bowl of soup. (Then you could write a blog post about why knives are terrible because you can’t eat soup with them.)

I want to discuss where concerns might be a good idea and where they might not be. In order to do this it might be helpful to start by articulating which types of problems people tend to use concerns to try to solve.

The problems that people try to solve with concerns

There are two problems I’m aware of that people try to use concerns to solve.

  1. Models that are too big to be easily understood
  2. Duplication across models

Both these problems could be solved in a number of ways, including by using concerns.

Another way could be to decompose the model object into a number of smaller plain old Ruby objects. Yet more options include service objects or Interactors. All these approaches have pros and cons and could be used appropriately or inappropriately.

The objections some programmers have to concerns

Here are some objections to concerns that I’ve read. Some I agree with. Some I don’t think are valid.

Concerns are inheritance, and composition is better than inheritance

I totally agree with this one. An often-repeated piece of advice in OOP is “prefer composition over inheritance”. I think it’s good advice.

Having said that, the advice is not “don’t ever use inheritance!” There are a lot of cases where inheritance is more appropriate than composition, even if inheritance isn’t the first thing I reach for. In the same way, the argument that “concerns are inheritance and composition is better than inheritance” does not translate to “concerns are always bad”.

Concerns don’t necessarily remove dependencies, they just spread them across multiple files

I agree with this point as well. It’s totally possible to slice a model up into several concerns and end up with the result that the “after” code is actually harder to understand than the “before” code.

My response to this point would be: If you’re going to use concerns, don’t use them like that!

Concerns create circular dependencies

As far as I can reason, this is true. A well-defined inheritance relationship doesn’t create a circular dependency because the parent class doesn’t know anything about the child class. Similarly, in a well-defined composition relationship, the “finer-grained” class doesn’t know about the “coarser-grained” one.

Concerns often don’t work this way. Concerns often refer to the particulars of the models that use them.

But I would say: Why, exactly, is this sort of circular dependency bad? And if I choose not to use a concern in a particular case because I want to avoid a circular dependency, what exactly my alternative? The drawbacks of whatever I use instead of a concern aren’t automatically guaranteed not to be worse than the drawbacks of using a concern.

When code is spread across multiple files, it can be unclear where methods are defined

This argument has a very small amount of merit. This argument could be made equally for not just concerns but for inheritance as well. So it’s really not an argument against concerns specifically, it’s an argument against concerns and inheritance equally.

But in any case, the problem can be easily overcome using search.

What I do when I encounter fat or duplicative models

When I encounter a fat or duplicative model, concerns are pretty much never the first solution that enters my mind.

My first instinct, again, is to consider decomposing my model into some number of plain old Ruby objects. I think sometimes Rails developers forget that plain old OOP, using plain old objects, can get you a very long way.

I often find that a large class will have a “hidden abstraction” inside it. There’s some sub-concept hiding in the class that doesn’t have a name or a cohesive bundle of code, yet it’s there.

In those cases I try to come up with a name for the hidden abstraction. Then I create a class named after that abstraction and move the appropriate code into my new class.

I don’t always find a hidden abstraction though. Sometimes some of the bloat is just some extra flab, a love handle if you will, that neither fits neatly into the model it’s in nor makes sense as a standalone concept. This is when I (sometimes) think of using a concern.

When I use concerns

In DHH’s post about concerns he says “Concerns are also a helpful way of extracting a slice of model that doesn’t seem part of its essence” (emphasis mine).

Good use cases for concerns

To me, the perfect use case for a concern is when I have, say, 10 methods on a model that have a high level of cohesion with one another and then 2 methods that relate to each other but (to use DHH’s words again) don’t seem to be part of the model’s essence.

If those 2 odd-man-out methods could make a new object of their own, great. I’ll do that. If not, I might consider moving them to a concern.

And of course, if multiple models require highly symmetrical behavior (e.g. some certain file attachment behavior), I’d usually rather factor that symmetrical behavior into a concern than to an inheritance tree. And usually a concern in these cases fits much more tidily over the models than trying to create a composition relationship out of it.

Lastly, I don’t want to give the impression that I use concerns all over the place. In general when I’m programming, concerns are among the last things I reach for, not the first.

My advice concerning concerns

I think concerns are fine and, in certain scenarios, great. That of course doesn’t mean I think concerns are the best solution to every problem.

When faced with any programming problem, take stock of the tools in your toolbox and use what seems appropriate. Object composition can be good. Inheritance can be good. Concerns can be good too.