Category Archives: Programming

How to Dockerize a Sinatra application

Why we’re doing this

Docker is difficult

In my experience, Dockerizing a Rails application for the first time is pretty hard. Actually, doing anything with Docker seems pretty hard. The documentation isn’t that good. Clear examples are hard to find.

Dockerizing Rails is too ambitious as a first goal

Whenever I do anything for the first time, I want to do the simplest, easiest possible version of that thing before I try anything more complicated. I also never want to try to learn more than one thing at once.

If I try to Dockerize a Rails application without any prior Docker experience, then I’m trying to learn the particulars of Dockerizing a Rails application while also learning the general principles of Docker at the same time. This isn’t a great way to go.

Dockerizing a Sinatra application gives us practice

Dockerizing a Sinatra application lets us learn some of the principles of Docker, and lets us get a small Docker win under our belt, without having to confront all the complications of Dockerizing a Rails application. (Sinatra is a very simple Ruby web application framework.)

After we Dockerize our Sinatra application we’ll have a little more confidence and a little more understanding than we did before. This confidence and understanding will be useful when we go to try to Dockerize a Rails application (which will be a future post).

By the way, if you’ve never worked with Sinatra before, don’t worry. No prior Sinatra experience is necessary.

What we’re going to do

Here’s what we’re going to do:

  1. Create a Sinatra application
  2. Run the Sinatra application to make sure it works
  3. Dockerize the Sinatra application
  4. Run the Sinatra application using Docker
  5. Shotgun a beer in celebration (optional)

I’m assuming you’re on a Mac and that you already have Docker installed. If you don’t want to copy/paste everything, I have a repo of all the files here.

(Side note: I must give credit to Marko Anastasov’s Dockerize a Sinatra Microservice post, from which this post draws heavily.)

Let’s get started.

Creating the Sinatra application

Our Sinatra “application” will have just one file. The application will have just one endpoint. Create a file called hello.rb with the following content.

# hello.rb

require 'sinatra'

get '/' do
  'It works!'
end

We’ll also need to create a Gemfile that says Sinatra is a dependency.

# Gemfile

source 'https://rubygems.org'

gem 'sinatra'

Lastly for the Sinatra application, we’ll need to add the rackup file, config.ru.

# config.ru

require './hello'

run Sinatra::Application

After we run bundle install to install the Sinatra gem, we can run the Sinatra application by running ruby hello.rb.

$ bundle install
$ ruby hello.rb

Sinatra apps run on port 4567 by default, so let’s open up http://localhost:4567 in a browser.

$ open http://localhost:4567

If everything works properly, you should see the following.

Dockerizing the Sinatra application

Dockerizing the Sinatra application will involve two steps. First, we’ll create a Dockerfile will tells Docker how to package up the application. Next we’ll use our Dockerfile to build a Docker image of our Sinatra application.

Creating the Dockerfile

Here’s what our Dockerfile looks like. You can put this file right at the root of the project alongside the Sinatra application files.

# Dockerfile

FROM ruby:2.7.1

WORKDIR /code
COPY . /code
RUN bundle install

EXPOSE 4567

CMD ["bundle", "exec", "rackup", "--host", "0.0.0.0", "-p", "4567"]

Since it might not be clear what each part of this file does, here’s an annotated version.

# Dockerfile

# Include the Ruby base image (https://hub.docker.com/_/ruby)
# in the image for this application, version 2.7.1.
FROM ruby:2.7.1

# Put all this application's files in a directory called /code.
# This directory name is arbitrary and could be anything.
WORKDIR /code
COPY . /code

# Run this command. RUN can be used to run anything. In our
# case we're using it to install our dependencies.
RUN bundle install

# Tell Docker to listen on port 4567.
EXPOSE 4567

# Tell Docker that when we run "docker run", we want it to
# run the following command:
# $ bundle exec rackup --host 0.0.0.0 -p 4567.
CMD ["bundle", "exec", "rackup", "--host", "0.0.0.0", "-p", "4567"]

Building the Docker image

All we need to do to build the Docker image is to run the following command.

I’m choosing to tag this image as hello, although that’s an arbitrary choice that doesn’t connect with anything inside our Sinatra application. We could have tagged it with anything.

The . part of the command tells docker build that we’re targeting the current directory. In order to work, this command needs to be run at the project root.

$ docker build --tag hello .

Once the docker build command successfully completes, you should be able to run docker images and see the hello image listed.

Running the Docker image

To run the Docker image, we’ll run docker run. The -p 80:4567 portion says “forward port 4567 to port 80”. This is necessary because Sinatra runs on port 4567 but, for some reason, Docker can apparently only expose port 80.

$ docker run -p 80:4567 hello

If we visit http://localhost (not http://localhost:4567, just http://localhost) we should see the Sinatra application being served.

$ open http://localhost

Conclusion

Congratulations. You now have a Dockerized Ruby application!

With this experience behind you, you’ll be better equipped to Dockerize a Rails application the next time you try to take on that task.

Why validation matchers are the only Shoulda matchers I use

One of the testing questions I commonly get is about Shoulda matchers. People ask if I use Shoulda matchers and if Shoulda matchers are a good idea.

I’ll share my thoughts on this. First I’ll explain what Shoulda is, then I’ll explain why the only Shoulda matchers I use are validation matchers.

What Shoulda is

If you’re unfamiliar with Shoulda matchers, the premise, from the GitHub description, is: “Shoulda Matchers provides RSpec- and Minitest-compatible one-liners to test common Rails functionality that, if written by hand, would be much longer, more complex, and error-prone.”

A few examples of specific Shoulda matchers are validates_presence_of (expects that a model attribute has a presence validator), have_many (expects that a has_many association exists), and redirect_to (expects that a redirection takes place).

I like the idea of a library that can clean up a lot of my repetitive test code. Unfortunately, the majority of Shoulda matchers only apply to the kinds of tests I would never write.

Test behavior, not implementation

To me it doesn’t make much sense to, for example, write a test that only checks for the presence of an Active Record association and doesn’t do anything else.

If I have an association, presumably that association exists in order to enable some piece of behavior, or else it would be pointless for the association to exist. For example, if a User class has_many :posts, then that association only makes sense if there’s some Post-related behavior.

So there are two possibilities in light of testing that the User class has_many :posts. One is that I write a test for both the association itself and the behavior enabled by the association, in which case the test for the association is redundant and adds no value. The other possibility is that I write a test only for the post association, but not for the post behavior, which wouldn’t make much sense because why wouldn’t I write a test for the post behavior?

To me it only makes sense in this example to write tests for the post behavior and write no tests directly for the association. The logic of this decision can be proved by imagining what would happen if the has_many :posts line were removed. Any tests for the post behavior would start failing because the behavior would be broken without the association line present.

Why validation matchers are different

I mentioned at the top of the post that validation matchers are the only Shoulda matchers I use. The reason I do use Shoulda matchers is because validations aren’t just a means to an end, they’re an end in themselves. In other words, validations are a feature.

The alternatives to using Shoulda matchers to check for validations are to not write validation tests at all or to write really repetitive tests for validations. Both those alternatives seem bad to me, so Shoulda matchers it is.

Lessons I learned converting all my database IDs to UUIDs

The motivation

About seven months ago I became aware of a slightly worrisome problem in the application I maintain at work.

The application is a medical application. Each patient in the system has a numeric, incremental account number like 4220. Due to coincidental timing of business and technology, the database ID of each patient is pretty close to the account number. The patient with account number 4220 might have a database ID of something like 4838.

You can imagine how this might cause confusion. Imagine wanting to check which patient you’re viewing and you see 4220 on the screen and patients/4838 in the URL. If you’re not paying close attention it can be confusing.

I brought this issue up with my boss. Turns out my boss had actually gotten tripped up on this issue himself. I brought up the option of switching from numeric IDs to UUIDs as a way to fix the problem and he agreed that it was a good idea. I also brought up my judgment that this would likely be a large, risky, time-consuming project, but we both agreed that the risks and costs associated with the UUID change were less than the risks and costs associated with leaving things the way they were.

The approach

Acknowledging the risk of the project, I decided to distribute the risk in small pieces over time so that no single change would carry too much risk.

Rather than trying to convert all my 87 tables to UUIDs at once, which would be way too risky, I decided to convert 1-10 tables at a time, starting with just one and ramping it up over time as the level of uncertainty decreased.

The planned cadence was once per week, spreading the work across 20-40 weeks, depending on how many tables could be converted in a batch. I applied each UUID change on a Saturday morning, although I didn’t start off doing them Saturdays. I was prompted to start doing it this way after one of the early migrations caused a problem. This brings me to my first lesson.

Lesson 1: apply UUID changes off-hours with a significant time buffer

My first mistake was that I didn’t properly calibrate my methodology to the true level of risk involved.

For one of the earlier UUID migrations I performed, I did it maybe an hour and a half before open of business while no one was doing anything. Unfortunately, something went wrong with the migration, and I didn’t have time to fully fix the problem (nor did I have time to roll it back) before people started using the system.

This incident showed me that I needed a bigger buffer time.

If I wanted to perform the migrations off-hours, my options were to a) perform the migrations on weekdays between about 10pm and 6am or b) perform the migrations on weekends. I decided to go with weekends.

In addition to adding buffer time, I added a way for me to conveniently put the app into “maintenance mode” which would block all users except me from using the app while maintenance mode was on.

Since the time I added these changes to my change process, there have been no UUID-related incidents.

Lesson 2: beware of subtle references

When a table’s IDs get converted to UUIDs, all the references to that table of course need to get updated too. A table called customers needs to have every customer_id column updated in order for the association to be preserved.

This is easy enough when the referencing column’s name matches the referenced table’s name (e.g. customer_id matches customers) and/or when a foreign key constraint is present, making it physically impossible to forget about the association. It’s harder when neither a matching column name nor a foreign key constraint exists to clue you in.

You know that incident I mentioned in Lesson 1? That incident was caused by my failure to detect active_storage_attachments.record_id as a column that needed its values changed from numeric IDs to UUIDs. This caused a peculiar bug where most of the file uploads in the system started to appear on records to which they did not belong. Not always, just sometimes. The random behavior had do do with the fact that an expression like '123abc'.to_i evaluates to 123 while an expression like 'abc123'.to_i evaluates to 0.

Anyway, the fix to that issue was conceptually straightforward enough once I knew what was happening, even if the fix was a little tedious.

At this point you might wonder whether good test coverage might have caught such a bug. I must have been taking stupid pills on the day this incident happened because, in addition to everything else I did wrong, I ran the test suite but didn’t bother to check the results before I deployed, even though I knew my change was particularly risky. If I had checked the test suite results I would have seen that there were a few failing tests related to file uploads. I haven’t made that mistake since.

Lesson 3: keep the numeric IDs around

For the first several tables I converted, it didn’t occur to me that I might want to keep the numeric IDs around for any reason. Two things that happened later showed me that it is in fact a good idea to keep the numeric IDs around.

First, keeping the numeric IDs makes for easier rollback. The incident from Lesson 1 could have been rolled back fairly trivially if I had just kept the numeric ID column.

Second, sometimes an incremental ID is handy. I recently built a feature that lets patients pay their statements online. The patients can find their statement by entering their statement ID. It’s not realistic to ask 80 year-old un-tech-savvy people to enter values like CB08B2 where they have to make the distinction between a zero and a letter O. So for that feature I used the numeric, sequential ID of the statements to show to patients.

Lesson 4: there are a lot of special cases

As of this writing I’ve converted 37 tables from numeric IDs to sequential IDs. These were done, again, in small batches of 1 to 10, usually more like 1 to 3.

Almost every batch presented me with some small, new, unique obstacle. Most of these obstacles were too small and uninteresting for it to be useful for me to list every one. For example, I learned on one batch that if a view (that is, database view, not Rails view) references the table I’m changing, then the view needs to be dropped before I make the UUID change and then re-created afterward. There was a lot of little stuff like that.

For this reason, I didn’t have much success with the gems that exist to assist with UUID migrations. Everything I did was custom. It wasn’t much code, though, mostly just a script that I used for each migration and added to each time I encountered a new issue. The current total size of the script is 91 lines.

Lesson 5: performance is fine (for me)

I’ve read some comments online that say UUIDs are bad for performance. Some cursory research tells me that, at least for PostgreSQL, this isn’t really true, at least not enough to matter.

I’ve also experienced no discernible performance hits as a result of converting tables from incremental primary keys to UUID primary keys.

Takeaways

If I had a time machine, here’s some advice I would give my past self.

  • Apply the UUID changes off-hours with a big buffer.
  • For each change, look really carefully for referencing columns.
  • Keep the numeric IDs around.
  • Expect to hit a lot of little snags along the way.

All in all this project has been worth it. Only one production incident was caused. The incident was recovered from relatively easily. The work hasn’t been that painful. And there’s a great satisfaction I get from looking at patient URLs and knowing they will never cause confusion with patient account numbers again.

Why use Factory Bot instead of creating test data manually?

I recently taught a Rails testing class where we wrote tests using RSpec, Capybara, Factory Bot and Faker.

During the class, one of the students asked, why do we use Factory Bot? What’s the advantage over creating test data manually?

The answer to this is perhaps most easily explained with an example. I’m going to show an example of a test setup that creates data manually, then a test setup that uses Factory Bot, so you can see the difference.

In both examples, the test setup is for a test that needs two Customer records to exist. A Customer object has several attributes and a couple associations.

Non-Factory-Bot example

Here’s what it looks like when I create my two Customer records manually.

The upside to this approach is that there’s no extra tooling involved. It’s just Active Record. There are a few downsides though.

First, it’s wasteful and tedious to have to think of and type out all this fake data (e.g. “555-123-4567”).

Second, it’s going to be unclear to an outside reader what data is significant and what’s just arbitrary. For example, is it significant that John Smith lives in Minneapolis or could his address have been anywhere?

Third, all this test data adds noise to the test and makes it an Obscure Test. When there’s a bunch of test data in the test it makes it much harder to tell at a glance what the essence of the test is and what behavior it’s testing.

This example is tedious enough with only two associations on the Customer model (State and User). You can imagine how bad things might get when you have truly complex associations.

RSpec.describe Customer do
  before do
    @customers = Customer.create!([
      {
        first_name: 'John',
        last_name: 'Smith',
        phone_number: '555-123-4567',
        address_line_1: '123 Fake Street',
        city: 'Minneapolis',
        state: State.create!(name: 'Minnesota', abbreviation: 'MN'),
        zip_code: '55111',
        user: User.create!(
          email: 'john.smith@example.com',
          password: 'gdfkgfgasdf18233'
        )
      },
      {
        first_name: 'Kim',
        last_name: 'Jones',
        phone_number: '555-883-2283',
        address_line_1: '338 Notreal Ave',
        city: 'Chicago',
        state: State.create!(name: 'Illinois', abbreviation: 'IL'),
        zip_code: '60606',
        user: User.create!(
          email: 'kim.jones@example.com',
          password: 'eejkgsfg238231188'
        )
      }
    ])
  end
end

Factory Bot example

Here’s a version of the test setup that uses Factory Bot. It achieves the same result, the creation of two Customer records. The code for this version is obviously much more concise.

RSpec.describe Customer do
  before do
    @customers = FactoryBot.create_list(:customer, 2)
  end
end

This simple code is made possible through factory definitions. In this case there are three factory definitions: one for Customer, one for State and one for User.

In all three of the factory definitions I’m using an additional gem called Faker. Faker helps with the generation of things like random names, phone numbers, email addresses, etc.

Here are the three factory definitions.

FactoryBot.define do
  factory :customer do
    first_name { Faker::Lorem.characters(10) }
    last_name { Faker::Lorem.characters(10) }
    phone_number { Faker::PhoneNumber.cell_phone }
    address_line_1 { Faker::Lorem.characters(10) }
    city { Faker::Lorem.characters(10) }

    # This line will generate an associated State record
    # using the factory definition for State
    state

    # Same here, but for User
    user
  end
end

FactoryBot.define do
  factory :state do
    name { Faker::Lorem.characters(10) }
    abbreviation { Faker::Lorem.characters(10) }
  end
end

FactoryBot.define do
  factory :user do
    email { Faker::Internet.email }
    password { Faker::Internet.password }
  end
end

Takeaways

If you were wondering why exactly we use Factory Bot, the answer is that it makes our tests more convenient to write and more understandable to read. In addition to Factory Bot, the Faker gem can help take away some of the tedium of having to create test data values.

There’s also one other popular method of creating test data which is to use fixtures. Fixtures have the advantage of speeding up a test suite because they’re only loaded once at the beginning of the test suite run (as opposed to factories which are typically run once per test) but I prefer factories because I feel they make tests easier to understand. You can read more about fixtures vs. factories here.

How I set up Factory Bot on a fresh Rails project

A reader of mine recently asked me how I set up Factory Bot for a new Rails project.

There are four steps I go through to set up Factory Bot.

  1. Install the factory_bot_rails gem
  2. Set up one or more factory definitions
  3. Install Faker
  4. Add the Factory Bot syntax methods to my rails_helper.rb file

Following are the details for each step.

Install the factory_bot_rails gem

The first thing I do is to include the factory_bot_rails gem (not the factory_bot gem) in my Gemfile. I include it under the :development, :test group.

Here’s a sample Gemfile from a project with only the default gems plus a few that I added for testing.

Remember that after you add a gem to your Gemfile you’ll need to run bundle install in order to actually install the gem.

source 'https://rubygems.org'
git_source(:github) { |repo| "https://github.com/#{repo}.git" }

ruby '2.7.0'

# Bundle edge Rails instead: gem 'rails', github: 'rails/rails'
gem 'rails', '~> 6.0.2', '>= 6.0.2.2'
# Use postgresql as the database for Active Record
gem 'pg', '>= 0.18', '< 2.0'
# Use Puma as the app server
gem 'puma', '~> 4.1'
# Use SCSS for stylesheets
gem 'sass-rails', '>= 6'
# Transpile app-like JavaScript. Read more: https://github.com/rails/webpacker
gem 'webpacker', '~> 4.0'
# Turbolinks makes navigating your web application faster. Read more: https://github.com/turbolinks/turbolinks
gem 'turbolinks', '~> 5'
# Build JSON APIs with ease. Read more: https://github.com/rails/jbuilder

gem 'devise'

# Reduces boot times through caching; required in config/boot.rb
gem 'bootsnap', '>= 1.4.2', require: false

group :development, :test do
  gem 'pry'
  gem 'rspec-rails'
  gem 'capybara'
  gem 'webdrivers'
  gem 'factory_bot_rails'
end

group :development do
  # Access an interactive console on exception pages or by calling 'console' anywhere in the code.
  gem 'web-console', '>= 3.3.0'
  gem 'listen', '>= 3.0.5', '< 3.2'
  # Spring speeds up development by keeping your application running in the background. Read more: https://github.com/rails/spring
  gem 'spring'
  gem 'spring-watcher-listen', '~> 2.0.0'
end

# Windows does not include zoneinfo files, so bundle the tzinfo-data gem
gem 'tzinfo-data', platforms: [:mingw, :mswin, :x64_mingw, :jruby]

Set up one or more factory definitions

Factory definitions are kind of the “templates” that are used for generating new objects.

For example, I have a user object that needs an email and a password, then I would create a factory definition saying “hey, make me a user with an email and password”. The actual code might look like this:

FactoryBot.define do
  factory :user do
    email { 'test@example.com' }
    password { 'password1' }
  end
end

Factory Bot is smart enough to know that when I say factory :user do, I’m talking about an Active Record class called User.

There’s a problem with this way of defining my User factory though. If I have a unique constraint on the users.email column in the database (for example), then I won’t ever be able to generate more than one User object. The first user’s email address will be test@example.com (no problem so far) but then when I go to create a second user, its email address will also be test@example.com, and if I have a unique constraint on users.email, the creation of this second record will not be allowed.

We need a way of making it so the factories’ values can be unique. One way, which I’ve done before, is to append a random number to the end of the email address, e.g. "test#{SecureRandom.hex}@example.com". There’s a different way to do it, though, that I find nicer. That way is to use another gem called Faker.

Install Faker

Just like I showed with factory_bot_rails above, the Faker gem can be added by putting it into the :development, :test group of the Gemfile.

Then we can change our User factory definition as follows.

FactoryBot.define do
  factory :user do
    email { Faker::Internet.email }
    password { Faker::Internet.password }
  end
end

This will give us random values like eldora@jones.net and lazaromertz@ko.name.

Add the Factory Bot syntax methods to my rails_helper.rb file

The syntax for actually using a Factory Bot factory in a test is as follows:

FactoryBot.create(:user)

There’s nothing wrong with this, but I find that these FactoryBot are so numerous in my test files that their presence feels a little noisy.

There’s a way to make it so that instead we can just write this:

create(:user)

The way to do that is to add a bit of code to spec/rails_helper.rb.

RSpec.configure do |config|
  config.include FactoryBot::Syntax::Methods
end

(You don’t actually add the RSpec.configure do |config| to the spec/rails_helper.rb file. It’s already there. I’m just including it here to show that that’s the block inside of which the config.include FactoryBot::Syntax::Methods line goes.)

What to do next

If you’re curious how to put Factory Bot together with the other testing tools to write some complete Rails tests, I might suggest my Rails testing “hello world” tutorial using RSpec and Capybara.

The difference between system specs and feature specs

If you’re like me, you might have found the difference between RSpec’s “feature specs” and “system specs” to be a little too nuanced to be easily understandable. Here’s my explanation of the difference between the two.

Two levels of Rails tests

For some background, I want to talk about the main two types of Rails tests that I use, and where feature specs, the predecessor to system specs, come intro the picture.

I think of Rails tests as existing on two basic levels.

High-level, course-grained

One level of tests is at a high level. These tests simulate a user visiting pages, filling in forms, clicking links and buttons, etc. Different people use different terms for these tests including “integration test”, “end-to-end test” and “acceptance test”.

Because these types of tests are often expensive to write and run, they tend not to cover every nook and cranny of the application’s behavior.

In RSpec terminology, these types of tests have historically been called feature specs.

Low-level, fine-grained

The other level of tests is at a lower level. There are all kinds of tests you could possibly write with Rails/RSpec (model specs, request specs, view specs, helper specs) but I tend to skip most of those in most scenarios and only write model specs.

From feature spec to system spec

The backstory

When Rails 5.1 came out in April 2017, one of the features it introduced was system tests. Here’s why this was done.

By default Rails ships with Minitest. The inclusion of Minitest provides a way to write model tests and certain other kinds of tests, but it historically hasn’t provided a way to write full-blown end-to-end without doing some extra work yourself, like bringing Capybara and Database Cleaner into the picture. This is the rationale for the addition of system tests, in my understanding.

System specs wrap system tests

According to the RSpec docs, “System specs are RSpec’s wrapper around Rails’ own system tests.” This means that it’s no longer required to explicitly include the Capybara gem, and because system tests are already run inside a transaction, you don’t need Database Cleaner.

The syntactical difference

Here are two examples from the RSpec docs for feature specs and for system specs. Hopefully the syntactical difference is clear enough to spot without explanation.

(Note: it’s been pointed out to me by Reddit user jrochkind that the “feature/scenario” and “describe/it” syntaxes are interchangeable and independent of system specs vs. features specs. Apparently the RSpec team has developed a preference for the “describe/it” version.)

Feature spec example

require "rails_helper"

RSpec.feature "Widget management", :type => :feature do
  scenario "User creates a new widget" do
    visit "/widgets/new"

    fill_in "Name", :with => "My Widget"
    click_button "Create Widget"

    expect(page).to have_text("Widget was successfully created.")
  end
end

System spec example

require "rails_helper"

RSpec.describe "Widget management", :type => :system do
  it "enables me to create widgets" do
    visit "/widgets/new"

    fill_in "Name", :with => "My Widget"
    click_button "Create Widget"

    expect(page).to have_text("Widget was successfully created.")
  end
end

Summary

System specs are a wrapper around Rails’ system tests. The benefits of using system specs instead of feature specs is that you don’t have to explicitly include the Capybara gem, nor do you have to use Database Cleaner. If you prefer the new syntax (as I do) which more closely matches the RSpec syntax for all the other RSpec spec types, then that’s an additional benefit as well.

My method of systematic troubleshooting

Why systematic troubleshooting is valuable

Computers are complicated, programming is complicated, and the problems we programmers have to solve are often complicated.

Because human brains are only so powerful, and because humans are susceptible to logical fallacies, it’s very important to have a systematic approach to troubleshooting if we want to have a hope of solving our problems and solving them in a timely manner.

If you don’t have a good methodology, you’ll guess and flail and get frustrated and ultimately probably fail to fix the issue. If you do have a good methodology, almost any problem will eventually collapse under the crushing weight of your capabilities.

Here are some questions I tend to ask myself when troubleshooting technical issues.

Do I know, with certainty, exactly what’s wrong?

Don’t get fooled

Most of the time, when I’m presented with a problem, I don’t know exactly what’s wrong at first. It’s of course impossible to fix a problem if I don’t know what needs fixing.

The challenge here is not to fool yourself into thinking the problem is something other than what it really is.

I’ve been guilty on a number of occasions of taking a bug report at face value and then discovering later that the reporter of the bug was mistaken and that the bug was something else. If someone tells me “we’re not able to charge American Express cards in the system,” I should translate that statement most of the time to “something seems to be wrong with credit card payments”.

State only what you know for sure to be true

Here’s a quote from Zen and the Art of Motorcycle Maintenance (using motorcycles as the subject of investigation rather than computers):

It is much better to enter a statement “Solve Problem: Why doesn’t cycle work?” which sounds dumb but is correct, than it is to enter a statement “Solve Problem: What is wrong with the electrical system?” when you don’t absolutely know the trouble is in the electrical system. What you should state is “Solve Problem: What is wrong with cycle?” and then state as the first entry of Part Two: “Hypothesis Number One: The trouble is in the electrical system.” You think of as many hypotheses as you can, then you design experiments to test them to see which are true and which are false.

Good advice. (And a good book.)

How can I narrow the scope of my investigation?

Most problems are too big

With most problems, there are so many variables interacting with each other that it’s impossible to model the whole situation in my head. So I try to see what I can eliminate from the picture.

Let’s say I’m trying to deploy a Rails application to AWS. Some of my deployment worked but I can’t connect to my RDS instance (that is, my database instance).

In this case, my EC2 instance can’t connect to my RDS instance. Or, more specifically, the Rails app on my EC2 instance can’t connect to my RDS instance. How do I know whether the problem lies with my RDS instance, my EC2 instance, my Rails app, all of these, none of these, or some combination? That’s a lot of possibilities.

Narrowing it down

So rather than trying to tackle the whole problem, I’ll narrow it down to just the RDS instance. Is anything wrong with the RDS instance? How can I interrogate the RDS instance without bringing all the other parts into the picture?

One thing I can do in this case is to use the PostgreSQL CLI client (that is to say, the psql command) on my laptop (not on the EC2 instance, on my laptop) to try to connect to the RDS instance. This way I’m not involving my EC2 instance, I’m not involving environment variables, I’m not involving Rails, I’m only dealing with the RDS instance.

I can run the command psql my-db-name -U postgres -h my-rds-hostname and see what happens. If I can connect to my database that way and run a query, then I can be sure that my database name, my password and my hostname are all good. If I get prompted for a password but it tells me the password is wrong, then I know the problem is my password. If I don’t even get prompted for a password, then something else is wrong.

If I don’t know what’s wrong, in what areas might the problem lie and what tests can I perform to get a yes/no in each area?

List hypotheses

Before I actually get to work investigating a problem, I’ll usually list a number of hypotheses that I can test.

Here are some hypotheses for the RDS connection problem.

  • I’m using the wrong database credentials
  • I have the URL for my RDS instance wrong
  • The environment variables for my database credentials, RDS URL, etc., aren’t even set
  • My RDS instance’s security group is set to block traffic
  • Something else is wrong that I can’t think of

Test the hypotheses

Then I’ll try to get an answer to each question. I might start with what’s easiest to test or I might start with what I think is most likely to be the culprit.

I already shared one way of checking the second hypothesis, “I have the URL for my RDS instance wrong”. If I believe I have my RDS instance URL right, or if my tests in that area are inconclusive, I might try testing a different hypothesis.

When working with AWS, I find it easy to accidentally make my security groups (i.e. the sets of rules that control what types of traffic various entities can receive, like HTTP traffic, SSH traffic, etc.) too restrictive. PostgreSQL requests typically travel on port 5432, so if I don’t have port 5432 open for my RDS instance, I won’t be able to connect on port 5432.

To see if this is the issue I can use an open port checker to hit my RDS URL at port 5432. Again, this test tests only one thing at a time, which means if it doesn’t work, I can be pretty sure exactly what doesn’t work. If I perform a test that could have a large number reasons for returning negative, I haven’t learned very much. Simple, narrow tests are the most useful kinds of tests to perform.

Have I tried everything that can possibly be tried? If not, what haven’t I tried yet?

This is a question I often ask myself when I get stuck. It might sound like a dumb question but it turns out to be a productive question to ask a surprising portion of the time.

Persistence conquers all things

The point of this question is to facilitate persistence. It’s basically true that every problem is solvable and that the only way to fail is to give up. After all, if you understood everything about the problem and all the parts surrounding it, you would know exactly how to fix the problem, and there wouldn’t be a problem.

For example, with the RDS example, if you knew everything about networking, and databases, and Linux, and Rails, then you’d know exactly how to fix the problem. Luckily it never gets to that point. Usually there’s a relatively small amount of knowledge and understanding that lies between you and the solution to the problem.

So next time you’re confronted with a hairy issue, remember to use a systematic methodology, remember not to get fooled into believing things about the situation that aren’t true, remember to be persistent, and it’s entirely likely that you’ll be able to solve your problem.

How to restart Sidekiq automatically for each deployment

This post will cover why it’s necessary to restart Sidekiq for each deployment as well as how to achieve it.

Why it’s necessary to restart Sidekiq for each deployment

Example: database column name change

Let’s say you have a model in your Rails application called Customer. The corresponding customers table has a column that someone has called fname.

Because you’re not a fan of making your code harder to work with through overabbreviation, you decide to alter the customers table and rename fname to first_name. (You also change any instances of fname to first_name in the Rails code.) You deploy this change and the database migration runs in production.

Problem: out-of-date Sidekiq code

Unfortunately, unless you restart the Sidekiq process, there will be a problem. At the time the Sidekiq process was started (and by “Sidekiq process” I of course mean the thing that runs when you run bundle exec sidekiq), the Rails code referred to a column called fname.

If you changed the column name without restarting Sidekiq, the code the Sidekiq process is running will be referring to the non-existent fname column and your application will break.

The fix is simple: restart Sidekiq each time you deploy. The way to achieve this, however, is not so simple.

How to get Sidekiq to restart

systemd and systemctl

Before we get into the specifics of how to get Sidekiq to restart, a little background is necessary.

You’ve probably run Linux commands before that look something like sudo service nginx restart. I’ve been running such commands for many years without understanding what the “service” part is all about.

The “service” part is related to a pair of Linux concepts, systemd and systemctl. In the words of the DigitalOcean article I just linked, systemd is “an init system and system manager” and systemctl is “the central management tool for controlling the init system”. To put it in something closer to layperson’s terms, systemd and systemctl are a way to manage processes in a convenient way.

Using systemd to manage Sidekiq

If we register Sidekiq as a service with systemd, we gain the ability to run commands like service sidekiq start, service sidekiq restart and service sidekiq stop.

Once we have that ability, restarting Sidekiq on each deployment is as simple as adding service sidekiq restart as a step in our deployment script.

Adding a sidekiq.service file

The first step in telling systemd about Sidekiq is to put a sidekiq.service file in a certain directory. Which directory depends on which Linux distribution you’re using. I use Ubuntu so my file goes at /lib/systemd/system/sidekiq.service.

For the contents of the file you can use this sidekiq.service file from the Sidekiq repo as a starting point.

Important modifications to make to sidekiq.service

Change WorkingDirectory from /opt/myapp/current to the directory where your app is served from.

Change User=deploy and Group=deploy to whatever user and group you use for deployment.

If you’re using a version of Sidekiq that’s earlier than 6.0.6, you’ll need to change Type=notify to Type=simple and remove the WatchdogSec=5 line.

Bringing in environment variables

If your application depends on environment variables, which it of course almost certainly does, the Sidekiq process will need to be aware of those environment variable values in order to run.

There are multiple ways to pull environment variables into the Sidekiq service. In my case I did so by creating a file called /etc/profile.d/sidekiq_env (the name and location are arbitrary) and adding a line EnvironmentFile=/etc/profile.d/sidekiq_env to my sidekiq.service. Here’s a sample sidekiq_env file so you know how it’s formatted:

RAILS_ENV=production
RAILS_FORCE_SSL=true
RAILS_SKIP_MIGRATIONS=false
# etc

Trying out your Sidekiq service

Once you’ve done all the above you can run systemctl enable sidekiq and then service sidekiq start. If you get something other than no output, something is wrong. If you get no output, the service theoretically started successfully, although you won’t know for absolute certain until you verify by running a job while tailing the logs.

You can tail the logs by running journalctl -u sidekiq -f. I verified my Sidekiq systemd setup by watching the output of those logs while invoking a Sidekiq job.

Troubleshooting

If everything is working at this point, congratulations. In the much more likely case that something is wrong, there are a couple ways you can troubleshoot.

First of all, you should know that you need to run systemctl daemon-reload after each change to sidekiq.service in order for the change to take effect.

One way is to use the output from the same journalctl -u sidekiq -f command above. Another is to run systemctl status sidekiq and see what it says.

If Sidekiq doesn’t run properly via systemd, try manually running whatever command is in ExecStart, which, if you used the default, is /usr/local/bin/bundle exec sidekiq -e production. That command should of course work, so if it doesn’t, then there’s a clue.

But it is possible for the command in ExecStart to work and for the systemd setup to still be broken. If, for example, you have your environment variables loaded properly in the shell but you don’t have environment variables loaded properly in your sidekiq.service file, the service start sidekiq command won’t work. Examine any Rails errors closely to determine whether the problem might be due to a missing environment variable value.

Bonus: restarting Sidekiq via Ansible

If you happen to use Ansible to manage your infrastructure like I do, here’s how you can add a task to your Ansible playbooks for restarting Sidekiq.

The restart task itself looks like this:

- name: Restart sidekiq
  service:
    name: sidekiq
    state: restarted
    daemon_reload: yes

This task alone wasn’t enough for me though. I wanted to add a task to copy the sidekiq.service file. I also needed to add a task to enable the Sidekiq service.

tasks:
  - name: Copy sidekiq.service
    template:
      src: sidekiq.service
      dest: /lib/systemd/system/sidekiq.service
      force: yes
      owner: root
      group: root
      mode: 0644

  - name: Enable sidekiq
    service:
      name: sidekiq
      enabled: yes

  - name: Restart sidekiq
    service:
      name: sidekiq
      state: restarted
      daemon_reload: yes

Good luck

Good luck, and please leave a comment if you have troubles or need clarification.

How to launch an EC2 instance using Ansible

What this post covers

In this post I’m going to show what could be considered a “hello world” of Ansible + AWS, using Ansible to launch an EC2 instance.

Aside from the time required to set up an AWS account and install Ansible, you should be able to get your EC2 instance running in 20 minutes or less.

Why Ansible + AWS for Rails hosting?

AWS vs. Heroku

For hosting Rails applications, the service I’ve reached for the most in the past is Heroku.

Unfortunately it’s not always possible or desirable to use Heroku. Heroku can get expensive at scale. There are also sometimes legal barriers due to e.g. HIPAA.

So, for whatever reason, AWS is sometimes a more viable option than Herokou.

The challenges with AWS + Rails

Unfortunately once you leave Heroku and enter the land of AWS, you’re largely on your own in many ways. Unlike with Heroku, there’s not one single way to do your deployment. There are basically infinite possible ways.

One way is to deploy manually, but that has all the disadvantages you can imagine a manual solution would have, such as, most obviously, a bunch of manual work to do each time you deploy.

Another option is to use Elastic Beanstalk. Elastic Beanstalk is kind of like AWS’s answer to Heroku, but it’s not nearly as nice as Heroku, and customizations can be a little tricky/hacky.

The advantages of using Ansible

I’ve been using Ansible for the last several months both on a commercial project and for my own personal projects.

If you’re not familiar with Ansible, here’s a description from the docs: “Ansible is an IT automation tool. It can configure systems, deploy software, and orchestrate more advanced IT tasks such as continuous deployments or zero downtime rolling updates.”

Ansible is often seen mentioned with similar tools like Puppet and Chef. I went with Ansible because among Ansible, Puppet, and Chef, Ansible had documentation that I could actually comprehend.

I’ve personally been using Ansible for two things so far: 1) provisioning EC2 instances and 2) deploying my Rails application. When I say “provisioning” I mainly mean spinning up an EC2 instance and installing all the software on it that my Rails app needs, like PostgreSQL, Bundler, Yarn, etc.

I like using Ansible because it allows me to manage my infrastructure using (at least to an extent so far) infrastructure as code. Rather than e.g. manually installing PostgreSQL and Bundler each time I provision a new EC2 instance, I write playbooks comprised of tasks (playbook and task are Ansible terms) that say things like “install Postgresql” and “install the Bundler gem”. This makes the cognitive burden of maintenance way lower and it also makes my infrastructure setup less of a black box.

Instructions for provisioning an EC2 instance

Before you start you’ll need to have Ansible installed and of course have an AWS account available for use.

For this exercise we’re going to create two files. The first will be an Ansible playbook in a file called launch.yml which can be placed anywhere on your filesystem.

The launch playbook

Copy and paste from the content below into a file called launch.yml placed, again, wherever you want.

Make careful note of the region entry. I use us-west-2, so if you use that region also, you’ll need to make sure to look for your EC2 instance in that region and not in another one.

Also make note of the image entry. I believe EC2 images can vary from region to region, so make sure that the image ID you use does in fact exist in the region you use.

Lastly, replace my_ssh_key_name with the full path to whatever SSH key you normally use to SSH into your EC2 instances, for example, ~/.ssh/aws-key.

---
- hosts: localhost
  gather_facts: false
  vars_files:
    - vars.yml

  tasks:
    - name: Provision instance
      ec2:
        aws_access_key: "{{ aws_access_key }}"
        aws_secret_key: "{{ aws_secret_key }}"
        key_name: my_ssh_key_name
        instance_type: t2.micro
        image: ami-0d1cd67c26f5fca19
        wait: yes
        count: 1
        region: us-west-2

The vars file

Rather than hard-coding the entries for aws_access_key and aws_secret_key, which would of course be a bad idea if we were to commit our playbook to version control, we can have a separate file where we keep secret values. This separate file can either be added to a .gitignore or managed with something called Ansible Vault (which is outside the scope of this post).

Create a file called vars.yml in the same directory where you put launch.yml. This file will only need the two lines below. You’ll of course need to replace my placeholder values with your real AWS access key and secret key.

---
aws_access_key: XXXXXXXXXXXXXXXX
aws_secret_key: XXXXXXXXXXXXXXXX

The launch command

With our playbook and vars file in place, we can now run the command to execute the playbook:

$ ansible-playbook -v launch.yml

You’ll probably see a couple warnings including No config file found; using defaults and [WARNING]: No inventory was parsed, only implicit localhost is available. These are normal and can be ignored. In many use cases for Ansible, we’re running our playbooks against remote server instances, but in this case we don’t even have any server instances yet, so we’re just running our playbook right on localhost. For whatever reason Ansible feels the need to warn us about this.

Verification

If you now open up your AWS console and go to EC2, you should now be able to see a fresh new EC2 instance.

To me this sure beats manually clicking through the EC2 instance launch wizard.

Good luck, and if you have any troubles, please let me know in the comments.