Category Archives: Ruby

Ruby memoization

What is memoization?

Memoization is a performance optimization technique.

The idea with memoization is: “When a method invokes an expensive operation, don’t perform that operation each time the method is called. Instead, just invoke the expensive operation once, remember the answer, and use that answer from now on each time the method is called.”

Below is an example that shows the benefit of memoization. The example is a class with two methods which both return the same result, but one is memoized and one is not.

The expensive operation in the example takes one second to run. As you can see from the benchmark I performed, the memoized method is dramatically more performant than the un-memoized one.

Running the un-memoized version 10 times takes 10 seconds (one second per run). Running the memoized version 10 times takes only just over one second. That’s because the first call takes one second but the calls after that take a negligibly small amount of time.

class Product
  # This method is NOT memoized. This method will invoke the
  # expensive operation every single time it's called.
  def price
    expensive_calculation
  end

  # This method IS memoized. It will invoke the expensive
  # operation the first time it's called but never again
  # after that.
  def memoized_price
    @memoized_price ||= expensive_calculation
  end
  
  def expensive_calculation
    sleep(1)
    500
  end
end

require "benchmark"

product = Product.new
puts Benchmark.measure { 10.times { product.price } }
puts Benchmark.measure { 10.times { product.memoized_price } }
$ ruby memoized.rb
  0.000318   0.000362   0.000680 ( 10.038078)
  0.000040   0.000049   0.000089 (  1.003962)

Why is memoization called memoization?

I’ve always thought memoization was an awkward term due to its similarity to “memorization”. The obscurity of the name bugged me a little so I decided to look up its etymology.

According to Wikipedia, “memoization” is derived from the Latin word “memorandum”, which means “to be remembered”. “Memo” is short for memorandum, hence “memoization”.

When to use memoization

The art of performance optimization is a bag of many tricks: query optimization, background processing, caching, lazy UI loading, and other techniques.

Memoization is one trick in this bag of tricks. You can recognize its use case when an expensive method is called repeatedly without a change in return value.

This is not to say that every time a case is encountered where an expensive method is called repeatedly without a change in return value that it’s automatically a good use case for memoization. Memoization (just like all performance techniques) is not without a cost, as we’ll see shortly. Memoization should only be used when the benefit exceeds the cost.

As with all performance techniques, memoization should only be used a) when you’re sure it’s needed and b) when you have a plan to measure the before/after performance effect. Otherwise what you’re doing is not performance optimization, you’re just randomly adding code (i.e. incurring costs) without knowing whether the costs you’re incurring are actually providing a benefit.

The costs of memoization

The main cost of memoization is that you risk introducing subtle bugs. Here are a couple examples of the kinds of bugs to which memoization is susceptible.

Instance confusion

Memoization works if and only if the return value will always be the same. Let’s say, for example, that you have a loop that makes use of an object which has a memoized method. Maybe this loop uses the same object instance in every single iteration, but you’re under the mistaken belief that a fresh instance is used for each iteration.

In this case the value from the object in the first iteration will be correct, but all the subsequent iterations risk being incorrect because they’ll use the value from the first iteration rather than getting their own fresh values.

If this type of bug sounds contrived, it’s not. It comes from a real example of a bug I once caused myself!

Nil return values

In the example above, if expensive_calculation had been nil, then the value wouldn’t get memoized because @memoized_price would be nil and nil is falsy.

The risk of such a bug is probably low, and the consequences of the bug are probably small in most cases, but it’s a good category of bug to be aware of. An alternative solution is to use defined? rather than lazy initialization, which is not susceptible to the nil-is-falsy bug.

Prudence pays off

The risk of introducing bugs as a side effect of memoization is admittedly low but it’s not zero. Because memoization isn’t free, it’s not a good idea to reflexively add memoization to methods as a default policy. Instead, add memoization on a case-by-case basis when it’s clearly justified.

Takeaways

  • Memoization is a performance optimization technique that prevents wasteful repeated calls to an expensive operation when the return value is the same each time.
  • Memoization should only be added when you’re sure it’s needed and you have a plan to verify the performance difference.
  • A good use case for memoization is when an expensive method is called repeatedly without a change in return value.
  • Memoization isn’t free. It carries with it the risk of subtle bugs. Therefore, don’t apply memoization indiscriminately. Only use it in cases where there’s a clear benefit.

How Ruby’s method_missing works

What we’re going to cover

In this post we’ll take a look at an example of the kinds of things you can do with method_missing as well as how and why method_missing works.

The example

The following example is a DSL which will allow us to construct an HTML document using Ruby.

(This example, by the way, is shamelessly stolen from a great post by Emmanuel Hayford who apparently himself took the example from the book The Ruby Programming Language. Check out the post and maybe the book too.)

Basic version

Here’s some Ruby code that will generate a rudimentary HTML document.

HTMLDocument.new do
  html do
    body do
      puts "Hello world"
    end
  end
end

The generated HTML code looks like this.

<html>
<body>
Hello world
</body>
</html>

Here’s the code that can get the above DSL code to generate the above HTML output.

class HTMLDocument
  def initialize(&block)
    # This instance_exec means that any message that's
    # sent inside the HTMLDocument.new block (html, body,
    # etc.) will use HTMLDocument as its recipient.
    # See https://www.codewithjason.com/ruby-instance-exec/
    instance_exec(&block)
  end

  private

  def method_missing(method_name, *args, &block)
    puts "<#{method_name}>"
    block.call
    puts "</#{method_name}>"
  end
end

HTMLDocument.new do
  html do
    body do
      puts "Hello world"
    end
  end
end

This works because, thanks to instance_exec, every message that gets sent inside of the HTMLDocument.new do block gets sent to HTMLDocument. Then, because HTMLDocument doesn’t actually respond a message called html or a message called body, method_missing gets invoked for html and body.

And in case it’s not clear, when I say “messages being sent”, I more or less mean “methods being called”. Inside the HTMLDocument.new block, the messages being sent are html, body and puts. The subtle distinction between a message being sent and a method being called is that there’s not always a one-to-one match between messages and methods. There are obviously no methods defined called html or body. Those are just messages that are being sent.

Anyway, inside of out method_missing definition, we say “output an opening tag and a closing tag, and in between those two things, call whatever block that was passed”. The result is the HTML output you see above.

“Advanced” example

Here’s a slightly more “advanced” example. This example will force us to use the *args part of method_missing which we’re not using in the example above.

In this version we use an a tag with an href attribute and a target attribute.

<html>
<body>
Hello world
<a href="https://www.codewithjason.com" target="_top" rel="noopener">
Code with Jason
</a>
</body>
</html>

The idea here is that, for any tag we use, we can optionally specify attributes for the tag by passing arguments. Below is the code that would produce the a tag above.

a(href: "https://www.codewithjason.com", target: "_top") do
  puts "Code with Jason"
end

In order for code like this to work, we need to add something that will take the hash we pass as *args and convert the hash into stringified HTML attributes.

class HTMLDocument
  def initialize(&block)
    instance_exec(&block)
  end

  private

  def method_missing(method_name, *args, &block)
    # args is equal to:
    # [{:href=>"https://www.codewithjason.com", :target=>"_top"}]
    # We're interested in the "first" (and only) element
    puts "<#{method_name}#{hash_to_html_attributes(args[0])}>"
    block.call
    puts "</#{method_name}>"
  end

  def hash_to_html_attributes(hash)
    return unless hash

    stringified_attributes = hash.map do |key, value|
      "#{key}=\"#{value}\""
    end

    " #{stringified_attributes.join(" ")}"
  end
end

HTMLDocument.new do
  html do
    body do
      puts "Hello world"

      a(href: "https://www.codewithjason.com", target: "_top") do
        puts "Code with Jason"
      end
    end
  end
end

Now we’re able to pass arguments to our “missing methods” and HTMLDocument will pick up our arguments.

But why? How does *args work? Let’s go through each of method_missing‘s parameters in detail so we can see.

method_missing, parameter by parameter

We’re going to go through each of method_missing‘s parameters, but out of order. The first parameter, method_name, is a normal one, but the other two are special, each in their own way.

method_name

The method_name argument is simply a stringified version of whatever message was passed. If I do my_object.hello, then method_name will be "hello". Nothing more to it than that.

&block

The &block parameter represents an optional block that we can pass to our method.

You might wonder what the & in front of &block is all about. The explanation is actually pretty long and involved, but the short version is that in order to be able to work with a block, the block first needs to be converted into an instance of the Proc class. The & is what converts the block into a Proc object.

If you’re curious about the details of how this works, see my other post about what the ampersand in front of &block means.

*args

The *args parameter is another slightly wacky one. What’s the * all about?

The * at the beginning of *args is the Ruby **splat operator**. When the splat operator appears in a parameter list, it basically says “send me as many arguments as you want, and I’ll trea

Here’s a method that uses the splat operator.

def list(store, *items)
  "get #{items.join(", ")} from #{store}"
end

puts list("grocery store", "bread", "bananas", "beer")
puts list("hardware store", "nuts", "bolts")

Notice how items gets treated just like an array, even though we didn’t use array syntax to pass the list values.
Here’s the output of the above script.

get bread, bananas, beer from grocery store
get nuts, bolts from hardware store

Takeaways

  • method_missing can be useful for constructing DSLs.
  • method_missing can be added to any object to endow that object with special behavior when the object gets sent a message for which it doesn’t have a method defined.
  • method_missing takes the name of the method that was called, an arbitrary number of arguments, and (optionally) a block.

How Ruby’s instance_exec works

What instance_exec is and why it exists

instance_exec is a method that executes a block in the context of whatever block you give it.

Why would we want to do this? As we’ll see, the usage of instance_exec can make DSL code less noisy and verbose. We can see based on a couple examples of DSL code that benefits from instance_exec. The first example is the code for a Factory Bot factory definition.

FactoryBot.define do
  factory :user do
    first_name { "John" }
    last_name { "Smith" }
  end
end

In this snippet, the factory method seems to come from nowhere. We’ll see shortly how this mystery can be explained.

This RSpec snippet has similar mysteries. Where does the describe method on the second line come from? How about the it method on the third line?

RSpec.describe "Hello world", type: :system do
  describe "index page" do
    it "shows the right content" do
      visit hello_world_index_path
      expect(page).to have_content("Hello, world!")
    end
  end
end

In this chapter we’re going to demonstrate how libraries like Factory Bot and RSpec use instance_exec in order to achieve the effect shown above.

A custom use of instance_exec

Perhaps the easiest way to illustrate how instance_execworks is for us to go through the process of writing our own custom method that uses instance_exec.

Here’s a piece of code that’s loosely analogous to the Factory Bot and RSpec examples above. What this code has in common with the above examples is that there’s a method of mysterious origin (content) being called inside a block.

inside_tag("p") do
  content "Hello"
  content "World"
end

We’re going to write a definition for the inside_tag method in such a way that, when we call it in the above manner, the following output is produced.

<p>
  Hello
  World
</p>

First pass at the inside_tag method

Here’s a crude version of the inside_tag method which just uses putsinside the block instead of the content method calls we’re ultimately after.

def inside_tag(name, &block)
  puts "<#{name}>"
  block.call
  puts "</#{name}>"
end

inside_tag("p") do
  puts "  Hello"
  puts "  World"
end

This code produces the desired output but we’re not using a content method yet. It’s also a little bit inelegant that we have to achieve the indentation by manually prefixing each line with a space. We’ll address that shortcoming in the next pass.

Second pass at the inside_tag method

Here’s a version of the method that’s slightly closer to what we’re shooting for. In this version, we define a class called Tag, an instance of which is passed to the block so that the body of the block can call the Tag object’s content method. Doing it this way also allows us to add an indent to each piece of content automatically.

class Tag
  def content(value)
    puts "  #{value}"
  end
end

def inside_tag(name, &block)
  puts "<#{name}>"
  block.call(Tag.new)
  puts "</#{name}>"
end

inside_tag("p") do |tag|
  tag.content "Hello"
  tag.content "World"
end

Third pass at inside_tag using instance_exec

Lastly, we can use instance_exec to make the block execute in the context of a Tag object. We now no longer have to do tag.content because the block is already executing in the context of a Tag.

class Tag
  def content(value)
    puts "  #{value}"
  end
end

def inside_tag(name, &block)
  puts "<#{name}>"
  Tag.new.instance_exec(&block)
  puts "</#{name}>"
end

inside_tag("p") do
  content "Hello"
  content "World"
end

Now that we’ve seen that this works, let’s take a closer look at why it works.

Why this works

The main object

Every method that’s called in Ruby is called in the context of some object.

Even when we don’t seem to be in the context of any object, we are. If you open an irb console and type self, you’ll see that the return value is main.

> self
 => main

There’s an object called main which receives any message that we send from the console.

If you type fooand hit enter, you’ll get an error that says:

NameError (undefined local variable or method `foo' for main:Object)

This is proof that when we send a message to no object in particular, the recipient of that message is main. (It says main:Objectbecause main is an instance of the native Ruby class called Object.)

Changing the context

Put the following code into a Ruby file and then run it.

class Tag
  def execute1(&block)
    block.call
  end

  def execute2(&block)
    instance_exec(&block)
  end
end

puts "Current context is #{self}"
Tag.new.execute1 { puts "Current context is #{self}" }
Tag.new.execute2 { puts "Current context is #{self}" }

The output will look something like this:

Current context is main
Current context is main
Current context is #<Tag:0x0000000155045078>

The first and second lines output main because, as was said earlier, the default recipient for any message that’s not sent to a particular object is main.

The third line outputs Tag because the Tag#execute2 method invokes the block it’s given using instance_exec.

Takeaways

  • instance_exec is a method that executes a block in the context of a certain object.
  • instance_exec can help make DSL syntax less noisy and verbose.
  • Methods like Factory Bot’s factory and RSpec’s it and describe are possible because of instance_exec.

How map(&:some_method) works

The map method’s shorthand syntax

One of the most common uses of map in Ruby is to take an object and call some method on the object, like this.

[1, 2, 3].map { |number| number.to_s }

To save us from redundancy, Ruby has a shorthand version which is functionally equivalent to the above.

[1, 2, 3].map(&:to_s)

The shorthand version is nice but its syntax is a little mysterious. In this post I’ll explain why the syntax is what it is.

Passing symbols as blocks

Let’s leave the world of map for a moment and deal with a “regular” method.

I’m going to show you a method which takes a block and then demonstrate four different ways of passing a block to that method.

Side note: if you’re not too familiar with Proc objects yet, I would suggest reading my other posts on how Proc objects work and what the & in front of &block means before continuing.

First way: using a normal block

You’ve of course seen this way before. We call my_method and pass a regular block to it.

def my_method(&block)
  block.call("hello")
end

puts my_method { |value| value.upcase } # outputs "HELLO"

Second way: using Proc.new

If we wanted to, we could instead pass our block using Proc.new. Since my_method takes a block and not a Proc object, we would have to prefix Proc.new with an ampersand to convert the Proc object into a block.

(If you didn’t know, prefixing an expression with & will convert a proc to a block and a block to a proc. See this post for more details on how that works.)

def my_method(&block)
  block.call("hello")
end

puts my_method(&Proc.new { |value| value.upcase })

There would never really be a practical reason to express the syntax this way, but I wanted to show that it’s possible. This “second way” example will also connect the first way and the third way.

Third way: using to_proc

All symbols respond to a method called to_proc which returns a Proc object. If we do :upcase.to_proc, it gives us a Proc object that’s equivalent to what we would have gotten by doing Proc.new { |value| value.upcase }.

def my_method(&block)
  block.call("hello")
end

puts my_method(&:upcase.to_proc)

Fourth way: passing a symbol

I’ll show one final way. When Ruby sees an argument that’s prefixed with an ampersand, it attempts to call to_proc on the argument. So our to_proc on &:upcase.to_proc is actually superfluous. We can just pass &:upcase all by itself.

def my_method(&block)
  block.call("hello")
end

puts my_method(&:upcase)

What ultimately gets passed is the Proc object that results from calling :upcase.to_proc. Actually, more precisely, what gets passed is the block that results from calling &:upcase.to_proc, since the & converts the Proc object to a block.

Passing symbols to map

With the understanding of the above, you now know that this:

[1, 2, 3].map(&:to_s)

Is equivalent to this:

[1, 2, 3].map(&:to_s.to_proc)

Which is equivalent to this:

[1, 2, 3].map(&Proc.new { |number| number.to_s })

Which, finally, is equivalent to this:

[1, 2, 3].map { |number| number.to_s }

So, contrary to the way it may seem, there aren’t two different “versions” of the map method. The shorthand syntax is owing to the way that Ruby passes Proc objects.

Takeaways

  • When an argument is prefixed with &, Ruby attempts to call to_proc on it.
  • All symbols respond to the to_proc method.
  • There aren’t two different versions of the map method. The shorthand syntax is possible due to the two points above.

Understanding Ruby closures

Why you’d want to know about Ruby closures

Ruby blocks are one of the areas of the language that’s simultaneously one of the most fundamental parts of the language but perhaps one of the hardest to understand.

Ruby functions like map and each operate using blocks. You’ll also find heavy use of blocks in popular Ruby libraries including Ruby on Rails itself.

If you start to dig into Ruby blocks, you’ll discover that, in order to understand blocks, you have to understand something else called Proc objects.

And as if that weren’t enough, you’ll then discover that if you want to deeply understand Proc objects, you’ll have to understand closures.

The concept of a closure is one that suffers from 1) an arguably misleading name (more about this soon) and 2) unhelpful, jargony explanations online.

My goal with this post is to provide an explanation of closures in plain language that can be understood by someone without a Computer Science background. And in fact, a Computer Science background is not needed, it’s only the poor explanations of closures that make it seem so.

Let’s dig deeper into what a closure actually is.

What a closure is

A closure is a record which stores a function plus (potentially) some variables.

I’m going to break this definition into parts to reduce the chances that any part of it is misunderstood.

  • A closure is a record
  • which stores a function
  • plus (potentially) some variables

I’m going to discuss each part of this definition individually.

First, a reminder: the whole reason we’re interested in Ruby closures is because of the Ruby concept called a Proc object, which is heavily involved in blocks. A Proc object is a closure. Therefore, all the examples of closures in this post will take the form of Proc objects.

If you’re not yet familiar with Proc objects, I would suggest taking a look at my other post, Understanding Ruby Proc objects, before continuing. It will help you understand the ideas in this post better.

First point: “A closure is a record”

A closure is a value that can be assigned to a variable or some other kind of “record”. The term “record” doesn’t have a special technical meaning here, we just use the word “record” because it’s broader than “variable”. Shortly we’ll see an example of a closure being assigned to something other than a variable.

Here’s a Proc object that’s assigned to a variable.

my_proc = Proc.new { puts "I'm in a closure!" }
my_proc.call

Remember that every Proc object is a closure. When we do Proc.new we’re creating a Proc object and thus a closure.

Here’s another closure example. Here, instead of assigning the closure to a variable, we’re assigning the closure to a key in a hash. The point here is that the thing a closure gets assigned to isn’t always a variable. That’s why we say “record” and not “variable”.

my_stuff = { my_proc: Proc.new { puts "I'm in a closure too!" } }
my_stuff[:my_proc].call

Second point: “which stores a function”

As you may have deduced, or as you may have already known, the code between the braces (puts "I'm in a closure!") is the function we’re talking about when we say “a closure is a record which stores a function”.

my_proc = Proc.new { puts "I'm in a closure!" }
my_proc.call

A closure can be thought of as a function “packed up” into a variable. (Or, more precisely, a variable or some other kind of record.)

Third point: “plus (potentially) some variables”

Here’s a Proc object (which, remember, is a closure) that involves an outside variable.

The variable number_of_exclamation_points gets included in the “environment” of the closure. Each time we call the closure that we’ve named amplifier, the number_of_exclamation_points variable gets incremented and one additional exclamation point gets added to the string that gets outputted.

number_of_exclamation_points = 0

amplifier = Proc.new do
  number_of_exclamation_points += 1
  "louder" + ("!" * number_of_exclamation_points)
end

puts amplifier.call # louder!
puts amplifier.call # louder!!
puts amplifier.call # louder!!!
puts amplifier.call # louder!!!!
puts number_of_exclamation_points # 4 - the original variable was mutated

As a side note, I find the name “closure” to be misleading. The fact that the above closure can mutate number_of_exclamation_points, a variable outside the function’s scope, seems to me like a decidedly un-closed idea. In fact, it seems like there’s a tunnel, an opening, between the closure and the outside scope, through which changes can leak.

I personally started having an easy time understanding closures once I stopped trying to connect the idea of “a closed thing” with the mechanics of how closures actually work.

Takeaways

  • Ruby blocks heavily involve Proc objects.
  • Every Proc object is a closure.
  • A closure is a record which stores a function plus (potentially) some variables.

The two common ways to call a Ruby block

Ruby blocks can be difficult to understand. One of the details which presents an obstacle to fully understanding blocks is the fact that there is more than one way to call a block.

In this post we’ll go over the two most common ways of calling a Ruby block: block.call and yield.

There are also other ways to call a block, e.g. instance_exec. But that’s an “advanced” topic which I’ll leave out of the scope of this post.

Here are the two common ways of calling a Ruby block and why they exist.

The first way: block.call

Below is a method that accepts a block, then calls that block.

def hello(&block)
  block.call
end

hello { puts "hey!" }

If you run this code, you’ll see the output hey!.

You may wonder what the & in front of &block is all about. As I explained in a different post, the & converts the block into a Proc object. The block can’t be called directly using .call. The block has to be converted into a Proc object first and then .call is called on the Proc object.

I encourage you to read my two other posts about Proc objects and the & at the beginning of &block if you’d like to understand these parts more deeply.

The second way: yield

The example below is very similar to the first example, except instead of using block.call we’re using yield.

def hello(&block)
  yield
end

hello { puts "hey!" }

You may wonder: if we already have block.call, why does Ruby provide a second, slightly different way of calling a block?

One reason is that yield gives us a capability that block.call doesn’t have. In the below example, we define a method and then pass a block to it, but we never have to explicitly specify that the method takes a block.

def hello
  yield
end

hello { puts "hey!" }

As you can see, yield gives us the ability to call a block even if our method doesn’t explicitly take a block. (Side note: any Ruby method can be passed a block, even if the method doesn’t explicitly take one.)

The fact that yield exists raises the question: why not just use yield all the time?

The answer is that when you use block.call, you have the ability to pass the block to another method if you so choose, which is something you can’t do with yield.

When we put &block in a method’s signature, we can do more with the block than just call it using block.call. We could also, for example, choose not to call the block but rather pass the block to a different method which then calls the block.

Takeaways

  • There are two common ways to call a Ruby block: block.call and yield.
  • Unlike block.call, yield gives us the ability to call a block even if our method doesn’t explicitly take a block.
  • Unlike using an implicit block and yield, using and explicit block allows us to pass a block to another method.

What the ampersand in front of &block means

Here’s a code sample that I’ve grabbed more or less at random from the Rails codebase.

def form_for(record, options = {}, &block)

The first two arguments, record and options = {}, are straightforward to someone who’s familiar with Ruby. But the third argument, &block, is a little more mysterious. Why the leading ampersand?

This post will be the answer to that question. In order to begin to understand what the leading ampersand is all about, let’s talk about how blocks relate to Proc objects.

Blocks and Proc objects

Let’s talk about blocks and Proc objects a little bit, starting with Proc objects.

Here’s a method which takes an argument. The method doesn’t care of what type the argument is. All the method does is output the argument’s class.

After we define the method, we call the method and pass it a Proc object. (If you’re not too familiar with Proc objects, you may want to check out my other post, Understanding Ruby Proc objects.)

def proc_me(my_proc)
  puts my_proc.class
end

proc_me(Proc.new { puts "hi" })

If you run this code, the output will be:

Proc

Not too surprising. We’re passing a Proc object as an argument to the proc_me method. Naturally, it thinks that my_proc is a Proc object.

Now let’s add another method, block_me, which accepts a block.

def proc_me(my_proc)
  puts my_proc.class
end

def block_me(&my_block)
  puts my_block.class
end

proc_me(Proc.new { puts "hi" })
block_me { puts "hi" }

If we run this code the output will be:

Proc
Proc

Even though we’re passing a Proc object the first time and a block the second time, we see Proc for both lines.

The reason that the result of my_block.class is Proc is because a leading ampersand converts a block to a Proc object.

Before moving on I encourage you to try out the above code in a console. Poke around at the code and change some things to see if it enhances your understanding of what’s happening.

Converting the Proc object to a block before passing the Proc object

Here’s a slightly altered version of the above example. Notice how my_proc has changed to &my_proc. The other change is that Proc.new has changed to &Proc.new.

def proc_me(&my_proc) # an & was added here
  puts my_proc.class
end

def block_me(&my_block)
  puts my_block.class
end

proc_me(&Proc.new { puts "hi" }) # an & was added here
block_me { puts "hi" }

If we run this code the output is the exact same.

Proc
Proc

This is because not only does a leading ampersand convert a block to a Proc object, but a leading ampersand also converts a Proc object to a block.

When we do &Proc.new, the leading ampersand converts the Proc object to a block. Then the leading ampersand in def proc_me(&my_proc) converts the block back to a Proc object.

I again encourage you to run this code example for yourself in order to more clearly understand what’s happening.

The differences between blocks and Proc objects

Ruby has a class called Proc but no class called Block. Because there’s no class called Block, nothing can be an instance of a Block. The material that Ruby blocks are made out of is Proc objects.

What happens when we try this?

my_block = { puts "hi" }

If we try to run this, we get:

$ ruby block.rb
block.rb:1: syntax error, unexpected string literal, expecting `do' or '{' or '('
my_block = { puts "hi" }
block.rb:1: syntax error, unexpected '}', expecting end-of-input
my_block = { puts "hi" }

That’s because the syntax { puts "hi" } doesn’t make any syntactical sense on its own. If we want to say { puts "hi" }, there are only two ways we can do it.

First way: put it inside a Proc object

That would look like this:

Proc.new { puts "hi" }

In this way the { puts "hi" } behavior is “packaged up” into an entity that we can then do whatever we want with. (Again, see my other post on Ruby proc objects for more details.)

Second way: use it to call a method that takes a block

That would look like this:

some_method { puts "hi" }

Why converting a block to a Proc object is necessary

Let’s take another look at our code sample from the beginning of the post.

def form_for(record, options = {}, &block)

In methods that take a block, the syntax is pretty much always &block, never just block. And as we’ve discussed, the leading ampersand converts the block into a Proc object. But why does the block get converted to a Proc object?

Since everything in Ruby is an instance of some object, and since there’s no such thing as a Ruby Block class, there can never be an object that’s a block. In order to be able to have an instance of something that represents the behavior of a block, that thing has to take the form of a Proc object, i.e. an instance of the class Proc, the stuff that Ruby blocks are made out of. That’s why methods that explicitly deal with blocks convert those blocks to Proc objects first.

Takeaways

  • A leading ampersand converts a block to a Proc object and a Proc object to a block.
  • There’s no such thing as a Ruby Block class. Therefore no object can be an instance of a block. The material that Ruby blocks are made out of is Proc objects.
  • The previous points taken together are why Ruby block arguments always appear as e.g. &block. The block can’t be captured in a variable unless it’s first converted to a Proc object.

Understanding Ruby Proc objects

What we’re going to do and why

If you’re a Ruby programmer, you almost certainly use Proc objects all the time, although you might not always be consciously aware of it. Blocks, which are ubiquitous in Ruby, and lambdas, which are used for things like Rails scopes, both involve Proc objects.

In this post we’re going to take a close look at Proc objects. First we’ll do a Proc object “hello world” to see what we’re dealing with. Then we’ll unpack the definition of Proc objects that the official Ruby docs give us. Lastly we’ll see how Proc objects relate to other concepts like blocks and lambdas.

A Proc object “hello world”

Before we talk about what Proc objects are and how they’re used, let’s take a look at a Proc object and mess around with it a little bit, just to see what one looks like.

The official Ruby docs provide a pretty good Proc object “hello world” example:

square = Proc.new { |x| x**2 }

We can see how this Proc object works by opening up an irb console and defining the Proc object there.

> square = Proc.new { |x| x**2 }
 => #<Proc:0x00000001333a8660 (irb):1> 
> square.call(3)
 => 9 
> square.call(4)
 => 16 
> square.call(5)
 => 25

We can kind of intuitively understand how this works. A Proc object behaves somewhat like a method: you define some behavior and then you can use that behavior repeatedly wherever you want.

Now that we have a loose intuitive understanding, let’s get a firmer grasp on what Proc objects are all about.

Understanding Proc objects more deeply

The official Ruby docs’ definition of Proc objects

According to the official Ruby docs on Procs objects, “a Proc object is an encapsulation of a block of code, which can be stored in a local variable, passed to a method or another Proc, and can be called.”

This definition is a bit of a mouthful. When I encounter wordy definitions like this, I like to separate them into chunks to make them easier to understand.

The Ruby Proc object definition, broken into chunks

A Proc object is:

  • an encapsulation of a block of code
  • which can be stored in a local variable
  • or passed to a method or another Proc
  • and can be called

Let’s take these things one-by-one.

A Proc object is an encapsulation of a block of code

What could it mean for something to be an encapsulation of a block of code? In general, when you “encapsulate” something, you metaphorically put it in a capsule. Things that are in capsules are isolated from whatever’s on the outside of the capsule. Encapsulating something also implies that it’s “packaged up”.

So when the docs say that a Proc object is “an encapsulation of a block of code”, they must mean that the code in a Proc object is packaged up and isolated from the code outside it.

A Proc object can be stored in a local variable

For this one let’s look at an example, straight from the docs:

square = Proc.new { |x| x**2 }

As we can see, this piece of code creates a Proc object and stores it in a local variable called square. So this part of the definition, that a Proc object can be stored in a local variable, seems easy enough to understand.

A Proc object can be passed to a method or another Proc

This one’s a two-parter so let’s take each part individually. First let’s focus on “A Proc object can be passed to another method”.

Here’s a method which can accept a Proc object. The method is followed by the definition of two Proc objects: square, which squares whatever number you give it, and double, which doubles whatever number you give it.

def perform_operation_on(number, operation)
  operation.call(number)
end

square = Proc.new { |x| x**2 }
double = Proc.new { |x| x * 2 }

puts perform_operation_on(5, square)
puts perform_operation_on(5, double)

If you were to run this code you would get the following output:

25
10

So that’s what it means to pass a Proc object into a method. Instead of passing data as a method argument like normal, you can pass behavior. Or, to put it another way, you can pass an encapsulation of a block of code. It’s then up to that method to execute that encapsulated block of code whenever and however it sees fit.

If we want to pass a Proc object into another Proc object, the code looks pretty similar to our other example above.

perform_operation_on = Proc.new do |number, operation|
  operation.call(number)
end

square = Proc.new { |x| x**2 }
double = Proc.new { |x| x * 2 }

puts perform_operation_on.call(5, square)
puts perform_operation_on.call(5, double)

The only difference between this example and the one above it is that, in this example, perform_operation_on is defined as a Proc object rather than a method. The ultimate behavior is exactly the same though.

A Proc object can be called

This last part of the definition of a Proc object, “a Proc object can be called”, is perhaps obvious at this point but let’s address it anyway for completeness’ sake.

A Proc object can be called using the #call method. Here’s an example.

square = Proc.new { |x| x**2 }
puts square.call(3)

There are other ways to call a Proc object but they’re not important for understanding Proc objects conceptually.

Closures

In order to fully understand Proc objects, we need to understand something called closures. The concept of a closure is a broader concept that’s not unique to Ruby.

Closures are too nuanced a concept to be included in the scope of this article, unfortunately. If you’d like to understand closures, I’d suggest checking out my other post, Understanding Ruby closures.

But the TL;DR version is that a closure is a record which stores a function plus (potentially) some variables.

Proc objects and blocks

Every block in Ruby is a Proc object, loosely speaking. Here’s a custom method that accepts a block as an argument.

def my_method(&block)
  puts block.class
end

my_method { "hello" }

If you were to run the above code, the output would be:

Proc

That’s because the block we passed when calling my_method is a Proc object.

Below is an example that’s functionally equivalent to the above. The & in front of my_proc converts the Proc object into a block.

def my_method(&block)
  puts block.class
end

my_proc = Proc.new { "hello" }
my_method &my_proc

By the way, if you’re curious about the & at the beginning of &block and &my_proc, I have a whole post about that here.

Proc objects and lambdas

Lambdas are also Proc objects. This can be proven by running the following in an irb console:

> my_lambda = lambda { |x| x**2 }
 => #<Proc:0x00000001241e82a8 (irb):1 (lambda)> 
> my_lambda.class
 => Proc

The difference between lambdas and Proc objects is that the two have certain subtle differences in behavior. For example, in lambdas, return means “exit from this lambda”. In regular Proc objects, return means “exit from embracing method”. I won’t go into detail on the differences between lambdas and Proc objects because it’s outside the scope of what I’m trying to convey in this post. Instead you can read these details in the official docs.

Takeaways

  • A Proc object is an encapsulation of a block of code, which can be stored in a local variable, passed to a method or another Proc, and can be called.
  • A closure is a record which stores a function plus some variables. Proc objects are closures.
  • Blocks are Proc objects.
  • Lambdas are Proc objects too, although a special kind of Proc object with subtly different behavior.

Don’t wrap instance variables in attr_reader unless necessary

I’ve occasionally come across advice to wrap instance variables in attr_reader. There are two supposed benefits of this practice, which I’ll describe shortly. First I’ll provide a piece of example code.

Example of wrapping instance variables in attr_reader

Original version

Here’s a tiny class that has just one instance variable, @name. You can see that @name is used in the loud_name method.

class User
  def initialize(name)
    @name = name
  end

  def loud_name
    "#{@name.upcase}!!!"
  end
end

attr_reader version

Here’s that same class with an attr_reader added. Notice how loud_name now references name rather than @name.

class User
  attr_reader :name

  def initialize(name)
    @name = name
  end

  def loud_name
    "#{name.upcase}!!!"
  end
end

The purported benefits

Advocates of this technique seem to find two certain benefits in it. (I’ve also come across other supported benefits but I don’t find them strong enough to merit mentioning.)

It makes refactoring easier

Rationale: If you ever want to change @name from being defined by an instance variable to being defined by a method, then you don’t have to go changing all the instances of @name to name.

This reasoning isn’t wrong but it is weak. First, it’s very rare that a value will start its life as an instance variable and then at some later point need to change to a method. This has happened to me so few times that I can’t recall a single instance of it happening.

Second, the refactoring work that the attr_reader saves is only a trivial amount of work. The cost of skipping the attr_reader is that you have to e.g. change a handful of instances of @name, in one file, to name. Considering that this is a tiny amount of work and that it needs to happen perhaps once every couple years per developer, this justification seems very weak.

It saves you from typo failures

Rationale: If you’re using instance variables and you accidentally type @nme instead of @name, @nme will just return nil rather than raising an error. If you’re using attr_reader and you accidentally type nme instead of name, nme will in fact raise an error.

This justification is also true, but also weak. If typo-ing an instance variable allows a bug to silently enter your application, then your application is not tested well enough.

I would be in favor of saving myself from the typo problem if the attr_reader technique hardly cost anything to use, but as we’ll see shortly, the attr_reader technique’s cost is too high to justify its benefit. Since the benefit is so tiny, the cost would have to be almost nothing, which it’s not.

Reasons why the attr_reader technique is a bad idea

Adding a public attr_reader throws away the benefits of encapsulation

Private instance variables are useful for the same reason as private methods: because you know they’re not depended on by outside clients.

If I have a class that has an instance variable called @price, I know that I can rename that instance variable to @cost or change it to @price_cents (changing the whole meaning of the value) or even kill @price altogether. What I want to do with @price is 100% my business. This is great.

But if I add attr_reader :price to my class, my class suddenly has responsibilities. I can no longer be sure that the class where @price is defined is the only thing that depends on @price. Other clients throughout my application may be referring to @price. I’m no longer free to do away with @price or change its meaning. This makes my code riskier and harder to change.

You can add a private attr_reader, but that’s unnatural

If you want to make use of the attr_reader technique but you don’t want to throw away the benefits of encapsulation, you can add a private attr_reader. Here’s what that would look like.

# attr_reader version

class User
  def initialize(name)
    @name = name
  end

  def loud_name
    "#{name.upcase}!!!"
  end

  private

  attr_reader :name
end

This solves the encapsulation problem, but what have we really gained on balance? In exchange for not having to change @name to name on the off chance that we change name from an instance variable to a method, we have to pay the price of having this weird private attr_reader :name thing at the bottom of our class.

And consider that we would have to do this on every single class that has at least one instance variable!

Don’t wrap instance variables in attr_reader

Wrapping your instance variables in a public attr_reader changes your instance variables from private to public, increasing the public surface area of your class’s API and making your application a little bit harder to understand.

Wrapping your instance variables in a private attr_reader adds an unnatural piece of boilerplate to all your Ruby classes.

Given the tiny and dubious benefits that the attr_reader technique provides, this cost isn’t worth it.

Using attr_reader is good and necessary for values that really need to be public. As a default policy, wrapping instance variables in attr_reader is a bad idea.

Understanding Ruby blocks

Blocks are a fundamental concept in Ruby. Many common Ruby methods use blocks. Blocks are also an integral part of many domain-specific languages (DSLs) in libraries like RSpec, Factory Bot, and Rails itself.

In this post we’ll discuss what a block is. Then we’ll take a look at four different native Ruby methods that take blocks (times, each, map and tap) in order to better understand what use cases blocks are good for.

Lastly, we’ll see how to define our own custom method that takes a block.

What a block is

Virtually all languages have a way for functions to take arguments. You pass data into a function and then the function does something with that data.

A block takes that idea to a new level. A block is a way of passing behavior rather than data to a method. The examples that follow will illustrate exactly what is meant by this.

Native Ruby methods that take blocks

Here are four native Ruby methods that take blocks. For each one I’ll give a description of the method, show an example of the method being used, and then show the output that that example would generate.

Remember that blocks are a way to pass behavior rather than data into methods. In each description, I’ll use the phrase “Behavior X” to describe the behavior that might be passed to the method.

Method: times

Description: “However many times I specify, repeat Behavior X.”

Example: three times, print the text “hello”. (Behavior X is printing “hello”.)

3.times do
  puts "hello"
end

Output:

hello
hello
hello

Method: each

“Take this array. For each element in the array, execute Behavior X.”

Example: iterate over an array containing three elements and print each element. (Behavior X is printing the element.)

[1, 2, 3].each do |n|
  puts n
end

Output:

1
2
3

Method: map

“Take this array. For each element in the array, execute Behavior X, append the return value of X to a new array, and then after all the iterations are complete, return the newly-created array.”

Example: iterate over an array and square each element. (Behavior X is squaring the element.)

squares = [1, 2, 3].map do |n|
  n * n
end

puts squares.join(",")

Output:

1,4,9

Method: tap

“See this value? Perform Behavior X and then return that value.”

Example: initialize a file, write some content to it, then return the original file. (Behavior X is writing to the file.)

require "tempfile"

file = Tempfile.new.tap do |f|
  f.write("hello world")
  f.rewind
end

puts file.read

Output:

hello world

Now let’s look at how we can write our own method that can take a block.

Custom methods that take blocks

An HTML generator

Here’s a method which we can give an HTML tag as well as a piece of behavior. The method will execute our behavior. Before and after the behavior will be the opening and closing HTML tags.

inside_tag("p") do
  puts "Hello"
  puts "How are you?"
end

The output of this code looks like this.

<p>
Hello
How are you?
</p>

In this example, the “Behavior X” that we’re passing to our method is printing the text “Hello” and then “How are you?”.

The method definition

Here’s what the definition of such a method might look like.

def inside_tag(tag, &block)
  puts "<#{tag}>"  # output the opening tag
  block.call       # call the block that we were passed
  puts "</#{tag}>" # output the closing tag
end

Adding an argument to the block

Blocks can get more interesting when add arguments.

In the below example, the inside_tag block now passes an instance of Tag back to the block, allowing the behavior in the block to call tag.content rather than just puts. This allows our content to be indented.

class Tag
  def content(value)
    puts "  #{value}"
  end
end

def inside_tag(tag, &block)
  puts "<#{tag}>"
  block.call(Tag.new)
  puts "</#{tag}>"
end

inside_tag("p") do |tag|
  tag.content "Hello"
  tag.content "How are you?"
end

The above code gives the following output.

<p>
  Hello
  How are you?
</p>

Passing an object back to a block is a common DSL technique used in libraries like RSpec, Factory Bot, and Rails itself.

The technical details of blocks

There are a lot of technical details to learn about blocks. There are some interesting questions you could ask about blocks, including the following:

These are all good questions worth knowing the answer to, and you can click the links above to find out. But understanding these details is not necessary in order to understand the high-level gist of blocks.

Takeaway

A block is a way of passing behavior rather than data to a method. Not only do native Ruby methods make liberal use of blocks, but so do many popular Ruby libraries. Custom methods that take blocks can also sometimes be a good way to add expressiveness to your own applications.