When faced with a bug they have to diagnose, many developers will start making guesses as to what the problem is.
Guessing can be fine, especially when the guesses are good ones and when the guesses are inexpensive to test. But often, the guesses quickly degrade into long shots and the developer spends his or her time flailing around randomly rather than progressing steadily toward a solution.
Diagnosing bugs more methodically
One of my principles of debugging is that it’s almost always easier to determine where the cause of a bug is than what the cause of the bug is. When I need to diagnose a bug, the first thing I ask is not “What is the problem?” but “Where is the problem?” Not only is “where” a much easier question to answer than “what”, but once I’ve found the “where”, the “what” is often plainly evident.
When you change the question from “what” to “where”, the problem changes from a thought-intensive mystery-solving exercise to a relatively straightforward search problem.
When I want to determine where a bug lies, I identify an area of code and ask “Does the bug lie in this area of code?” If the answer is yes, then I perform the search again on a narrower area. If the answer is no, I continue my search in a different area.
When I ask the question “Does the bug lie in this area of code?” the answer can be gotten like this.
Perform the steps that reproduce the bug on the latest code. Observe that the bug is present.
Delete or disable some section of the code.
Perform the reproduction steps again.
Observe whether the bug is present. If the bug is gone, the answer is yes. If the bug is still present, the answer is no.
But the question remains: how do you determine which areas of your code to inspect for the bug? Obviously it’s not going to be efficient to just randomly choose areas for inspection. That’s where binary search comes in.
Binary search
Imagine I wanted to know someone’s birthday but I wasn’t allowed to ask them when their birthday was. I was only allowed to ask yes-or-no questions.
In order to figure out this person’s birthday, I could of course ask, “Is it on January 1st?” “Is it on January 2nd?” etc. But that would take me up to 365 guesses (or actually 366 because of leap year birthdays) and on average it would take me 366/2 = 183 guesses (assuming even distribution of birthdays).
Instead I could ask the person “Is your birthday before July 1st?” If so, I can rule out all the days after July 1st. Then I can cut the remaining portion of the year in half. The next question would be “Is your birthday before April 1st?” If so, I do the same thing again. “Before February 16th?” “Before January 23rd?” “Before January 13th?” and so on. With this method, it takes not more than nine questions to arrive at the right answer. That’s a lot better than 183!
Binary search in code
This binary search method can be applied to searching code as well. Let’s say you have a 100-line file that contains a bug but you don’t know where the bug is. You can (at least in theory) delete the last 50 lines of code and then check the remaining code for the presence of the bug. If the bug is still there, then you can divide that code in half by deleting the last 25 lines, and so on. Obviously you usually can’t literally delete the last 50 lines of a file because then it wouldn’t be syntactically valid and so on, but the principle of course still stands.
Binary search can also be used on the level of an entire codebase. You can devise questions that will divide the codebase roughly in half and then check for the presence or absence of the bug in each half. You don’t even need to necessarily delete code in order for this method to work. You just need a way to eliminate half of the codebase from your search area somehow. (Think about the game 20 Questions. “Is it living? Is it an animal? Is it a mammal?” and so on.)
You can also perform searches across time. Git bisect works by taking a range of commits and then repeatedly dividing it in half, asking you to answer the question “Does the bug lie in this half?” at each step. When you perform a bisect, you’re asking not “What is this bug?” but rather “What commit introduced this bug?” (In other words, “where is this bug”.) If your commits are small and atomic, then often the cause of the bug will be often obvious once the offending commit is identified. If the offending commit is large, you might need to do another binary search on the code the commit introduced in order to isolate the root cause.
The beauty of binary search debugging
Before I figured out how to diagnose bugs methodically, debugging was often an extremely frustrating exercise. I would stare at the screen and wonder why what was happening was happening. I would read the code over and over to try to make sense of it. I would make guesses as to what the problem could be. And after my guess turned out to be wrong, I often felt no closer to a solution than before.
Now things are different. The bug diagnosis methodology that I use now—which combines the “where, not what” principle with binary search—has two big benefits. First, this methodology allows me to progress systematically and inexorably toward a solution rather than taking shots in the dark. Second, it almost always works. It’s hard to overstate how good it feels to know that whatever bug I’m faced with, I have the ability to diagnose it in a timely manner, and without much mental strain, and with a success rate close to 100%. That sure as hell beats guessing.
Takeaways
When diagnosing bugs, guessing is sometimes fine, but it’s often inefficient.
It’s almost always easier to find where a bug is than what a bug is.
Using binary search to find where a bug is makes the search process more efficient.
A flaky test is a test that passes sometimes and fails sometimes, even though no code has changed.
In other words, a flaky test is a test that’s non-deterministic.
A test can be non-deterministic if either a) the test code is non-deterministic or b) the application code being tested is non-deterministic, or both.
Below are some common causes of flaky tests. I’ll briefly discuss the fix for some of these common causes, but the focus of this post isn’t to provide a guide to fixing flaky tests, it’s to give you a familiarity with the most common causes for flaky tests so that you can know what to go looking for when you do your investigation work. (I have a separate post for fixing flaky tests.)
The causes I’ll discuss are race conditions, leaked state, network/third-party dependency, randomness and fixed time dependency.
Race conditions
A race condition is when the correct functioning of a program depends on two or more parallel actions completing in a certain sequence, but the actions sometimes complete in a different sequence, resulting in incorrect behavior.
Real-life race condition example
Let’s say I’m going to pick up my friend to go to a party. I know it takes 15 minutes to get to his house so I text him 15 minutes prior so he can make sure to be ready in time. The sequence of events is supposed to be 1) my friend finishes getting ready and then 2) I arrive at my friend’s house. If the events happen in the opposite order then I end up having to wait for my friend, and I lay on the horn and shout obscenities until he finally comes out of his house.
Race conditions are especially likely to occur when the times are very close. Imagine, for example, that it always takes me exactly 15 minutes to get to my friend’s house, and it usually takes him 14 minutes to get ready, but about one time out of 50, say, it takes him 16 minutes. Or maybe it takes my friend 14 minutes to get ready but one day I somehow get to his house in 13 minutes instead of the usual 15. You can imagine how it wouldn’t take a big deviation from the norm to cause the race condition problem to occur.
Hopefully this real-life example illustrates that the significant thing that gives rise to race conditions is parallelism and sequence dependence, and it doesn’t matter what form the parallelism takes. The parallelism could take the form of multithreading, asynchronicity, two entirely separate systems (e.g. two calls to two different third-party APIs), or literally anything else.
Race conditions in DOM interaction/system tests
Race conditions are fairly common in system tests (tests that exercise the full application stack including the browser).
Let’s say there’s a test that 1) submits a form and then 2) clicks a link on the subsequent page. To tie this to the pick-up-my-friend analogy, the submission of the form would be analogous to me texting my friend saying I’ll be there in 15 minutes, and the loading of the subsequent page would be analogous to my friend getting ready. The race condition creates a problem when the test attempts to click the link before the page loads (analogous to me arriving at my friend’s house before he’s ready).
To make the analogy more precise, this sort of failure is analogous to me arriving at my friend’s house before he’s ready, and then just leaving after five minutes because I don’t want to wait (timeout error!).
The solution to this sort of race condition is easy: just remove the asynchronicity. Instead of allowing the test to execute at its natural pace, add a step after the form submission that waits for the next page to load before trying to click the link. This is analogous to me adding a step and saying “text me when you’re ready” before I leave to pick up my friend. If we arrange it like that then there’s no longer a race.
Because DOM interactions often involve asynchronicity, DOM interaction is a common area for race conditions, and therefore flaky tests, to be present.
Edge-of-timeout race conditions
A race condition can also occur when the amount of time an action takes to complete is just under a timeout value, e.g. a timeout value is five seconds and the action takes four seconds (but sometimes six seconds).
In these cases you can increase the timeout and/or improve the performance of the action such that the timeout value and typical run length are no longer close to each other.
Leaked state
Tests can create flaky behavior when they leak state into other tests.
Let me shoot an apple off your head
Let’s use another analogy to illustrate this one. Let’s say I wanted to perform two tests on myself. The first test is to see if I can shoot an apple off the top of my friend’s head with a bow and arrow. The second test is to see if I can drink 10 shots of tequila in under an hour.
If I were to perform the arrow test immediately followed by the tequila test and do that once a week, I could expect to get basically the same test results each time.
But if I were to perform the tequila test immediately followed by the arrow test, my aim would probably be compromised, and I might miss the apple once in a while. (Sorry, friend.) The problem is that the tequila test “leaks state”: it creates a lasting alteration in the global state, and that alteration affects subsequent tests.
And if I were to perform these two tests in random order, the tequila test would give the same result each time because I’d always be starting it sober, but the arrow test would appear to “flake” because sometimes I’d start it sober and sometimes I’d start it drunk. I might even suspect that there’s a problem with the arrow test because that’s the test that’s showing the symptom, but I’d be wrong. The problem is a different test with leaky state.
Ways for tests to leak state
Returning to computers, there are a lot of ways a test can alter the global state and create non-deterministic behavior.
One way is to alter database data. Imagine there are two tests, each of which creates a user with the email address test@example.com. The first test will pass and, if there’s a unique constraint on users.email, the second test will raise an error due to the unique constraint violation. Sometimes the first test will fail and sometimes the second test will fail, depending on which order you run them in.
Another way that a test could leak state is to change a configuration setting. Let’s say that your test environment has background jobs configured not to run for most tests because most background jobs are irrelevant to what’s being tested and would just slow things down. But then let’s imagine that you have one test where you do want background jobs to run, and so at the beginning of that tests you set background job setting from “don’t run” to “run”. If you don’t remember to change the setting back to “don’t run” at the end, background jobs will run for all later tests and potentially cause problematic behavior.
State can also be leaked by altering environment variables, altering the contents of the filesystem, or any number of other ways.
Network/third-party dependency
The main reason why network dependency can create non-deterministic behavior doesn’t take a lot of explaining: sometimes the network is up and sometimes it’s not.
Moreover, when you’re depending on the network, you’re often depending on some third-party service. Even if the network itself is working just fine, the third-party service could suffer an outage at any time, causing your tests to fail. I’ve also seen cases where a test fails because a test makes a third-party service call over and over and then gets rate-limited, and from that point on, for a period of time, that test fails.
The way to prevent flaky tests caused by network dependence is to use test doubles in your tests rather than hitting live services.
Randomness
Randomness is, by definition, non-deterministic. If you, for example, have a test that generates a random integer between 1 and 2 and then asserts that that number is 1, that test is obviously going to fail about half the time. Random inputs lead to random failures.
One way to get bitten by randomness is to grab the first item in a list of things that’s usually in the same order but not always. That’s why it’s usually better to specify a definite item rather than grab the nth item in a list.
Fixed time dependency
Once I was working late and I noticed that certain tests started to fail for no apparent reason, even though I hadn’t changed any code.
After some investigation I realized that, due to the way they were written, these tests would always fail when run at a certain time of day. I had just never worked that late before.
This is common with tests that cross the boundary of a day (or month or year). Let’s say you have a test that creates an appointment that occurs four hours from the current time, and then asserts that that appointment is included on today’s list of appointments. That test will pass when it’s run at 8am because the appointment will appear at 12pm which is the same day. But the test will fail when it’s run at 10pm because four hours after 10pm is 2am which is the next day.
Takeaways
A flaky test is a test that passes sometimes and fails sometimes, even though no code has changed.
Flaky tests are caused by non-determinism either in the test code or in the application code.
Some of the most common causes of flaky tests are race conditions, leaked state, network/third-party dependency, randomness and fixed time dependency.
Sometimes you’ll be tempted to add things to your application code that don’t affect the functionality of your application but do make testing a little easier.
The drawback to doing this is that causes your application code to lose cohesion. Instead of doing just one job—making your application work—your code is now doing two jobs: 1) making your application work and 2) helping to test the application. This mixture of jobs is a straw on the camel’s back that makes the application code just that much harder to understand.
Next time you’re tempted to add something to your application code to make testing more convenient, resist the temptation. Instead add it somewhere in the code that supports your tests. This may take more time and thought initially but the investment will pay itself back in the long run.
When you want to understand what a legacy program you’re working on is supposed to do, what’s your first instinct? Often it’s to look at the code.
But unfortunately legacy code is often so convoluted and inscrutable that it’s virtually impossible to tell what the code is supposed to do just by looking at it.
In these cases it may seem that you’re out of luck. But fortunately you’re not.
Modeling
Due to what philosophers call the veil of perception, direct knowledge of anything in the world is impossible. There’s very little that we can know with absolute completeness and correctness.
We don’t have knowledge about how most things work, we only have a model of how it works. A model is a useful approximation to the truth. Our models may not be absolutely complete or correct but they’re useful enough to get the job done on a day-to-day basis.
Take a car, for example. I don’t have a complete understanding of how every part of a car works. And what I do know probably isn’t even 100% accurate. But my model is sufficiently comprehensive and accurate that I can operate my car without being regularly surprised and confounded by differences in the ways I expect my car to behave versus the way it actually behaves.
Let’s use another example, trees. My understanding of trees is fragmentary. But like my model of cars, my model of trees is sufficiently comprehensive and accurate that trees don’t regularly throw me surprises. I understand trees to be stationary, and so when I cross the street, I never worry that I may be struck by a speeding tree. Although I do know that wind can blow over dead trees and so I’m cautious about entering the woods on a windy day.
How models are developed
How did I develop my models of cars and trees?
Trees, unlike cars, are natural. I can’t look at a tree’s source code or schematics to gain an understanding of how trees work. Everything humans know about trees has been discovered by observation and experimentation—in other words, scientific inquiry. We observe that trees start off small and then grow large. We observe that deciduous trees lose their leaves in the winter and coniferous trees don’t. And so on.
We can of course also learn about trees by reading books and learning from teachers, but that’s not source material, it’s only a regurgitation of the products of someone else’s scientific inquiry.
Cars are a little different. Cars are man-made, so in addition to learning about cars through observation and experimentation, we can learn by, for example, reading the owner’s manual that came with the car. And in theory, we could look at engineering diagrams and talk with the people who designed the car in order to gain a direct understanding of how the car works straight from the source material.
Cars are also different from trees in the sense that much of the mechanics of a car are very self-evident. You can learn something about how a car works by taking apart its engine. Not so much with a tree.
Inspecting a car’s transmission is analogous to reading a line of code. The thing you’re looking at is, itself, an instruction. It can’t lie to you. You can be mystified by it and you can misunderstand its purpose, but the car part can’t outright lie to you because the car part is how the car works.
Modeling the mind
Before we connect all this back to legacy code, I want to share one more example of scientific modeling.
The human brain is a complex machine. So is a car, so is a tree, and so is a codebase. But unlike a car or a codebase, we can’t observe the mechanics of the brain directly, at least not yet, not very much. We can’t look at the brain’s source code or insert breakpoints. For the purposes of understanding how it works, a brain is much more like a tree than a car.
But cognitive scientists have still managed to learn a lot about how the brain works. Or, more precisely, cognitive scientists have gained an understanding of the behavior that the brain produces. They’ve gained an understanding of how the mind (the outward expression of the machinery of the brain) works. They’ve developed a model.
This model of the mind has been developed in the same way that any accurate model has been developed: through scientific inquiry. A scientist can’t have a chat with the brain’s designer or have a look at its schematics, but a scientist can compare the behavior of people with normal brains against people who have had certain parts of their brains excised, for example, and make inferences about the roles that those parts of the brain play.
(Scientists can make diagrams of how they believe the brain works, but remember, those diagrams aren’t source material. They’re not code. They’re only a documentation of our current best understandings based on our scientific inquiry.)
So: if scientists can develop a model of the behavior generated by the brain without having access to the source machinery of the brain, what can we learn from scientists about how to understand the behavior of legacy systems without having access to comprehensible code?
Applying scientific inquiry to software systems
If you haven’t already done so, I’d like to invite you to make a conscious distinction between two ways of learning the behavior of a software system. One way is the obvious way that everyone’s familiar with: reading the code. The other way is the way that many people have probably used informally quite a bit but may not have consciously put a name to so much, which is scientific inquiry.
A full instruction in scientific inquiry is of course outside the scope of this post, and I wouldn’t be qualified to give one anyway. The point of this post is to invite you to consciously realize that you can develop a usefully accurate model of a software system not just by reading its code, but by using the methods of scientific inquiry.
Direct knowledge of anything in the world is impossible due to the veil of perception. All we can do is develop models, which are useful approximations to the truth.
Models are developed through the process of scientific inquiry.
In addition to reading the code, a model of a software system can be developed by using the process of scientific inquiry.
Most programmers are familiar with the concept of premature optimization and the reasons why it’s bad. As a reminder, the main reason premature optimization is bad is because it’s an effort to solve problems that probably aren’t real. It’s more economical to wait and observe where the performance bottlenecks are than to try to predict where the bottlenecks are going to be.
Perhaps fewer programmers are familiar with the idea of premature generalization, also known as the code smell Speculative Generality. Premature generalization is when you generalize a piece of code beyond its current requirements in anticipation of more general future requirements. In my experience it’s a very common mistake.
Premature generalization is bad for the same exact reason premature optimization is bad: because it’s an effort to solve problems that probably aren’t real.
Making a piece of code more general than it needs to be in anticipation of future needs might seem like a smart planning measure. If you can see that the code you’re writing will probably need to accommodate more use cases in the future, why not just make the code general enough now? That way you only have to write the code once.
When programmers do this they’re making a bet. Sometimes their bet is right and sometimes it’s wrong. In my experience, these sorts of bets are wrong enough of the time that you lose on average. It’s like betting $50 for a 10% chance at winning $100. If you were to do that 10 times, you’d spend $500 and win just once (on average), meaning you’ll have paid $500 to win $100.
It’s more economical to make your code no more general than what’s called for by today’s requirements and accept the risk that you might have to rework the code later to generalize it. This is also a bet but it’s a sounder one. Imagine a lottery system where you can either buy a ticket for $50 for a 10% chance of winning $100, or you can choose not to play and accept a 10% chance of getting fined $30. (I know it’s a weird lottery but bear with me.) If you buy a ticket ten times then on average you lose $400 because you’ve paid $500 to win $100. If ten times in a row you choose not to buy a ticket, then on average you get fined $30. So you’re obviously way better off with a policy of never buying the ticket.
Takeaways
Premature generalization is when you generalize a piece of code beyond its current requirements in anticipation of more general future requirements.
On average, premature generalization doesn’t pay. It’s more economical to write the code in such a way as to only accommodate today’s requirements and then only generalize if and when a genuine need arises.
Every codebase is a story. Well-designed programs tell a coherent, easy-to-understand story. Other programs are poorly designed and tell a confusing, hard-to-understand story. And it’s often the case that a program wasn’t designed at all, and so no attempt was made to tell a coherent story. But there’s some sort of story in the code no matter what.
If a codebase is like a story, a file in a codebase is like a chapter in a book. A well written-chapter will clearly let the reader know what the most important points are and will feature those important points most prominently. A chapter is most understandable when it principally sticks to just one topic.
The telling of the story may unavoidably require the conveyance of incidental details. When this happens, those incidental details will be put in their proper place and not mixed confusingly with essential points. If a detail would pose too much of a distraction or an interruption, it gets moved to a footnote or appendix or parenthetical clause.
A piece of code is cohesive if a) everything in it shares one single idea and b) it doesn’t mix incidental details with essential points.
Now let’s talk about ways that cohesion tends to get lost as well as ways to maintain cohesion.
How cohesion gets lost
Fresh new projects are usually pretty easy to work with. This is because a) when you don’t have very much code, it’s easier to keep your code organized, and b) when the total amount of code is small, you can afford to be fairly disorganized without hurting overall understandability too much.
Things get tougher as the project grows. Entropy (the tendency for all things to decline into disorder) unavoidably sets in. Unless there are constant efforts to fight back against entropy, the codebase grows increasingly disordered. The code grows harder to understand and work with.
One common manifestation of entropy is the tendency for developers to hang new methods onto objects like ornaments on a Christmas tree. A developer is tasked with adding a new behavior. He or she goes looking for the object that seems like the most fitting home for that behavior. He or she adds the new behavior, which doesn’t perfectly fit the object where it was placed, but the new code only makes the object 5% less cohesive, and it’s not clear where might be a better place for that behavior, so in it goes.
This ornament-hanging habit is never curtailed because no individual “offense” appears to be all that bad. This is the nature of entropy: disorder sets in not because anything bad was done but simply because no one is going out of their way to stave off disorder.
So, even though no individual change appears to be all that bad, the result of all these changes in aggregate is a surprisingly bad mess. The objects are huge. They confusingly mix unrelated ideas. Their essential points are obscured by incidental details. They’re virtually impossible to understand. They lack cohesion.
How can this problem be prevented?
How cohesion can be preserved
The first key to maintaining cohesion is to make a clear distinction between what’s essential and what’s incidental. More specifically, a distinction must be made between what’s essential and what’s incidental with respect to the object in question.
For example, let’s say I have a class called Appointment. The concerns of Appointment include, among other things, a start time, a client and some matters related to caching.
I would say that the start time and client are essential concerns of the appointment and that the caching is probably incidental. In the story of Appointment, start time and client are important highlights, whereas caching concerns are incidental details and should be tucked away in a footnote or appendix.
That explains how to identify incidental details conceptually but it doesn’t explain how to separate incidental details mechanically. So, how do we do that?
The primary way I do this is to simply move the incidental details into different objects. Let’s say for example that I have a Customer object with certain methods including one called balance.
Over time the balance calculation becomes increasingly complicated to the point that it causes Customer to lose cohesion. No problem: I can just move the guts of the balance method into a new object (a PORO) called CustomerBalance and delegate all the gory details of balance calculation to that object. Now Customer can once again focus on the essential points and forget about the incidental details.
Now, in this case it made perfect sense to recognize the concept of a customer balance as a brand new abstraction. But it doesn’t always work out this way. In our earlier Appointment example, for example, it’s maybe not so natural to take our caching concerns and conceive of them as a new extraction. It’s not particularly clear how that would go.
What we can do in these cases, when we want to move an incidental detail out of an object but we can’t put our finger on a befitting new abstraction, is we can use a mixin instead. I view mixins as a good way to hold a bit of code which has cohesion with itself but which doesn’t quite qualify as an abstraction and so doesn’t make sense as an object. For me, mixins usually don’t have standalone value, and they’re usually only ever “mixed in” to one object as opposed to being reusable.
(I could have said concern instead of mixin, but a) to me it’s a distinction without a meaningful difference, and b) concerns come along with some conceptual baggage that I didn’t want to bring into the picture here.)
So for our Appointment example, we could move the caching code into a mixin in order to get it out of Appointment so that Appointment could once again focus solely on its essential points and forget about its incidental details.
Where to put these newly-sprouted files
When I make an object more cohesive by breaking out its incidental details into new model file, you might wonder where I put that new file.
The short answer is that I put these files into app/models, with additional subfolders based on the meaning of the code.
So for the Appointment, I might have app/models/appointment.rb and app/models/scheduling/appointment_caching.rb, provided that the caching code is related specifically to scheduling. The rationale here is that the caching logic will only ever be relevant to scheduling whereas an appointment might be viewed in multiple contexts, e.g. sometimes scheduling and sometimes billing.
For the customer balance example, I might have app/models/customer.rb and app/models/billing/customer_balance.rb. Again, a customer balance is always a billing concern whereas a customer could be looked at through a billing lens or conceivably through some other sort of lens.
Note that even though appointment_caching.rb is a mixin or concern, I don’t put it in a concerns or mixins folder. That’s because I believe in organizing files by meaning rather than type. I find that doing so makes it easier to find what I want to find when I want to find it.
Takeaways
A piece of code is cohesive if a) everything in it shares single idea and b) it doesn’t mix incidental details with essential points.
Cohesion naturally erodes over time due to entropy.
The first key to maintaining cohesion is to make a clear distinction between what’s essential and what’s incidental.
Incidental details can be moved into either new objects or into mixins/concerns in order to help preserve cohesion.
If you’re going to make a change to an area, you have to understand that area. If you don’t understand the area you’re changing very well, your lack of understanding might lead to you accidentally introducing a bug.
Well-written code is loosely coupled from the other pieces of code it touches. “Loosely coupled” means that if you have classes A and B which talk to each other, you can understand class A without having to know much about class B and vice versa.
Conversely, if A and B are tightly coupled, then you might have to understand both class A and class B just to understand class A. Tight coupling makes code harder to work with.
One aspect of loosely-coupled code is that it has crisp boundaries, which are the opposite of blurry boundaries. Here’s an example of a piece of code with blurry boundaries.
class Person
def initialize(params)
@name = params[:name]
end
end
person = Person.new(params)
The only thing Person needs from the outside world is a name, but Person is accepting the broader params as an argument.
Looking outward from inside the Person class, we might wonder: what exactly is in params? Who knows! It could be anything.
The inclusion of params is a “leak” from the outside world into Person.
Looking the other direction, from outside Person inward, we might see Person.new(params) and wonder what exactly of paramsPerson needs in order to do its job. Does Person need everything inside of params? Just some of it? Who knows! Could be anything.
Let’s contrast the blurry-boundary code above with the crisp-boundary code below.
class Person
def initialize(name)
@name = name
end
end
person = Person.new(params[:name])
In this case, looking outward from inside the Person class, it’s clear that Person takes a name and that’s it.
And then looking in from outside, Person.new(params[:name]) makes it clear exactly what’s being sent to Person.
In order to make your classes and methods more understandable, keep your boundaries crisp by accepting the minimum amount of argument information necessary in order to get the job done.
In Rails apps that use RSpec, it’s customary to have a spec directory with certain subdirectories named for the types of tests they contain: models, requests, system. The Minitest organization scheme doesn’t share the exact same nomes but it does share the custom of organizing by test type.
I would like to raise the question: Why do we do it this way?
To get at the answer to that question I’d like to ask a broader question: What’s the benefit of organizing test files at all? Why not just throw all the tests in a single directory? For me there are two reasons.
Reasons to organize test files into directories
Finding tests
Sometimes I’m changing a feature and I want to know where the tests are for that feature so I can change or extend the tests accordingly
When I’m making a change to a feature, I usually want to know where the tests are that are related to that feature so I can update or extend the tests accordingly. Or, if that feature doesn’t have tests, I want to know so, and with a reasonable degree of certainty, so that I don’t accidentally create new tests that duplicate existing ones.
Running tests in groups
If tests are organized into directories then they can be conveniently run in groups.
It is of course possible, at least in some frameworks, to apply certain tags to tests and then run the tagged tests as a group. But doing so depends on developers remembering to add tags. This seems to me like a fragile link in the chain.
I find directories to be better than tags for this purpose since it’s of course impossible to forget to put a file in a directory.
Test type vs. meaning
At some point I realized that if I organize my test files based on meaning rather than test type, it makes it much easier to both a) find the tests when I want to find them and b) run the tests in groups that serve my purposes. Here’s why.
Finding tests
When I want to find the tests that correspond to a certain feature, I don’t necessarily know a lot about the characteristics of those tests. There might be a test that matches the filename of the application code file that I’m working on, but also there might not be. I’m also not always sure whether the application code I’m working on is covered by a model test, a system test, some other type of test, some combination of test types, or no test at all. The best I can do is either guess, search manually, or grep for some keywords and hope that the results aren’t too numerous to be able to examine one-by-one.
If on the other hand the files are organized in a directory tree that corresponds to the tests’ meaning in the domain model, then finding the tests is easier. If I’m working in the application’s billing area, for example, I can look in spec/billing folder to see if the relevant tests are there. If I use a nested structure, I can look in spec/billing/payments to find tests that are specifically related to payments.
I don’t need to worry about whether the payments-related tests are model tests, system tests or some other type of tests. I can just look in spec/billing/payments and work with whatever’s there. (I do, however, like to keep folders at the leaf level with names like models, system, etc. because it can be disorienting to not know what types of tests you’re looking at, and also it can create naming conflicts if you don’t separate files by type.)
Running tests in groups
I don’t often find it particularly useful to, say, run all my model tests or all my system tests. I do however find it useful to run all the tests in a certain conceptual area.
When I make a change in a certain area and I want to check for regressions, I of course want to check in the most likely places first. It’s usually more likely that I’ve introduced a regression to a conceptually related area than a conceptually unrelated area.
To continue the example from above, if I make a change to the payments area, then I can run all the tests in spec/billing/payments to conveniently check for regressions. If those tests all pass then I can zoom out one level and run all the tests in spec/billing. This gives me four “levels” of progressively broader regression testing: 1) a single file in spec/billing/payments, 2) all the tests in spec/billing/payments, 3) all the tests in spec/billing, and 4) all the tests in the whole test suite. If I organize my tests by type, I don’t have that ability.
On breaking convention
I’m not often a big fan of diverging from framework conventions. Breaking conventions often results in a loss of convenience which isn’t made up for by whatever is gained by breaking convention.
But don’t mistake this break from convention with other types of breaks from conventions you might have seen. Test directory structure is a very weak convention and it’s not even a Rails convention, it’s a convention of RSpec or Minitest. And in fact, it’s not even a technical convention, it’s a cultural convention. Unless I’m mistaken, there’s not actually any functionality tied to the test directory structure in RSpec or Minitest, and so diverging from the cultural standard doesn’t translate to a loss of functionality. It’s virtually all upside.
Takeaways
The benefits of organizing tests into directories include to be able to find tests and to be able to run tests in groups.
Organizing tests by meaning rather than type makes it easier to find tests and to run them in groups in a way that’s more logical for the purpose of finding regressions.
Every once in a while I come across the question “Where should I put my POROs in Rails?”
In order to answer this question, I would actually zoom out and ask a broader question: How should we organize our files in Rails in general?
Rails’ organizational limits
To some it might seem that this question already has an answer. Rails already gives us app/controllers for controllers, app/models for models, app/helpers for helpers and so on.
But after a person works with a growing Rails app for a while, it eventually becomes clear that Rails can only take you so far. The sheer quantity of code overwhelms Rails’ ability to help keep the code organized. It’s like piling pound after pound of spaghetti onto a single dinner plate. It only makes sense up to a certain point. Past that point the result is a mess. (This isn’t a criticism of Rails. It’s a natural fact of frameworks in general.)
A Rails codebase can grow both “horizontally” and “vertically”. Horizontal growth means adding more resources: more database tables, more model files, more controller files, etc. Rails can handle horizontal growth just fine, indefinitely.
Vertical growth means a growth in complexity. If the amount of domain logic in an application continues to grow but the number of controllers/models/etc. stays the same, then the result is that the individual files all grow. If the “fat models, skinny controllers” heuristic is followed, then the complexity accumulates in the model files. The result is huge models. These huge models are hard to understand because of their sheer size and because they lack cohesion, meaning that each model isn’t “about” one thing, but rather each model file is just a dumping ground for everything that might be loosely related to that model.
Common (poor) attempts to manage complexity growth
A common way to address the complexity problem is to split the code according to design patterns (decorators, builders, etc.) and put the files in folders that are named for the design patterns: app/decorators, app/builders and so on. The logic of this approach is that it’s a continuation of what Rails is already doing for us, which is to divide files by design pattern. At first glance it seems like a sensible approach.
However, I don’t think this approach does a very good job of addressing the problem of being able to find what we need to find when we need to find it. Here’s why.
Let’s say for example that I need to make a change to some billing-related logic. I know that the code I’m looking for has something to do with billing of course, but I might not know much else about the code I’m looking for. I have no idea whether the code I’m interested in might lie in app/models, app/decorators or anywhere else. I probably have a sense of whether the code is display-related (app/views), domain-logic-related (app/models) or related to the request/response lifecycle (app/controllers), but beyond that, I probably have no clue where the code is located. How could I?
When people try to extend Rails’ convention of dividing files by design pattern, they’re missing an important point. Decorators, builders, commands, queries, etc. are all different from each other, but they’re different from each other in a different way than models, views and controllers are different from each other.
Think of it this way. Imagine if instead of being divided into meat, produce, dairy, etc. sections a grocery store was organized by “things in boxes”, “things in plastic bags”, etc. The former is an essential difference while the latter is an incidental difference. Unless you know how everything is packaged, you won’t be sure where to look. The difference between models, views and controllers is like the difference between meat, produce and dairy. The difference between decorators, builders, commands, queries, etc. is more like the difference between how the items are packaged. Again, the former is essential while the latter is incidental.
Organizing by meaning
A better way to organize Rails code is by meaning. Instead of having one folder for each design pattern I have, I can have one folder for each conceptual area of my app. For example, if my app has a lot of billing code, I can have folders called app/models/billing, app/controllers/billing and so on. This makes it much easier to find a piece of code when I don’t know anything about the code’s structure, I only know about its meaning.
Regarding design patterns, I think design patterns are both overrated and overemphasized, at least in the Rails world. A lot of Rails developers seem to have the idea that every file they create must belong to some category: model, controller, worker, helper, decorator, service, etc. Maybe this is because in a vanilla Rails app, pretty much everything is slotted in to a category in some way. But there’s no logical reason that every piece of code has to fit into some design pattern. The plain old “object” is an extremely powerful and versatile device.
But what if everything in a Rails app is just plain old Ruby objects? Won’t the app lose structure? Not necessarily. Most objects represent models in the broad sense of the term “model”, which is that the code represents some aspect of the world in a way that’s easy to understand and work with. Therefore, the objects that comprise the app’s domain logic can go in app/models, organized hierarchically by domain concept. Plain old objects can sit quite comfortably in app/models alongside Active Record models.
Now let’s go all the way back to the original question: where should you put POROs in Rails?
The answer depends on how you organize your Rails code in general. It also depends on what you consider POROs to be. I consider most of my POROs to be models, so I put them in app/models.
Takeaways
Rails can only help with code organization when the amount of code is small. Past a certain point it’s up to you to impose your own structure.
If the aim is to be able to find the code you need to find when you need to find it, organizing by design pattern doesn’t help much if you don’t already know how the code is structured. Organizing the code by meaning is better.
In this post I’ll show what duplication is, why it’s such a surprisingly complicated issue, why the popular advice is dubious, and what can be done to address duplication.
We’ll start with a definition of duplication.
What duplication is
We could imagine that duplication could be defined as a piece of code that appears in two or more places. Indeed, this sounds like a very reasonable and accurate definition. But it’s actually wrong.
Here’s what duplication really is. Duplication is when there’s a single behavior that’s specified in two or more places.
Just because two identical pieces of code are present doesn’t necessarily mean duplication exists. And just because there are no two identical pieces of code present doesn’t mean there’s no duplication.
Two pieces of code could happen to be identical, but if they actually serve different purposes and lead separate lives, then they don’t represent the same behavior, and they don’t constitute duplication. To “DRY up” these identical-looking pieces of code would create new problems, like handcuffing two people together who need to walk in two different directions.
On the other hand, it’s possible for a single behavior to be represented in a codebase but with non-identical code. The way to tell if two pieces of code are duplicative isn’t to see if their code matches (although most of the time duplicative behavior and duplicative code do appear together.) The question that determines duplication is: if I changed one piece of code in order to meet a new requirement, would it be logically necessary to update the other piece of code the same way? If so, then the two pieces of code are probably duplicates of each other, even if their behavior is not achieved using the same exact syntax.
Why duplication is bad
The main reason duplication is bad is because it leaves a program susceptible to developing logical inconsistencies.
If a behavior is expressed in two different places in a program, and one of them accidentally doesn’t match the other, then the deviating behavior is necessarily wrong. (Or if the deviating happens to still meet its requirements, it only does so by accident.)
Another reason duplication can be bad is because it can pose an extra maintenance burden. It takes longer, and requires more mental energy, to apply a change to two areas of code instead of just one.
But not all instances of duplication are equally bad. Some kinds of duplication are more dangerous than others.
When duplication is more dangerous or less dangerous
There are three factors that determine the degree of harm of an instance of duplication: 1) how easily discoverable the duplication is, 2) how much extra overhead the presence of the duplication incurs, and 3) how much “traffic” that area receives, i.e. how frequently that area of code needs to be changed or understood. Let’s look at each of these factors more closely.
Discoverability
If there’s a piece of behavior that’s specified twice in the codebase, but the two pieces of code are only separated by one line, then there’s not a big problem because everyone is basically guaranteed to notice the problem. If someone updates one of the copies of the behavior to meet a new requirement, they’re very unlikely to accidentally miss updating the other one. You might call this the proximity factor.
If two pieces of duplicated behavior appear in different files in different areas of the application, then a “miss” is much more likely to occur, and therefore the duplication constitutes a worse problem.
Another quality that makes discovery of duplication easier is similitude. If two pieces of code look very similar, then their duplicity is more likely to be noticed than if the two pieces of code don’t look the same. You might call this the similitude factor.
If the proximity factor is bad (the pieces of duplicated code are at a great distance from each other) and/or if the similitude factor is bad (the duplication is obscured by the pieces of duplicated code not being similar enough to appear obviously duplicative) then it means the duplication is riskier.
Overhead
Some instances of duplication are easier to live with than others. Two short lines of very similar code, located right next to each other, are very easy to keep in sync with one another. Other types of duplication are much more deeply baked into the system and can cause a much bigger headache.
For example, if a piece of duplication exists as part of the database schema, that’s a much higher maintenance cost than a short code duplication. Instances of duplication that are big and aren’t represented by identical code can also be costly to maintain because, in those cases, you can’t just type the same thing twice, you have to perform a potentially expensive translation step in your head.
Traffic level
Duplication is a type of “bad code”, and so principles that apply to bad code apply to duplication as well. One of these principles is that bad code in heavily-trafficked areas costs more than bad code in lightly-trafficked areas.
When considering how much a piece of bad code costs, it’s worth considering when that cost is incurred. When a piece of bad code incurs a cost, we might think of this as analogous to paying a toll on a toll road.
One tollway is when a piece of code is changed. The more frequently the code is changed, the more of a toll it’s going to incur, and so the bigger a problem it is.
Another tollway is when a piece of code needs to be understood as a prerequisite to understanding a different piece of code. Every codebase has “leaf code” and “branch code”. If a piece of code is leaf code, as in nothing depends on it, then we can afford for that code to be pretty bad and it doesn’t matter. Branch code, on the other hand, gets heavy intellectual traffic, and so incurs a higher toll, and so is a bigger problem.
How to decide whether to DRY up a piece of code or to keep the duplication
The way to decide whether or not to DRY up a piece of duplication is pretty simple, although it’s not easy. There are two factors to consider.
Severity
If a piece of duplication is “severe”—i.e. it has low discoverability, poses high overhead, and/or has a high traffic level—it should probably be fixed. If not, it should probably be left alone.
Quality of alternative
Just because a piece of duplication costs something doesn’t automatically mean that the de-duplicated version costs less. It doesn’t happen very often, but sometimes a de-duplication unavoidably results in code that’s so generalized that it’s virtually impossible to understand. In these cases the duplicated version may be the lesser of two evils.
But be careful to make the distinction between “this code can’t be de-duplicated without making it worse” and “this particular attempt to de-duplicate this code made it worse”. Like all refactoring projects, sometimes you just need to try a few times before you land on something you’re happy with. And sometimes you just need to be careful not to go overboard.
Why the popular guidelines make little sense
It currently seems to be fashionable to hold the belief that developers apply DRY too eagerly. This hasn’t been my experience. The opposite has been my experience.
Claims that developers apply DRY too eagerly are often accompanied by advice to follow WET (“write everything twice”) or the “rule of three”, or “duplication is cheaper than the wrong abstraction”. Here’s why I think these popular guidelines make little sense.
Rule of three/”write everything twice”
Here’s my way of deciding whether to DRY up a duplication: Is the duplication very bad? Are we able to come up with a fix that’s better than the duplicated version and not worse? If so, then clean it up. If not, leave it alone.
Notice that my criteria do not include “does the duplication appear three times”? I can’t see how that could be among the most meaningful factors.
Imagine, for example, a piece of duplication in the form of three very simple and nearly-identical lines, grouped together in a single file. The file is an unimportant one which only gets touched a couple times a year, and no one needs to understand that piece of code as a prerequisite to understanding anything else.
Now imagine another piece of duplication. The duplication appears in only two places, but the places are distant from one another and therefore the duplication is hard to discover. The two places where the duplicated behavior appear are expressed differently enough that the code would elude detection by a code quality tool or a manual human search. The behavior is a vitally central and important one. It doesn’t get changed often enough that it stays at the top of everyone’s mind, but it gets changed often enough that there are lots of opportunities for divergences to arise. And the two places the behavior appears are brutally painful to keep in sync.
Given this scenario, why on earth would I choose to fix the triple-duplicate and leave the double-duplicate alone?
The rule of three and “write everything twice” (WET) make little sense. The number of times a piece of duplication appears is not the main factor in judging its harmfulness.
Duplication is cheaper than the wrong abstraction
This statement is repeated very frequently in the Ruby community, usually to discourage people from applying the DRY principle too eagerly.
I wish we would think about this statement more deeply. Why are we setting up such a strong a connection between duplication and abstractions? It strikes me as a non-sequitur.
And why are we imagining such a strong danger of creating the wrong abstraction? Do we not trust ourselves to DRY up a piece of code and end up with something good? And again, why does the result of our de-duplicating have to be an abstraction? I find it an illogical connection.
If we take out the word “abstraction” then the sentiment that remains is “duplicated code is better than a de-duplicated version that’s even worse”. In which case I of course agree, but the statement is so banal that it’s not even a statement worth making.
I think “duplication is cheaper than the wrong abstraction” is a statement devoid of any useful meaning, and one we should stop repeating.
How to fix instances of duplication
A duplication-removal project is just a special case of a refactoring project. (Remember, refactoring means “changing the structure of code without changing its behavior”). Any guidelines that apply to general refactoring projects also apply to de-duplication projects.
When de-duplicating, it helps to work in small, atomic units. If the refactoring was triggered by a need to make a behavior change, don’t mix the behavior change with the refactoring. Perform the refactoring either before implementing the change or after or both, not during. And when you reach the point when you’re no longer sure that your refactorings are an improvement, stop.
When I’m de-duplicating two pieces of code, it’s often not clear how the unification will be achieved. In these cases I like to make it my first step to make the duplicate pieces of code completely identical while still keeping them separate. Merging two subtly different pieces of code can be tricky but merging two identical pieces of code is trivial. So make them identical first, then merge.
Duplication exists when there’s a single behavior that’s specified in two or more places.
The main reason duplication is bad is because it leaves a program susceptible to developing logical inconsistencies.
Not all instances of duplication are equally dangerous. The severity of a piece of duplication can be judged based on its discoverability, overhead cost and traffic level.
In order to decide whether an instance of duplication is worth fixing, consider the severity of the duplication. Also compare the duplicative code with the de-duplicated code, and only keep the “fixed” version if the fixed version is actually better.
The rule of three/”write everything twice” makes little sense because it doesn’t take the factors into account that determine whether a piece of duplication is dangerous or innocuous. “Duplication is the wrong abstraction” makes little sense because it sets up a false dichotomy between duplication and “the wrong abstraction”.
To get good at removing duplication, get good at refactoring.
When attempting to remove an instance of duplication, it’s often helpful to make the duplicative code completely identical as a first step, and then merge the identical code as a second step.