Diagnosing bugs by guessing
When faced with a bug they have to diagnose, many developers will start making guesses as to what the problem is.
Guessing can be fine, especially when the guesses are good ones and when the guesses are inexpensive to test. But often, the guesses quickly degrade into long shots and the developer spends his or her time flailing around randomly rather than progressing steadily toward a solution.
Diagnosing bugs more methodically
One of my principles of debugging is that it’s almost always easier to determine where the cause of a bug is than what the cause of the bug is. When I need to diagnose a bug, the first thing I ask is not “What is the problem?” but “Where is the problem?” Not only is “where” a much easier question to answer than “what”, but once I’ve found the “where”, the “what” is often plainly evident.
When you change the question from “what” to “where”, the problem changes from a thought-intensive mystery-solving exercise to a relatively straightforward search problem.
When I want to determine where a bug lies, I identify an area of code and ask “Does the bug lie in this area of code?” If the answer is yes, then I perform the search again on a narrower area. If the answer is no, I continue my search in a different area.
When I ask the question “Does the bug lie in this area of code?” the answer can be gotten like this.
- Perform the steps that reproduce the bug on the latest code. Observe that the bug is present.
- Delete or disable some section of the code.
- Perform the reproduction steps again.
- Observe whether the bug is present. If the bug is gone, the answer is yes. If the bug is still present, the answer is no.
But the question remains: how do you determine which areas of your code to inspect for the bug? Obviously it’s not going to be efficient to just randomly choose areas for inspection. That’s where binary search comes in.
Binary search
Imagine I wanted to know someone’s birthday but I wasn’t allowed to ask them when their birthday was. I was only allowed to ask yes-or-no questions.
In order to figure out this person’s birthday, I could of course ask, “Is it on January 1st?” “Is it on January 2nd?” etc. But that would take me up to 365 guesses (or actually 366 because of leap year birthdays) and on average it would take me 366/2 = 183 guesses (assuming even distribution of birthdays).
Instead I could ask the person “Is your birthday before July 1st?” If so, I can rule out all the days after July 1st. Then I can cut the remaining portion of the year in half. The next question would be “Is your birthday before April 1st?” If so, I do the same thing again. “Before February 16th?” “Before January 23rd?” “Before January 13th?” and so on. With this method, it takes not more than nine questions to arrive at the right answer. That’s a lot better than 183!
Binary search in code
This binary search method can be applied to searching code as well. Let’s say you have a 100-line file that contains a bug but you don’t know where the bug is. You can (at least in theory) delete the last 50 lines of code and then check the remaining code for the presence of the bug. If the bug is still there, then you can divide that code in half by deleting the last 25 lines, and so on. Obviously you usually can’t literally delete the last 50 lines of a file because then it wouldn’t be syntactically valid and so on, but the principle of course still stands.
Binary search can also be used on the level of an entire codebase. You can devise questions that will divide the codebase roughly in half and then check for the presence or absence of the bug in each half. You don’t even need to necessarily delete code in order for this method to work. You just need a way to eliminate half of the codebase from your search area somehow. (Think about the game 20 Questions. “Is it living? Is it an animal? Is it a mammal?” and so on.)
You can also perform searches across time. Git bisect works by taking a range of commits and then repeatedly dividing it in half, asking you to answer the question “Does the bug lie in this half?” at each step. When you perform a bisect, you’re asking not “What is this bug?” but rather “What commit introduced this bug?” (In other words, “where is this bug”.) If your commits are small and atomic, then often the cause of the bug will be often obvious once the offending commit is identified. If the offending commit is large, you might need to do another binary search on the code the commit introduced in order to isolate the root cause.
The beauty of binary search debugging
Before I figured out how to diagnose bugs methodically, debugging was often an extremely frustrating exercise. I would stare at the screen and wonder why what was happening was happening. I would read the code over and over to try to make sense of it. I would make guesses as to what the problem could be. And after my guess turned out to be wrong, I often felt no closer to a solution than before.
Now things are different. The bug diagnosis methodology that I use now—which combines the “where, not what” principle with binary search—has two big benefits. First, this methodology allows me to progress systematically and inexorably toward a solution rather than taking shots in the dark. Second, it almost always works. It’s hard to overstate how good it feels to know that whatever bug I’m faced with, I have the ability to diagnose it in a timely manner, and without much mental strain, and with a success rate close to 100%. That sure as hell beats guessing.
Takeaways
- When diagnosing bugs, guessing is sometimes fine, but it’s often inefficient.
- It’s almost always easier to find where a bug is than what a bug is.
- Using binary search to find where a bug is makes the search process more efficient.
- Binary search almost always works.
One awesome advice I am definitely going to remember and keep. Thank you.
When I read this, I thought: ‘I know this. Yesterday’s news.’ But today it helped me find the answer to ‘why does including this CSS file break the layout?’, which I failed to find last week.