What it means to be wrong and why it’s bad
It’s generally better to be right than to be wrong. Since there’s more than one way to be wrong, I want to be specific about the type of wrongness I want to address in this post before I move on.
The type of wrongness I’m interested in in this post is logical incorrectness, like two plus two equals five.
The danger of being wrong
Being right or wrong isn’t just an academic concern. In programming, being wrong often has concrete negative economic (and other) impacts. Developers who are often wrong will be much less efficient and burn up much more payroll cost and opportunity cost than developers who are wrong less often.
Being wrong is also not something that happens every great once in a while. Most humans are wrong about a whole bunch of stuff, a lot of the time, because that’s just human nature. Even really smart people are wrong about things a very nonzero amount of the time.
So I want to share some things we developers can do in order to be wrong less. But first let me share a concrete example of the kind of mistakenness I’m talking about.
An example of being wrong
Bug: an appointment goes missing
Let’s say I’m building some scheduling software. One of my users, Rachel, reports to me that yesterday she rescheduled someone’s appointment from July 1st at 10am to July 3rd at 10am. Today, Rachel looked at both the schedule for July 1st and July 3rd and the appointment isn’t present on either day. Apparently there’s a bug that removes appointments from the schedule when you try to reschedule them.
So I start to look at the code and see if I can find any evidence that this buggy behavior is present. Unfortunately, the code is very complicated, and my investigation takes a long time. My investigation lasts an entire day. By the end of the day I’ve made almost no progress toward fixing the bug.
The bug was not the bug
Unbeknownst to me, the thing I thought was the bug was not actually the bug. In fact, there was no bug. Between the time Rachel rescheduled the appointment and the time Rachel found the appointment missing, another user, Janice. deleted the appointment. There was in fact no bug at all. I was wrong. I wasted a whole day as a consequence of being wrong.
How to be less wrong
We developers can be wrong less of the time by studying epistemology. Epistemology is a branch of philosophy which deals with the acquisition of knowledge. Epistemology tells us how we can know, with certainty, what’s true and what isn’t.
More narrowly, we can study logic. Logic is a branch of philosophy that deals with a formal system of reasoning. One of the central ideas of logic is that of an argument. Arguments are the ideas we’ll be focusing on in this post.
The definition of a logical argument
An argument is a group of statements including one or more premises and one and only one conclusion. (I shamelessly stole this definition word-for-word from this web page.)
Now let’s talk about what a premise is and what a conclusion is. As an aid I’ll share an example of a logical argument, henceforth just referred to as an “argument”.
All fish live in water.
All sharks are fish.
Therefore, all sharks live in water.
This argument contains two premises. “All fish live in water” is a premise. “All sharks are fish” is also a premise.
This argument’s conclusion is of course “Therefore, all sharks live in water”. If it’s true that all fish live in water and it’s true that all sharks are fish, then it’s of course true that all sharks live in water.
Validity and soundness
Not all arguments are good ones. An argument can be valid or invalid and sound or unsound.
An argument is valid if the truth of the argument’s conclusion is logically connected to the argument’s premises. Our above fish/shark argument is a valid argument because, if the argument’s premises are true, its conclusion must necessarily be true. We could make the argument invalid by changing some things.
All fish live in water.
All sharks are fish.
Therefore, all sharks have fins.
This argument isn’t valid because its conclusion doesn’t logically flow from its premises. It happens to be true that all sharks have fins, but that fact isn’t true as a natural consequence of this argument’s premises, so the argument isn’t valid.
Note that validity doesn’t have anything to do with truth. An argument can be valid even if its premises aren’t true.
All turtles are invisible.
Everyone has a turtle in their brain.
Therefore, everyone has an invisible turtle in their brain.
The premises of the above argument aren’t true (at least as far as I know) but the argument is nonetheless valid.
An argument is sound if the argument is valid and its premises are true. Our first argument (“all fish live in water, all sharks are fish, therefore all sharks live in water”) is sound because the argument is valid and its premises are true. Sound arguments always have true conclusions.
Here’s another sound argument.
Every 20th century American president has been male.
Richard Nixon was a 20th century American president.
Richard Nixon was male.
Now comes the fun part, where we apply logical arguments to programming.
Arguments in programming
Read the following argument, keeping in mind the definitions of validity and soundness. See if you can tell if the argument is valid or invalid, sound or unsound. (If you don’t want a spoiler, don’t scroll past the argument until you’ve read the full argument.)
The site is unusually slow today.
We performed a large deployment this morning.
The deployment is the cause of the slowness.
This argument is unsound. Even if the premises are true, we can’t know based on the premises that the deployment was the cause of the slowness. How do we know it’s not a coincidence? For all we know, our site got featured on Hacker News and a big traffic spike is the cause of the slowness. Our argument is unsound because its conclusion isn’t necessarily true based on its premises. So, the reason that the argument is unsound is because even though its premises are true, its logic is invalid.
Here’s another example. Instead of just two premises, this argument has three.
Sometimes slowness is caused by code changes.
Sometimes slowness is caused by traffic spikes.
The site is unusually slow today.
The cause of the slowness is either a code change or a traffic spike.
This argument is also unsound. There are more possible reasons for a site to be slow than just a code change or a traffic spike. For example, maybe our DevOps person killed half the servers in the middle of the night last night without our knowing it. So despite true premises, this argument, like the preceding one, is invalid.
Here’s another example.
The code in the most recent deployment introduced a bug.
The only thing that went out in the most recent deployment was Josh’s code.
Josh’s code caused the bug.
As long as this argument’s premises are true, this argument is sound. If we know for sure that the most recent deployment introduced a bug, and we know for sure that the only thing that went out in the most recent deployment was Josh’s code, then it does logically follow that Josh’s code caused the bug.
Here’s a final example. This one is a little more detailed than the previous ones.
At 10:32am, a duplicate $20 charge appeared in the system for patient #5225.
Also at 10:32am, Jason carelessly performed a manual actual action on patient #5225, an action that was related to that patient’s $20 charge.
Jason caused the duplicate charge.
This is another unsound argument. Even though it sounds likely that my action caused the duplicate charge, it’s not logically valid to make that inference based on the premises. The invalidity is perhaps not obvious, but can be made more apparent by asking the question: “Are there any possible circumstances under which Jason’s manual action would NOT have created the duplicate charge?” For example, it’s possible that the card could have accidentally been run twice, and the timing was a coincidence.
I have empirical proof of the invalidity of the above argument because this is a real-life example and, in fact, my action was not the cause of the duplicate charge. Part of what helped me determine this is the following sound argument.
It’s impossible to create a charge without having a patient’s credit card information.
It would have been physically impossible for me to involve the patient’s credit card information when I performed my manual action because we don’t store credit card information in the system.
My action couldn’t have created the duplicate charge.
The preceding sound argument (and remember, sound arguments always have true conclusions) led me to investigate more deeply. What I ultimately discovered was that the patient’s card did in fact get run twice and the timing was just a coincidence. Why wasn’t it obvious from the outset that the patient’s card was run twice? Because we use Authorize.net as a payment gateway, and apparently sometimes the Authorize.net API returns a failure response even when the charge was successfully incurred, so from the perspective of our application there was only one charge that got successfully created, even though in reality there were two.
Good luck with your arguments
Next time you’re confronted with a programming mystery, I invite you to frame your mystery in terms of arguments. Write down your premises, and make sure to write down only premises that are true, otherwise your argument will be unsound. Then try to come up with a conclusion and make sure that your conclusion necessarily follows from your premises so that your argument is valid. If your argument is valid and sound, your conclusion will necessarily be true.
If you’d like to have an argument with me on Twitter, you can find me here.
I had a similar situation with an ActiveMerchant gateway.
There are more duplicate transactions on days with high volume.
A failed web request due to not enough puma processes could lead users to retry the charge.
The problem is not enough capacity
The duplicates are always 30 seconds apart.
The library sends a second request to a secondary server after 30 seconds.
The gateway endpoints reduced the duplicate requests.
*swapping the primary and secondary endpoints.
This turned out to be the solution while the other was a systematic coincidence. Related but not the cause. Setting up an alert for latency on both endpoints confirmed it.