What atomic commits are and why they’re advantageous
I like to make Git commits very frequently. Looking at the Git log for a recent project, it looks like I tend to commit about every 5-15 minutes.
I find that the smaller my commits are the easier I make life for myself. I remember painful occasions in the past where I would do a big chunk of work, maybe two hours or so. Things would be going fine, going fine, and then all the sudden my work would collapse on itself and I wouldn’t be able to figure out how to get it back to a working state.
At that point my options would be either to scratch my head for the next hour to try to figure out what went wrong or to revert the whole two hours’ worth of work. I had painted myself into a corner. When I teach programming classes I see students paint themselves into a corner like this all the time.
This kind of stuff doesn’t really happen to me anymore. One of the main reasons is that I practice atomic commits.
“Atomic commit” is basically a fancy way of saying a commit that commits one and only one thing. It’s a single complete unit of work.
Students of mine often find it funny that I’ll make a commit after changing just one or two lines of code. But a commit is not just something I do after completing a piece of work. A commit is something I do before I start a piece of work.
Let’s say I have Chunk A and Chunk B, two unrelated pieces of work. Chunk A is a 30-second change. Chunk B is a 20-minute change. A lot of people might not bother committing Chunk A because it’s so small. But then Chunk A “comes along for the ride” when I’m working on Chunk B. If I screw up Chunk B and have to bail and revert and start over, then I also end up reverting Chunk A. Or if a week later I find out Chunk B introduced a bug and I need to revert Chunk B at that point, Chunk A gets reverted as well even though it has nothing to do with Chunk B.
These are the things I have in mind when I commit what might seem to others like a comically tiny change.
Atomic commits also make it easier to track down the source of mysterious regressions. Let’s say a feature was known to be working on January 5th and broken on June 11th and nobody knows when exactly the feature broke. Git bisect can make it very quick and easy to find out exactly which commit introduced the regression. At least, it’s easy if the team has been practicing atomic commits. If each commit is huge and contains multiple pieces of work, git bisect
loses a lot of its usefulness.
By the way, I think “WIP” (work in progress) is one of the worst possible commit messages. First, it basically screams “This commit is not a complete unit of work.” Second, it’s about as vague as it gets. Committing “WIP” is basically saying “Rather than take 30 seconds to think of a meaningful commit message, I’d rather make you take several minutes to try to figure out what this commit is all about.” Please don’t commit “WIP”.
What atomic commits have to do with testing
I find it advantageous to make sure each commit leaves the application’s test suite in a passing state.
This way you’re free to check out any commit in the application’s history and you can have a reasonable expectation that the test suite will pass. (This becomes less true the further back in history you go, of course, but it’s at least true for the recent commit history which is usually the history you’re most interested in.)
Some developers don’t find it important to keep the test suite passing on every commit. When this is the case I might check out a certain revision and see that the tests don’t pass. Then I’m forced to wonder: is it supposed to be like this or was this an accident? There’s nothing more frustrating than getting tripped up by a failing test, finally asking someone about it, and getting the response, “Oh, yeah, that test is supposed to fail.” Allowing failing tests to be committed to the repo arguably defeats the purpose of having tests.
It’s okay for a commit not to add a test. It’s okay for a commit to add a passing test. But if a commit commits a failing test then the commit is not a complete unit of work, and the benefits of a repo full of atomic commits are to some extent lost.
I also often find it handy to use git bisect
in conjunction with an application’s test suite. Using git bisect
is all about going back to a certain point in time and asking “Does feature X work at this commit?” Sometimes the test that answers that question is a manual check. Sometimes it’s an automated test. If the team has a habit of making only small, atomic commits, using git bisect
together with tests is a lot easier.