Note: If you disagree with what’s expressed in this post, I encourage you to also take a look at my more nuanced and comprehensive post on the topic of code duplication, which gives a more thorough refutation of some of the popular ideas around duplication.
“Duplication is cheaper than the wrong abstraction” is a saying I often hear repeated in the Rails community. I don’t really feel like the expression makes complete sense. I fear that it may lead developers to make poor decisions. Here’s what I take the expression to mean, why I can’t completely get on board with it, and what I would advise instead.
The idea (as I understand it)
My understanding of the “duplication is cheaper than the wrong abstraction” idea, based on Sandi Metz’s post about it, is as follows. When a programmer refactors a piece of code to be less duplicative, that programmer replaces the duplicative code with a new, non-duplicative abstraction. So far so good, perhaps. But, by creating this new abstraction, the programmer signals to posterity that this new abstraction is “the way things should be” and that this new abstraction ought not to be messed with. As a result, this abstraction gets bastardized over time as maintainers of the code need to change it yet simultaneously feel compelled to preserve it. The abstraction gets littered with conditional logic to behave different ways in different scenarios, and eventually becomes an unreadable mess.
I hope this is a fair and reasonably faithful paraphrasing of Sandi’s idea. Here are the parts of the idea that I agree with, followed by the parts of the idea that I take issue with.
The parts I agree with
In my experience it’s absolutely true that existing code has a certain inertia to it. Whether it’s out of caution or laziness or some other motive, it seems that a common approach to existing code is “don’t mess with it too much”. This is often a pretty reasonable approach, especially in codebases with poor test coverage where broad refactorings aren’t very safe. Unfortunately the “don’t mess with it too much” approach (as Sandi correctly points out) often makes bad code even worse.
I also of course agree that it’s bad when an abstraction gets cluttered up with a bunch of conditional logic to behave differently in different scenarios. Once that happens, the abstraction can hardly be called an abstraction anymore. It’s like two people trying to live in one body.
I also agree with Sandi’s approach to cleaning up poorly-deduplicated code. First, back out the “improvements” and return to the duplicated state. Then begin the de-duplication work anew. Good approach.
The parts I take issue with
What exactly is meant by “the wrong abstraction”?
I think “the wrong abstraction” is a confused way of referring to poorly-de-duplicated code. Here’s why.
It seems to me that what’s meant by “the wrong abstraction” is “a confusing piece of code littered with conditional logic”. I don’t really see how it makes sense to call that an abstraction at all, let alone the wrong abstraction.
Not every piece of code is an abstraction of course. To me, an abstraction is a piece of code that’s expressed in high-level language so that the distracting details are abstracted away. If I were to see a confusing piece of code littered with conditional logic, I wouldn’t see it and think “oh, there’s an incorrect abstraction”, I would just think, “oh, there’s a piece of crappy code”. It’s neither an abstraction nor wrong, it’s just bad code.
So instead of “duplication is cheaper than the wrong abstraction”, I would say “duplication is cheaper than confusing code littered with conditional logic”. But I actually wouldn’t say that, because I don’t believe duplication is cheaper. I think it’s usually much more expensive.
When duplication is dearer
I don’t see how it can be said, without qualification, that duplication is cheaper than the wrong abstraction. Some certain things must be considered. How bad is the duplication? How bad is the de-duplicated code? Sometimes the duplication is cheaper but sometimes it’s more expensive. How do you know unless you know how good or bad each alternative is? It depends on the scenario.
If the duplication is very small and obvious, and the alternative is to create a puzzling mess, then that duplication is absolutely cheaper than the bad code. But if the duplication is horrendous (for example, the same several lines of code duplicated across distant parts of the codebase dozens of times, and with inconsistent names which make the duplication hard to notice or track down) and the alternative is a piece of code that’s merely imperfect, then I would say that the duplication is more expensive.
In general, I find duplication to typically be more much expensive than the de-duplicated alternative. Duplication can bite you, hard. The worst is when there’s a piece of code that’s duplicated but you don’t know it’s duplicated. In that case you risk changing the code in one place without realizing you’re creating an inconsistency. It’s hard for a poor abstraction to have consequences worse than that.
Programmers’ reluctance to refactor isn’t a good justification for not DRYing up code
If a programmer “feels honor-bound to retain the existing abstraction” (to quote Sandi’s post), then to me that sounds like a symptom of a problem that’s distinct from duplication or bad abstractions. If developers are afraid to clean up poor code, then I don’t think the answer is to hold off on fixing duplication problems. I think the answer is to address the reasons why developers are reluctant to clean up existing code. Maybe that reason is a lack of automated tests and code review, or a lack of a culture of collective ownership. Whatever the underlying problem is, fixing that problem surely must be a better response than allowing duplication to live in your codebase.
My alternative take, summarized
Instead of “duplication is cheaper than the wrong abstraction”, I would say the following.
Duplication is bad. In fact, duplication is one of the most dangerous mistakes in coding. Except in very minor cases, duplication is virtually always worth fixing. But not all possible ways of addressing duplication are equally good. Don’t replace a piece of duplicative code with a confusing piece of code that’s made out of if statements.
When you find yourself adding if statements to a piece of code in order to get it to behave differently under different scenarios, you’re creating a confusion. Don’t try to make one thing act like two things. Instead, separate it into two things.
If you feel reluctant to modify someone else’s code, ask why that is. Is it because you feel like you’ll get in trouble if you do? Is it because you don’t understand the code, and there’s little test coverage, and you’re afraid you’ll break something if you make changes that are too drastic? Whatever the underlying reason for your reluctance is, it’s a problem, because it’s holding your organization back from improving its code. Instead of adding more bad code on top of the existing bad code, see if there’s anything you can do to try to address these underlying issues.
Takeaways
- As Sandi Metz says (but in my words), confusing code littered with conditionals is not a good way to address duplicative code.
- A piece of code filled with conditionals isn’t really an abstraction or even “wrong”, it’s just a confusing piece of code.
- Duplication is one of the most dangerous mistakes in coding, and almost always worth fixing. Unless someone really botches the job when de-duplicating a piece of code, the duplicated version is almost always more expensive to maintain than the de-duplicated version.
- Try to foster a culture of collective ownership in your organization so that developers aren’t afraid to question or change existing code when the existing code gets out of sync with current needs.
- Try to use risk-mitigating practices like automated testing, small changes, and continuous deployment so that when refactorings are needed, you’re not afraid to do them.
This is a great post! Thank you for writing it. The statement I use is “code duplication is better than the wrong business abstraction”.
A lot of these statements/concepts come out of Agile, Object-Oriented Programming, and Extreme Programming recognize that there are two things evolving unevenly: the understanding of the business domain, and then laggingly, the development of the code that models the understanding of the business domain.
A better expansion here as I understand the concept would be “duplication of code is acceptable if the proposed extraction/abstraction does not meaningfully model the current understanding of the business domain”.
The power of the statement, as I use it (and as I see these things go from Agile/OOPS/XP) is to push someone to get closer to the business domain (and the user) in their thinking/proposals/justifications and as a counterweight to DRY. The problem with these things is when they get imposed as a criticism, rather than the starting place of a conversation.
I think you give solid advice:
> Don’t try to make one thing act like two things. Instead, separate it into two things. If you feel reluctant to modify someone else’s code, ask why that is.
Thanks for the thoughtful comment and the kind words!
Great post.
I’ve always found this advice to be suspect. What I’ve found is that the frequency you hear this saying is highly correlated to teams whose code suffers from very little abstraction. What has bothered me about it is that it’s often said by influential senior engineers meaning one thing, but heard by junior members of the team as ‘avoid abstraction, you can do it wrong’.
It is interesting to see other interpretations of Sandy’s words. Thank you for this insightful article!
My initial understanding of “wrong abstraction” overlaps with Ben’s i.e wrong in relation to the business domain. Yet, I would omit “present understanding” because it is our present understanding, in the first place, that led us to the wrong abstraction.
As knowledge accumulates with time, our understanding of a domain will never be more lacking that it is now. In other words the more we wait, to introduce an abstraction – the better. Yet, waiting has a cost.
A wrong abstraction, as I understand it, would be one that is not backed by sufficient knowledge of the problem domain.
One thing I believe needs more clarification is the statement “Duplication is bad. In fact, duplication is one of the most dangerous mistakes in coding”.
Duplication is not always bad. In the context of distributed work/services it can provide flexibility and loose coupling. In chapter 2 of their new book “Software Mistakes and Tradeoffs”, Tomasz Lelek and Jon Skeet provide a compelling argument supporting code duplication. It is worth reading.
Thanks again and have a great day!
Thanks for this article Jason, it’s really interesting as your writings always area.
I’ve read Sandi’s article and I agreed with it because what I’ve understood from it is she was talking about what in my mind looks like a case of premature optimization.
Reintroducing the duplication is a way to “start fresh” and create a better abstraction instead and not to keep the duplication for ever.
In her words:
‘Once you completely remove the old abstraction you can start anew, re-isolating duplication and re-extracting abstractions.’
Not in that article but she said something in that line in her book. It is easier to promote code from a concrete class to an abstract one than the other way around. I like that, let’s say, methodical approach.
As always your insights are really helpful. Thanks for that Jason.
> Duplication is bad. In fact, duplication is one of the most dangerous mistakes in coding. Except in very minor cases, duplication is virtually always worth fixing
Without qualification (as you say about Sandi’s viewpoint), this take is pretty poor. Duplication is definitely not “virtually always worth fixing”
In most cases that I’ve seen, the problems always begin from over-DRYing code. Then juniors come along and feel honor-bound, just like Sandi said, and, in 12-24 months, it’s completely incomprehensible.
Duplication is definitely not “virtually always worth fixing”
To me that doesn’t sound like an argument against DRYing code, it sounds like an argument against over-DRYing code – which I would agree with.
3 copies and you refactor.
There are levels of DRY and most people imo misunderstand actual duplication. A is A. A is not Á.
Duplication has to be exact. Everything in a piece of code has to be the exact same for it to be a duplicate.
Getting out of an abstraction is harder than getting an abstraction right. It locks you in more than duplication.
This doesn’t make duplication correct in terms of DRY. I would suggest small abstractions that still leaves the door open for not utilising them.
A good example of this is a code base that enforces e.g a repository pattern or generic base classes. It’s too much abstraction and leaves the door closed on flexibility.
Abstractions should be done very very carefully.
The problem with all of this is that it’s hard to quantify poor code from developer to developer. It’s practically standard operating procedure for a programmer taking over an existing code base to say how crappy all the code is. Often, “crappy code” is code that the current developer doesn’t have the whole story or context on.
the DRY principle might be in some situations under double check, specially when we face it to other principles such as [The rule of three](https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming)) and the *Duplication is far cheaper than the wrong abstraction* principle. Briefly said, *DRY* will require some kind of abstraction under which the duplication hides, and abstraction requires a certain level of accuracy and expertise so that it can handle the repeated cases without the need to handle missing ones with nasty conditions. There is no free lunch!