How to get familiar with a new codebase

by Jason Swett,

The question

Someone submitted the following question to me on my ask me a question page:

One of the things I learned from my RoR bootcamp (Le Wagon) was test driven dev. Using that knowledge, when I joined my first job, I wasn’t given much guidance to onboard (start up and I was first internal dev), so I started by reading and learning the system from the rspec files.

What would your advice be for new to industry, junior devs as to the most effective way to learn a new monolith / code base?

Here’s my advice for how to get familiar with a new codebase.

Having the right objective

The first step in getting familiar with a new codebase is to realize that it’s an impossible goal!

Unless the codebase is really tiny, no one, no matter how smart or experienced, can understand the whole thing. And even if you could understand the whole thing, I think it would be a waste of effort.

Rather than trying to understand an entire codebase, I think it’s more useful to try to understand the area of the code where you need to make a change. After all, the only reason for needing to understand an area of code is in order to safely make a change to it.

Now let’s talk about how to understand a piece of code. I find it helpful to list the obstacles to understanding a piece of code.

The obstacles to understanding code

The following things can stand in the way of understanding a piece of code:

  • Unfamiliar technologies
  • Unfamiliar domain concepts
  • Sheer complexity
  • Poor-quality code

Let’s discuss each, including how to address it.

Unfamiliar technologies

The answer to this one is straightforward although not easy: get familiar with those technologies.

When I have to learn a new technology, I personally like to spin up a scratch project to teach myself about it in isolation. Learning a new technology is easier when it’s not mixed with other stuff like unfamiliar domain concepts and somebody else’s code.

Sometimes I like to bounce between a scratch project and a production project. Too much scratch coding and the learning can get too detached from what’s relevant to the production project. To much production coding and it can be hard to separate the difficulties presented by the unfamiliar technology from the difficulties presented by everything else.

Unfamiliar domain concepts

Domain knowledge can be hard to acquire. Software technologies often have documentation you can read, but domain knowledge often has to be acquired through experience or just by having someone tell you.

The unfortunate truth about domain knowledge is that, quite often, you just have to ask your co-workers to tell you. If you’re lucky you may be able to supplement your learning with things like Wikipedia and books.

Sheer complexity

Complex things are obviously harder to understand than simple things. Sometimes it’s helpful to acknowledge that in addition to unfamiliar technologies, unfamiliar domain concepts and hard-to-understand code, some things are just complex. To me, articulating precisely why something is hard to understand is half the battle toward understanding it.

When I want to try to understand something complex, I try to break it down into parts. For example, I started to understand cars a lot better when I understood that a car consists of several somewhat separate systems including the engine, the braking system, the steering system, the heating and cooling system, etc. Cars became a little easier for me to understand once I realized that these separate systems were present and that I could understand each system more or less in isolation.

Poor-quality code

My definition of bad code is code that’s hard to understand and change. Sadly, quite a lot of code in the world, perhaps the vast majority of it, is pretty bad, and therefore hard to work with.

One of the highest-yield techniques I’ve encountered for understanding bad code (which I learned from Working Effectively with Legacy Code) is to do a “scratch refactoring”. With this technique, I take a piece of code and freely rename variables, move things around, etc., with no intention of ever committing my changes. Sometimes this act can lead to a useful burst of insight.

Honestly, the most helpful thing I can say about dealing with legacy code is that you should buy Working Effectively with Legacy Code and read it. There’s too much to say about legacy code for me to repeat it all here, and most of what I could say would be redundant to the book anyway.

Takeaways

  • Getting familiar with an entire codebase is impossible. Instead, focus on getting familiar with the parts you need to change.
  • There can be several different reasons why an area of code can be hard to understand. When trying to understand a piece of code, try to identify the reason(s) the code is hard to understand and then address each reason individually.

Leave a Reply

Your email address will not be published.