Modeling legacy code behavior using science

by Jason Swett,

When you want to understand what a legacy program you’re working on is supposed to do, what’s your first instinct? Often it’s to look at the code.

But unfortunately legacy code is often so convoluted and inscrutable that it’s virtually impossible to tell what the code is supposed to do just by looking at it.

In these cases it may seem that you’re out of luck. But fortunately you’re not.


Due to what philosophers call the veil of perception, direct knowledge of anything in the world is impossible. There’s very little that we can know with absolute completeness and correctness.

We don’t have knowledge about how most things work, we only have a model of how it works. A model is a useful approximation to the truth. Our models may not be absolutely complete or correct but they’re useful enough to get the job done on a day-to-day basis.

Take a car, for example. I don’t have a complete understanding of how every part of a car works. And what I do know probably isn’t even 100% accurate. But my model is sufficiently comprehensive and accurate that I can operate my car without being regularly surprised and confounded by differences in the ways I expect my car to behave versus the way it actually behaves.

Let’s use another example, trees. My understanding of trees is fragmentary. But like my model of cars, my model of trees is sufficiently comprehensive and accurate that trees don’t regularly throw me surprises. I understand trees to be stationary, and so when I cross the street, I never worry that I may be struck by a speeding tree. Although I do know that wind can blow over dead trees and so I’m cautious about entering the woods on a windy day.

How models are developed

How did I develop my models of cars and trees?

Trees, unlike cars, are natural. I can’t look at a tree’s source code or schematics to gain an understanding of how trees work. Everything humans know about trees has been discovered by observation and experimentation—in other words, scientific inquiry. We observe that trees start off small and then grow large. We observe that deciduous trees lose their leaves in the winter and coniferous trees don’t. And so on.

We can of course also learn about trees by reading books and learning from teachers, but that’s not source material, it’s only a regurgitation of the products of someone else’s scientific inquiry.

Cars are a little different. Cars are man-made, so in addition to learning about cars through observation and experimentation, we can learn by, for example, reading the owner’s manual that came with the car. And in theory, we could look at engineering diagrams and talk with the people who designed the car in order to gain a direct understanding of how the car works straight from the source material.

Cars are also different from trees in the sense that much of the mechanics of a car are very self-evident. You can learn something about how a car works by taking apart its engine. Not so much with a tree.

Inspecting a car’s transmission is analogous to reading a line of code. The thing you’re looking at is, itself, an instruction. It can’t lie to you. You can be mystified by it and you can misunderstand its purpose, but the car part can’t outright lie to you because the car part is how the car works.

Modeling the mind

Before we connect all this back to legacy code, I want to share one more example of scientific modeling.

The human brain is a complex machine. So is a car, so is a tree, and so is a codebase. But unlike a car or a codebase, we can’t observe the mechanics of the brain directly, at least not yet, not very much. We can’t look at the brain’s source code or insert breakpoints. For the purposes of understanding how it works, a brain is much more like a tree than a car.

But cognitive scientists have still managed to learn a lot about how the brain works. Or, more precisely, cognitive scientists have gained an understanding of the behavior that the brain produces. They’ve gained an understanding of how the mind (the outward expression of the machinery of the brain) works. They’ve developed a model.

This model of the mind has been developed in the same way that any accurate model has been developed: through scientific inquiry. A scientist can’t have a chat with the brain’s designer or have a look at its schematics, but a scientist can compare the behavior of people with normal brains against people who have had certain parts of their brains excised, for example, and make inferences about the roles that those parts of the brain play.

(Scientists can make diagrams of how they believe the brain works, but remember, those diagrams aren’t source material. They’re not code. They’re only a documentation of our current best understandings based on our scientific inquiry.)

So: if scientists can develop a model of the behavior generated by the brain without having access to the source machinery of the brain, what can we learn from scientists about how to understand the behavior of legacy systems without having access to comprehensible code?

Applying scientific inquiry to software systems

If you haven’t already done so, I’d like to invite you to make a conscious distinction between two ways of learning the behavior of a software system. One way is the obvious way that everyone’s familiar with: reading the code. The other way is the way that many people have probably used informally quite a bit but may not have consciously put a name to so much, which is scientific inquiry.

A full instruction in scientific inquiry is of course outside the scope of this post, and I wouldn’t be qualified to give one anyway. The point of this post is to invite you to consciously realize that you can develop a usefully accurate model of a software system not just by reading its code, but by using the methods of scientific inquiry.

If you’re interested in learning more about the methods of science, I would recommend The Magic of Reality by Richard Dawkins and The Demon-Haunted World by Carl Sagan.


  • Direct knowledge of anything in the world is impossible due to the veil of perception. All we can do is develop models, which are useful approximations to the truth.
  • Models are developed through the process of scientific inquiry.
  • In addition to reading the code, a model of a software system can be developed by using the process of scientific inquiry.

Leave a Reply

Your email address will not be published. Required fields are marked *