Top-down vs bottom-up: how to deep dive into a codebase

It’s not impossible to cruise through a programming task on a new codebase and actually complete it without understanding its logic, purpose, or context. Yet there are really only a couple of main approaches to sailing into uncharted waters and coming out the other end alive.

I find it helpful to imagine piecing together a puzzle; we can always start by tackling a dominant piece and working our way from it or rather study the big picture before getting started. Should the approach be the same with a 26-piece-puzzle or a 5K-piece puzzle? Do we write (automation or behavioral) tests, learn the business logic and draw diagrams or rely on conversations with colleagues first?

Read the documentation? Nobody has time for that!#100DaysOfCode #CodeNewbie #programming pic.twitter.com/aOXy2uJCeU
— Danny Thompson (@DThompsonDev) August 19, 2020

Whether you are contributing to a new open source project or need to dive into a new repo for work, the approach you opt for will impact your success rate or your teams’. It’s important to cherry-pick the right steps at the right time. Below I list some tips and quick wins I learned along the way for getting into a new repository.

Bridging the gap: top-down vs. bottom-up

Where would you start, if you just arrived at a new team and you were tasked with a backlog issue, for example fixing a bug, creating a button on screen or adding an integration somewhere?

The bottom-up approach, means trying to get right to the code lines that handle the task at hand. For example, looking for a unique string to pinpoint the specific code area closest to the feature or problem you’re looking to tackle. This approach can lead you to quickly solve the issue at hand. On the other hand, you may unknowingly implement components that have already been implemented, or not understand the impact of your changes on other parts of the codebase.

With top-down, we look for the larger picture – starting off with modules, , understanding each module’s responsibility, and understanding the main logic. But there are levels of Top-down. For example, you may have a micro-top-down process looking into a specific module (like the module responsible for DB access), or you might have a higher up module – such as the entire backend. The cons of doing an exaggerated top down on a monolithic codebase, is that it’s, plainly speaking, inefficient. It’s easy to hit the end of your exploration capacity and lose sight of the bigger picture when you’re not being hands-on.

While both approaches have merit, they are two extreme ends on the same continuum that are especially problematic for first-timers facing a daunting monolithic codebase. The truth is probably somewhere in the middle (as always). Otherwise, these relatively tempting dichotomous approaches will prove stressful and ineffective when facing delivery.

Rules of thumbs I try to follow when approaching my first task on a new codebase:

Get super specific context [Readme, tests and context]

Even if I want to get to know a very specific code area to complete a task on a repo, I’ll first get some general understanding by looking at the codebase’s Readme files and docs, and even try to look at recent bug-fixes and breakage points to understand the logic. Run the tests and make sure all the frameworks are working and suites are running without errors before starting the task at hand. For a better understanding of the code area, start writing your own tests. If part of a team, I would ask my team lead or mentor which modules and components my first task touches upon.

Big picture components

There are some things you should really understand before tackling the task itself. If you’re not sure, ask your team leader / mentor. Why are you writing this code? Why is this important and what’s the objective? What modules will your code interact with? Get a good overview as part of your onboarding but without aimlessly parsing through the codebase. Essentially, this means – start to look for the big picture of the components.

Quick history check or Git extras

We love to get frustrated with legacy code, but we also have to learn why some code came to be like that and why it’s still there. Check busy areas of the code through commit messages (look at Git Extras and specifically git effort command to see the number of commit files with the most activity). Another thing is understanding: What libraries and utilities do I need to know? What has already been implemented?

Check for interdependencies and interfaces

You can ask: “If I change X, what other modules do I need to change accordingly?”.

At Swimm for example, users create and run Unit files within the CLI. Users can then collaborate and document information from the repo to highlight different areas of the codebase for other users. We gave a new engineer on our team a task – to save the name of the author of such a Unit upon its creation. This appears to be a super-simple and focused task. Nevertheless, he needed to learn that we have another area that makes sure that what we save in the database is in a certain format so that we don’t send information from our clients (that we do not save by design).

The developer that received this task had to understand the interaction of three different areas – how to save the Unit’s author name (the CLI) without saving client information to the database (DB interaction), and display it in the front end (front end).

High exposure = patterns and structure

Once you’ve explored several areas and functionalities, even poorly documented legacy code, you’re bound to start making sense of the mess. You’ll find patterns, see how the code is organized, and gain an understanding of how the different components are connected and inevitably the codebase as a whole. This has to do with high quantity exposure and the way our brain can make up for missing pieces of data. In her new book, Badass: Making Users Awesome, Kathy Sierra states:

“after enough exposure with feedback, your brain [begins] detecting patterns and underlying structures, without your conscious awareness.”

Frontend vs backend: dive in like you’re the Captain

So far I described some high-level tips to apply for any module or code snippet you have to tackle. Yet, there are some tips that better apply to frontend or backend code, as I describe below.

Frontend tips:

After spending 80 hours on the frontend…

"Whew! I almost died. Ok let me knock out the API. Gimme 2 hours"
— Angie Jones (@techgirl1908) August 21, 2020

Before you start looking at the code – interact with it! Sign up and become a regular user. Try to imagine how each event or action happens in the backend. Play more with the app /website to see how all the changes you will make look and feel constantly. This will also help you consider what your code might affect.
Try to reuse existing components as much as possible. Be sure to ask your mentor or team lead if there are components you will find helpful in your first task(s).
Try the “friends and family test”. See how they experience and learn from their user-logic.

Backend tips:

Research the dependencies, the functions and main features you must deal with in the backend. Do some frontend investigating as well. It will help you understand how the changes you make in the backend might affect the user’s experience. You’ll need this along the way.
Find all places in the frontend that interact with your backend code. See if the behavior makes sense before and after your changes.
Skim some areas on git and Treat your Code as a Crime Scene. Understand the areas that you touch upon, and look at the code history. Using git extra’s commands is a good place to start.
Read the existing automation tests.

Summarizing thoughts:

Experts are not what they know, but what they do

– Kathy Sierra on Designing for Badass

Write code that others simply get. That means that your problem solving approach and line of thought are clearly reflected in the code.
Talk to your colleagues. Successful codebase-onboarding may be heavily influenced by good communication, remote or not, by mentoring or pairing programs, the quality of the tasks you receive, and learning more about the company culture and the people you work with.

While onboarding new codebases we have a rare opportunity to show immediate value to the team by contributing to missing documentation. Once you start to become an expert on a specific area you’ll grow more attentive to missing documentation and you’ll have reached the level of Ultimate Onboardee if you’re able to add steps to Readme files or deprecated documents.

To learn more about Swimm, sign up for a 1:1 Swimm demo.

Omer Rosenbaum

CTO & Co-founder

Omer founded the Check Point Security Academy and was the Cyber Security Lead at ITC, an educational organization that trains talented professionals to develop careers in technology. Omer has a MA in Linguistics from Tel Aviv University and is the creator behind the Brief YouTube Channel.