Previously (here and here), we explored the significance of context in AI Coding Assistants, highlighting various forms it can take and the unique challenges each presents. In this article, we turn our attention to another type of context, which I believe is critical for the effectiveness of AI Coding Assistants.
This is part of a series of articles about AI tools for developers
Code documentation
Finding the perfect context comes with its own set of challenges. Let’s look at an additional resource for context: code documentation. This refers to documents written in natural language that explain the code. However – our focus for this post is documents, rather than code comments.
Take, for example, a document that details how to add a new command to Git’s Command Line Interface (CLI). Imagine you’re contributing to Git and want to introduce a new command, such as git new-command. We delved into this specific document more thoroughly in another article.
Why is this document helpful?
Because it clearly explains how different parts of the code are interconnected. In the case of this example, it guides the reader through a process that involves four different files, and despite being separate files, they all logically contribute to the addition of a single CLI command.
This is the ideal contextual input for LLMs as it’s a natural language description presented alongside the code itself. Since these models are predominantly trained on natural language, they find it much simpler to “comprehend” this format than raw code. Additionally, even for seasoned engineers, it’s much easier to grasp the code and, importantly, the relationships between different parts of the code when it’s accompanied by natural language explanations and chunked together into manageable segments.
Given a specific task to solve, the harder it is to understand the code, the more helpful such documentation is.
Note that the explanation here is highly coupled to the codebase, and highlights many specific details. The naming convention of using cmd_ prefix, the folder hierarchy (specifically, the folder builtin), the signature of commands, filenames (builtin.h, git.c, …), names of variables (BULTIN_OBJS) and many more. All of this information can ground a generative model to generate code that is more likely to be correct.
Given that such a document exists, the AI assistant’s job becomes much easier when trying to solve a relevant task.
Additionally, code documentation can explain things that are simply not in the code, for example – why we chose to implement something in a certain way.
Why is documentation important?
- Documents serve as a concise method to illustrate how different parts of the code are related. Understanding the code, especially the connections between various parts, becomes simpler when it’s paired with natural language explanations and organized into clear sections.
- Documents can shed light on aspects that aren’t evident in the code itself, such as the rationale behind choosing a specific implementation approach.
- The format of these documents is particularly suitable for Large Language Models (LLMs). They combine natural language with code, which is easier for a model trained primarily on natural language to interpret and understand than code alone.
Helping humans create documentation that highlights their individual expertise
Code documentation is a great source of context for AI coding assistants. And in some cases, having an engineer in the loop is crucial for creating the right context. The key challenge here is encouraging humans to infuse their distinct knowledge into the documentation while automating the rest of the process.
At Swimm, our speciality is helping engineers transfer their unique and individual insights into written documents, while handling the rest of the process automatically. The result? Code documentation that is the optimal context for LLMs.
Here are a few of the ways that Swimm helps developers create documentation that includes their specialized knowledge:
- PR2Doc automatically generates documentation from pull requests, allowing engineers to add their unique knowledge to the documentation. PR2Doc is a great way to document business logic and design decisions as part of the SDLC.
- Chat to Doc is a new feature – we call it /ask. Simply put, developers ask questions and get answers. They can then add their own input into the conversation and then Swimm will create documentation based on the conversation with the AI coding assistant.
Nothing really matters unless it’s up to date
The wrong context leads to the wrong results, and it’s as simple as that.
This is especially true when it comes to code documentation and one of its classic problems: the fact that it’s never fresh and almost always out of date. If there isn’t a mechanism to automatically update documentation in tandem with code changes, the whole effort of creating it may seem futile. This is where Swimm’s patented Auto-sync technology comes into play, addressing this exact challenge by ensuring documentation stays current with the evolving code.
Wrapping up
Documentation is an indispensable tool when it comes to enhancing the effectiveness of the responses generated by AI coding assistants. Well-written and up-to-date code documentation serves as a bridge, connecting the world of coding with the intuitive understanding of natural language, and therefore empowers AI Coding Assistants to function more effectively and accurately.