Expanding Context in AI Coding Assistants: Utilizing Natural Language

In this series, we’re discussing the different types of context that help build AI coding assistants. If you missed part 1, here is a brief (and very simplified) explanation:

AI coding assistants are designed to take on a specific task, equipped with relevant contextual information, and then generate a solution based on the given input. And a key component of AI coding assistants is RAG (Retrieval-Augmented Generation). RAG enhances LLMs by combining the generative capabilities of these models with the ability to retrieve up-to-date, relevant, and diverse information from external sources.

In this article, we will move beyond source code as context and discuss other examples, those written in natural language.

This is part of a series of articles about AI tools for developers

Issues, pull requests & commits as contextual sources

This trio serves as a valuable source for AI coding assistants. These elements, mainly written in natural language, provide insights that can’t be deduced directly from the code itself.

Here’s what they can help illuminate:

Business logic: They help in understanding the reasons behind code changes. This includes the business needs or user requirements that the code is meant to address. These can be found most commonly in the issue description
Technical reasoning for specific changes: mostly found in commit messages, engineers describe the reasoning behind a technical implementation that is not explicitly written anywhere else.
Code functionality and code changes: Commits and PRs create a multi-layered view of the codebase: not only the most recent status of the code, but also how it was built – layer by layer. This gives a clear picture of the code’s evolution, which sometimes accounts for things to avoid (as we had tried them and then had to change), or the initial idea behind an implementation.

Although these contextual sources theoretically offer great value, utilizing them effectively does present some challenges. Hurdles include determining the relevance that corresponds to the specific context or problem at hand. Additionally, the information provided in these sources might be outdated or inaccurate, and would require a process to validate current relevance — and this is a challenge in and of itself. To be explicit – if we feed the AI assistant with outdated textual information, the AI assistant is liable to regard it as true and accurate, generating erroneous and misleading answers.

Internal knowledge bases

In addition to sources within repositories, context can also be pulled from other internal tools. Internal knowledge bases can include insights from Project Management tools like Jira, and AI assistants can extract information about tasks, bugs, and feature requests, providing context on a project’s progress and priorities.

There are also more informal knowledge bases that could be great sources of context, like Slack. Slack channels are treasure troves of informal knowledge sharing – from quick bug fixes to discussions about architectural decisions.

Despite their potential, these knowledge bases present challenges similar to those discussed above. As with issues and PRs, the information stored in these tools may not always be up-to-date or accurate, requiring careful scrutiny and cross-referencing.

Additionally, extracting relevant information from Slack and Jira is complicated by their detachment from the codebase. Finding the exact conversation or ticket that holds the key to a particular coding problem is like searching for a needle in a haystack.

Wrapping up

Even though Issues, PRs, and commits, and tools like Slack and Jira are packed full of helpful information for AI coding assistants, making the most of this data is not an easy task. Dealing with issues of relevance, accuracy, and actually finding the information needed are just a few of the challenges to consider.

Next up: we’re going to discuss documentation.

Omer Rosenbaum

CTO & Co-founder

Omer founded the Check Point Security Academy and was the Cyber Security Lead at ITC, an educational organization that trains talented professionals to develop careers in technology. Omer has a MA in Linguistics from Tel Aviv University and is the creator behind the Brief YouTube Channel.

Contextualizing AI: Natural language

Issues, pull requests & commits as contextual sources

Internal knowledge bases

Wrapping up

Keep reading