How Swimm uses static analysis to generate quality code documentation

Automatically generated documentation is the dream for most developers – especially those working at organizations that struggle with poorly understood legacy code. However, the rise of LLMs and AI tools for developers has introduced a new challenge: their tendency to “hallucinate”, producing highly convincing but incorrect information.

In this post, we’ll break down Swimm’s approach that addresses hallucinations head-on, combining static analysis with the controlled usage of AI and LLMs.

The foundation of our approach

Our solution is a deterministic, explaining methodology that prioritizes reliability over AI generation. This process involves 3 steps:

Code mapping: Using deterministic static analysis, we identify all relevant flows and logical components within the codebase. This creates a solid foundation for understanding the code’s structure and relationships.
Retrieval: The system deterministically retrieves relevant context for specific topics or questions. This ensures that all documentation is grounded in actual code rather than AI-generated assumptions.
Generation: Only in this step do we introduce LLMs, transforming the accurately retrieved context into coherent documentation. The crucial distinction here is that every part of the model’s output is anchored to specific parts of your codebase, preventing hallucinations.

Quality assurance through multiple channels

The commitment we’ve made to quality extends beyond our processes. In fact, we maintain a comprehensive set of evaluators, curated repositories and knowledge topics representing diverse use cases. Every AI feature update or release undergoes rigorous testing against these evaluators.

We also incorporate feedback mechanisms to continuously improve the quality of generated documentation:

Acceptance rate tracking: Monitors how often generated documents are committed to codebases
User satisfaction metrics: Implements a simple yet effective thumbs up/down system with optional feedback
Collaborative improvement: Features like “ask a teammate” enable users to source unclear or internal logic, with added information being incorporated back into the generation context

We’re LLM and language-agnostic

Our core technology works with various LLMs, allowing you to use approved internal models, access the latest advancements, and enabling us to test and deploy different models as needed.

Additionally, the Swimm platform features comprehensive language support through a language-agnostic design and specialized Language Plugins. This is how we’re able to handle both modern and legacy languages effectively, with the ability to create custom plugins for organization-specific languages or file formats.

Why it matters

Traditional documentation methods are time-consuming and often outdated as soon as they’re written. Meanwhile, pure LLM-based solutions risk introducing incorrect information that could mislead developers.

At Swimm, we leverage the power of AI while maintaining the accuracy and reliability of traditional documentation methods. By grounding every piece of generated documentation in actual code through static analysis, we ensure that teams can trust the documentation they’re working with.

This method combines the efficiency of automation with the reliability of traditional documentation, all while remaining flexible enough to adapt to specific organizational needs and future technological advances.

Omer Rosenbaum

CTO & Co-founder

Omer founded the Check Point Security Academy and was the Cyber Security Lead at ITC, an educational organization that trains talented professionals to develop careers in technology. Omer has a MA in Linguistics from Tel Aviv University and is the creator behind the Brief YouTube Channel.