Leapfrogging traditional vector-based RAG with language maps
TL;DR: Vector-based RAG performs poorly for many real-world applications like codebase chats, and you should consider 'language maps'. Also, our codebase chat at wiki.mutable.ai is available today.
At Mutable.ai, we want to make it much easier for developers to understand and build software, with a long-term mission to accelerate entire organizations with AI.
It might seem simple to plug in your codebase into a state-of-the-art LLM, but LLMs have two limitations that make human-level assistance with code difficult:
They currently have context windows that are too small to accommodate most codebases, let alone your entire organization's codebases.
They need to reason immediately to answer any questions without thinking through the answer "step-by-step."
We built a chat sometime a year ago based on keyword retrieval and vector embeddings. No matter how hard we tried, including training our own dedicated embedding model, we could not get the chat to get us good performance.
Below is a typical example asked of the llama.cpp repo:
As you can see, the answers were oddly specific and seemed to pull in the wrong context consistently, especially from tests. We could, of course, take countermeasures, but it felt like a losing battle.
So we went back to step 1, let’s understand the code, let’s do our homework, and for us, that meant actually putting an understanding of the codebase down in a document — a Wikipedia-style article — called Auto Wiki. The wiki features diagrams and citations to your codebase.
This wiki is useful in and of itself for onboarding and understanding the business logic of a codebase, but one of the hopes for constructing such a document was that we’d be able to circumvent traditional keyword and vector-based RAG approaches.
It turns out using a wiki to find context for an LLM overcomes many of the weaknesses of our previous approach, while still scaling to arbitrarily large codebases:
Instead of context retrieval through vectors or keywords, the context is retrieved by looking at the sources that the wiki cites.
The answers are based both on the section(s) of the wiki that are relevant AND the content of the actual code that we put into memory — this functions as a “language map” of the codebase.
See it in action below for the same query as our old codebase chat:
The quality of the answer is dramatically improved - it is more accurate, relevant, and comprehensive.
It turns out language models love being given language and not a bunch of text snippets that are nearby in vector space or that have certain keywords! We find strong performance consistently across codebases of all sizes. The results from the chat are so good they even surprised us a little bit - you should check it out on a codebase of your own, at https://wiki.mutable.ai
We are introducing evals demonstrating how much better our chat is with this approach, more on this soon!