RAG Fusion

Open-source methodology combining retrieval-augmented generation with reciprocal rank fusion and generated queries. 900+ GitHub stars and widely adopted across the AI community.

2023–Present

GitHub Repository Read the Blog Post

Technologies

PythonRAGLLMReciprocal Rank FusionInformation RetrievalChromaDBBM25Open Source

The Wonderful World of RAG Fusion, illustrated with a cartoon cat professor

Project Overview

My journey into retrieval techniques started at Mendeley, where I rebuilt the search experience from scratch. I was fortunate to be surrounded by mentors who guided me through the fundamentals: what precision and recall actually mean, how lexical search differs from vector search, how to think about relevance at scale. After Mendeley, I moved from user-facing product work to helping build platform search capabilities across Elsevier's product ecosystem, and that's where I started encountering more complex retrieval problems. One technique kept catching my attention: Reciprocal Rank Fusion. The idea of combining multiple ranked lists into a single, better ranking felt elegant. The problem was that it wasn't very practical unless you were already sitting down and manually pre-formulating a range of queries before firing them off. A useful technique with no scalable workflow.

Then LLMs changed the equation. During my early experiments on the Scopus AI prototype, I was thinking about retrieval in two ways. One was straightforward: generate variations of the same query to improve recall, so you weren't missing papers that used different terminology. The other was more ambitious: what if you asked a question from many different perspectives, so that when you combined the results, you'd get a genuinely comprehensive overview of a topic rather than just the papers that happened to match one phrasing?

At the time, I was quite obsessed with knowledge management systems like Obsidian, using them to collect notes and then applying search techniques to explore a topic from different angles to help with my writing. The thought kept nagging: wouldn't it be brilliant if you could apply that to everything? What if the LLM itself could write the alternative queries, which in turn would lead to a much better search experience?

Project Ramble: my personal search system, where I hooked up my Obsidian notes to vector search combined with GPT-3 in 2022

Project Ramble interface showing a personal search system combining Obsidian notes with vector search and GPT-3

The first question I remember testing it with was "how to live a good life." It's been my go-to ever since, precisely because it's multifaceted, doesn't have a single right answer, and forces a system to think broadly. I remember seeing the results come back and feeling genuinely elated. The difference was obvious right away. I was so excited that I called up friends and close colleagues and talked endlessly about it, probably to the point of annoyance. But I could see I'd uncovered something valuable.

"How to live a good life": the first query I tested, and still my go-to for evaluating new systems

Project Sebright v0.8 demo showing a keyword graph for 'How to live a good life' with branches to Anthropology of Morality, Science of Well-being, Positive Psychology

That's really what became RAG Fusion. It started as a way to improve retrieval in my own knowledge base, but it quickly became obvious the idea had much broader application. I wrote it up with a deliberately provocative title, "Forget RAG, the Future is RAG Fusion", and open-sourced the implementation. The repo has since grown to over 900 stars on GitHub. I'll admit the title was a bit much, but the technique does what it says on the tin.

The Problem with Standard RAG

Standard RAG is brilliant at what it does: take a query, find relevant documents via vector search, and use them to ground a language model's response. But it inherits a fundamental constraint from the search step. You get one query, which means one angle, one vocabulary, one way of describing what you're after.

Searches for RAG skyrocketing in 2023, but the technique still had a core limitation

Google Trends graph showing searches for Retrieval Augmented Generation skyrocketing in 2023

Researchers rarely think in single queries. Someone investigating the relationship between gut microbiota and depression might also be interested in papers about the gut-brain axis, serotonin production in the intestinal tract, or probiotic interventions for mood disorders. These are all facets of the same question, but a single vector search will only find documents close to whichever phrasing the user happened to type first. The other 90% of potentially transformative work stays buried.

The issue isn't that vector search is bad. It's that human questions are multidimensional, and a single retrieval pass only captures one dimension. What I wanted was a system that could think about the question from several angles before deciding what was relevant.

How It Works

Multi-Query Generation

One question, many perspectives

An LLM takes the original query and generates multiple reformulations, each capturing a different angle or vocabulary for the same underlying question.

Diagram showing how a single query is expanded into multiple generated queries, each exploring different facets of the original question

•Generates 4+ alternative queries from different perspectives
•Diverse prompt variant explicitly requests different angles, synonyms, and varied specificity
•Casts a wider net across the document space than any single query could

Parallel Retrieval

Search from every angle

Each generated query runs independently against the vector store, producing separate ranked result lists. The Hybrid variant also runs BM25 keyword search alongside vector search for each query.

•Vector search via ChromaDB embeddings for semantic matching
•Optional BM25 keyword search for exact term matching (the "free lunch")
•Cross-disciplinary discovery: finds relevant work across fields the user might not have considered

Reciprocal Rank Fusion

The best of every perspective

All ranked results are combined using Reciprocal Rank Fusion, a technique developed by Cormack, Clarke, and Büttcher at the University of Waterloo and Google (SIGIR 2009). RRF boosts documents that appear consistently well across multiple query perspectives. In their words, it "yields results better than any individual system, and better results than standard reranking methods." Documents that are only relevant to one framing get naturally deprioritised.

The Reciprocal Rank Fusion algorithm showing how scores from multiple ranked lists are combined using the formula score = sum of 1/(k + rank)

•Produces a final re-ranked list prioritising consistently relevant documents
•No training required: works as a drop-in enhancement to any existing RAG pipeline
•Optionally synthesised into a natural language answer via LLM

The Full Pipeline

The RAG Fusion pipeline: one question in, multiple perspectives out, re-ranked results back

RAG Fusion architecture diagram: User Query flows to Generate Similar Queries, which fans out to multiple Vector Search queries, results merge via Reciprocal Rank Fusion into Re-ranked Results and finally Generative Output

The mechanism diagram above shows how RAG Fusion's pipeline connects together. What made this approach click for me was realising that the individual components aren't new. Multi-query generation, vector search, and rank fusion are all established techniques. The contribution is in how they combine: the LLM handles the creative work of reframing the question, vector search handles retrieval at scale, and RRF handles the aggregation without needing any training data or parameter tuning.

The RAG Fusion mechanism in detail

Detailed diagram of the RAG Fusion mechanism showing query generation, parallel vector searches, and reciprocal rank fusion

How RRF re-ranks results by positional scoring

Visualisation of how Reciprocal Rank Fusion combines positional rankings from multiple search result lists

Evaluation Results

The repo now includes a quantitative evaluation harness using the NFCorpus dataset from the BEIR benchmark: 3,633 medical and nutrition documents with 323 test queries and graded relevance judgements. Six retrieval methods are compared, from basic BM25 keyword search through to the full Hybrid+Diverse pipeline.

+22%

NDCG@5

ranking quality

+40%

Recall@10

coverage

+25%

MRR

first relevant result

+15%

Precision@5

top-result accuracy

Hybrid+Diverse vs single-query vector search baseline (50 queries, seed=42)

Three things stood out from the evaluation. First, hybrid search is essentially a free lunch: fusing BM25 and vector results via RRF costs nothing extra and consistently improves ranking quality, especially MRR. Second, the diverse prompt outperforms standard RAG Fusion by forcing the model to explore genuinely different angles rather than generating semantically close variations. Third, the two techniques complement each other cleanly through RRF, producing the strongest results across every metric.

Impact

Open Source Adoption

900+ GitHub stars and 110+ forks. Referenced in production systems, research papers, and conference talks across the AI community.

Production Impact

Core component of the Scopus AI prototype, which became a research platform used by millions of researchers worldwide.

Multiple Patents

The retrieval techniques developed during this work are covered by multiple patent applications at Elsevier.

Viral Article

"Forget RAG, the Future is RAG Fusion" became one of the most widely shared articles on retrieval-augmented generation, reaching audiences well beyond the AI research community.

What surprised me most was the speed and breadth of adoption. It started with posts on LinkedIn and X, then people began writing academic papers about it. One researcher from South America even asked if I'd be willing to help integrate it into drone technology for warfare, which was certainly not an application I'd anticipated when writing about search relevance. The GitHub stars kept climbing, forks multiplied, and people started publishing their own articles explaining and extending the technique. It felt like I'd finally made something that had driven real impact, and on a topic I genuinely cared about. That's exactly the reason I wanted to be in the technology space.

Where It Stands Today

I still think RAG Fusion is a solid technique, and the evaluation results back that up. But the field moves quickly, and there are alternatives emerging. The one I've come across that sometimes performs a bit better is having the AI generate an answer first and then working backwards from there to find supporting evidence. The trade-off is that approach inherits all the biases and knowledge limitations of the language model itself.

In medicine, there's a name for this failure mode: anchoring bias. A doctor latches onto the first diagnosis and then unconsciously filters all subsequent evidence through that lens. The generate-answer-first approach is essentially anchoring bias as a retrieval strategy. It can look impressive on benchmarks, but it will only ever find evidence for what the model already believes.

What RAG Fusion does differently is work from first principles. You're building knowledge from the ground up rather than starting from what the model already thinks it knows. I think that matters, especially when you're trying to explore a space where no good answer exists yet, or where you need to discover things the model hasn't been trained on. And the evaluation shows it performs even better when you combine different search techniques together, vector and lexical, through RRF. That complementarity is the real strength.

Try It Yourself

RAG Fusion is fully open source and MIT licensed. The evaluation harness lets you benchmark different retrieval strategies on a real dataset with a single command.

View on GitHub

Read the Blog Post

Back to Projects