Re-Ranked RAG improves information retrieval

💡

All code and notebooks used in this post are available here → github.com/kdshreyas/rag-comparative-demo

Retrieval-Augmented Generation (RAG) has transformed how we build domain-aware AI applications. But while Naive RAG gets you started, advanced retrieval techniques like CrossEncoder Re-Ranking can drastically improve your results, especially when accuracy matters.

In this post, we’ll walk through a practical comparison between Naive RAG and Re-Ranked RAG, using real code and examples from an insurance domain use case.

🧠 The Problem

Let’s say you're building a chatbot that answers policy-related questions from a PDF. You’re using a local LLM (Gemma 3B via Ollama) and a vector store (ChromaDB) powered by nomic-embed-text-v1.5.

The goal? Build a retrieval-based QA system that can:

Extract accurate answers from policy documents.
Handle ambiguous or layered queries.
Minimize hallucinations and irrelevant responses.

🔧 The Stack

Here’s what we used:

Local LLM: gemma3:latest via Ollama
Embedding model: nomic-embed-text-v1.5 (local)
Vector DB: ChromaDB
Document loader: UnstructuredLoader
Reranker: cross-encoder/ms-marco-MiniLM-L-6-v2 via sentence-transformers

🧱 Step 1: Basic Naive RAG Setup

We start with a classic RAG pipeline:

Load and clean the PDF.
Split it into chunks using SentenceTransformersTokenTextSplitter.
Generate embeddings.
Store in ChromaDB.
Retrieve the top 10 docs and pass them to the LLM for generation.

Retrieval + Generation (Naive RAG)

def generate_answer(question):
    query_embedding = embedding_model.embed_query(question)
    results = retriever.query(query_embeddings=[query_embedding], n_results=10, include=["documents"])
    retrieved_documents = results["documents"][0]
    context = "\n".join(retrieved_documents)

    prompt = f"""
    You are an assistant for answering questions using the provided context.
    Context:
    {context}
    Question: {question}
    Answer in 2–3 sentences.
    """

    response = llm.invoke([HumanMessage(content=prompt)])
    return response.content

🧠 Step 2: Add Cross-Encoder Re-Ranking

Naive RAG may retrieve semantically similar chunks, but not all are equally relevant. So we apply a CrossEncoder to re-score the top 10 chunks based on their relevance to the query.

Reranking Function

cross_encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")

def rerank(docs, question):
    pairs = [[question, doc] for doc in docs]
    scores = cross_encoder.predict(pairs)
    top_docs = [doc for _, doc in sorted(zip(scores, docs), reverse=True)[:5]]
    return "\n\n".join(top_docs)

Re-Ranked Answer Generator

def generate_answer_with_rerank(question):
    query_embedding = embedding_model.embed_query(question)
    results = retriever.query(query_embeddings=[query_embedding], n_results=10, include=["documents"])
    raw_docs = results["documents"][0]
    context = rerank(raw_docs, question)

    prompt = f"""
    You are an assistant for answering questions using the provided context.
    Context:
    {context}
    Question: {question}
    Answer in 2–3 sentences.
    """

    response = llm.invoke([HumanMessage(content=prompt)])
    return response.content

🔍 Real Examples – Naive vs Re-Ranked RAG

✅ Example 1: “What is Insured’s Declared Value (IDV)?”

Naive RAG:

"IDV is the manufacturer’s listed price adjusted for depreciation."

✅ Accurate but incomplete.

Re-Ranked RAG:

"IDV is the market value of the vehicle and the maximum amount payable in case of a total loss or theft, without deducting depreciation."

🎯 It adds critical details about claims, making it more useful.

✅ Example 2: “How is No Claim Bonus (NCB) calculated?”

Naive RAG:

"No Claim Bonus is a discount given for claim-free years."

🟡 Basic info, but leaves out specifics.

Re-Ranked RAG:

"NCB is calculated as a percentage discount on the premium, starting at 20% and increasing up to 50% over claim-free years. It resets to zero if a claim is made."

✔️ Complete with thresholds and conditions.

✅ Example 3: “What does total loss mean in a motor insurance policy?”

Naive RAG:

"A total loss occurs when the cost of repair exceeds the vehicle's IDV."

⚠️ Technically okay but vague.

Re-Ranked RAG:

"In insurance terms, a total loss means the cost of repairing the damaged vehicle exceeds 75% of the IDV, and compensation is paid as per the IDV without depreciation."

📌 Adds numerical threshold + real-world implication.

🧪 Results Summary

Query	Naive RAG	Re-Ranked RAG
What is IDV?	Basic definition	Full explanation + claims context
How is NCB calculated?	Generic statement	Specific formula & conditions
What is a total loss?	Vague threshold	Clearly defined + policy terminology

🎯 Key Takeaways

Naive RAG retrieves semantically similar content, but that’s not always relevant content.
Adding CrossEncoder re-ranking significantly improves the relevance of chunks passed to the LLM.
For high-stakes domains like insurance, law, or finance, reranking isn’t optional; it’s essential.

Implementing re-ranking in RAG takes just a few lines of code but delivers massive quality improvements, especially when working with open-ended LLMs or verbose, domain-specific content like insurance policy documents.

If you're serious about building robust AI systems that don’t just sound smart but actually are smart,
I'd love to hear what you’re working on, whether it's brainstorming ideas, exploring RAG pipelines, or even building something together.
Feel free to reach out if you'd like to collaborate! contact me

👉 Curious to see the code in action?
Check out my GitHub repo: 🔗rag-comparative-demo
Clone it, run it locally, and test both Naive and Re-Ranked RAG pipelines for yourself.
I'd love to hear your feedback!

Cheers, and until next time! 🚀

How Re-Ranked RAG Outperforms Naive RAG in Information Retrieval

🧠 The Problem

🔧 The Stack

🧱 Step 1: Basic Naive RAG Setup

Retrieval + Generation (Naive RAG)

🧠 Step 2: Add Cross-Encoder Re-Ranking

Reranking Function

Re-Ranked Answer Generator

🔍 Real Examples – Naive vs Re-Ranked RAG

✅ Example 1: “What is Insured’s Declared Value (IDV)?”

✅ Example 2: “How is No Claim Bonus (NCB) calculated?”

✅ Example 3: “What does total loss mean in a motor insurance policy?”

🧪 Results Summary

🎯 Key Takeaways

Comments

Exploring Generative AI: Concepts & Code

Next-Level AI Creativity with Uncensored LLMs

More from this blog

Next-Level AI Creativity with Uncensored LLMs

Language Models: Key Concepts You Need to Know!

How to Setup Data Science Environment The Ultimate Guide: Essentials, Tips and Tricks. (VS Code, Conda, Extensions)

Small language models can perform better than the Large language models?

Command Palette

🧠 The Problem

🔧 The Stack

🧱 Step 1: Basic Naive RAG Setup

Retrieval + Generation (Naive RAG)

🧠 Step 2: Add Cross-Encoder Re-Ranking

Reranking Function

Re-Ranked Answer Generator

🔍 Real Examples – Naive vs Re-Ranked RAG

✅ Example 1: “What is Insured’s Declared Value (IDV)?”

✅ Example 2: “How is No Claim Bonus (NCB) calculated?”

✅ Example 3: “What does total loss mean in a motor insurance policy?”

🧪 Results Summary

🎯 Key Takeaways

Comments

Exploring Generative AI: Concepts & Code

Next-Level AI Creativity with Uncensored LLMs

More from this blog