Skip to main content

Command Palette

Search for a command to run...

How Re-Ranked RAG Outperforms Naive RAG in Information Retrieval

Understanding the Difference: Naive RAG and Re-Ranked RAG Compared in a Real Insurance Scenario

Updated
How Re-Ranked RAG Outperforms Naive RAG in Information Retrieval
💡
All code and notebooks used in this post are available here → github.com/kdshreyas/rag-comparative-demo

Retrieval-Augmented Generation (RAG) has transformed how we build domain-aware AI applications. But while Naive RAG gets you started, advanced retrieval techniques like CrossEncoder Re-Ranking can drastically improve your results, especially when accuracy matters.

In this post, we’ll walk through a practical comparison between Naive RAG and Re-Ranked RAG, using real code and examples from an insurance domain use case.


🧠 The Problem

Let’s say you're building a chatbot that answers policy-related questions from a PDF. You’re using a local LLM (Gemma 3B via Ollama) and a vector store (ChromaDB) powered by nomic-embed-text-v1.5.

The goal? Build a retrieval-based QA system that can:

  • Extract accurate answers from policy documents.

  • Handle ambiguous or layered queries.

  • Minimize hallucinations and irrelevant responses.


🔧 The Stack

Here’s what we used:

  • Local LLM: gemma3:latest via Ollama

  • Embedding model: nomic-embed-text-v1.5 (local)

  • Vector DB: ChromaDB

  • Document loader: UnstructuredLoader

  • Reranker: cross-encoder/ms-marco-MiniLM-L-6-v2 via sentence-transformers


🧱 Step 1: Basic Naive RAG Setup

We start with a classic RAG pipeline:

  1. Load and clean the PDF.

  2. Split it into chunks using SentenceTransformersTokenTextSplitter.

  3. Generate embeddings.

  4. Store in ChromaDB.

  5. Retrieve the top 10 docs and pass them to the LLM for generation.

Retrieval + Generation (Naive RAG)

def generate_answer(question):
    query_embedding = embedding_model.embed_query(question)
    results = retriever.query(query_embeddings=[query_embedding], n_results=10, include=["documents"])
    retrieved_documents = results["documents"][0]
    context = "\n".join(retrieved_documents)

    prompt = f"""
    You are an assistant for answering questions using the provided context.
    Context:
    {context}
    Question: {question}
    Answer in 2–3 sentences.
    """

    response = llm.invoke([HumanMessage(content=prompt)])
    return response.content

🧠 Step 2: Add Cross-Encoder Re-Ranking

Naive RAG may retrieve semantically similar chunks, but not all are equally relevant. So we apply a CrossEncoder to re-score the top 10 chunks based on their relevance to the query.

Reranking Function

cross_encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")

def rerank(docs, question):
    pairs = [[question, doc] for doc in docs]
    scores = cross_encoder.predict(pairs)
    top_docs = [doc for _, doc in sorted(zip(scores, docs), reverse=True)[:5]]
    return "\n\n".join(top_docs)

Re-Ranked Answer Generator

def generate_answer_with_rerank(question):
    query_embedding = embedding_model.embed_query(question)
    results = retriever.query(query_embeddings=[query_embedding], n_results=10, include=["documents"])
    raw_docs = results["documents"][0]
    context = rerank(raw_docs, question)

    prompt = f"""
    You are an assistant for answering questions using the provided context.
    Context:
    {context}
    Question: {question}
    Answer in 2–3 sentences.
    """

    response = llm.invoke([HumanMessage(content=prompt)])
    return response.content

🔍 Real Examples – Naive vs Re-Ranked RAG

✅ Example 1: “What is Insured’s Declared Value (IDV)?”

Naive RAG:

"IDV is the manufacturer’s listed price adjusted for depreciation."

✅ Accurate but incomplete.

Re-Ranked RAG:

"IDV is the market value of the vehicle and the maximum amount payable in case of a total loss or theft, without deducting depreciation."

🎯 It adds critical details about claims, making it more useful.


✅ Example 2: “How is No Claim Bonus (NCB) calculated?”

Naive RAG:

"No Claim Bonus is a discount given for claim-free years."

🟡 Basic info, but leaves out specifics.

Re-Ranked RAG:

"NCB is calculated as a percentage discount on the premium, starting at 20% and increasing up to 50% over claim-free years. It resets to zero if a claim is made."

✔️ Complete with thresholds and conditions.


✅ Example 3: “What does total loss mean in a motor insurance policy?”

Naive RAG:

"A total loss occurs when the cost of repair exceeds the vehicle's IDV."

⚠️ Technically okay but vague.

Re-Ranked RAG:

"In insurance terms, a total loss means the cost of repairing the damaged vehicle exceeds 75% of the IDV, and compensation is paid as per the IDV without depreciation."

📌 Adds numerical threshold + real-world implication.


🧪 Results Summary

QueryNaive RAGRe-Ranked RAG
What is IDV?Basic definitionFull explanation + claims context
How is NCB calculated?Generic statementSpecific formula & conditions
What is a total loss?Vague thresholdClearly defined + policy terminology

🎯 Key Takeaways

  • Naive RAG retrieves semantically similar content, but that’s not always relevant content.

  • Adding CrossEncoder re-ranking significantly improves the relevance of chunks passed to the LLM.

  • For high-stakes domains like insurance, law, or finance, reranking isn’t optional; it’s essential.


Implementing re-ranking in RAG takes just a few lines of code but delivers massive quality improvements, especially when working with open-ended LLMs or verbose, domain-specific content like insurance policy documents.

If you're serious about building robust AI systems that don’t just sound smart but actually are smart,
I'd love to hear what you’re working on, whether it's brainstorming ideas, exploring RAG pipelines, or even building something together.
Feel free to reach out if you'd like to collaborate! contact me


👉 Curious to see the code in action?
Check out my GitHub repo: 🔗rag-comparative-demo
Clone it, run it locally, and test both Naive and Re-Ranked RAG pipelines for yourself.
I'd love to hear your feedback!


Cheers, and until next time! 🚀

Exploring Generative AI: Concepts & Code

Part 1 of 3

A deep dive into generative AI, this blog series focuses on the core concepts behind AI models, their underlying algorithms, and practical coding techniques. Perfect for developers, researchers, and tech enthusiasts looking to master the world of AI.

Up next

Next-Level AI Creativity with Uncensored LLMs

Unlocking the Full Potential of Language Models Without Restrictions