# How Re-Ranked RAG Outperforms Naive RAG in Information Retrieval

<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text"><strong>All code and notebooks used in this post are available here → </strong><a target="_self" rel="noopener noreferrer nofollow" href="http://github.com/kdshreyas/rag-comparative-demo" style="pointer-events: none"><strong>github.com/kdshreyas/rag-comparative-demo</strong></a></div>
</div>

Retrieval-Augmented Generation (RAG) has transformed how we build domain-aware AI applications. But while Naive RAG gets you started, advanced retrieval techniques like **CrossEncoder Re-Ranking** can drastically improve your results, especially when accuracy matters.

In this post, we’ll walk through a practical comparison between **Naive RAG** and **Re-Ranked RAG**, using real code and examples from an insurance domain use case.

---

## 🧠 The Problem

Let’s say you're building a chatbot that answers policy-related questions from a PDF. You’re using a local LLM (Gemma 3B via Ollama) and a vector store (ChromaDB) powered by `nomic-embed-text-v1.5`.

The goal? Build a retrieval-based QA system that can:

* Extract accurate answers from policy documents.
    
* Handle ambiguous or layered queries.
    
* Minimize hallucinations and irrelevant responses.
    

---

## 🔧 The Stack

Here’s what we used:

* **Local LLM**: `gemma3:latest` via Ollama
    
* **Embedding model**: `nomic-embed-text-v1.5` (local)
    
* **Vector DB**: ChromaDB
    
* **Document loader**: `UnstructuredLoader`
    
* **Reranker**: `cross-encoder/ms-marco-MiniLM-L-6-v2` via `sentence-transformers`
    

---

## 🧱 Step 1: Basic Naive RAG Setup

We start with a classic RAG pipeline:

1. Load and clean the PDF.
    
2. Split it into chunks using `SentenceTransformersTokenTextSplitter`.
    
3. Generate embeddings.
    
4. Store in ChromaDB.
    
5. Retrieve the top 10 docs and pass them to the LLM for generation.
    

### Retrieval + Generation (Naive RAG)

```python
def generate_answer(question):
    query_embedding = embedding_model.embed_query(question)
    results = retriever.query(query_embeddings=[query_embedding], n_results=10, include=["documents"])
    retrieved_documents = results["documents"][0]
    context = "\n".join(retrieved_documents)
    
    prompt = f"""
    You are an assistant for answering questions using the provided context.
    Context:
    {context}
    Question: {question}
    Answer in 2–3 sentences.
    """
    
    response = llm.invoke([HumanMessage(content=prompt)])
    return response.content
```

---

## 🧠 Step 2: Add Cross-Encoder Re-Ranking

Naive RAG may retrieve semantically similar chunks, but not all are equally relevant. So we apply a CrossEncoder to **re-score the top 10 chunks** based on their relevance to the query.

### Reranking Function

```python
cross_encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")

def rerank(docs, question):
    pairs = [[question, doc] for doc in docs]
    scores = cross_encoder.predict(pairs)
    top_docs = [doc for _, doc in sorted(zip(scores, docs), reverse=True)[:5]]
    return "\n\n".join(top_docs)
```

### Re-Ranked Answer Generator

```python
def generate_answer_with_rerank(question):
    query_embedding = embedding_model.embed_query(question)
    results = retriever.query(query_embeddings=[query_embedding], n_results=10, include=["documents"])
    raw_docs = results["documents"][0]
    context = rerank(raw_docs, question)
    
    prompt = f"""
    You are an assistant for answering questions using the provided context.
    Context:
    {context}
    Question: {question}
    Answer in 2–3 sentences.
    """
    
    response = llm.invoke([HumanMessage(content=prompt)])
    return response.content
```

---

## 🔍 Real Examples – Naive vs Re-Ranked RAG

### ✅ Example 1: “What is Insured’s Declared Value (IDV)?”

**Naive RAG:**

> "IDV is the manufacturer’s listed price adjusted for depreciation."

✅ Accurate but incomplete.

**Re-Ranked RAG:**

> "IDV is the market value of the vehicle and the maximum amount payable in case of a total loss or theft, without deducting depreciation."

🎯 It adds critical details about claims, making it more useful.

---

### ✅ Example 2: “How is No Claim Bonus (NCB) calculated?”

**Naive RAG:**

> "No Claim Bonus is a discount given for claim-free years."

🟡 Basic info, but leaves out specifics.

**Re-Ranked RAG:**

> "NCB is calculated as a percentage discount on the premium, starting at 20% and increasing up to 50% over claim-free years. It resets to zero if a claim is made."

✔️ Complete with thresholds and conditions.

---

### ✅ Example 3: “What does total loss mean in a motor insurance policy?”

**Naive RAG:**

> "A total loss occurs when the cost of repair exceeds the vehicle's IDV."

⚠️ Technically okay but vague.

**Re-Ranked RAG:**

> "In insurance terms, a total loss means the cost of repairing the damaged vehicle exceeds 75% of the IDV, and compensation is paid as per the IDV without depreciation."

📌 Adds numerical threshold + real-world implication.

---

## 🧪 Results Summary

| Query | Naive RAG | Re-Ranked RAG |
| --- | --- | --- |
| What is IDV? | Basic definition | Full explanation + claims context |
| How is NCB calculated? | Generic statement | Specific formula & conditions |
| What is a total loss? | Vague threshold | Clearly defined + policy terminology |

---

## 🎯 Key Takeaways

* Naive RAG retrieves semantically similar content, but that’s not always *relevant* content.
    
* Adding **CrossEncoder re-ranking** significantly improves the relevance of chunks passed to the LLM.
    
* For high-stakes domains like **insurance, law, or finance**, reranking isn’t optional; it’s essential.
    

---

Implementing re-ranking in RAG takes just a few lines of code but delivers **massive quality improvements,** especially when working with open-ended LLMs or verbose, domain-specific content like insurance policy documents.

If you're serious about building robust AI systems that don’t just *sound* smart but **actually are smart**,  
I'd love to hear what you’re working on, whether it's **brainstorming ideas, exploring RAG pipelines, or even building something together**.  
Feel free to reach out if you'd like to collaborate! [contact me](https://bio.link/helloshreyas)

---

👉 **Curious to see the code in action?**  
Check out my GitHub repo: 🔗[rag-comparative-demo](https://github.com/kdshreyas/rag-comparative-demo)  
Clone it, run it locally, and test both **Naive** and **Re-Ranked** RAG pipelines for yourself.  
I'd love to hear your feedback!

---

**Cheers, and until next time! 🚀**
