Skip to content

vicky150612/Agentic-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agentic RAG Pipeline

An agent-driven Retrieval-Augmented Generation (RAG) system that retrieves information from a document knowledge base using multiple retrieval strategies and dynamically orchestrates them through a LangGraph workflow.

The system combines:

  • Semantic vector retrieval
  • Keyword search (BM25)
  • Query decomposition
  • Cross-encoder reranking
  • Self-evaluation and retry strategies

This architecture enables more reliable and grounded answers compared to traditional RAG pipelines.


Architecture Overview

Agentic RAG Graph

The pipeline is implemented as a LangGraph workflow that dynamically routes queries through multiple retrieval strategies before generating and evaluating an answer.

The graph is defined in graph_builder.py using LangGraph's StateGraph, and processes every query through the following stages:

Stage Description
1. Query Router Classifies the query as factual or complex
2. Query Decomposition Breaks complex queries into sub-questions
3. Vector Retrieval Semantic search via ChromaDB embeddings
4. Keyword Retrieval BM25-based keyword search
5. Hybrid Combination Merges and deduplicates results from both retrievers
6. Cross-Encoder Reranking Reranks results by relevance
7. Answer Generation Generates a grounded answer from retrieved context
8. Self-Evaluation Scores the answer on relevance, completeness, and grounding
9. Retry / Fallback Rewrites query and retries if score is below threshold

The workflow automatically retries if the generated answer is judged to be low quality.


Retrieval Strategies

The system uses multiple complementary retrieval strategies to maximize the chance of retrieving relevant context. Each method focuses on a different aspect of information retrieval.

1. Vector Search (Semantic Retrieval)

Vector search retrieves documents based on semantic similarity, not exact words.

Implementation Overview

  1. Documents are converted into embeddings using a SentenceTransformer model.
  2. The embeddings are stored in a Chroma vector database.
  3. When a query is received, it is embedded and compared against stored document vectors.
  4. The most semantically similar chunks are returned.

The retrieved documents are returned along with a similarity score and used as context for the LLM.

2. Keyword Search (BM25 Retrieval)

Keyword search is implemented using BM25, a probabilistic ranking algorithm commonly used in search engines.

Unlike vector search, BM25 focuses on exact keyword matching and term frequency.

Implementation Overview

  1. Documents are tokenized and cleaned.
  2. Stopwords are removed.
  3. A BM25 index is built over the document tokens.
  4. Queries are matched against the index to rank documents.

Keyword Extraction

The query is normalized and filtered to remove stopwords.

tokens = re.findall(r"\b\w+\b", query.lower())
keywords = [word for word in tokens if word not in STOPWORDS]

BM25 Index Construction

tokenized_docs = [extract_keywords(doc.page_content) for doc in docs]
bm25 = BM25Okapi(tokenized_docs)

Retrieval

scores = bm25.get_scores(tokenized_query)

Documents with the highest BM25 scores are returned.

3. Hybrid Retrieval

To improve retrieval quality, the system combines both retrieval strategies.

Process

  1. Vector search retrieves semantically relevant documents.
  2. BM25 retrieves keyword-matching documents.
  3. Results from both searches are merged.
  4. Duplicate documents are removed.

This hybrid approach improves:

  • Recall (more relevant documents retrieved)
  • Precision (better ranking after reranking)

4. Cross-Encoder Reranking

After hybrid retrieval, the results are reranked using a Cross-Encoder model.

Unlike embedding similarity, cross-encoders evaluate the query and document together, producing a more accurate relevance score.

Reranking Process

  1. Query-document pairs are created.

  2. Each pair is scored by the cross-encoder.

  3. Documents are sorted by predicted relevance.

The top-ranked documents are used for answer generation.

This step significantly improves answer quality.


Retry Strategy

One of the key features of the system is its self-correcting retry mechanism.

Instead of immediately returning a low-quality answer, the system evaluates its output and attempts to improve retrieval automatically.

Step 1: Answer Evaluation

After generating an answer, the system evaluates it using an LLM.

The evaluation prompt asks the model to score the answer based on:

  • factual grounding
  • completeness
  • relevance

The evaluator returns a score between 0 and 1.

Step 2: Decision Logic

The evaluation score determines the next action.

Score Action
>= threshold Accept answer
< threshold Retry retrieval
retries exceeded Return fallback

The threshold is configurable in the graph configuration.

Step 3: Query Expansion

During the second retry attempt, the query is rewritten to broaden retrieval coverage.

Example:

Original query:
"What is Linux CFS scheduling?"

Expanded query:
"Explain Linux process scheduling and the Completely Fair Scheduler."

This allows the system to retrieve documents that may not match the original query exactly.

Step 4: Expanded Retrieval

If another retry occurs:

  • Retrieval top_k is increased
  • More documents are searched
  • Hybrid retrieval is executed again

Example adjustment:

top_k = 20 → 30

This increases the chances of finding relevant information.

Step 5: Fallback Response

If the system reaches the maximum number of retries and still cannot produce a sufficiently good answer, it returns a safe fallback.

Example:

"I could not find relevant information in the knowledge base to answer this question."

This prevents the LLM from hallucinating unsupported answers.

Benefits of the Architecture

This architecture ensures that answers are:

  • grounded in retrieved documents
  • automatically improved if quality is low
  • safe against hallucinations.

Tech Stack

Backend

Component Role
FastAPI REST API
LangGraph Agent workflow orchestration
LangChain LLM integration
ChromaDB Vector database
Sentence Transformers Embeddings
Cross-Encoder Reranking
BM25 Keyword retrieval

Frontend

Component Role
React 19 UI framework
Vite Build tool and dev server
Tailwind CSS Styling
ReactMarkdown Markdown rendering

Project Structure

Agentic-RAG-Pipeline/
│
├── backend/
│   ├── graph/
│   │   ├── graph_builder.py   # LangGraph workflow definition
│   │   ├── nodes.py           # Individual node implementations
│   │   └── state.py           # Shared state schema
│   │
│   ├── ingestion.py           # Document chunking and indexing
│   ├── retriever.py           # Vector and BM25 retrieval logic
│   ├── evaluation.py          # LLM-based answer scoring
│   └── server.py              # FastAPI application
│
├── frontend/
│   └── src/
│       ├── components/        # React UI components
│       ├── api.js             # API client
│       └── main.jsx           # App entry point
│
├── requirements.txt
└── README.md

Setup Instructions

Prerequisites


Backend Setup

1. Clone the repository

git clone https://github.com/vicky150612/Agentic-RAG
cd Agentic-RAG-Pipeline

2. Create and activate a virtual environment

python -m venv venv

On Windows:

venv\Scripts\activate

On macOS / Linux:

source venv/bin/activate

3. Install Python dependencies

pip install -r requirements.txt

4. Download NLTK stopwords

python -m nltk.downloader stopwords

5. Configure environment variables

cp backend/.env.example backend/.env

Change the values in the .env file as required.

6. Start the backend server

cd backend
python server.py

The server will be available at http://localhost:8000.


Frontend Setup

1. Navigate to the frontend directory

cd frontend

2. Install dependencies

npm install

3. Start the development server

npm run dev

The frontend will be available at http://localhost:5173.


API Reference

Health Check

GET /health

Returns the current system status.

Ingest Documents

POST /ingest

Upload one or more documents (.pdf or .txt). Documents are chunked using RecursiveCharacterTextSplitter and stored in the vector database.

Query

POST /query

Request:

{
  "query": "What is Linux fair scheduling?"
}

Response:

{
  "query": "What is Linux fair scheduling?",
  "route": "hybrid",
  "subquestions": [
    "What is Linux process scheduling?",
    "How does the Linux kernel schedule tasks?",
    "What is the Completely Fair Scheduler?"
  ],
  "retrieved_docs": [...],
  "keyword_docs": [...],
  "hybrid_docs": [...],
  "final_docs": [...],
  "answer": "The Completely Fair Scheduler (CFS) is the default CPU scheduler in Linux...",
  "evaluation_score": 0.86
}

Supported Document Formats

  • PDF (.pdf)
  • Plain text (.txt)

Future Improvements

  • Streaming responses with Server-Sent Events
  • Better evaluation metrics
  • Observability with LangSmith
  • Graph visualization in frontend
  • Multi-document reasoning

Acknowledgements

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors