🚀 RAG-based Document Search API

This repository contains a Retrieval-Augmented Generation (RAG) API that allows querying a vector database (ChromaDB) to retrieve relevant document chunks and generate responses using Ollama (Mistral).

📌 Features

✅ FastAPI-based API for document retrieval and comparison
✅ Embeddings with Ollama (nomic-embed-text) for vector search
✅ PDF document loading and chunking with PyPDFDirectoryLoader
✅ Cosine similarity-based document comparison
✅ Automated testing with pytest

🛠️ Installation

1️⃣ Clone the Repository

git clone https://github.com/smshelar/rag_pipeline.git
cd your-repo

2️⃣ Create a Virtual Environment

python -m venv venv
source venv/bin/activate  # On macOS/Linux
venv\Scripts\activate     # On Windows

3️⃣ Install Dependencies

pip install -r requirements.txt

4️⃣ Start the API

uvicorn rag_api:app --host 0.0.0.0 --port 8000 --reload

🚀 API Endpoints

🔍 1. Query Documents (RAG)

Endpoint:

POST /query/

Request Body:

{
  "query_text": "What is the company name?"
}

Response:

{
  "response": "The company name is ConocoPhillips.",
  "sources": ["document_1.pdf(page_num:chunk_num)", 
              "document_2.pdf(page_num:chunk_num)", 
              "document_3.pdf(page_num:chunk_num)"]
}

🔄 2. Compare Two Documents

Endpoint:

POST /compare/

Request Body:

{
  "query_1": "Impact of climate change",
  "query_2": "Rising sea levels"
}

Response:

{
  "query_1": "Impact of climate change",
  "query_2": "Rising sea levels",
  "similarity_score": 0.87,
  "source_1": "doc1.pdf",
  "source_2": "doc2.pdf"
}

📂 3. Populate the Database

Endpoint:

POST /populate/

Request Body:

{
  "reset": true
}

Response:

{
  "message": "Database populated with 100 chunks"
}

🧪 Running Tests

Run all tests using:

pytest test.py

🏗️ Folder Structure

📁 your-repo
│-- 📂 data/                  # Directory for PDFs
│-- 📂 chroma/                # ChromaDB storage
│-- 📜 embedding_function.py   # Ollama embedding function
│-- 📜 query.py                # Query processing
│-- 📜 compare_embeddings.py    # Document similarity comparison
│-- 📜 load_model.py            # Data pipeline for ChromaDB
│-- 📜 rag_api.py               # FastAPI server
│-- 📜 test.py                  # Pytest-based tests
│-- 📜 requirements.txt         # Dependencies
│-- 📜 README.md                # Project Documentation

🔗 References

🎯 Future Enhancements

🔹 Dockerization for deployment
🔹 Support for more document formats (TXT, DOCX)
🔹 Advanced ranking using LLM-generated summaries

🚀 Developed with ❤️ using Python, LangChain & FastAPI

🔥 Want Any Custom Changes?

✅ Add Docker setup?
✅ Include environment variables (.env)?
✅ Create a GitHub Actions CI/CD pipeline?

Let me know what you need! 🚀🔥

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 RAG-based Document Search API

📌 Features

🛠️ Installation

1️⃣ Clone the Repository

2️⃣ Create a Virtual Environment

3️⃣ Install Dependencies

4️⃣ Start the API

🚀 API Endpoints

🔍 1. Query Documents (RAG)

🔄 2. Compare Two Documents

📂 3. Populate the Database

🧪 Running Tests

🏗️ Folder Structure

🔗 References

🎯 Future Enhancements

🔥 Want Any Custom Changes?

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
chroma		chroma
docs		docs
README.md		README.md
app.py		app.py
create_db.py		create_db.py
embedding_function.py		embedding_function.py
embeddings_compare.py		embeddings_compare.py
load_model.py		load_model.py
query.py		query.py
rag_api.py		rag_api.py
requirements.txt		requirements.txt
test.py		test.py

smshelar/rag_pipeline

Folders and files

Latest commit

History

Repository files navigation

🚀 RAG-based Document Search API

📌 Features

🛠️ Installation

1️⃣ Clone the Repository

2️⃣ Create a Virtual Environment

3️⃣ Install Dependencies

4️⃣ Start the API

🚀 API Endpoints

🔍 1. Query Documents (RAG)

🔄 2. Compare Two Documents

📂 3. Populate the Database

🧪 Running Tests

🏗️ Folder Structure

🔗 References

🎯 Future Enhancements

🔥 Want Any Custom Changes?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages