Medical Chatbot with RAG Architecture

A medical question-answering system that leverages Retrieval-Augmented Generation (RAG) to provide accurate and context-aware medical information. This application combines the power of large language models with vector search to deliver precise answers to medical queries.

🚀 Key Features

Advanced RAG Pipeline: Implements a robust Retrieval-Augmented Generation system for accurate medical information retrieval
Multi-document Support: Processes and indexes multiple PDF documents from the data/ directory
Semantic Search: Utilizes Pinecone's vector database for efficient similarity search across medical documents
State-of-the-Art LLM: Powered by Google's Gemini model through LangChain for high-quality response generation
Web Interface: User-friendly Flask-based web interface for seamless interaction
Scalable Architecture: Designed for easy extension and integration with additional data sources
Customizable Prompts: Easily adjustable system prompts to tailor responses to medical domain requirements
Efficient Chunking: Smart text splitting to maintain context while processing large documents

📊 Technical Architecture

The application follows a modern microservices architecture with the following components:

Frontend: Lightweight HTML/JS interface with responsive design
Backend: Flask web server handling API requests
Vector Database: Pinecone for efficient vector similarity search
Embedding Model: all-MiniLM-L6-v2 for creating document embeddings
LLM Integration: Google's Gemini model for generating human-like responses
Document Processing: Automated pipeline for PDF ingestion and text extraction

Project Structure

app.py: Flask app and RAG pipeline
store_index.py: Builds Pinecone index from PDFs in data/
src/helper.py: Load PDF(s), split text, and create embeddings
src/prompt.py: System prompt for the assistant
templates/chat.html: Frontend chat page
static/style.css: Simple styles

Prerequisites

Python 3.10+
A Pinecone account and API key
A Google AI Studio API key (for Gemini)

1) Setup (Windows PowerShell)

python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
python -m pip install -r requirements.txt

2) Environment Variables

Create a .env file in the project root:

PINECONE_API_KEY="your_pinecone_api_key"
GOOGLE_API_KEY="your_google_api_key"

Notes:

The code uses the Pinecone index name medical-catboot and expects embeddings of dimension 384 (all-MiniLM-L6-v2).
Default serverless spec: cloud aws, region us-east-1.

3) Add/Update Your Data

Place your PDFs in the data/ folder (the repo includes data/Medical_book.pdf). Rebuild the index after you change files.

4) Build/Refresh the Pinecone Index

.\.venv\Scripts\python store_index.py

This will:

Read PDFs from data/
Split into chunks
Create sentence-transformer embeddings (384 dims)
Create or reuse Pinecone index medical-catboot
Upsert embeddings

5) Run the App

.\.venv\Scripts\python app.py

Open http://localhost:8080 in your browser.

Troubleshooting

Ensure .env contains valid PINECONE_API_KEY and GOOGLE_API_KEY.
If you change PDFs, rerun store_index.py to refresh embeddings.
If the index doesn’t exist, the script will create it (serverless us-east-1).
On corporate networks, set proxy env vars for pip/downloads if needed.

Tech Stack

Python, Flask
LangChain (Retrieval chain)
Sentence Transformers (all-MiniLM-L6-v2)
Pinecone (Vector DB)
Google Gemini (via langchain-google-genai)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medical Chatbot with RAG Architecture

🚀 Key Features

📊 Technical Architecture

Project Structure

Prerequisites

1) Setup (Windows PowerShell)

2) Environment Variables

3) Add/Update Your Data

4) Build/Refresh the Pinecone Index

5) Run the App

Troubleshooting

Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
research		research
src		src
static		static
templates		templates
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
setup.py		setup.py
store_index.py		store_index.py
template.sh		template.sh

harshit1142/Medical_Chatbot

Folders and files

Latest commit

History

Repository files navigation

Medical Chatbot with RAG Architecture

🚀 Key Features

📊 Technical Architecture

Project Structure

Prerequisites

1) Setup (Windows PowerShell)

2) Environment Variables

3) Add/Update Your Data

4) Build/Refresh the Pinecone Index

5) Run the App

Troubleshooting

Tech Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages