An AI-powered Retrieval-Augmented Generation (RAG) system to analyze medical documents like PDFs and scanned images. Upload files and ask questions in natural language — the assistant responds with accurate, medically aware answers, backed by the documents you provide.
Watch the full demo here: Medical Report Assistant Demo
- PDF/Image Parsing with OCR
- Automatic Medical Field Extraction (e.g., glucose, hemoglobin)
- RAG-based Chat using GPT-4
- Session Memory for Multi-Turn Interaction
- File Source Attribution with Expandable View
- Streamlit UI with Sidebar Uploads
- LangSmith Tracing (auto-enabled via env vars)
graph TD
A[Medical Docs] --> B[Text/OCR Extraction]
B --> C[Field Parsing] --> D[Chunking & Embedding]
D --> E[Qdrant Vector DB]
F[User Query] --> G[Vector Search]
E --> G --> H[Context] --> I[GPT-4 Answer]
| Component | Tech Stack |
|---|---|
| OCR | Tesseract, pdfplumber |
| Embeddings | OpenAI or HuggingFace (configurable) |
| Vector Storage | Qdrant |
| Chat Interface | Streamlit |
| RAG Engine | LangChain + GPT-4 |
- Python 3.8+
- OpenAI API key (or local embeddings)
- Qdrant running locally or remotely
- Tesseract OCR installed
git clone https://github.com/yourusername/medical_chatbot.git
cd medical_chatbot
pip install -r requirements.txt- macOS:
brew install tesseract - Ubuntu:
sudo apt install tesseract-ocr - Windows: Download from GitHub
docker run -p 6333:6333 qdrant/qdrantOPENAI_API_KEY=sk-xxx...
QDRANT_URL=http://localhost:6333
LANGSMITH_API_KEY=your_langsmith_key # Optional, enables tracing
LANGSMITH_PROJECT=medical_chatbot # Optionalcd app
python init_qdrant.py # optional init
streamlit run ui.pyThen open http://localhost:8501
- Upload reports via the sidebar (PDF, PNG, JPG)
- Ask questions like:
- "What is the glucose level?"
- "Any abnormalities in CBC report?"
- Click "📂 Sources" to see which files were referenced
- This is not a diagnostic tool — use for informational purposes only
- Handle medical data with care (HIPAA/GDPR compliance)
- Secure your API keys and environment configs