Authors: Samira Sadeghi, Daniel Noroozi
A PACS‑style Streamlit dashboard that takes a chest X‑ray, predicts multi‑label findings using BiomedCLIP image embeddings, retrieves similar historical cases (RAG), and drafts a radiology report that is checked (and optionally auto‑revised) against a small knowledge graph (neuro‑symbolic verification).
Not a clinical device.
A printable version is included as assets/FinalDashboard.pdf.
-
Dataset prep (Notebook)
- Loads Indiana CXR metadata (
indiana_projections.csv,indiana_reports.csv) and resolves valid image paths. - Repairs common filename mismatches by scanning the image folder and keeping only valid files.
- Builds the “caption” text by joining Findings + Impression.
- Loads Indiana CXR metadata (
-
Embeddings + multi‑label classifier (Notebook)
- Computes BiomedCLIP image embeddings (ViT‑B/16 backbone).
- Trains a One‑vs‑Rest Random Forest for multi‑label prediction over 8 labels:
Cardiomegaly, Pneumonia, Atelectasis, Edema, PleuralEffusion, Fracture, Pneumothorax, Normal. - Calibrates per‑class probabilities with sigmoid / Platt scaling on a held‑out calibration split.
- Tunes per‑class decision thresholds on a validation split (no test leakage).
-
Neuro‑symbolic layer (Notebook + App)
- Constructs a small RDF knowledge graph (definitions, anatomy tree, disease→location hints).
- Validates predicted labels and provides KG facts (definitions + expected locations).
-
RAG evidence retrieval (App)
- Retrieves top‑K similar cases by cosine similarity in embedding space.
- Highlights detected pathology terms inside the retrieved radiologist report (rule‑based entity mapping).
-
Report drafting (App)
- Optionally calls a local LLM (DeepSeek‑R1 via Ollama) to draft a report from: predicted labels + retrieved case snippets + KG definitions.
- Runs a KG verifier over the draft; if it violates constraints, the app attempts automatic revision (up to 2 passes) and shows a revision trail.
-
Explainability (App)
- CLIP embedding saliency (not disease‑specific) to visualize what influenced the embedding.
- Label‑specific occlusion sensitivity heatmap (slower, grid‑based).
python -m venv .venv
# Windows: .venv\Scripts\activate
# macOS/Linux: source .venv/bin/activate
python -m pip install -U pip
pip install -r requirements.txtGPU is optional. If you have CUDA, PyTorch will use it automatically.
The notebook expects this structure (dataset is not included in this repo):
dataset/
images/
images_normalized/
<image files...>
indiana_projections.csv
indiana_reports.csv
Open and run:
notebooks/AI_RAG_train_test_split_Final.ipynb
It will generate several .pkl artifacts (dataset + embeddings + models + thresholds).
To keep the repo clean, move them into ./artifacts:
python scripts/collect_artifacts.pyThe Streamlit app is already configured to prefer ./artifacts/*.pkl automatically.
streamlit run app/app.pyIf you want the LLM report drafting feature:
- Install Ollama
- Pull a model (example):
ollama pull deepseek-r1
- Make sure Ollama’s OpenAI‑compatible endpoint is reachable at:
http://localhost:11434/v1
If the LLM is not available, the app falls back to a rule‑based report layout.
- This is a portfolio project; outputs must be reviewed by qualified professionals.
- Embedding saliency ≠ diagnosis explanation; it shows sensitivity of the embedding, not a certified clinical rationale.
- RAG evidence is similarity‑based and may retrieve imperfect matches.
- BiomedCLIP (Microsoft) via
open_clip - Indiana University chest X‑ray dataset (refer to the dataset’s original license/terms)

