Automated Radiology Report Generation with Vision Transformers & GPT-2
Read the Full Thesis (FAST NUCES, 2024)
"A Hybrid Approach for Automated Radiology Report Generation and Summarization using Vision Transformers and Language Models"
By Kheem Parkash Dharmani | Supervised by Dr. Ejaz Ahmed | FAST NUCES Islamabad, 2024
- Hybrid Vision-Language Model: ViT for X-ray image features + fine-tuned GPT-2 for clinical report generation.
- Research-Driven: Directly implements the latest peer-reviewed MS Thesis methods.
- Clinically Relevant: Delivers detailed, accurate, context-aware radiology reports.
- Explainable & Modular: Clean pipeline, highly extendable, code fully documented.
- Professional Portfolio: Production-grade repository for real-world, research, or demo use.
| System Architecture | Example Output |
|---|---|
![]() |
![]() |
RadiologyReportGen-AI is a robust deep learning pipeline for automated generation of chest X-ray radiology reports, combining Vision Transformers (ViT) for high-fidelity image analysis and a fine-tuned GPT-2 for natural language report generation.
All methodology is grounded in this MS Thesis (2024) and addresses major clinical and computational challenges in AI-based radiology.
- Project Highlights
- Screenshots & Visuals
- Overview
- System Architecture
- Methodology
- Dataset
- Installation
- Quickstart Usage
- Project Structure
- Results & Evaluation
- Tips for Reproducibility & Extension
- References
- License
- Contact & Acknowledgements
Main Steps:
- Input: Chest X-ray image.
- Preprocessing: Resize, crop, normalize.
- Feature Extraction: ViT encodes the image.
- Similarity Matching: Cosine similarity to database features.
- Prompt Construction: Most similar imageβs MeSH/clinical findings used as GPT-2 prompt.
- Report Generation: Fine-tuned GPT-2 generates detailed radiology report.
- Merges Indiana OpenI datasets, cleans and structures report text, applies standard image preprocessing.
- Leverages a ViT base model (patch size 16x224), extracting 768-dim features per patch for every X-ray.
- Efficient GPU processing for large datasets.
- GPT-2 model is fine-tuned on cleaned radiology reports for coherent, clinical text.
- Uses custom dataset, batching, and loss monitoring.
- New X-rays are matched by feature similarity; MeSH terms from the most similar image act as GPT-2 prompts.
- Generated report is closely tied to actual radiological findings.
- Automatic scoring: Perplexity, BLEU, ROUGE, BERTScore.
- Visual analysis: t-SNE feature plots, loss curves, word clouds.
- Source: Indiana University Chest X-ray OpenI
- Included: PNG images,
indiana_reports.csv,indiana_projections.csv - Data Path: Place all data inside
/data/(see below). - Privacy: Fully anonymized, public research dataset.
Prerequisites:
- Python 3.10+
- CUDA-enabled GPU recommended
Clone and Install:
git clone https://github.com/Kheem-Dh/RadiologyReportGen-AI.git
cd RadiologyReportGen-AI
pip install -r requirements.txt-
Preprocess Data
python scripts/preprocess_data.py
-
Extract ViT Features
python scripts/extract_features.py
-
Fine-Tune GPT-2
python scripts/train_gpt2.py
-
Generate a Report
python scripts/generate_report.py
-
Evaluate Performance
python scripts/evaluate.py
RadiologyReportGen-AI/
βββ data/ # Place your dataset files and images here
β βββ indiana_reports.csv
β βββ indiana_projections.csv
β βββ images/
βββ screenshots/ # Place all screenshots and diagrams here
βββ notebooks/
β βββ Radiology_Report_Generation.ipynb
βββ src/
β βββ data_preprocessing.py
β βββ feature_extraction.py
β βββ report_generation.py
β βββ integration.py
β βββ evaluation.py
β βββ utils.py
βββ scripts/
β βββ preprocess_data.py
β βββ extract_features.py
β βββ train_gpt2.py
β βββ generate_report.py
β βββ evaluate.py
βββ requirements.txt
βββ README.md
βββ LICENSE
-
Quantitative Metrics
- Perplexity: Lower = better model confidence
- BLEU, ROUGE, BERTScore: High scores show strong clinical and linguistic relevance
-
Qualitative
- Reports accurately reflect findings, impressions, and medical context.
- Handles complex or rare cases effectively.
Sample Evaluation Table:
| Example | BLEU | ROUGE-1 | BERTScore F1 | Report Quality |
|---|---|---|---|---|
| 1 | 0.14 | 1.00 | 0.91 | Excellent |
| 2 | 0.32 | 1.00 | 0.83 | Excellent |
- Use a virtual environment (venv or conda) for clean installs.
- Place only small sample data in repo; large datasets should be referenced via
/data/. - Add your own X-ray images for demo by dropping them in
/data/images/and updating paths. - Save your generated outputs (loss curves, t-SNE, report examples) in
/screenshots/for your portfolio. - Notebook for EDA & Exploration: Use the provided Jupyter notebook for visualization, prototyping, and presentation.
- Colab Demo: Add a Colab badge for quick web-based demos (see badge at top).
- Extend for new tasks: The modular
/src/codebase can be adapted for MRI, CT, or other modalities with minimal changes.
- Full MS Thesis PDF:
A Hybrid Approach for Automated Radiology Report Generation and Summarization using Vision Transformers and Language Models
by Kheem Parkash Dharmani, FAST NUCES, Islamabad, 2024.
See the thesis for full methodology, extended results, and complete reference list.
See the full thesis PDF for the complete reference list and in-depth literature review.
- Mohsan, M. M., Akram, M. U., et al. "Vision Transformer and Language Model Based Radiology Report Generation." IEEE Access, 2022.
- Li, M., Liu, R., Wang, F., et al. "Auxiliary signal-guided knowledge encoder-decoder for medical report generation." WWW, 2023.
- Sirshar, M., Paracha, M. F. K., et al. "Attention based automated radiology report generation using CNN and LSTM." PLOS ONE, 2022.
- [Full reference list in thesis and
REFERENCES.mdif desired.]
Author: Kheem Parkash Dharmani MS Data Science, FAST NUCES Islamabad Supervisor: Dr. Ejaz Ahmed
Acknowledgements: Dr. Ahmad Raza Shahid, family, mentors, and the FAST NUCES community.
For Questions, Issues, or Collaboration:
This repository is based on the MS Thesis: "A Hybrid Approach for Automated Radiology Report Generation and Summarization using Vision Transformers and Language Models", 2024.




