This repository contains the code and experiments for my work on patient-specific Beam Orientation Optimization (BOO) in head-and-neck radiotherapy using Deep Q-Learning (DQN). The agent learns to select clinically meaningful gantry angles directly from voxel-level anatomy, without repeated Monte Carlo dose simulations.
TL;DR: Given CT anatomy and organ masks, we learn to predict 5 high-value beam angles in under a second, improving PTV coverage and sparing OARs compared to standard equiangular plans.
- 🔍 Overview & Motivation
- 📈 Results (100 Patients)
- 📂 Repository Structure
- ⚙️ Installation
▶️ Evaluation- 🏋️ Training
- 🧬 Model Summary
- 🔮 Future Work
- 📄 Citation
- 🙏 Acknowledgements
Selecting clinically optimal beam orientations is crucial in radiotherapy.
Conventional BOO methods:
- Not personalized to anatomy ❌
- Computationally infeasible at large search spaces ❌
- Insensitive to voxel-level geometry ❌
- Require repeated full dose simulations ❌
We formulate BOO as a sequential decision-making problem and train a Deep Q-Network to:
- Extract voxel-level anatomical structure from CT + organ masks
- Sequentially choose 5 distinct beam angles
- Accumulate a pseudo-physical dose surrogate over timesteps
- Optimize reward balancing:
- PTV coverage (good)
- OAR sparing (avoid toxicity)
Inference time: <1 second per patient.
Deep-Reinforcement-Learning-for-Personalized-Radiotherapy-Beam-Orientation-Optimization/
├── configs/
│ └── experiments.json
├── figures/
│ ├── success_cases/ # Best examples
│ ├── typical_cases/ # Typical
│ ├── failure_cases/ # Failure cases
│ └── anomaly_cases/ # Special discussion
├── models/
│ └── best_dqn_model.pt
├── results/
│ ├── summary_results.md
│ └── test_results.csv
├── utils/
│ └── repro.py
├── baselines.py
├── eval_main.py
├── train.py
├── requirements.txt
└── README.md
git clone https://github.com/krishdef7/Deep-Reinforcement-Learning-for-Personalized-Radiotherapy-Beam-Orientation-Optimization.git
cd Deep-Reinforcement-Learning-for-Personalized-Radiotherapy-Beam-Orientation-Optimization
pip install -r requirements.txtWe use the OpenKBP dataset (head-and-neck):
- CT volumes
- PTV mask
- OAR masks (cord, brainstem, L/R parotids, mandible)
Split:
- Train: 200
- Validation: 40
- Test: 100
🛠 Users must download OpenKBP separately and update paths in configs/experiments.json
python eval_main.pyOutputs will include:
results/test_results.csv— per-patient metrics (D95, coverage, OAR doses)figures/patient_XXX_dose_dqn.png— DQN dose maps overlaid on CTfigures/patient_XXX_dvh_dqn.png— dose–volume histogram plots
python train.pyTraining summary:
- Replay buffer: 3000
- Batch size: 32
- γ = 0.95
- ε-greedy: 0.90 → 0.10
- Target network update every 5 epochs
- Converges in ~3.5 hours CPU
| Method | Coverage | D95 |
|---|---|---|
| DQN (ours) | 0.8059 | 0.2405 |
| Equiangular | 0.6867 | 0.1207 |
| Heuristic | 0.6397 | 0.0949 |
| RandomMean | 0.5883 | 0.0554 |
- +11.9% absolute improvement in PTV coverage
- ~2× improvement in D95
- <1 second per patient (post-training)
- Strong generalization across 100 unseen CT cases
State (8 channels):
- CT
- PTV mask
- 5 OAR masks
- Accumulating dose surrogate
Actions:
- 36 discrete gantry angles (0–350° at 10° spacing)
- DQN selects 5 sequential non-repeating beams
Architecture:
- 5× Conv layers + BN + ReLU
- Bottleneck: 4×4×256
- Fully connected head
- Masking to prevent repeated beams
- Model parameters: ~3.4M
Dose Surrogate:
- Ray-traced geometric field
- Gaussian blur → approximate scatter
- Accumulate dose per timestep
Reward:
- Terminal reward based on:
- ↑ D95 and coverage
- ↓ mean OAR dose
Located in:
figures/success_cases/
figures/typical_cases/
figures/failure_cases/
figures/anomaly_cases/
High-dose regions remain inside PTV and spare critical OARs.
DVH curves reflect improved target coverage.
All evaluated under identical surrogate dose to ensure fair comparison:
- Equiangular beams
- Geometry heuristic
- Random non-repeating beams (mean)
- Higher D95 → higher local tumor control likelihood
- OAR avoidance reduces severe toxicity risk
- <1s runtime enables:
- Adaptive planning
- Online replanning
- QA workflow assistive tools
- Surrogate dose ≠ true Monte Carlo dose
- Current version operates on single 2D slice
- Trained only on head-and-neck geometry
- Research prototype — not clinically deployable
- 3D DQN / U-Net encoders
- GPU-based Monte Carlo integration
- Learned neural surrogate physics
- Multi-objective RL (Pareto optimal)
- Online robustness against anatomical changes
- Multi-disease training (lung, pelvis, liver)
If you use this repository, please cite:
Deep Reinforcement Learning for Personalized Radiotherapy Beam Orientation Optimization.
Krish Garg, IIT Roorkee, 2025.
- OpenKBP dataset contributors
- IIT Roorkee — Department of Physics (institutional affiliation)
- No external funding used