📄 Paper | 🤗 Dataset | 📚 Notebooks | 📖 Docs
TransUNet is a hybrid deep learning architecture combining CNN encoders with Vision Transformers for medical image segmentation. This implementation focuses on automated pancreas segmentation from CT scans using the Medical Segmentation Decathlon dataset.
Pancreas segmentation from CT imaging is challenging due to:
- Small organ size (<1% of scan volume)
- Variable shape across patients
- Low contrast with surrounding soft tissue
TransUNet addresses these challenges through:
- ✅ Multi-scale CNN feature extraction
- ✅ Global context modeling via transformer self-attention
- ✅ U-Net decoder with skip connections for precise localization
- ✅ Hybrid loss (Dice + Cross-Entropy) for class imbalance
- 🏗️ Complete TransUNet implementation from scratch (533 lines)
- 🔬 MONAI preprocessing pipeline with HU windowing & isotropic resampling
- ⚡ 2D slice training for memory efficiency on consumer GPUs
- 📊 3D volume inference with slice-and-stack aggregation
- 🎯 Hybrid loss function combining Dice & Cross-Entropy
- 🔍 Attention visualization for model interpretability
TransUNet pipeline consists of three components:
Input CT (224×224) → CNN Encoder → Transformer → CNN Decoder → Segmentation Map
↓ ↓ ↑
Multi-scale Global Skip
Features Context Connections
| Variant | Embed Dim | Heads | Layers | Parameters | Memory |
|---|---|---|---|---|---|
| Small | 384 | 6 | 6 | 17M | ~4GB |
| Base | 768 | 12 | 12 | 105M | ~12GB |
| Large | 1024 | 16 | 24 | 300M | ~24GB |
Medical Segmentation Decathlon - Task07 Pancreas
| Attribute | Details |
|---|---|
| Volumes | 420 CT scans (282 train, 139 test) |
| Modality | Portal venous phase CT |
| Labels | Background (0), Pancreas (1), Tumor (2) |
| Format | NIfTI compressed (.nii.gz) |
| Size | ~11.4GB (compressed) |
| Source | Memorial Sloan Kettering Cancer Center |
| License | CC-BY-SA 4.0 |
Dataset is auto-downloaded via monai.apps.download_and_extract in Notebook 01.
Hardware Requirements:
- CUDA-capable GPU (recommended for training)
- 8GB+ VRAM for base variant, 4GB for small variant
- 16GB+ system RAM
Software Requirements:
- Python 3.9+
- CUDA 11.8+ (for GPU acceleration)
# Clone the repository
git clone https://github.com/ihatesea69/TransUNet-Pancreas-Segmentation.git
cd TransUNet-Pancreas-Segmentation
# Create virtual environment and install dependencies
uv sync
# Activate virtual environment
.venv\Scripts\activate # Windows
source .venv/bin/activate # Linux/macOSDependencies (managed via pyproject.toml):
torch>=2.0.0- Deep learning frameworkmonai>=1.3.0- Medical imaging toolkitnibabel>=5.0.0- NIfTI file I/Oeinops>=0.8.0- Tensor operationsmatplotlib,numpy,scikit-learn- Scientific computing
# Launch Jupyter for interactive exploration
jupyter notebook notebooks/01_Data_Exploration_and_Processing.ipynb
# Or use CLI for training/inference
python main.py train --variant small --epochs 50
python main.py inference --checkpoint model.pth --input scan.nii.gzTransUNet-Pancreas-Segmentation/
│
├── 📦 src/ # Core source code
│ ├── model.py # TransUNet architecture (533 lines)
│ ├── dataset.py # SlicingDataset for 2D extraction
│ ├── transforms.py # MONAI preprocessing pipeline
│ ├── loss.py # HybridLoss (Dice + CrossEntropy)
│ └── utils.py # Visualization utilities
│
├── 📓 notebooks/ # Interactive Jupyter notebooks
│ ├── 01_Data_Exploration_and_Processing.ipynb
│ ├── 02_Model_Architecture.ipynb
│ ├── 03_Training_Pipeline.ipynb
│ └── 04_Evaluation_and_Demo.ipynb
│
├── 📄 paper/ # Academic paper (LaTeX)
│ ├── paper.tex # Main document
│ ├── references.bib # Bibliography
│ └── paper.pdf # Compiled PDF
│
├── 📖 docs/ # Documentation
│ ├── QUICKSTART.md # Quick setup guide
│ ├── CHANGELOG.md # Version history
│ └── DEPLOY.md # GitHub Pages deployment
│
├── 🔧 scripts/ # Utility scripts
│ ├── compile.ps1 # LaTeX compilation
│ └── clean.ps1 # Project cleanup
│
├── 📁 data/ # Dataset (~11.4GB, gitignored)
├── 💾 checkpoints/ # Model weights (gitignored)
├── 📊 outputs/ # Results & metadata
├── 🖼️ assets/ # Images & diagrams
│
├── main.py # CLI entry point
├── pyproject.toml # UV project config
├── index.html # GitHub Pages landing page
├── LICENSE # MIT License
└── README.md # This file
The project is organized into 4 self-contained Jupyter notebooks:
jupyter notebook notebooks/01_Data_Exploration_and_Processing.ipynb- Downloads MSD Task07 Pancreas dataset (~11.4GB)
- Creates 80/10/10 train/val/test splits
- Defines MONAI preprocessing pipeline
- Visualizes CT volumes with segmentation masks
jupyter notebook notebooks/02_Model_Architecture.ipynb- TransUNet component-by-component explanation
- Multi-head attention visualization
- Forward pass verification
- Parameter count analysis
jupyter notebook notebooks/03_Training_Pipeline.ipynb- SlicingDataset for 2D slice extraction
- Hybrid loss (Dice + CrossEntropy)
- Training loop with validation
- Model checkpointing
Training Configuration:
CONFIG = {
"model_variant": "small", # or "base", "large"
"batch_size": 8,
"num_epochs": 50,
"learning_rate": 1e-4,
"img_size": 224,
}jupyter notebook notebooks/04_Evaluation_and_Demo.ipynb- Load trained checkpoints
- 3D volume inference (slice-by-slice)
- Dice score & Hausdorff distance metrics
- Visualization with mask overlays
For scripted workflows, use the CLI:
# Training
python main.py train \
--variant small \
--batch-size 8 \
--epochs 50 \
--lr 1e-4
# Inference
python main.py inference \
--checkpoint checkpoints/best_model.pth \
--input data/test_scan.nii.gz \
--output outputs/prediction.nii.gzOptions:
--variant: Model size (small|base|large)--batch-size: Training batch size (default: 8)--epochs: Number of training epochs (default: 50)--lr: Learning rate (default: 1e-4)
from src.model import create_transunet
from src.transforms import get_val_transforms
import torch
# Create model
model = create_transunet(
img_size=224,
in_channels=1,
out_channels=2,
variant="small"
)
# Load trained weights
checkpoint = torch.load("checkpoints/best_model.pth")
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()
# Inference on 2D slice
with torch.no_grad():
input_slice = torch.randn(1, 1, 224, 224)
output = model(input_slice)
prediction = torch.argmax(output, dim=1)MONAI-based preprocessing ensures consistent data formatting:
| Step | Transform | Purpose |
|---|---|---|
| 1 | LoadImaged |
Load NIfTI files |
| 2 | EnsureChannelFirstd |
Add channel dimension |
| 3 | Orientationd |
Standardize to RAS orientation |
| 4 | Spacingd |
Resample to 1.0mm isotropic |
| 5 | ScaleIntensityRanged |
HU windowing [-175, 250] → [0, 1] |
| 6 | CropForegroundd |
Remove empty background |
HU Windowing: Focuses on soft tissue (pancreas, liver, kidneys) while suppressing bone and air.
Hybrid Loss combines complementary objectives:
Total Loss = 0.5 × Dice Loss + 0.5 × Cross-Entropy Loss- Dice Loss: Handles extreme class imbalance (pancreas < 1% of volume)
- Cross-Entropy Loss: Provides stable per-pixel gradients
Based on TransUNet paper benchmarks:
| Metric | Expected Range |
|---|---|
| Dice Score | 0.75 - 0.85 |
| Hausdorff Distance (95%) | 5 - 15 mm |
| Inference Time | ~2-3 sec/slice (GPU) |
Performance varies with model variant and GPU hardware.
This project is licensed under the MIT License. See LICENSE for details.
The MSD Task07 Pancreas dataset is licensed under CC-BY-SA 4.0.
If you use TransUNet in your research, please cite the original paper:
@article{chen2021transunet,
title={TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation},
author={Chen, Jieneng and Lu, Yongyi and Yu, Qihang and Luo, Xiangde and Adeli, Ehsan and Wang, Yan and Lu, Le and Yuille, Alan L and Zhou, Yuyin},
journal={arXiv preprint arXiv:2102.04306},
year={2021}
}For the MSD dataset:
@article{simpson2019large,
title={A large annotated medical image dataset for the development and evaluation of segmentation algorithms},
author={Simpson, Amber L and Antonelli, Michela and Bakas, Spyridon and others},
journal={arXiv preprint arXiv:1902.09063},
year={2019}
}-
Chen et al. (2021) - TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv:2102.04306
-
Simpson et al. (2019) - A large annotated medical image dataset for the development and evaluation of segmentation algorithms. arXiv:1902.09063
-
Ronneberger et al. (2015) - U-Net: Convolutional Networks for Biomedical Image Segmentation. MICCAI 2015.
-
Dosovitskiy et al. (2020) - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR 2021.
-
Medical Segmentation Decathlon - http://medicaldecathlon.com/
- Medical Segmentation Decathlon organizers for the publicly available dataset
- MONAI team for the medical imaging framework
- TransUNet authors for the architecture design
- Memorial Sloan Kettering Cancer Center for data collection
- Issues: GitHub Issues
- Discussions: GitHub Discussions
For questions about the implementation, please open an issue on GitHub.
