- Overview
- Installation
- Configuration
- Usage
- Project Structure
- Training
- Results
- Contributing
- License
- Acknowledgments
This repository provides a comprehensive pipeline for anatomical segmentation of the prostate gland from MRI images using deep learning techniques, as well as the reports from training a segmentation model using the nnU-Net framework. It focuses on multi-dataset preprocessing, analysis, and preparation for training segmentation models compatible with frameworks like nnU-Net and MONAI. A training script is also provided, using U-Net, although the reported metrics and results come from nnUnet. Notebooks with dataset exploration are provided, as well as modules for data analysis and manipulation.
The results, discussed in the Results section, show promising performance similar to state-of-the-art methods.
Automatic segmentation of prostate zones (PZ, TZ) is essential since ≈70–75% of clinically significant prostate cancers (csPCa) originate in the peripheral zone (PZ).
Also, the PI-RADS scoring system depends on anatomical zoning—DWI/ADC for lesions in PZ, T2W for TZ—requiring accurate zonal masks for correct lesion assessment.
Furthermore, manual zonal delineation on T2W images is time-consuming, tedious, and prone to inter- and intra-observer variability, especially given anatomical heterogeneity and low tissue contrast.
Since recent reviews show Dice scores ~0.90 (whole gland), ~0.87 (TZ), ~0.79 (PZ)—comparable to expert radiologists across varied datasets, the objective is to develop a model that can achieve similar performance, and attempt to make it more robust by utilizing multiple datasets.
- Multi-dataset support: Works with PICAI, Prostate158, and Medical Segmentation Decathlon datasets
- Comprehensive preprocessing pipeline: Includes resampling, ROI extraction, N4 bias correction, and other data preparation steps
- Parallel processing: Efficient batch processing with progress tracking
- Data exploration tools: Interactive analysis and visualization of medical imaging datasets
- nnU-Net compatibility: Automatic data structuring following nnU-Net conventions
- Zonal segmentation: Support for peripheral zone (PZ) and transition zone (TZ) segmentation
Dataset | Description | Cases | Modalities | Link |
---|---|---|---|---|
PICAI | PI-CAI Challenge dataset with expert annotations for prostate cancer detection | 1,500 public cases | T2w, DWI, ADC | pi-cai.grand-challenge.org |
Prostate158 | Expert-annotated 3T MRI dataset for anatomical zones and cancer detection | 158 cases | T2w, DWI, ADC | GitHub |
Medical Decathlon Task05 | Prostate segmentation task from Medical Segmentation Decathlon | 48 cases | T2w, ADC | medicaldecathlon.com |
- Python 3.8 or higher
- CUDA-compatible GPU (for training, preprocessing can be done in CPU in reasonable time)
- Clone the repository:
git clone https://github.com/your-username/mri-prostate-segmentation-deeplearning.git
cd mri-prostate-segmentation-deeplearning
- Install required packages:
pip install -r requirements.txt
- monai: Medical imaging AI framework
- nibabel: NIfTI file reading and writing
- simpleitk: Medical image processing and analysis
- matplotlib: Data visualization
- numpy: Numerical computing
- pandas: Data analysis and manipulation
- ipywidgets: Interactive Jupyter notebook widgets
- Download datasets using the provided script:
./download.sh
This downloads the PICAI dataset folds from Zenodo. You might want to download the rest of the datasets as well.
- Configure data paths in the preprocessing scripts:
# Update paths in loadingData/build_data_*.py
RAW_DATA_ROOT = "/path/to/your/datasets"
OUT_ROOT = "/path/to/output/nnUNet_raw/"
- Set up output directories for nnU-Net format:
mkdir -p /path/to/output/nnUNet_raw/Dataset001_picai/{imagesTr,labelsTr}
Key parameters can be configured in the build scripts:
# Resampling spacing (x, y, z) in mm
spacing = (0.5, 0.5, 3.0)
# ROI crop factor (0.6 = 60% crop around center)
crop_factor = 0.65
# Label value swapping for consistency across datasets
bool_swap_mask_values = False
- Data Exploration:
# Open Jupyter notebooks for dataset analysis
jupyter notebook exploratoryAnalysis/explore_picai.ipynb
- Dataset Preprocessing:
# Preprocess PICAI dataset
python loadingData/build_data_picai.py
# Preprocess other datasets
python loadingData/build_data_decathlon.py
python loadingData/build_data_158.py
- Preprocessing Pipeline Testing:
# Test individual preprocessing functions
python preprocessing/TestPreprocessing.py
Use the DataAnalyzer
class to explore datasets. Example:
from exploratoryAnalysis.DataAnalyzer import DataAnalyzer
# Initialize analyzer
analyzer = DataAnalyzer("/path/to/datasets")
analyzer.regex = ".*_t2w.mha$" # Filter for T2-weighted images
# Collect metadata
df = analyzer.collect_metadata_from_subdirs("picai_folds/picai_images_fold0")
# Visualize images
analyzer.show_image("path/to/image.mha", save="output.png")
# Generate intensity histograms
analyzer.image_intensity_histogram("path/to/image.mha", plot=True)
Create custom preprocessing pipelines. Example:
from preprocessing.Pipeline import Pipeline
from preprocessing.PreProcessor import *
# Initialize pipeline
pipeline = Pipeline()
# Add preprocessing steps
pipeline.add(load_image) \
.add(resample_image, interpolator=sitk.sitkLinear, out_spacing=(0.5, 0.5, 3.0)) \
.add(get_region_of_interest, crop=0.65) \
.add(n4_bias_field_correction) \
.add(normalize_image, method="minmax")
# Process images
processed_images = pipeline.run(image_paths, parallel=True, max_workers=4)
Process large datasets efficiently. Example:
# Process image-label pairs in parallel
from preprocessing.Utils import preprocess_pairs_parallel, save_pairs_parallel
# Define pipelines for images and labels
img_pipeline = Pipeline().add(load_image).add(resample_image)
lbl_pipeline = Pipeline().add(load_image).add(resample_image, interpolator=sitk.sitkNearestNeighbor)
# Process pairs
paired_results = preprocess_pairs_parallel(
list(zip(image_paths, label_paths)),
img_pipeline,
lbl_pipeline,
workers=8
)
# Save results
out_images, out_labels = save_pairs_parallel(
paired_results,
output_image_paths,
output_label_paths,
workers=8
)
load_image(path)
: Load medical images with 4D→3D conversion if neededresample_image(image, out_spacing, interpolator)
: Resample to target spacingget_region_of_interest(image, crop)
: Extract ROI around prostaten4_bias_field_correction(image)
: Remove MRI bias field artifactsnormalize_image(image, method)
: Z-score or min-max normalizationcombine_zonal_masks(mask, pz_val, tz_val)
: Combine PZ and TZ masks to form a WG maskswap_zonal_mask_values(mask, val1, val2)
: Swap mask label valuesreorient_image(image, orientation)
: Reorient to standard orientation
.
├── exploratoryAnalysis/ # Dataset exploration and analysis
│ ├── DataAnalyzer.py # Main analysis class
│ ├── explore_picai.ipynb # PICAI dataset exploration
│ ├── explore_decathlon.ipynb # Decathlon dataset exploration
│ └── explore_158.ipynb # Prostate158 dataset exploration
├── loadingData/ # Dataset preprocessing scripts
│ ├── build_data_picai.py # PICAI preprocessing
│ ├── build_data_decathlon.py # Decathlon preprocessing
│ ├── build_data_158.py # Prostate158 preprocessing
│ └── data.template.json # nnU-Net metadata template
├── preprocessing/ # Core preprocessing modules
│ ├── Pipeline.py # Preprocessing pipeline framework
│ ├── PreProcessor.py # Individual preprocessing functions
│ ├── Utils.py # Utility functions
│ └── TestPreprocessing.py # Visual testing functions
├── requirements.txt # Python dependencies
├── download.sh # Dataset download script
└── README.md # This file
A training script is provided to train a U-Net model on the preprocessed datasets.
The final training of the model, however, was done using the nnU-Net framework, which is highly recommended for its automated training and evaluation capabilities. The 3D-fullres U-Net architecture was used, which is suitable for volumetric data like MRI. Five folds of the PICAI dataset were used for training, with the model trained on the T2-weighted images and corresponding masks, and tested on a holdout set.
The parameters reported by the nnU-Net training script were as follows:
-
Architecture: 3D-fullres U-Net
-
Epochs: 1000
-
Batch size: 2
-
Learning rate: 0.01
-
Optimizer: SGD
-
Loss function: DiceCE (RobustCrossEntropyLoss + MemoryEfficientSoftDiceLoss)
- As defined in the nnU-Net paper, the formulation is a combination of cross-entropy and Dice loss, formulated as follows:
where
- Weight decay: 3e-05
- Activation: LeakyReLU
- Patch size: 24 × 256 × 256
Results from the nnU-Net training on the PICAI dataset showed promising performance, with the model achieving competitive Dice scores across different anatomical zones similar to state-of-the-art methods, which stand at around 0.90 for the whole gland, 0.87 for the transition zone (TZ), and 0.79 for the peripheral zone (PZ).
The metrics reported where calculated on a validation set on each one of the five folds of the PICAI dataset, and are as follows:
We can see the global overview of the results on the validation set for all folds. We see a difference in performance between the PZ and TZ, since the latter is easier to segment than the former.
The dice score formulation is as follows:
where
IoU (Intersection over Union) is also reported, which is defined as:
True positive (TP), false positive (FP), and false negative (FN) counts are also provided for each fold, and where calculated voxel-wise, meaning that the TP count is the number of voxels that are correctly predicted as part of the prostate zone, FP is the number of voxels that are incorrectly predicted as part of the prostate zone, and FN is the number of voxels that are part of the prostate zone but were not predicted as such.
The per-case results show the Dice scores distribution across all cases in the validation sets.
A fold comparison is provided (the third fold is missing as it wasn't reported in the nnU-Net training):
The rest of the datasets are to be tested.
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/new-feature
) - Commit your changes (
git commit -am 'Add new feature'
) - Push to the branch (
git push origin feature/new-feature
) - Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- PI-CAI Challenge for providing the PICAI dataset
- Prostate158 dataset contributors
- Medical Segmentation Decathlon organizers
- nnU-Net framework developers
- MONAI community for medical imaging AI tools
For training, the annotations derived from Bosma et al. where used. https://grand-challenge.org/algorithms/prostate-segmentation/
@article{PICAI_Study_design, author={Anindo Saha AND Jasper J. Twilt AND Joeran S. Bosma AND Bram van Ginneken AND Derya Yakar AND Mattijs Elschot AND Jeroen Veltman AND Jurgen Fütterer AND Maarten de Rooij AND Henkjan Huisman}, title={{Artificial Intelligence and Radiologists at Prostate Cancer Detection in MRI: The PI-CAI Challenge (Study Protocol)}}, year={2022}, doi={10.5281/zenodo.6667655} }