Anatomical Segmentation of the Prostate Gland from MRI Images using Deep Learning

Index

Overview
Installation
Configuration
Usage
Project Structure
Training
Results
Contributing
License
Acknowledgments

Overview

This repository provides a comprehensive pipeline for anatomical segmentation of the prostate gland from MRI images using deep learning techniques, as well as the reports from training a segmentation model using the nnU-Net framework. It focuses on multi-dataset preprocessing, analysis, and preparation for training segmentation models compatible with frameworks like nnU-Net and MONAI. A training script is also provided, using U-Net, although the reported metrics and results come from nnUnet. Notebooks with dataset exploration are provided, as well as modules for data analysis and manipulation.

The results, discussed in the Results section, show promising performance similar to state-of-the-art methods.

Goal and motivation

Automatic segmentation of prostate zones (PZ, TZ) is essential since ≈70–75% of clinically significant prostate cancers (csPCa) originate in the peripheral zone (PZ).

Also, the PI-RADS scoring system depends on anatomical zoning—DWI/ADC for lesions in PZ, T2W for TZ—requiring accurate zonal masks for correct lesion assessment.

Furthermore, manual zonal delineation on T2W images is time-consuming, tedious, and prone to inter- and intra-observer variability, especially given anatomical heterogeneity and low tissue contrast.

Since recent reviews show Dice scores ~0.90 (whole gland), ~0.87 (TZ), ~0.79 (PZ)—comparable to expert radiologists across varied datasets, the objective is to develop a model that can achieve similar performance, and attempt to make it more robust by utilizing multiple datasets.

Key Features

Multi-dataset support: Works with PICAI, Prostate158, and Medical Segmentation Decathlon datasets
Comprehensive preprocessing pipeline: Includes resampling, ROI extraction, N4 bias correction, and other data preparation steps
Parallel processing: Efficient batch processing with progress tracking
Data exploration tools: Interactive analysis and visualization of medical imaging datasets
nnU-Net compatibility: Automatic data structuring following nnU-Net conventions
Zonal segmentation: Support for peripheral zone (PZ) and transition zone (TZ) segmentation

Supported Datasets

Dataset	Description	Cases	Modalities	Link
PICAI	PI-CAI Challenge dataset with expert annotations for prostate cancer detection	1,500 public cases	T2w, DWI, ADC	pi-cai.grand-challenge.org
Prostate158	Expert-annotated 3T MRI dataset for anatomical zones and cancer detection	158 cases	T2w, DWI, ADC	GitHub
Medical Decathlon Task05	Prostate segmentation task from Medical Segmentation Decathlon	48 cases	T2w, ADC	medicaldecathlon.com

Installation

Prerequisites

Python 3.8 or higher
CUDA-compatible GPU (for training, preprocessing can be done in CPU in reasonable time)

Dependencies Installation

Clone the repository:

git clone https://github.com/your-username/mri-prostate-segmentation-deeplearning.git
cd mri-prostate-segmentation-deeplearning

Install required packages:

pip install -r requirements.txt

Core Dependencies

monai: Medical imaging AI framework
nibabel: NIfTI file reading and writing
simpleitk: Medical image processing and analysis
matplotlib: Data visualization
numpy: Numerical computing
pandas: Data analysis and manipulation
ipywidgets: Interactive Jupyter notebook widgets

Configuration

Dataset Setup

Download datasets using the provided script:

./download.sh

This downloads the PICAI dataset folds from Zenodo. You might want to download the rest of the datasets as well.

Configure data paths in the preprocessing scripts:

# Update paths in loadingData/build_data_*.py
RAW_DATA_ROOT = "/path/to/your/datasets"
OUT_ROOT = "/path/to/output/nnUNet_raw/"

Set up output directories for nnU-Net format:

mkdir -p /path/to/output/nnUNet_raw/Dataset001_picai/{imagesTr,labelsTr}

Preprocessing Parameters

Key parameters can be configured in the build scripts:

# Resampling spacing (x, y, z) in mm
spacing = (0.5, 0.5, 3.0)

# ROI crop factor (0.6 = 60% crop around center)
crop_factor = 0.65

# Label value swapping for consistency across datasets
bool_swap_mask_values = False

Usage

Quick Start

Data Exploration:

# Open Jupyter notebooks for dataset analysis
jupyter notebook exploratoryAnalysis/explore_picai.ipynb

Dataset Preprocessing:

# Preprocess PICAI dataset
python loadingData/build_data_picai.py

# Preprocess other datasets
python loadingData/build_data_decathlon.py
python loadingData/build_data_158.py

Preprocessing Pipeline Testing:

# Test individual preprocessing functions
python preprocessing/TestPreprocessing.py

Detailed Workflow

1. Dataset Exploration

Use the DataAnalyzer class to explore datasets. Example:

from exploratoryAnalysis.DataAnalyzer import DataAnalyzer

# Initialize analyzer
analyzer = DataAnalyzer("/path/to/datasets")
analyzer.regex = ".*_t2w.mha$"  # Filter for T2-weighted images

# Collect metadata
df = analyzer.collect_metadata_from_subdirs("picai_folds/picai_images_fold0")

# Visualize images
analyzer.show_image("path/to/image.mha", save="output.png")

# Generate intensity histograms
analyzer.image_intensity_histogram("path/to/image.mha", plot=True)

2. Preprocessing Pipeline

Create custom preprocessing pipelines. Example:

from preprocessing.Pipeline import Pipeline
from preprocessing.PreProcessor import *

# Initialize pipeline
pipeline = Pipeline()

# Add preprocessing steps
pipeline.add(load_image) \
        .add(resample_image, interpolator=sitk.sitkLinear, out_spacing=(0.5, 0.5, 3.0)) \
        .add(get_region_of_interest, crop=0.65) \
        .add(n4_bias_field_correction) \
        .add(normalize_image, method="minmax")

# Process images
processed_images = pipeline.run(image_paths, parallel=True, max_workers=4)

3. Batch Processing

Process large datasets efficiently. Example:

# Process image-label pairs in parallel
from preprocessing.Utils import preprocess_pairs_parallel, save_pairs_parallel

# Define pipelines for images and labels
img_pipeline = Pipeline().add(load_image).add(resample_image)
lbl_pipeline = Pipeline().add(load_image).add(resample_image, interpolator=sitk.sitkNearestNeighbor)

# Process pairs
paired_results = preprocess_pairs_parallel(
    list(zip(image_paths, label_paths)), 
    img_pipeline, 
    lbl_pipeline, 
    workers=8
)

# Save results
out_images, out_labels = save_pairs_parallel(
    paired_results, 
    output_image_paths, 
    output_label_paths, 
    workers=8
)

Available Preprocessing Functions

load_image(path): Load medical images with 4D→3D conversion if needed
resample_image(image, out_spacing, interpolator): Resample to target spacing
get_region_of_interest(image, crop): Extract ROI around prostate
n4_bias_field_correction(image): Remove MRI bias field artifacts
normalize_image(image, method): Z-score or min-max normalization
combine_zonal_masks(mask, pz_val, tz_val): Combine PZ and TZ masks to form a WG mask
swap_zonal_mask_values(mask, val1, val2): Swap mask label values
reorient_image(image, orientation): Reorient to standard orientation

Project Structure

.
├── exploratoryAnalysis/          # Dataset exploration and analysis
│   ├── DataAnalyzer.py           # Main analysis class
│   ├── explore_picai.ipynb       # PICAI dataset exploration
│   ├── explore_decathlon.ipynb   # Decathlon dataset exploration
│   └── explore_158.ipynb         # Prostate158 dataset exploration
├── loadingData/                  # Dataset preprocessing scripts
│   ├── build_data_picai.py       # PICAI preprocessing
│   ├── build_data_decathlon.py   # Decathlon preprocessing
│   ├── build_data_158.py         # Prostate158 preprocessing
│   └── data.template.json        # nnU-Net metadata template
├── preprocessing/                # Core preprocessing modules
│   ├── Pipeline.py               # Preprocessing pipeline framework
│   ├── PreProcessor.py           # Individual preprocessing functions
│   ├── Utils.py                  # Utility functions
│   └── TestPreprocessing.py      # Visual testing functions
├── requirements.txt              # Python dependencies
├── download.sh                   # Dataset download script
└── README.md                     # This file

Training

A training script is provided to train a U-Net model on the preprocessed datasets.

The final training of the model, however, was done using the nnU-Net framework, which is highly recommended for its automated training and evaluation capabilities. The 3D-fullres U-Net architecture was used, which is suitable for volumetric data like MRI. Five folds of the PICAI dataset were used for training, with the model trained on the T2-weighted images and corresponding masks, and tested on a holdout set.

The parameters reported by the nnU-Net training script were as follows:

Architecture: 3D-fullres U-Net
Epochs: 1000
Batch size: 2
Learning rate: 0.01
Optimizer: SGD
Loss function: DiceCE (RobustCrossEntropyLoss + MemoryEfficientSoftDiceLoss)
- As defined in the nnU-Net paper, the formulation is a combination of cross-entropy and Dice loss, formulated as follows:

$$L_{total} = L_{dice} + L_{CE}$$

$$L_{dice} = -\frac{2}{|K|} \sum_{k \in K} \frac{\sum_{i \in I} u_i^k v_i^k}{\sum_{i \in I} u_i^k + \sum_{i \in I} v_i^k}$$

where $u$ is the softmax output of the network and $v$ is a one hot encoding of the ground truth segmentation map.

Weight decay: 3e-05
Activation: LeakyReLU
Patch size: 24 × 256 × 256

Results

Results from the nnU-Net training on the PICAI dataset showed promising performance, with the model achieving competitive Dice scores across different anatomical zones similar to state-of-the-art methods, which stand at around 0.90 for the whole gland, 0.87 for the transition zone (TZ), and 0.79 for the peripheral zone (PZ).

The metrics reported where calculated on a validation set on each one of the five folds of the PICAI dataset, and are as follows:

We can see the global overview of the results on the validation set for all folds. We see a difference in performance between the PZ and TZ, since the latter is easier to segment than the former.

The dice score formulation is as follows:

$$\text{Dice} = \frac{2 \cdot |A \cap B|}{|A| + |B|}$$

where $A$ is the predicted mask and $B$ is the ground truth mask.

IoU (Intersection over Union) is also reported, which is defined as:

$$\text{IoU} = \frac{|A \cap B|}{|A \cup B|}$$

True positive (TP), false positive (FP), and false negative (FN) counts are also provided for each fold, and where calculated voxel-wise, meaning that the TP count is the number of voxels that are correctly predicted as part of the prostate zone, FP is the number of voxels that are incorrectly predicted as part of the prostate zone, and FN is the number of voxels that are part of the prostate zone but were not predicted as such.

The per-case results show the Dice scores distribution across all cases in the validation sets.

A fold comparison is provided (the third fold is missing as it wasn't reported in the nnU-Net training):

The rest of the datasets are to be tested.

Best case example (dice score ~0.98)

Prediction

Ground Truth

Worst case example (dice score ~0.7)

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/new-feature)
Commit your changes (git commit -am 'Add new feature')
Push to the branch (git push origin feature/new-feature)
Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

PI-CAI Challenge for providing the PICAI dataset
Prostate158 dataset contributors
Medical Segmentation Decathlon organizers
nnU-Net framework developers
MONAI community for medical imaging AI tools

Citations

For training, the annotations derived from Bosma et al. where used. https://grand-challenge.org/algorithms/prostate-segmentation/

@article{PICAI_Study_design, author={Anindo Saha AND Jasper J. Twilt AND Joeran S. Bosma AND Bram van Ginneken AND Derya Yakar AND Mattijs Elschot AND Jeroen Veltman AND Jurgen Fütterer AND Maarten de Rooij AND Henkjan Huisman}, title={{Artificial Intelligence and Radiologists at Prostate Cancer Detection in MRI: The PI-CAI Challenge (Study Protocol)}}, year={2022}, doi={10.5281/zenodo.6667655} }

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
docs		docs
exploratoryAnalysis		exploratoryAnalysis
loadingData		loadingData
preprocessing		preprocessing
train		train
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
download.sh		download.sh
requirements.txt		requirements.txt

License

usbt0p/mri-prostate-segmentation-deeplearning

Folders and files

Latest commit

History

Repository files navigation

Anatomical Segmentation of the Prostate Gland from MRI Images using Deep Learning

Index

Overview

Goal and motivation

Key Features

Supported Datasets

Installation

Prerequisites

Dependencies Installation

Core Dependencies

Configuration

Dataset Setup

Preprocessing Parameters

Usage

Quick Start

Detailed Workflow

1. Dataset Exploration

2. Preprocessing Pipeline

3. Batch Processing

Available Preprocessing Functions

Project Structure

Training

Results

Best case example (dice score ~0.98)

Prediction

Ground Truth

Worst case example (dice score ~0.7)

Contributing

License

Acknowledgments

Citations

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages