Skip to content

Prostate segmentation of T2 weighted Magnetic Resonance Images using Deep Learning methods with full data acquisition and preprocessing pipeline.

License

Notifications You must be signed in to change notification settings

usbt0p/mri-prostate-segmentation-deeplearning

Repository files navigation

Anatomical Segmentation of the Prostate Gland from MRI Images using Deep Learning

Python 3.8+ License: MIT

Index

Overview

This repository provides a comprehensive pipeline for anatomical segmentation of the prostate gland from MRI images using deep learning techniques, as well as the reports from training a segmentation model using the nnU-Net framework. It focuses on multi-dataset preprocessing, analysis, and preparation for training segmentation models compatible with frameworks like nnU-Net and MONAI. A training script is also provided, using U-Net, although the reported metrics and results come from nnUnet. Notebooks with dataset exploration are provided, as well as modules for data analysis and manipulation.

The results, discussed in the Results section, show promising performance similar to state-of-the-art methods.

sample segmentation

Goal and motivation

Automatic segmentation of prostate zones (PZ, TZ) is essential since ≈70–75% of clinically significant prostate cancers (csPCa) originate in the peripheral zone (PZ).

Also, the PI-RADS scoring system depends on anatomical zoning—DWI/ADC for lesions in PZ, T2W for TZ—requiring accurate zonal masks for correct lesion assessment.

Furthermore, manual zonal delineation on T2W images is time-consuming, tedious, and prone to inter- and intra-observer variability, especially given anatomical heterogeneity and low tissue contrast.

Since recent reviews show Dice scores ~0.90 (whole gland), ~0.87 (TZ), ~0.79 (PZ)—comparable to expert radiologists across varied datasets, the objective is to develop a model that can achieve similar performance, and attempt to make it more robust by utilizing multiple datasets.

Key Features

  • Multi-dataset support: Works with PICAI, Prostate158, and Medical Segmentation Decathlon datasets
  • Comprehensive preprocessing pipeline: Includes resampling, ROI extraction, N4 bias correction, and other data preparation steps
  • Parallel processing: Efficient batch processing with progress tracking
  • Data exploration tools: Interactive analysis and visualization of medical imaging datasets
  • nnU-Net compatibility: Automatic data structuring following nnU-Net conventions
  • Zonal segmentation: Support for peripheral zone (PZ) and transition zone (TZ) segmentation

Supported Datasets

Dataset Description Cases Modalities Link
PICAI PI-CAI Challenge dataset with expert annotations for prostate cancer detection 1,500 public cases T2w, DWI, ADC pi-cai.grand-challenge.org
Prostate158 Expert-annotated 3T MRI dataset for anatomical zones and cancer detection 158 cases T2w, DWI, ADC GitHub
Medical Decathlon Task05 Prostate segmentation task from Medical Segmentation Decathlon 48 cases T2w, ADC medicaldecathlon.com

Installation

Prerequisites

  • Python 3.8 or higher
  • CUDA-compatible GPU (for training, preprocessing can be done in CPU in reasonable time)

Dependencies Installation

  1. Clone the repository:
git clone https://github.com/your-username/mri-prostate-segmentation-deeplearning.git
cd mri-prostate-segmentation-deeplearning
  1. Install required packages:
pip install -r requirements.txt

Core Dependencies

  • monai: Medical imaging AI framework
  • nibabel: NIfTI file reading and writing
  • simpleitk: Medical image processing and analysis
  • matplotlib: Data visualization
  • numpy: Numerical computing
  • pandas: Data analysis and manipulation
  • ipywidgets: Interactive Jupyter notebook widgets

Configuration

Dataset Setup

  1. Download datasets using the provided script:
./download.sh

This downloads the PICAI dataset folds from Zenodo. You might want to download the rest of the datasets as well.

  1. Configure data paths in the preprocessing scripts:
# Update paths in loadingData/build_data_*.py
RAW_DATA_ROOT = "/path/to/your/datasets"
OUT_ROOT = "/path/to/output/nnUNet_raw/"
  1. Set up output directories for nnU-Net format:
mkdir -p /path/to/output/nnUNet_raw/Dataset001_picai/{imagesTr,labelsTr}

Preprocessing Parameters

Key parameters can be configured in the build scripts:

# Resampling spacing (x, y, z) in mm
spacing = (0.5, 0.5, 3.0)

# ROI crop factor (0.6 = 60% crop around center)
crop_factor = 0.65

# Label value swapping for consistency across datasets
bool_swap_mask_values = False

Usage

Quick Start

  1. Data Exploration:
# Open Jupyter notebooks for dataset analysis
jupyter notebook exploratoryAnalysis/explore_picai.ipynb
  1. Dataset Preprocessing:
# Preprocess PICAI dataset
python loadingData/build_data_picai.py

# Preprocess other datasets
python loadingData/build_data_decathlon.py
python loadingData/build_data_158.py
  1. Preprocessing Pipeline Testing:
# Test individual preprocessing functions
python preprocessing/TestPreprocessing.py

Detailed Workflow

1. Dataset Exploration

Use the DataAnalyzer class to explore datasets. Example:

from exploratoryAnalysis.DataAnalyzer import DataAnalyzer

# Initialize analyzer
analyzer = DataAnalyzer("/path/to/datasets")
analyzer.regex = ".*_t2w.mha$"  # Filter for T2-weighted images

# Collect metadata
df = analyzer.collect_metadata_from_subdirs("picai_folds/picai_images_fold0")

# Visualize images
analyzer.show_image("path/to/image.mha", save="output.png")

# Generate intensity histograms
analyzer.image_intensity_histogram("path/to/image.mha", plot=True)

2. Preprocessing Pipeline

Create custom preprocessing pipelines. Example:

from preprocessing.Pipeline import Pipeline
from preprocessing.PreProcessor import *

# Initialize pipeline
pipeline = Pipeline()

# Add preprocessing steps
pipeline.add(load_image) \
        .add(resample_image, interpolator=sitk.sitkLinear, out_spacing=(0.5, 0.5, 3.0)) \
        .add(get_region_of_interest, crop=0.65) \
        .add(n4_bias_field_correction) \
        .add(normalize_image, method="minmax")

# Process images
processed_images = pipeline.run(image_paths, parallel=True, max_workers=4)

3. Batch Processing

Process large datasets efficiently. Example:

# Process image-label pairs in parallel
from preprocessing.Utils import preprocess_pairs_parallel, save_pairs_parallel

# Define pipelines for images and labels
img_pipeline = Pipeline().add(load_image).add(resample_image)
lbl_pipeline = Pipeline().add(load_image).add(resample_image, interpolator=sitk.sitkNearestNeighbor)

# Process pairs
paired_results = preprocess_pairs_parallel(
    list(zip(image_paths, label_paths)), 
    img_pipeline, 
    lbl_pipeline, 
    workers=8
)

# Save results
out_images, out_labels = save_pairs_parallel(
    paired_results, 
    output_image_paths, 
    output_label_paths, 
    workers=8
)

Available Preprocessing Functions

  • load_image(path): Load medical images with 4D→3D conversion if needed
  • resample_image(image, out_spacing, interpolator): Resample to target spacing
  • get_region_of_interest(image, crop): Extract ROI around prostate
  • n4_bias_field_correction(image): Remove MRI bias field artifacts
  • normalize_image(image, method): Z-score or min-max normalization
  • combine_zonal_masks(mask, pz_val, tz_val): Combine PZ and TZ masks to form a WG mask
  • swap_zonal_mask_values(mask, val1, val2): Swap mask label values
  • reorient_image(image, orientation): Reorient to standard orientation

Project Structure

.
├── exploratoryAnalysis/          # Dataset exploration and analysis
│   ├── DataAnalyzer.py           # Main analysis class
│   ├── explore_picai.ipynb       # PICAI dataset exploration
│   ├── explore_decathlon.ipynb   # Decathlon dataset exploration
│   └── explore_158.ipynb         # Prostate158 dataset exploration
├── loadingData/                  # Dataset preprocessing scripts
│   ├── build_data_picai.py       # PICAI preprocessing
│   ├── build_data_decathlon.py   # Decathlon preprocessing
│   ├── build_data_158.py         # Prostate158 preprocessing
│   └── data.template.json        # nnU-Net metadata template
├── preprocessing/                # Core preprocessing modules
│   ├── Pipeline.py               # Preprocessing pipeline framework
│   ├── PreProcessor.py           # Individual preprocessing functions
│   ├── Utils.py                  # Utility functions
│   └── TestPreprocessing.py      # Visual testing functions
├── requirements.txt              # Python dependencies
├── download.sh                   # Dataset download script
└── README.md                     # This file

Training

A training script is provided to train a U-Net model on the preprocessed datasets.

The final training of the model, however, was done using the nnU-Net framework, which is highly recommended for its automated training and evaluation capabilities. The 3D-fullres U-Net architecture was used, which is suitable for volumetric data like MRI. Five folds of the PICAI dataset were used for training, with the model trained on the T2-weighted images and corresponding masks, and tested on a holdout set.

The parameters reported by the nnU-Net training script were as follows:

  • Architecture: 3D-fullres U-Net

  • Epochs: 1000

  • Batch size: 2

  • Learning rate: 0.01

  • Optimizer: SGD

  • Loss function: DiceCE (RobustCrossEntropyLoss + MemoryEfficientSoftDiceLoss)

    • As defined in the nnU-Net paper, the formulation is a combination of cross-entropy and Dice loss, formulated as follows:
$$L_{total} = L_{dice} + L_{CE}$$ $$L_{dice} = -\frac{2}{|K|} \sum_{k \in K} \frac{\sum_{i \in I} u_i^k v_i^k}{\sum_{i \in I} u_i^k + \sum_{i \in I} v_i^k}$$

where $u$ is the softmax output of the network and $v$ is a one hot encoding of the ground truth segmentation map.

  • Weight decay: 3e-05
  • Activation: LeakyReLU
  • Patch size: 24 × 256 × 256

Results

Results from the nnU-Net training on the PICAI dataset showed promising performance, with the model achieving competitive Dice scores across different anatomical zones similar to state-of-the-art methods, which stand at around 0.90 for the whole gland, 0.87 for the transition zone (TZ), and 0.79 for the peripheral zone (PZ).

The metrics reported where calculated on a validation set on each one of the five folds of the PICAI dataset, and are as follows:

We can see the global overview of the results on the validation set for all folds. We see a difference in performance between the PZ and TZ, since the latter is easier to segment than the former.

The dice score formulation is as follows:

$$\text{Dice} = \frac{2 \cdot |A \cap B|}{|A| + |B|}$$

where $A$ is the predicted mask and $B$ is the ground truth mask.

IoU (Intersection over Union) is also reported, which is defined as:

$$\text{IoU} = \frac{|A \cap B|}{|A \cup B|}$$

True positive (TP), false positive (FP), and false negative (FN) counts are also provided for each fold, and where calculated voxel-wise, meaning that the TP count is the number of voxels that are correctly predicted as part of the prostate zone, FP is the number of voxels that are incorrectly predicted as part of the prostate zone, and FN is the number of voxels that are part of the prostate zone but were not predicted as such.

global overview

The per-case results show the Dice scores distribution across all cases in the validation sets. percase

A fold comparison is provided (the third fold is missing as it wasn't reported in the nnU-Net training):

fold comparison

The rest of the datasets are to be tested.

Best case example (dice score ~0.98)

Prediction

best case example

Ground Truth

best case example ground truth

Worst case example (dice score ~0.7)

worst case example

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/new-feature)
  3. Commit your changes (git commit -am 'Add new feature')
  4. Push to the branch (git push origin feature/new-feature)
  5. Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments


Citations

For training, the annotations derived from Bosma et al. where used. https://grand-challenge.org/algorithms/prostate-segmentation/

@article{PICAI_Study_design, author={Anindo Saha AND Jasper J. Twilt AND Joeran S. Bosma AND Bram van Ginneken AND Derya Yakar AND Mattijs Elschot AND Jeroen Veltman AND Jurgen Fütterer AND Maarten de Rooij AND Henkjan Huisman}, title={{Artificial Intelligence and Radiologists at Prostate Cancer Detection in MRI: The PI-CAI Challenge (Study Protocol)}}, year={2022}, doi={10.5281/zenodo.6667655} }

About

Prostate segmentation of T2 weighted Magnetic Resonance Images using Deep Learning methods with full data acquisition and preprocessing pipeline.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published