ML4MIP is a machine learning framework designed for medical image processing tasks such as training, validation, inference, preprocessing, postprocessing, and graph extraction. The project is built around a structured workflow and utilizes the Hydra configuration manager for flexible configuration handling.
- Training: Train deep learning models on medical imaging datasets.
- Validation: Evaluate trained models using validation datasets.
- Preprocessing: Prepare datasets for training and inference.
- Inference: Run inference on new medical images.
- Graph Extraction: Extract graph representations from medical images.
- Postprocessing: Apply filtering and cleanup operations on model outputs.
- MLFlow Integration: Log experiments, model parameters, and metrics.
ML4MIP uses pyproject.toml
for dependency management. To install the project and its dependencies, follow these steps:
# Clone the repository
git clone [email protected]:pvmeng/ML4MIP.git
cd ml4mip
# Create and activate a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
# Install the package
pip install .
Alternatively, if you want to install it in editable mode for development:
pip install -e .
ML4MIP provides a set of command-line scripts for various tasks. You can list all available commands and their parameters using:
train --help
ml_flow_uri: file://${hydra:runtime.cwd}/runs
model_dir: ${hydra:runtime.cwd}/models
model_tag: unet.pt
batch_size: 16
lr: 0.01
num_epochs: 100
model:
model_type: UNETMONAI2
model_path: null
base_model_jit_path: null
checkpoint_path: null
dataset:
train:
data_dir: ${hydra:runtime.cwd}/data/rand_patch_96
mask_dir: ${hydra:runtime.cwd}/data/rand_patch_96
image_affix:
- ''
- .img.nii.gz
mask_affix:
- ''
- .label.nii.gz
transform: PATCH_POS_CENTER
size:
- 96
- 96
- 96
train: true
split_ratio: 0.9
target_pixel_dim:
- 0.35
- 0.35
- 0.5
target_spatial_size:
- 600
- 600
- 280
sigma_ratio: 0.1
pos_center_prob: 0.75
max_samples: null
cache: false
cache_pooling: 0
mask_operation: STD
max_epochs: 40
grouped: true
val:
data_dir: /data/training_data
mask_dir: /data/training_data
image_affix:
- ''
- .img.nii.gz
mask_affix:
- ''
- .label.nii.gz
transform: STD
size:
- 96
- 96
- 96
train: true
split_ratio: 0.9
target_pixel_dim:
- 0.35
- 0.35
- 0.5
target_spatial_size:
- 600
- 600
- 280
sigma_ratio: 0.1
pos_center_prob: 0.75
max_samples: 2
cache: false
cache_pooling: 0
mask_operation: STD
max_epochs: 1
grouped: false
visualize_model: true
visualize_model_val_batches: 1
visualize_model_train_batches: 4
plot_3d: true
extract_graph: false
epoch_profiling_torch: false
epoch_profiling_cpy: false
inference:
mode: SLIDING_WINDOW
sw_size:
- 96
- 96
- 96
sw_batch_size: 4
sw_overlap: 0.25
model_input_size:
- 96
- 96
- 96
loss:
loss_type: CE_DICE
lambda_dice: 1.0
lambda_ce: 0.3
cedice_batch: false
alpha: 0.5
scheduler:
scheheduler_type: LINEARLR
linear_start_factor: 1.0
linear_end_factor: 0.01
linear_total_iters: null
resume_schedule: true
ML4MIP uses Hydra to manage configurations. All parameters can be modified in conf/config.yaml
. Users can override settings directly via the command line:
train batch_size=16 lr=0.01 num_epochs=100
train
validate
inference
preprocessing
extract_graph
postprocessing
For dataset settings and advanced configurations, refer to conf/config.yaml
.
ML4MIP integrates with MLFlow for tracking experiments. Logs, model checkpoints, and evaluation metrics are stored under ml_flow_uri
, which can be set in the configuration.
src/ # Main codebase for pipeline components and workflows
experiments/ # Jupytext notebooks for various experimental analyses