Skip to content

omidvarnia/dpg2026_mlip_tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tutorial on ML-Based Image Processing for Electrochemical Applications

A hands-on Jupyter notebook tutorial for machine-learning-based image processing of electrochemical and scientific microscopy data, developed for the DPG 2026, AKPIK session, Dresden, March 2026.


Table of Contents

  1. Introduction
  2. Credits / Contributors
  3. References
  4. Tutorial Notebooks
  5. Dependencies
  6. Getting Started
  7. License

Introduction

DPG2026 is a tutorial repository built around Jupyter notebooks, covering the full machine-learning pipeline for scientific image analysis — from raw data to segmented results.

The tutorial is structured around three main themes:

  • Image Preprocessing — normalization, denoising, contrast adjustment, edge detection, binarization, and morphological operations.
  • Synthetic Image Generation — classical augmentation, physics-based synthesis, DCGAN-based generation, and Stable Diffusion-based image-to-image synthesis.
  • Image Segmentation — U-Net training and inference, Segment Anything Model (SAM v1 & v2), NASA MicroNet, and particle tracking with TrackPy.

The tutorial targets electrochemical imaging applications (e.g., SEM, EBC, battery-material microscopy) but the methods generalize to any scientific imaging domain.

Event: DPG 2026, AKPIK session — Dresden, March 2026.


Credits / Contributors

Name Affiliation Contact
Amir Omidvarnia Forschungszentrum Jülich a.omidvarnia@fz-juelich.de
Simone Koecher Forschungszentrum Jülich s.koecher@fz-juelich.de
Mobina Azimi Forschungszentrum Jülich m.azimi@fz-juelich.de

References

Datasets

Libraries


Tutorial Notebooks

The tutorial is split into three notebook groups, intended to be followed in order:

1. notebooks/preprocess/ — Image Preprocessing

Notebook Description
preprocessing_basics.ipynb Fundamentals of digital image processing: normalization, denoising, contrast enhancement, edge detection, binarization, morphology, and connected-component labeling.

2. notebooks/synth_data/ — Synthetic Data Generation

Notebook Description
example_Aug.ipynb Classical augmentation: geometric and intensity transforms to expand labeled datasets.
example_PB.ipynb Physics-based synthesis: combining real backgrounds with parameterized particle models.
example_DCGAN.ipynb Deep Convolutional GAN (DCGAN) for generating realistic synthetic microscopy images.
example_SDiff.ipynb Stable Diffusion image-to-image synthesis for creating new realistic sample variations.

3. notebooks/segmentation/ — Image Segmentation & Tracking

Notebook Description
example_unet.ipynb Train and evaluate a U-Net segmentation model on synthetic/real pairs.
example_sam1.ipynb Zero-shot and prompt-based segmentation using SAM v1.
example_sam2.ipynb Segmentation in images and video using SAM 2.
example_nasa_micronet.ipynb Apply NASA MicroNet pretrained models to electrochemical microscopy data.
example_trackpy.ipynb Particle detection, trajectory linking, drift correction, and motion analysis using TrackPy.

How notebooks use YAML configuration and output folders

Most tutorial notebooks share a common configuration and directory layout driven by a single YAML file and a small helper module:

  • Central configuration: All generation and segmentation notebooks load parameters from a YAML file at the repository root (tutorial_parameters.yaml) using ConfigLoader from src/synth_data_module. Typical patterns look like:

    • config_path = repo_root / 'tutorial_parameters.yaml'
    • config = ConfigLoader(config_path)
    • Scalars and paths are then accessed as dictionary-style keys, e.g. config['pretrained_models_dir'] or config.get_dataset_params('EBC1').
  • Synthetic data notebooks (notebooks/synth_data/*):

    • ConfigLoader and PreparationManager read dataset- and method-specific blocks (e.g. PB, Aug, DCGAN, SDiff) from the YAML file.
    • New folders for preprocessed images, masks, and generated synthetic data are created under a configurable base directory (e.g. inside preprocessed_data/ and method-specific subfolders such as PB_EBC1, PB_SEM, etc.).
    • For each method, the notebooks initialise a PreparationManager and a SynthDataGenerator with repo_root, dataset, and method_name; these classes internally create and manage:
      • training and validation image/mask directories,
      • synthetic image and binary-mask output folders,
      • optional labelled-mask and augmentation folders (for advanced workflows).
  • Segmentation notebooks (notebooks/segmentation/*):

    • All segmentation notebooks (U-Net, SAM1, SAM2, NASA MicroNet, TrackPy) load the same tutorial_parameters.yaml via ConfigLoader to obtain shared paths for models and pretrained weights.
    • From this YAML, they derive repository-relative output locations, for example:
      • pretrained_model_dir = os.path.join(repo_root, config['pretrained_models_dir'], 'segment_anything1_META')
      • pretrained_model_dir = os.path.join(repo_root, config['pretrained_models_dir'], 'segment_anything2_META')
      • pretrained_model_dir = os.path.join(repo_root, config['pretrained_models_dir'], 'NASA_Micronet')
    • Each notebook then creates additional experiment-specific folders as needed, such as:
      • U-Net: output_dir for saving training curves, predictions, and masks.
      • SAM1/SAM2: model checkpoint directories under pretrained_models_dir for downloaded .pth/.pt files.
      • NASA MicroNet: model snapshots and fine-tuned checkpoints under the NASA-specific subfolder.

In practice, this design means you only have to edit the YAML file once (for data paths, model roots, and high-level hyperparameters); all notebooks pick up consistent settings and write their outputs into predictable, method-specific subdirectories under preprocessed_data/, Pretrained_models/, and synthetic data folders.


Dependencies

Python 3.11

The tutorial targets Python 3.11. Key dependencies include:

Full dependency list: requirements.txt


Getting Started

1. Download and Install VS Code

2. Install uv Package Manager

uv is a fast Python package installer and resolver.

Linux / macOS:

curl -LsSf https://astral.sh/uv/install.sh | sh

Windows:

powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

Verify:

uv --version

3. Clone the Repository

git clone https://jugit.fz-juelich.de/iet-1/dpg2026_mlip_tutorial
.git
cd dpg2026_mlip_tutorial

4. Create a Virtual Environment with Python 3.11

uv venv .dpg2026 --python 3.11

5. Activate the Environment

Linux / macOS:

source .dpg2026/bin/activate

Windows:

.dpg2026\Scripts\activate

6. Install Dependencies

uv pip install --upgrade pip
uv pip install -r requirements.txt

Note on NASA MicroNet: Install manually with:

pip install git+https://github.com/nasa/pretrained-microscopy-models

Note on TrackPy: If you encounter compatibility issues, install from source:

uv pip install https://github.com/soft-matter/trackpy/archive/master.zip

7. Install the DPG2026 Package

pip install -e .

8. Configure VS Code

  1. Open the Command Palette: Ctrl+Shift+P (macOS: Cmd+Shift+P)
  2. Select "Python: Select Interpreter"
  3. Choose ./.dpg2026/bin/python (or browse to the path)

To set the Jupyter kernel:

  1. Open a .ipynb notebook in VS Code
  2. Click "Select Kernel" (top-right corner)
  3. Choose "Python Environments" → select .dpg2026

9. Launch Jupyter (Optional)

To run notebooks outside VS Code:

source .dpg2026/bin/activate
jupyter notebook --no-browser --ip=0.0.0.0 --port=8888

Notes for Clusters with AMD GPUs

Some clusters may use AMD GPUs. In this case, load the ROCm module before starting Python:

module purge
module load rocm
module load rocm/6.4

Install tensorflow-rocm instead of standard TensorFlow:

pip install tensorflow-rocm

For PyTorch with ROCm support:

pip install --index-url https://download.pytorch.org/whl/rocm6.1 torch torchvision torchaudio

Notes for Clusters with NVIDIA GPUs

Ensure the NVIDIA CUDA toolkit is loaded:

module load cuda

Use the standard packages:

pip install tensorflow

License

GNU GPL v3.0

This project is licensed under the GNU General Public License v3.0.

See the full license text at: https://www.gnu.org/licenses/gpl-3.0.html

About

Tutorial on machine learning-based image processing methods for electrochemical applications. Presented at DPG 2026 (AKPIK Section, Dresden).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors