A generative model is a type of machine learning model that can generate new data that is similar to the data it has been trained on 12. This tutorial will give you a short introduction to diffusion models, which are a specific type of generative model.
A diffusion model is based on the idea of incrementally inverting a forward process of adding noise to data. In this forward process, the input data
We will use denoising-diffusion-pytorch which is a diffusion model library written by lucidrains
. For training we will use lightning.
First, clone and cd
into the repository:
git clone https://github.com/weigertlab/diffusion_model_tutorial.git
cd diffusion_model_tutorial
If using BARD, simply open the notebook regularly in VSCode and choose the kernel diffusion
.
Otherwise all the setup, including environment creation as well as data downloading, can be done by simply running this command:
source setup.sh
Generative models are typically trained on a dataset of (unlabeled) images. In this tutorial we will use two example datasets, showing images of
- flywing membrane, or
- dual color zebrafish retina.
Please choose one of the two datasets to train your own models. If you want to train a model on your own data, you can skip these steps. Note that the 2D images are given as a npz
file, which is a compressed numpy array. For your custom data, you could as well use a folder with tiff files.
1. Flywing membrane3
2. Dual color zebrafish retina4
Please see diffusion.ipynb to see how to train a diffusion model on one of the two datasets.
Footnotes
-
Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising diffusion probabilistic models." Advances in neural information processing systems 33 (2020): 6840-6851. ↩
-
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-Based Generative Modeling through Stochastic Differential Equations. arXiv preprint arXiv:2011.13456, 2020. ↩
-
Prakash, M., Buchholz, T.-O., Schmidt, D., Krull, A., & Jug, F. (2020). Flywing (noise 0) dataset for microscopy image denoising and segmentation benchmark as used in DenoiSeg paper. Zenodo dataset ↩
-
Martin Weigert, Uwe Schmidt, Tobias Boothe, Andreas Müller, Alexandr Dibrov, Akanksha Jain, Benjamin Wilhelm, Deborah Schmidt, Coleman Broaddus, Sian Culley, Mauricio Rocha-Martins, Fabián Segovia-Miranda, Caren Norden, Ricardo Henriques, Marino Zerial, Michele Solimena, Jochen Rink, Pavel Tomancak, Loic Royer, Florian Jug, Eugene W. Myers. Content Aware Image Restoration: Pushing the Limits of Fluorescence Microscopy. Dataset ↩