You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<h2>Part 2 – Implementing the UNet from scratch</h2>
926
+
927
+
Now that we know how we can generate images with the help of a UNet in a denoising model, we will go through implementing one from scratch. More specifically, we will be attempting to generate digits similar to those in the MNIST dataset from pure noise using a denoising UNet that we will create.
928
+
929
+
<h3>Training an Unconditioned UNet</h3>
930
+
931
+
The most basic denoiser is a one-step denoiser. Formally, given a noisy image <code>z</code>, we aim to train a denoiser <code>D<sub>θ</sub>(z)</code> that can map it to a clean image <code>x</code>. To do this, we can optimize over the L<sup>2</sup> loss E<sub>z,x</sub>||z - x||<sup>2</sup> while training.<br>
932
+
933
+
<br>To create a noisy image, we can use the process z = x + σε where σ ∈ [0, 1] and ε ~ 𝒩(0, 1). Here, 𝒩 is the standard normal distribution. To visualize the kind of images this process will result in below is an example of an MNIST digit with progressively more noise as σ gradually increases from 0 to 1:
934
+
935
+
<divclass="image-row">
936
+
<figure>
937
+
<imgsrc="images/unet/00.png" alt="00.png" />
938
+
<figcaption>σ = 0.0</figcaption>
939
+
</figure>
940
+
<figure>
941
+
<imgsrc="images/unet/02.png" alt="02.png" />
942
+
<figcaption>σ = 0.2</figcaption>
943
+
</figure>
944
+
<figure>
945
+
<imgsrc="images/unet/04.png" alt="04.png" />
946
+
<figcaption>σ = 0.4</figcaption>
947
+
</figure>
948
+
<figure>
949
+
<imgsrc="images/unet/05.png" alt="05.png" />
950
+
<figcaption>σ = 0.5</figcaption>
951
+
</figure>
952
+
<figure>
953
+
<imgsrc="images/unet/06.png" alt="06.png" />
954
+
<figcaption>σ = 0.6</figcaption>
955
+
</figure>
956
+
<figure>
957
+
<imgsrc="images/unet/08.png" alt="08.png" />
958
+
<figcaption>σ = 0.8</figcaption>
959
+
</figure>
960
+
<figure>
961
+
<imgsrc="images/unet/02.png" alt="10.png" />
962
+
<figcaption>σ = 1.0</figcaption>
963
+
</figure>
964
+
</div>
965
+
966
+
To start building the model, we will be using the following architecture:
where <code>D</code> is the number of hidden dimensions.
975
+
976
+
<h4>Training hyperparameters</h4>
977
+
For the hyperparameters, we will be using a batch size of 256, a learning rate of 1e-4, a hidden dimension of 128, the Adam optimizer with the given learning rate, and a training time of 5 epochs. A fixed noise level of σ = 0.5 will be used to noise the training images.
978
+
979
+
<h4>Evaluation results</h4>
980
+
After the model is trained, below is the training loss curve, where the loss of the model is plotted for every batch processed:
We can see that the model performs decently well. To illustrate its effectiveness on images noised with different levels of σ below is the model after the 5th epoch denoising the same image with different levels of noise for σ ∈ [0.0, 0.2, 0.4, 0.5, 0.6, 0.8, 1.0]:
Although the model is decent at removing noise from images, our goal is to generate digits from pure noise. This proves to be an issue because with MSE loss, the model will learn to predict the image that minimizes the sum of its squared distance to all other training images. Because pure noise is the input to the model for any given training image, the result is an average of all digits in training set. This is illustrated in the following inputs and the output of the model after the 1st and 5th epoch:
0 commit comments