Update proj5.html

cjxthecoder · web-flow · commit c611f4f82514 · 2025-12-16T05:44:58.000-08:00
diff --git a/project-5/proj5.html b/project-5/proj5.html
@@ -918,5 +918,103 @@ <h2>Part 1.9 – Hybrid Images</h2>
 </div>
 </section>
 
+<!-- ========================================================= -->
+<!-- Part 2.0: Noising Process Visualization -->
+<!-- ========================================================= -->
+<section id="part-2-1">
+<h2>Part 2 – Implementing the UNet from scratch</h2>
+
+Now that we know how we can generate images with the help of a UNet in a denoising model, we will go through implementing one from scratch. More specifically, we will be attempting to generate digits similar to those in the MNIST dataset from pure noise using a denoising UNet that we will create.
+
+<h3>Training an Unconditioned UNet</h3>
+
+The most basic denoiser is a one-step denoiser. Formally, given a noisy image <code>z</code>, we aim to train a denoiser <code>D<sub>&theta;</sub>(z)</code> that can map it to a clean image <code>x</code>. To do this, we can optimize over the L<sup>2</sup> loss E<sub>z,x</sub>||z - x||<sup>2</sup> while training.<br>
+
+<br>To create a noisy image, we can use the process z = x + &sigma;&epsilon; where &sigma; &isin; [0, 1] and &epsilon; ~ &Nscr;(0, 1). Here, &Nscr; is the standard normal distribution. To visualize the kind of images this process will result in below is an example of an MNIST digit with progressively more noise as &sigma; gradually increases from 0 to 1:
+
+<div class="image-row">
+<figure>
+<img src="images/unet/00.png" alt="00.png" />
+<figcaption>&sigma; = 0.0</figcaption>
+</figure>
+<figure>
+<img src="images/unet/02.png" alt="02.png" />
+<figcaption>&sigma; = 0.2</figcaption>
+</figure>
+<figure>
+<img src="images/unet/04.png" alt="04.png" />
+<figcaption>&sigma; = 0.4</figcaption>
+</figure>
+<figure>
+<img src="images/unet/05.png" alt="05.png" />
+<figcaption>&sigma; = 0.5</figcaption>
+</figure>
+<figure>
+<img src="images/unet/06.png" alt="06.png" />
+<figcaption>&sigma; = 0.6</figcaption>
+</figure>
+<figure>
+<img src="images/unet/08.png" alt="08.png" />
+<figcaption>&sigma; = 0.8</figcaption>
+</figure>
+<figure>
+<img src="images/unet/02.png" alt="10.png" />
+<figcaption>&sigma; = 1.0</figcaption>
+</figure>
+</div>
+
+To start building the model, we will be using the following architecture:
+<div align="center">
+<figure>
+<img src="images/unet/unconditioned_arch.png" alt="unconditioned_arch.png" />
+<figcaption>Source: <a href="https://cal-cs180.github.io/fa25/hw/proj5/partb.html">CS180</a></figcaption>
+</figure>
+</div>
+
+where <code>D</code> is the number of hidden dimensions.
+
+<h4>Training hyperparameters</h4>
+For the hyperparameters, we will be using a batch size of 256, a learning rate of 1e-4, a hidden dimension of 128, the Adam optimizer with the given learning rate, and a training time of 5 epochs. A fixed noise level of &sigma; = 0.5 will be used to noise the training images.
+
+<h4>Evaluation results</h4>
+After the model is trained, below is the training loss curve, where the loss of the model is plotted for every batch processed:
+<div align="center">
+<figure>
+<img src="images/unet/121_training_curve.png" alt="121_training_curve.png" />
+</figure>
+</div>
+
+The following are the performance of the model after the 1st and 5th epoch on sample test images, all noised with &sigma; = 0.5:
+<div align="center">
+<figure>
+<img src="images/unet/121_visualization.png" alt="121_visualization.png" />
+</figure>
+</div>
+
+We can see that the model performs decently well. To illustrate its effectiveness on images noised with different levels of &sigma; below is the model after the 5th epoch denoising the same image with different levels of noise for &sigma; &isin; [0.0, 0.2, 0.4, 0.5, 0.6, 0.8, 1.0]:
+<div align="center">
+<figure>
+<img src="images/unet/122_visualization.png" alt="122_visualization.png" />
+</figure>
+</div>
+
+<h4>Limitations on pure noise</h4>
+Although the model is decent at removing noise from images, our goal is to generate digits from pure noise. This proves to be an issue because with MSE loss, the model will learn to predict the image that minimizes the sum of its squared distance to all other training images. Because pure noise is the input to the model for any given training image, the result is an average of all digits in training set. This is illustrated in the following inputs and the output of the model after the 1st and 5th epoch:
+<div align="center">
+<figure>
+<img src="images/unet/123_visualization.png" alt="123_visualization.png" />
+</figure>
+</div>
+
+The training loss curve also tells a similar story, as the loss ends up stuck at a certain level:
+<div align="center">
+<figure>
+<img src="images/unet/123_training_curve.png" alt="123_training_curve.png" />
+</figure>
+</div>
+
+To generate plausible-looking digits, we need a different approach than one-step denoising.
+</section>
+
 </body>
 </html>