Skip to content

Commit 970581a

Browse files
authored
Update proj5.html
1 parent 96c9864 commit 970581a

File tree

1 file changed

+19
-5
lines changed

1 file changed

+19
-5
lines changed

project-5/proj5.html

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -919,7 +919,7 @@ <h2>Part 1.9 – Hybrid Images</h2>
919919
</section>
920920

921921
<!-- ========================================================= -->
922-
<!-- Part 2.0: Noising Process Visualization -->
922+
<!-- Part 2.0: Flow Matching from Scratch -->
923923
<!-- ========================================================= -->
924924
<section id="part-2-1">
925925
<h2>Part 2 – Implementing the UNet from scratch</h2>
@@ -1003,7 +1003,7 @@ <h4>Evaluation results</h4>
10031003
<h4>Limitations on pure noise</h4>
10041004
Although the model is decent at removing noise from images, our goal is to generate digits from pure noise. This proves to be an issue because with MSE loss, the model will learn to predict the image that minimizes the sum of its squared distance to all other training images. To illustrate this issue, we will feed the model a pure noise sample <code>z</code> ~ &Nscr;(0, &#119816;) on all training inputs <code>x</code>, and because <code>z</code> contains no information about <code>x</code>, the result is an average of all digits in the training set.
10051005

1006-
As a result, while the training loss curve shows not much difference:
1006+
As a result, while the training loss curve shows not much suspect:
10071007
<div align="center">
10081008
<figure>
10091009
<img src="images/unet/123_training_curve.png" alt="123_training_curve.png" />
@@ -1022,7 +1022,7 @@ <h4>Limitations on pure noise</h4>
10221022
<h4>The Flow Matching Model</h4>
10231023
Instead of trying to denoise the image in a single step, we aim to iteratively denoise the image, similar to how we do so in the sampling loops using DeepFloyd's noise coefficients. To do this, we will start by interpolating how intermediate noise samples are constructed. The simplest approrach is to use linear interpolation, namely let the intermediate sample be <code>x<sub>t</sub></code> = (1 - <code>t</code>)x<sub>0</sub> + <code>tx<sub>1</sub></code> for a given <code>t</code> &isin; [0, 1], where <code>x<sub>0</sub></code> is the noise and <code>x<sub>1</sub></code> is the clean image.<br>
10241024

1025-
<br>Now that we have an equation relating a clean image with any pure noise sample, we can train our model to learn the <strong>flow</strong>, or the change with respect to <code>t</code> for any given <code>x<sub>t</sub></code>. This produces a vector field across all images, where the velocity for each is d/dt <code>x<sub>t</sub></code> = <code>x<sub>1</sub></code> - <code>x<sub>0</sub></code>. Therefore, if we can predict <code>x<sub>1</sub></code> - <code>x<sub>0</sub></code> for any given <code>t</code> and <code>x<sub>t</sub></code>, we can go along the path traced out by the vector field and arrived at somewhere near the manifold of clean images. This technique is known as a <strong>flow matching model</strong>, and with the model trained, we can numerically integrate a random noise sample <code>x<sub>0</sub></code> with a set number of iterations, and get our clean image <code>x<sub>1</sub></code>.
1025+
<br>Now that we have an equation relating a clean image with any pure noise sample, we can train our model to learn the <strong>flow</strong>, or the change with respect to <code>t</code> for any given <code>x<sub>t</sub></code>. This produces a vector field across all images, where the velocity for each is d/dt <code>x<sub>t</sub></code> = <code>x<sub>1</sub></code> - <code>x<sub>0</sub></code>. Therefore, if we can predict <code>x<sub>1</sub></code> - <code>x<sub>0</sub></code> for any given <code>t</code> and <code>x<sub>t</sub></code>, we can go along the path traced out by the vector field and arrived at somewhere near the manifold of clean images. This technique is known as a <strong>flow matching model</strong>, and with the model trained, we can numerically integrate a random noise sample <code>x<sub>0</sub></code> with a set number of iterations using Euler's method, and get a clean image <code>x<sub>1</sub></code>.
10261026

10271027
<h4>Training a Time-Conditioned UNet</h4>
10281028
To add time conditioning to our UNet, we will make the following changes to our model architecture:
@@ -1034,10 +1034,24 @@ <h4>Training a Time-Conditioned UNet</h4>
10341034
</div>
10351035

10361036
<h4>Flow Matching Hyperparameters</h4>
1037-
For the hyperparameters, we will be using a batch size of 64, a learning rate of <code>1e-2</code>, a hidden dimension of 64, the Adam optimizer with the given learning rate, a exponential learning rate decay scheduler with &gamma; = 0.1<sup>(1.0 / <code>num_epochs</code>)</sup>, and a training time of 10 epochs. To advance the scheduler, we will call <code>scheduler.step()</code> at the end of each training epoch.
1037+
For the hyperparameters, we will be using a batch size of 64, a learning rate of <code>1e-2</code>, a hidden dimension of 64, the Adam optimizer with the given learning rate, a exponential learning rate decay scheduler with &gamma; = 0.1<sup>(1.0 / <code>num_epochs</code>)</sup>, a sampling iteration count of <code>T</code> = 50, and a training time of 10 epochs. To advance the scheduler, we will call <code>scheduler.step()</code> at the end of each training epoch.
10381038

10391039
<h4>Forward and Sampling Operations</h4>
1040-
To train our model, for each clean image <code>x<sub>1</sub></code> we will generate <code>x<sub>0</sub></code> &isin; &Nscr;(0, &#119816;) and <code>t</code> &isin; U([0, 1]), where U is the uniform distribution. After computing <code>x<sub>t</sub></code> = (1 - <code>t</code>)x<sub>0</sub> + <code>tx<sub>1</sub></code>, we will feed <code>x<sub>t</sub></code> and <code>t</code> into our UNet and compute the loss of unet(<code>x<sub>t</sub></code>, <code>t</code>) and <code>x<sub>1</sub></code> - <code>x<sub>0</sub></code>. Below is its loss curve:
1040+
To train our model, for each clean image <code>x<sub>1</sub></code> we will generate <code>x<sub>0</sub></code> &isin; &Nscr;(0, &#119816;) and <code>t</code> &isin; U([0, 1]), where U is the uniform distribution. After computing <code>x<sub>t</sub></code> = (1 - <code>t</code>)x<sub>0</sub> + <code>tx<sub>1</sub></code>, we will feed <code>x<sub>t</sub></code> and <code>t</code> into our UNet and compute the loss of u<sub>&theta;</sub>(<code>x<sub>t</sub></code>, <code>t</code>) and <code>x<sub>1</sub></code> - <code>x<sub>0</sub></code>. Below is the new model's training loss curve:
1041+
<div align="center">
1042+
<figure>
1043+
<img src="images/unet/21_training_curve.png" alt="21_training_curve.png" />
1044+
</figure>
1045+
</div>
1046+
1047+
When sampling from the model, we will simply generate a random <code>x<sub>0</sub></code> &isin; &Nscr;(0, &#119816;), and for every iteration <code>i</code> from 1 to <code>T</code>, we will compute <code>x<sub>0</sub></code> = <code>x<sub>0</sub></code> + (1 / <code>T</code>)u<sub>&theta;</sub>(<code>x<sub>t</sub></code>, <code>t</code>), where <code>t</code> = <code>i</code> / <code>T</code>. The following are the results ofthe 1st, 5th, and 10th epoch:
1048+
<div align="center">
1049+
<figure>
1050+
<img src="images/unet/21_visualization.png" alt="21_visualization.png" />
1051+
</figure>
1052+
</div>
1053+
1054+
Although the results are not perfect, the improvements starting from the 1st epoch up to the 10th are already noticeable.
10411055
</section>
10421056

10431057
</body>

0 commit comments

Comments
 (0)