Skip to content

Commit 4d3dda8

Browse files
authored
Update proj5.html
1 parent 16e9ead commit 4d3dda8

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

project-5/proj5.html

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
<html lang="en">
33
<head>
44
<meta charset="UTF-8" />
5-
<title>Project 5: Fun with Diffusion Models</title>
5+
<title>CS180 Project 5</title>
66
<style>
77
body {
88
font-family: system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif;
@@ -1003,7 +1003,7 @@ <h4>Evaluation results</h4>
10031003
<h4>Limitations on pure noise</h4>
10041004
Although the model is decent at removing noise from images, our goal is to generate digits from pure noise. This proves to be an issue because with MSE loss, the model will learn to predict the image that minimizes the sum of its squared distance to all other training images. To illustrate this issue, we will feed the model a pure noise sample <code>z</code> ~ &Nscr;(0, &#119816;) on all training inputs <code>x</code>, and because <code>z</code> contains no information about <code>x</code>, the result is an average of all digits in the training set.
10051005

1006-
As a result, while the training loss curve shows not much suspect:
1006+
As a result, while the training loss curve does not show much suspect:
10071007
<div align="center">
10081008
<figure>
10091009
<img src="images/unet/123_training_curve.png" alt="123_training_curve.png" />
@@ -1037,7 +1037,7 @@ <h4>Flow Matching Hyperparameters</h4>
10371037
For the hyperparameters, we will be using a batch size of 64, a learning rate of <code>1e-2</code>, a hidden dimension of 64, the Adam optimizer with the given learning rate, a exponential learning rate decay scheduler with &gamma; = 0.1<sup>(1.0 / <code>num_epochs</code>)</sup>, a sampling iteration count of <code>T</code> = 300, and a training time of 10 epochs. To advance the scheduler, we will call <code>scheduler.step()</code> at the end of each training epoch.
10381038

10391039
<h4>Embedding <code>t</code> in the UNet</h4>
1040-
To embed <code>t</code> in the UNet, we will multiply the <code>unflat</code> and <code>firstUpBlock</code> tensors (the result after applying the <strong>Unflatten</strong> and the first <strong>UpConv</strong> operations respectively) by <code>fc1_t</code> and <code>fc2_t</code>. <code>fc1_t</code> and <code>fc2_t</code> are the result of passing <code>t</code> through the first and second FCBlock, where the first produces a tensor with twice the number of hidden dimensions, while the second has the same number of hidden dimensions as the first and last ConvBlock (i.e. the first and second result each has 2D and D channels). In pesudocode:
1040+
To embed <code>t</code> in the UNet, we will multiply the <code>unflat</code> and <code>firstUpBlock</code> tensors (the result after applying the <strong>Unflatten</strong> and the first <strong>UpConv</strong> operations respectively) by <code>fc1_t</code> and <code>fc2_t</code>. <code>fc1_t</code> and <code>fc2_t</code> are the result of passing <code>t</code> through the first and second FCBlock, where the first produces a tensor with twice the number of hidden dimensions, while the second has the same number of hidden dimensions as the first and last ConvBlock (i.e. the first and second result each has 2D and D channels). In pseudocode:
10411041
<pre><code>unflat_cond = unflat * fc1_t
10421042
firstUpBlock_cond = firstUpBlock * fc2_t</code></pre>
10431043

@@ -1059,20 +1059,20 @@ <h4>Time-Conditioned Forward and Sampling Operations</h4>
10591059
Although the results are not perfect, the improvements starting from the 1st epoch up to the 10th are already noticeable.
10601060

10611061
<h4>Adding Class-Conditioning to Time-Conditioned UNet</h4>
1062-
To make more improvements to our image generation, we can condition our UNet on the class of digits 0-9. This require adding an additional FCBlock for each time condition, where the class vector <code>c</code> is a one-hot vector. To ensure that the UNet would still work without conditioning on the class (in order to implement CFG later), we will set a dropout rate <code>p<sub>uncond</sub></code> of 0.1, in which we set the one-hot vector of <code>c</code> to all 0s.
1062+
To make more improvements to our image generation, we can condition our UNet on the class of digits 0-9. This requires adding an additional FCBlock for each time condition, where the class vector <code>c</code> is a one-hot vector. To ensure that the UNet would still work without conditioning on the class (in order to implement CFG later), we will set a dropout rate <code>p<sub>uncond</sub></code> of 0.1, in which we set the one-hot vector of <code>c</code> to all 0s.
10631063

10641064
<h4>Embedding <code>c</code> and <code>t</code> in the UNet</h4>
10651065
To embed <code>c</code> <code>t</code> in the UNet, we will use 2 additional FCBlocks to convert the label (<code>c</code>) into 2 tensors <code>fc1_c</code> and <code>fc2_c</code>, each with the same number of hidden dimensions as <code>fc1_t</code> and <code>fc2_t</code> respectively. Then, instead of multiplying the intermediate blocks by the time tensor, we will instead do:
10661066
<pre><code>unflat_cond_class = unflat * fc1_c + fc1_t
10671067
firstUpBlock_cond = firstUpBlock * fc2_c + fc2_t</code></pre>
10681068

1069-
The last step is to zero out the class one-hot vectors at the dropout rate, which we can implement efficiently by using a mask of the same length as the batch size. We can than multiply it with the batch of one-hot vectors to zero out any vector that is the <code>i</code>-th in the batch if <code>mask[i] = 0</code>.
1069+
The last step is to zero out the class one-hot vectors at the dropout rate, which we can implement efficiently by using a mask of the same length as the batch size. We can then multiply the batch of one-hot vectors with it to zero out any vector that is the <code>i</code>-th in the batch if <code>mask[i] = 0</code>.
10701070

10711071
<h4>Class Conditioning Hyperparameters</h4>
1072-
Because class conditioning converges fast, we will use the same number of training epochs as time conditioning, which is 10. A guidance scale of &gamma; = 5 will be used in the CFG part. The same hyperparamters as the Time-Conditioned UNet will be used for the relevant parts.
1072+
Because class conditioning converges fast, we will use the same number of training epochs as time conditioning, which is 10. A guidance scale of &gamma; = 5 will be used in the CFG part. The same hyperparameters as the Time-Conditioned UNet will be used for the relevant parts.
10731073

10741074
<h4>Class-Conditioned Forward and Sampling Operations</h4>
1075-
The forward will be very similar to the Time-Conditioned UNet, except to compute the loss, we will also input the training image's label into the model, along with a mask of 1s and 0s with 0 probability <code>p<sub>uncond</sub></code>. The training loss curve is as follows:
1075+
The forward will be very similar to the Time-Conditioned UNet, except that to compute the loss, we will also input the training image's label into the model, along with a mask of 1s and 0s with 0 probability <code>p<sub>uncond</sub></code>. The training loss curve is as follows:
10761076
<div align="center">
10771077
<figure>
10781078
<img src="images/unet/26_training_curve.png" alt="26_visualization.png" />

0 commit comments

Comments
 (0)