Skip to content

Commit e08330f

Browse files
authored
Refine noise addition and denoising explanations
Updated text and code snippets for clarity and consistency in the explanation of noise addition and denoising processes. Adjusted image references and captions for better presentation.
1 parent e936c1c commit e08330f

File tree

1 file changed

+46
-75
lines changed

1 file changed

+46
-75
lines changed

project-5/index.html

Lines changed: 46 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,20 @@ <h2>Part 1.1 – The forward process</h2>
154154
</figure>
155155
</div>
156156

157-
For the forward function, we can use <code>alphas_cumprod[t]</code> to obtain the noise coefficient at timestamp <code>t</code>, and <code>torch.randn_like</code> to get &epsilon; &isin; [0, 1), allowing us to compute <code>im_noisy</code>. Below are examples of the Campanile at noise timestamps 250, 500, and 750:
157+
To add noise to an image <code>x<sub>0</sub></code>, we can use the forward process and compute
158+
<div class="image-row">
159+
<figure>
160+
<img src="images/forward.png" alt="forward.png" />
161+
</figure>
162+
</div>
163+
164+
for a given timestamp <code>t</code> &isin; [0, 1, ..., 999, 1000]. The noise coefficient at timestamp <code>t</code> can be obtained using
165+
166+
<pre><code>alphas_cumprod = stage_1.scheduler.alphas_cumprod
167+
alpha_cumprod_t = alphas_cumprod[t]
168+
</code></pre>
169+
170+
for a given <code>t</code>. Below are examples of the Campanile at noise timestamps 250, 500, and 750:
158171

159172
<div class="subsection">
160173
<h3>Campanile at Different Noise Levels</h3>
@@ -230,12 +243,10 @@ <h4>t = 750</h4>
230243
<section id="part-1-3">
231244
<h2>Part 1.3 – Implementing One Step Denoising</h2>
232245

233-
A more effective method is to use a pretrained diffusion model. Using <code>stage_1.unet</code>, we can estimate the amount of noise in the noisy image. With the forward equation, we can solve for <code>x<sub>0</sub></code> (the original image) given the timestamp <code>t</code>:
246+
A more effective method is to use a pretrained diffusion model. Using <code>stage_1.unet</code>, we can estimate the amount of noise in the noisy image. With the forward equation above, we can solve for <code>x<sub>0</sub></code> (the original image) given the timestamp <code>t</code>:
234247

235-
<div class="subsection">
236248
<pre><code>at_x0 = im_noisy_cpu - (1 - alpha_cumprod).sqrt() * noise_est
237249
original_im = at_x0 / alpha_cumprod.sqrt()</code></pre>
238-
</div>
239250

240251
Below are a comparison the original, noisy, and the estimate of the original image for <code>t</code> &isin; [250, 500, 750]:
241252

@@ -306,25 +317,14 @@ <h2>Part 1.4 – Iterative Denoising</h2>
306317
</figure>
307318
</div>
308319

309-
to compute <code>x</code> at timestamp <code>T</code>, where <code>T</code> (or <code>prev_t</code>) is the next timestamp after the current timestamp <code>t</code> in <code>strided_timestamps</code>. First, we compute the constants:
320+
to compute <code>x</code> at timestamp <code>T</code>, where <code>T</code> (or <code>prev_t</code>) is the next timestamp after the current timestamp in the strided timestamps. First, we use the definition of the constants, where <code>alpha_cumprod_t</code> is the variable with the bar:
310321

311-
<div class="subsection">
312-
<pre><code>alpha_cumprod = alphas_cumprod[t]
322+
<pre><code>alpha_cumprod_t = alphas_cumprod[t]
313323
alpha_cumprod_prev = alphas_cumprod[prev_t]
314-
alpha_t = alpha_cumprod / alpha_cumprod_prev
324+
alpha_t = alpha_cumprod_t / alpha_cumprod_prev
315325
beta_t = 1 - alpha_t</code></pre>
316-
</div>
317-
318-
Then, we can compute <code>x<sub>T</sub></code> by using the one-step estimate of <code>x<sub>0</sub></code> as follows:
319-
320-
<div class="subsection">
321-
<pre><code>x_0 = (image - (1 - alpha_cumprod).sqrt() * noise_est) / alpha_cumprod.sqrt()
322-
term_1 = alpha_cumprod_prev.sqrt() * beta_t
323-
term_2 = alpha_t.sqrt() * (1 - alpha_cumprod_prev)
324-
pred_pi_nonoise = (term_1 * x_0 + term_2 * image) / (1 - alpha_cumprod)
325-
pred_prev_image = add_variance(predicted_variance, t, pred_pi_nonoise)</code></pre></div>
326326

327-
Below are some visualizations for the iterative denoising process:
327+
Then, we can get an approximation of <code>x<sub>0</sub></code> by using the one-step estimate, The estimated variance will be computed along with the noise estimate, so we can now compute <code>x<sub>T</sub></code> by using the formula above and obtain the image estimate for the next step. Below are some visualizations for the iterative denoising process:
328328

329329
<div class="subsection">
330330
<h3>Denoising Loop Visualizations (i_start = 10)</h3>
@@ -421,12 +421,10 @@ <h2>Part 1.5 – Diffusion Model Sampling</h2>
421421
<section id="part-1-6">
422422
<h2>Part 1.6 – Classifier-Free Guidance (CFG)</h2>
423423

424-
To improve the quality of the images, we can compute both a noise estiamte conditioned on the text prompt, and the unconditional noise estimate, based on the null prompt <code>''</code>. Denoting the conditional noise estimate as &epsilon;<sub>c</sub> and the unconditional noise estimate as &epsilon;<sub>u</sub>, we let our noise estimate be &epsilon; = &epsilon;<sub>u</sub> + &gamma;(&epsilon;<sub>c</sub> - &epsilon;<sub>u</sub>). Note that we have &epsilon; = &epsilon;<sub>u</sub> and &epsilon; = &epsilon;<sub>c</sub> for &gamma; = 0 and &gamma; = 1 respectively. However, when &gamma; > 1, we can get much higher equality images for reasons still dicussed today. This technique is known as <strong>classifier-free guidance</strong>, and we can implement the noise estimate as follows:
424+
To improve the quality of the images, we can compute both a noise estiamte conditioned on the text prompt, and the unconditional noise estimate, based on the null prompt <code>''</code>. Denoting the conditional noise estimate as &epsilon;<sub>c</sub> and the unconditional noise estimate as &epsilon;<sub>u</sub>, we let our noise estimate be &epsilon; = &epsilon;<sub>u</sub> + &gamma;(&epsilon;<sub>c</sub> - &epsilon;<sub>u</sub>). Note that we have &epsilon; = &epsilon;<sub>u</sub> and &epsilon; = &epsilon;<sub>c</sub> for &gamma; = 0 and &gamma; = 1 respectively. However, when &gamma; > 1, we can get much higher equality images for reasons still dicussed today. This technique is known as <strong>classifier-free guidance</strong>, and we can calculate the noise estimate as follows:
425425

426-
<div class="subsection">
427426
<pre><code>noise_est_cfg = uncond_noise_est + scale * (noise_est - uncond_noise_est)
428427
</code></pre>
429-
</div>
430428

431429
By setting <code>scale = 7</code> (&gamma; = 7) and the conditional & unconditional prompts be <code>'a high quality photo'</code> & the null prompt <code>''</code>, we get the following sample images:
432430

@@ -679,11 +677,9 @@ <h4>Hand Drawn Image 2</h4>
679677
<div class="subsection">
680678
<h3>1.7.2 – Inpainting</h3>
681679

682-
Using the techniques above, we can also modify our <code>iterative_denoise_cfg</code> function to edit certain sections of an image. To do so, we first define a mask the same size as the image that is 1 at the pixels where we want to edit, and 0 otherwise. For each loop of the denoising process, we replace <code>x<sub>t</sub></code> with <strong>m</strong><code>x<sub>t</sub></code> + (1 - <strong>m</strong>)forward(<code>x<sub>0</sub>, t</code>), where <strong>m</strong> is the mask and <code>x<sub>0</sub></code> is the original image. This can be accomplished by the following code:
680+
Using the techniques above, we can also modify our <code>iterative_denoise_cfg</code> function to edit certain sections of an image. To do so, we first define a mask the same size as the image that is 1 at the pixels where we want to edit, and 0 otherwise. For each loop of the denoising process, we replace <code>x<sub>t</sub></code> with <strong>m</strong><code>x<sub>t</sub></code> + (1 - <strong>m</strong>)forward(<code>x<sub>0</sub>, t</code>), where <strong>m</strong> is the mask and <code>x<sub>0</sub></code> is the original image.<br>
683681

684-
<pre><code>masked_image = image * mask + forward(original_image, t).to(device).half() * (1 - mask)</code></pre>
685-
686-
Once <code>image</code> is replaced by <code>masked_image</code>, we replace all further occurrences of <code>image</code> except for the last instance, as the image at each step still needs to be updated. Finally, we let our starting noise be purely random and start with a timestamp index of 0, so that the patch we want to change can be sufficiently denoised. Below are the results on the Campanile image:
682+
<br>Once <code>image</code> is replaced by <code>masked_image</code>, we replace all further occurrences of <code>image</code> except for the last instance, as the image at each step still needs to be updated. Finally, we let our starting noise be purely random and start with a timestamp index of 0, so that the patch we want to change can be sufficiently denoised. Below are the results on the Campanile image:
687683

688684
<h4>Campanile Inpainting</h4>
689685
<div class="image-row">
@@ -843,7 +839,7 @@ <h3>Prompts: <code>'an oil painting of an old man'</code> and <code>'an oil pain
843839
<div class="image-row">
844840
<figure>
845841
<img src="images/anagram/anagram1_256.png" alt="anagram1_256.png" />
846-
<figcaption>Original"</figcaption>
842+
<figcaption>Original</figcaption>
847843
</figure>
848844
<figure>
849845
<img src="images/anagram/anagram1_flip_256.png" alt="anagram1_256.png" />
@@ -855,7 +851,7 @@ <h3>Prompts: <code>'a lithograph of waterfalls'</code> and <code>'a man wearing
855851
<div class="image-row">
856852
<figure>
857853
<img src="images/anagram/anagram2_256.png" alt="anagram2_256.png" />
858-
<figcaption>Original"</figcaption>
854+
<figcaption>Original</figcaption>
859855
</figure>
860856
<figure>
861857
<img src="images/anagram/anagram2_flip_256.png" alt="anagram2_256.png" />
@@ -867,7 +863,7 @@ <h3>Prompts: <code>'an oil painting of a snowy mountain village'</code> and <cod
867863
<div class="image-row">
868864
<figure>
869865
<img src="images/anagram/anagram3_256.png" alt="anagram3_256.png" />
870-
<figcaption>Original"</figcaption>
866+
<figcaption>Original</figcaption>
871867
</figure>
872868
<figure>
873869
<img src="images/anagram/anagram3_flip_256.png" alt="anagram3_256.png" />
@@ -883,69 +879,44 @@ <h3>Prompts: <code>'an oil painting of a snowy mountain village'</code> and <cod
883879
<section id="part-1-9">
884880
<h2>Part 1.9 – Hybrid Images</h2>
885881

886-
With the technqiues above, we can now also create hybrid images, or images that look like different subjects depending on the viewing distance. The classical way to create a hybrid image is to transform the image you want to see at close range with a low-pass filter, thus keeping
882+
With the technqiues above, we can now also create hybrid images, or images that look like different subjects depending on the viewing distance. The classical way to create a hybrid image is to transform the image you want to see at far range with a low-pass filter, the image you want to see at close range wiht a high-pass filter, and combine the 2 transformed images. We can use a similar algorithm in the denoising process, namely by passing the noise estimate from <code>p<sub>1</sub></code> and <code>p<sub>2</sub></code> through a low and high pass filter respectively. This will produce an image that when view close up, shows <code>p<sub>1</sub></code>, but when viewed far away, shows <code>p<sub>2</sub></code>. Unlike the anagram images, we don't need to flip or transform the image to be denoised, as both images should be viewed under the same orientation. Below are several examples:
887883

888884
<div class="subsection">
889-
<h3>1.9.1 – Code: make_hybrids</h3>
890-
891-
892-
<p class="note">
893-
Notes: describe your filter choice (Gaussian blur / FFT), the cutoff frequencies (sigmas),
894-
and how you combined low/high frequency components. <!-- TODO -->
895-
</p>
896-
</div>
897-
898-
<div class="subsection">
899-
<h3>1.9.2 – Two Hybrid Images</h3>
900-
<p>
901-
Each hybrid image should look like Image A up close (high frequencies) and Image B from far away
902-
(low frequencies), or vice versa. Include the two source images and the resulting hybrid.
903-
</p>
904-
905-
<h4>Hybrid 1</h4>
906-
<p>
907-
<strong>Image A (high freq / close):</strong> <em><!-- TODO: describe A --></em><br/>
908-
<strong>Image B (low freq / far):</strong> <em><!-- TODO: describe B --></em>
909-
</p>
885+
<h3>Prompts: <code>'an oil painting of an old man'</code> and <code>'an oil painting of people around a campfire'</code></h3>
910886
<div class="image-row">
911887
<figure>
912-
<img src="images/part1_9_hybrid1_sourceA.png" alt="Hybrid 1 source A" />
913-
<figcaption>Hybrid 1 – Source A</figcaption>
914-
</figure>
915-
<figure>
916-
<img src="images/part1_9_hybrid1_sourceB.png" alt="Hybrid 1 source B" />
917-
<figcaption>Hybrid 1 – Source B</figcaption>
888+
<img src="images/anagram/anagram1_256.png" alt="anagram1_256.png" />
889+
<figcaption>Original</figcaption>
918890
</figure>
919891
<figure>
920-
<img src="images/part1_9_hybrid1_result.png" alt="Hybrid 1 result" />
921-
<figcaption>Hybrid 1 – Result</figcaption>
892+
<img src="images/anagram/anagram1_flip_256.png" alt="anagram1_256.png" />
893+
<figcaption>Flipped</figcaption>
922894
</figure>
923895
</div>
924896

925-
<h4>Hybrid 2</h4>
926-
<p>
927-
<strong>Image A (high freq / close):</strong> <em><!-- TODO: describe A --></em><br/>
928-
<strong>Image B (low freq / far):</strong> <em><!-- TODO: describe B --></em>
929-
</p>
897+
<h3>Prompts: <code>'a lithograph of waterfalls'</code> and <code>'a man wearing a hat'</code></h3>
930898
<div class="image-row">
931899
<figure>
932-
<img src="images/part1_9_hybrid2_sourceA.png" alt="Hybrid 2 source A" />
933-
<figcaption>Hybrid 2 – Source A</figcaption>
900+
<img src="images/anagram/anagram2_256.png" alt="anagram2_256.png" />
901+
<figcaption>Original</figcaption>
934902
</figure>
935903
<figure>
936-
<img src="images/part1_9_hybrid2_sourceB.png" alt="Hybrid 2 source B" />
937-
<figcaption>Hybrid 2 – Source B</figcaption>
904+
<img src="images/anagram/anagram2_flip_256.png" alt="anagram2_256.png" />
905+
<figcaption>Flipped</figcaption>
906+
</figure>
907+
</div>
908+
909+
<h3>Prompts: <code>'an oil painting of a snowy mountain village'</code> and <code>'a photo of a dog'</code></h3>
910+
<div class="image-row">
911+
<figure>
912+
<img src="images/anagram/anagram3_256.png" alt="anagram3_256.png" />
913+
<figcaption>Original</figcaption>
938914
</figure>
939915
<figure>
940-
<img src="images/part1_9_hybrid2_result.png" alt="Hybrid 2 result" />
941-
<figcaption>Hybrid 2 – Result</figcaption>
916+
<img src="images/anagram/anagram3_flip_256.png" alt="anagram3_256.png" />
917+
<figcaption>Flipped</figcaption>
942918
</figure>
943919
</div>
944-
945-
<p class="note">
946-
Optional: include frequency visualizations or intermediate components (low/high pass) if you computed them.
947-
<!-- TODO -->
948-
</p>
949920
</div>
950921
</section>
951922

0 commit comments

Comments
 (0)