Refine noise addition and denoising explanations

cjxthecoder · web-flow · commit e08330f5f45e · 2025-12-13T18:20:50.000-08:00
Updated text and code snippets for clarity and consistency in the explanation of noise addition and denoising processes. Adjusted image references and captions for better presentation.
diff --git a/project-5/index.html b/project-5/index.html
@@ -154,7 +154,20 @@ <h2>Part 1.1 – The forward process</h2>
 </figure>
 </div>
 
-For the forward function, we can use <code>alphas_cumprod[t]</code> to obtain the noise coefficient at timestamp <code>t</code>, and <code>torch.randn_like</code> to get &epsilon; &isin; [0, 1), allowing us to compute <code>im_noisy</code>. Below are examples of the Campanile at noise timestamps 250, 500, and 750:
+To add noise to an image <code>x<sub>0</sub></code>, we can use the forward process and compute
+<div class="image-row">
+<figure>
+<img src="images/forward.png" alt="forward.png" />
+</figure>
+</div>
+
+for a given timestamp <code>t</code> &isin; [0, 1, ..., 999, 1000]. The noise coefficient at timestamp <code>t</code> can be obtained using
+
+<pre><code>alphas_cumprod = stage_1.scheduler.alphas_cumprod
+alpha_cumprod_t = alphas_cumprod[t]
+</code></pre>
+
+for a given <code>t</code>. Below are examples of the Campanile at noise timestamps 250, 500, and 750:
 
 <div class="subsection">
 <h3>Campanile at Different Noise Levels</h3>
@@ -230,12 +243,10 @@ <h4>t = 750</h4>
 <section id="part-1-3">
 <h2>Part 1.3 – Implementing One Step Denoising</h2>
 
-A more effective method is to use a pretrained diffusion model. Using <code>stage_1.unet</code>, we can estimate the amount of noise in the noisy image. With the forward equation, we can solve for <code>x<sub>0</sub></code> (the original image) given the timestamp <code>t</code>:
+A more effective method is to use a pretrained diffusion model. Using <code>stage_1.unet</code>, we can estimate the amount of noise in the noisy image. With the forward equation above, we can solve for <code>x<sub>0</sub></code> (the original image) given the timestamp <code>t</code>:
 
-<div class="subsection">
 <pre><code>at_x0 = im_noisy_cpu - (1 - alpha_cumprod).sqrt() * noise_est
 original_im = at_x0 / alpha_cumprod.sqrt()</code></pre>
-</div>
 
 Below are a comparison the original, noisy, and the estimate of the original image for <code>t</code> &isin; [250, 500, 750]:
 
@@ -306,25 +317,14 @@ <h2>Part 1.4 – Iterative Denoising</h2>
 </figure>
 </div>
 
-to compute <code>x</code> at timestamp <code>T</code>, where <code>T</code> (or <code>prev_t</code>) is the next timestamp after the current timestamp <code>t</code> in <code>strided_timestamps</code>. First, we compute the constants:
+to compute <code>x</code> at timestamp <code>T</code>, where <code>T</code> (or <code>prev_t</code>) is the next timestamp after the current timestamp in the strided timestamps. First, we use the definition of the constants, where <code>alpha_cumprod_t</code> is the variable with the bar:
 
-<div class="subsection">
-<pre><code>alpha_cumprod = alphas_cumprod[t]
+<pre><code>alpha_cumprod_t = alphas_cumprod[t]
 alpha_cumprod_prev = alphas_cumprod[prev_t]
-alpha_t = alpha_cumprod / alpha_cumprod_prev
+alpha_t = alpha_cumprod_t / alpha_cumprod_prev
 beta_t = 1 - alpha_t</code></pre>
-</div>
-
-Then, we can compute <code>x<sub>T</sub></code> by using the one-step estimate of <code>x<sub>0</sub></code> as follows:
-
-<div class="subsection">
-<pre><code>x_0 = (image - (1 - alpha_cumprod).sqrt() * noise_est) / alpha_cumprod.sqrt()
-term_1 = alpha_cumprod_prev.sqrt() * beta_t
-term_2 = alpha_t.sqrt() * (1 - alpha_cumprod_prev)
-pred_pi_nonoise = (term_1 * x_0 + term_2 * image) / (1 - alpha_cumprod)
-pred_prev_image = add_variance(predicted_variance, t, pred_pi_nonoise)</code></pre></div>
 
-Below are some visualizations for the iterative denoising process:
+Then, we can get an approximation of <code>x<sub>0</sub></code> by using the one-step estimate, The estimated variance will be computed along with the noise estimate, so we can now compute <code>x<sub>T</sub></code> by using the formula above and obtain the image estimate for the next step. Below are some visualizations for the iterative denoising process:
 
 <div class="subsection">
 <h3>Denoising Loop Visualizations (i_start = 10)</h3>
@@ -421,12 +421,10 @@ <h2>Part 1.5 – Diffusion Model Sampling</h2>
 <section id="part-1-6">
 <h2>Part 1.6 – Classifier-Free Guidance (CFG)</h2>
 
-To improve the quality of the images, we can compute both a noise estiamte conditioned on the text prompt, and the unconditional noise estimate, based on the null prompt <code>''</code>. Denoting the conditional noise estimate as &epsilon;<sub>c</sub> and the unconditional noise estimate as &epsilon;<sub>u</sub>, we let our noise estimate be &epsilon; = &epsilon;<sub>u</sub> + &gamma;(&epsilon;<sub>c</sub> - &epsilon;<sub>u</sub>). Note that we have &epsilon; = &epsilon;<sub>u</sub> and &epsilon; = &epsilon;<sub>c</sub> for &gamma; = 0 and &gamma; = 1 respectively. However, when &gamma; > 1, we can get much higher equality images for reasons still dicussed today. This technique is known as <strong>classifier-free guidance</strong>, and we can implement the noise estimate as follows:
+To improve the quality of the images, we can compute both a noise estiamte conditioned on the text prompt, and the unconditional noise estimate, based on the null prompt <code>''</code>. Denoting the conditional noise estimate as &epsilon;<sub>c</sub> and the unconditional noise estimate as &epsilon;<sub>u</sub>, we let our noise estimate be &epsilon; = &epsilon;<sub>u</sub> + &gamma;(&epsilon;<sub>c</sub> - &epsilon;<sub>u</sub>). Note that we have &epsilon; = &epsilon;<sub>u</sub> and &epsilon; = &epsilon;<sub>c</sub> for &gamma; = 0 and &gamma; = 1 respectively. However, when &gamma; > 1, we can get much higher equality images for reasons still dicussed today. This technique is known as <strong>classifier-free guidance</strong>, and we can calculate the noise estimate as follows:
 
-<div class="subsection">
 <pre><code>noise_est_cfg = uncond_noise_est + scale * (noise_est - uncond_noise_est)
 </code></pre>
-</div>
 
 By setting <code>scale = 7</code> (&gamma; = 7) and the conditional & unconditional prompts be <code>'a high quality photo'</code> & the null prompt <code>''</code>, we get the following sample images:
 
@@ -679,11 +677,9 @@ <h4>Hand Drawn Image 2</h4>
 <div class="subsection">
 <h3>1.7.2 – Inpainting</h3>
 
-Using the techniques above, we can also modify our <code>iterative_denoise_cfg</code> function to edit certain sections of an image. To do so, we first define a mask the same size as the image that is 1 at the pixels where we want to edit, and 0 otherwise. For each loop of the denoising process, we replace <code>x<sub>t</sub></code> with <strong>m</strong><code>x<sub>t</sub></code> + (1 - <strong>m</strong>)forward(<code>x<sub>0</sub>, t</code>), where <strong>m</strong> is the mask and <code>x<sub>0</sub></code> is the original image. This can be accomplished by the following code:
+Using the techniques above, we can also modify our <code>iterative_denoise_cfg</code> function to edit certain sections of an image. To do so, we first define a mask the same size as the image that is 1 at the pixels where we want to edit, and 0 otherwise. For each loop of the denoising process, we replace <code>x<sub>t</sub></code> with <strong>m</strong><code>x<sub>t</sub></code> + (1 - <strong>m</strong>)forward(<code>x<sub>0</sub>, t</code>), where <strong>m</strong> is the mask and <code>x<sub>0</sub></code> is the original image.<br>
 
-<pre><code>masked_image = image * mask + forward(original_image, t).to(device).half() * (1 - mask)</code></pre>
-
-Once <code>image</code> is replaced by <code>masked_image</code>, we replace all further occurrences of <code>image</code> except for the last instance, as the image at each step still needs to be updated. Finally, we let our starting noise be purely random and start with a timestamp index of 0, so that the patch we want to change can be sufficiently denoised. Below are the results on the Campanile image:
+<br>Once <code>image</code> is replaced by <code>masked_image</code>, we replace all further occurrences of <code>image</code> except for the last instance, as the image at each step still needs to be updated. Finally, we let our starting noise be purely random and start with a timestamp index of 0, so that the patch we want to change can be sufficiently denoised. Below are the results on the Campanile image:
 
 <h4>Campanile Inpainting</h4>
 <div class="image-row">
@@ -843,7 +839,7 @@ <h3>Prompts: <code>'an oil painting of an old man'</code> and <code>'an oil pain
     <div class="image-row">
       <figure>
         <img src="images/anagram/anagram1_256.png" alt="anagram1_256.png" />
-        <figcaption>Original"</figcaption>
+        <figcaption>Original</figcaption>
       </figure>
       <figure>
         <img src="images/anagram/anagram1_flip_256.png" alt="anagram1_256.png" />
@@ -855,7 +851,7 @@ <h3>Prompts: <code>'a lithograph of waterfalls'</code> and <code>'a man wearing
     <div class="image-row">
       <figure>
         <img src="images/anagram/anagram2_256.png" alt="anagram2_256.png" />
-        <figcaption>Original"</figcaption>
+        <figcaption>Original</figcaption>
       </figure>
       <figure>
         <img src="images/anagram/anagram2_flip_256.png" alt="anagram2_256.png" />
@@ -867,7 +863,7 @@ <h3>Prompts: <code>'an oil painting of a snowy mountain village'</code> and <cod
     <div class="image-row">
       <figure>
         <img src="images/anagram/anagram3_256.png" alt="anagram3_256.png" />
-        <figcaption>Original"</figcaption>
+        <figcaption>Original</figcaption>
       </figure>
       <figure>
         <img src="images/anagram/anagram3_flip_256.png" alt="anagram3_256.png" />
@@ -883,69 +879,44 @@ <h3>Prompts: <code>'an oil painting of a snowy mountain village'</code> and <cod
 <section id="part-1-9">
   <h2>Part 1.9 – Hybrid Images</h2>
   
-  With the technqiues above, we can now also create hybrid images, or images that look like different subjects depending on the viewing distance. The classical way to create a hybrid image is to transform the image you want to see at close range with a low-pass filter, thus keeping
+  With the technqiues above, we can now also create hybrid images, or images that look like different subjects depending on the viewing distance. The classical way to create a hybrid image is to transform the image you want to see at far range with a low-pass filter, the image you want to see at close range wiht a high-pass filter, and combine the 2 transformed images. We can use a similar algorithm in the denoising process, namely by passing the noise estimate from <code>p<sub>1</sub></code> and <code>p<sub>2</sub></code> through a low and high pass filter respectively. This will produce an image that when view close up, shows <code>p<sub>1</sub></code>, but when viewed far away, shows <code>p<sub>2</sub></code>. Unlike the anagram images, we don't need to flip or transform the image to be denoised, as both images should be viewed under the same orientation. Below are several examples:
 
   <div class="subsection">
-    <h3>1.9.1 – Code: make_hybrids</h3>
-
-
-    <p class="note">
-      Notes: describe your filter choice (Gaussian blur / FFT), the cutoff frequencies (sigmas),
-      and how you combined low/high frequency components. <!-- TODO -->
-    </p>
-  </div>
-
-  <div class="subsection">
-    <h3>1.9.2 – Two Hybrid Images</h3>
-    <p>
-      Each hybrid image should look like Image A up close (high frequencies) and Image B from far away
-      (low frequencies), or vice versa. Include the two source images and the resulting hybrid.
-    </p>
-
-    <h4>Hybrid 1</h4>
-    <p>
-      <strong>Image A (high freq / close):</strong> <em><!-- TODO: describe A --></em><br/>
-      <strong>Image B (low freq / far):</strong> <em><!-- TODO: describe B --></em>
-    </p>
+    <h3>Prompts: <code>'an oil painting of an old man'</code> and <code>'an oil painting of people around a campfire'</code></h3>
     <div class="image-row">
       <figure>
-        <img src="images/part1_9_hybrid1_sourceA.png" alt="Hybrid 1 source A" />
-        <figcaption>Hybrid 1 – Source A</figcaption>
-      </figure>
-      <figure>
-        <img src="images/part1_9_hybrid1_sourceB.png" alt="Hybrid 1 source B" />
-        <figcaption>Hybrid 1 – Source B</figcaption>
+        <img src="images/anagram/anagram1_256.png" alt="anagram1_256.png" />
+        <figcaption>Original</figcaption>
       </figure>
       <figure>
-        <img src="images/part1_9_hybrid1_result.png" alt="Hybrid 1 result" />
-        <figcaption>Hybrid 1 – Result</figcaption>
+        <img src="images/anagram/anagram1_flip_256.png" alt="anagram1_256.png" />
+        <figcaption>Flipped</figcaption>
       </figure>
     </div>
 
-    <h4>Hybrid 2</h4>
-    <p>
-      <strong>Image A (high freq / close):</strong> <em><!-- TODO: describe A --></em><br/>
-      <strong>Image B (low freq / far):</strong> <em><!-- TODO: describe B --></em>
-    </p>
+    <h3>Prompts: <code>'a lithograph of waterfalls'</code> and <code>'a man wearing a hat'</code></h3>
     <div class="image-row">
       <figure>
-        <img src="images/part1_9_hybrid2_sourceA.png" alt="Hybrid 2 source A" />
-        <figcaption>Hybrid 2 – Source A</figcaption>
+        <img src="images/anagram/anagram2_256.png" alt="anagram2_256.png" />
+        <figcaption>Original</figcaption>
       </figure>
       <figure>
-        <img src="images/part1_9_hybrid2_sourceB.png" alt="Hybrid 2 source B" />
-        <figcaption>Hybrid 2 – Source B</figcaption>
+        <img src="images/anagram/anagram2_flip_256.png" alt="anagram2_256.png" />
+        <figcaption>Flipped</figcaption>
+      </figure>
+    </div>
+
+    <h3>Prompts: <code>'an oil painting of a snowy mountain village'</code> and <code>'a photo of a dog'</code></h3>
+    <div class="image-row">
+      <figure>
+        <img src="images/anagram/anagram3_256.png" alt="anagram3_256.png" />
+        <figcaption>Original</figcaption>
       </figure>
       <figure>
-        <img src="images/part1_9_hybrid2_result.png" alt="Hybrid 2 result" />
-        <figcaption>Hybrid 2 – Result</figcaption>
+        <img src="images/anagram/anagram3_flip_256.png" alt="anagram3_256.png" />
+        <figcaption>Flipped</figcaption>
       </figure>
     </div>
-
-    <p class="note">
-      Optional: include frequency visualizations or intermediate components (low/high pass) if you computed them.
-      <!-- TODO -->
-    </p>
   </div>
 </section>