You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Updated text and code snippets for clarity and consistency in the explanation of noise addition and denoising processes. Adjusted image references and captions for better presentation.
Copy file name to clipboardExpand all lines: project-5/index.html
+46-75Lines changed: 46 additions & 75 deletions
Original file line number
Diff line number
Diff line change
@@ -154,7 +154,20 @@ <h2>Part 1.1 – The forward process</h2>
154
154
</figure>
155
155
</div>
156
156
157
-
For the forward function, we can use <code>alphas_cumprod[t]</code> to obtain the noise coefficient at timestamp <code>t</code>, and <code>torch.randn_like</code> to get ε ∈ [0, 1), allowing us to compute <code>im_noisy</code>. Below are examples of the Campanile at noise timestamps 250, 500, and 750:
157
+
To add noise to an image <code>x<sub>0</sub></code>, we can use the forward process and compute
158
+
<divclass="image-row">
159
+
<figure>
160
+
<imgsrc="images/forward.png" alt="forward.png" />
161
+
</figure>
162
+
</div>
163
+
164
+
for a given timestamp <code>t</code> ∈ [0, 1, ..., 999, 1000]. The noise coefficient at timestamp <code>t</code> can be obtained using
for a given <code>t</code>. Below are examples of the Campanile at noise timestamps 250, 500, and 750:
158
171
159
172
<divclass="subsection">
160
173
<h3>Campanile at Different Noise Levels</h3>
@@ -230,12 +243,10 @@ <h4>t = 750</h4>
230
243
<sectionid="part-1-3">
231
244
<h2>Part 1.3 – Implementing One Step Denoising</h2>
232
245
233
-
A more effective method is to use a pretrained diffusion model. Using <code>stage_1.unet</code>, we can estimate the amount of noise in the noisy image. With the forward equation, we can solve for <code>x<sub>0</sub></code> (the original image) given the timestamp <code>t</code>:
246
+
A more effective method is to use a pretrained diffusion model. Using <code>stage_1.unet</code>, we can estimate the amount of noise in the noisy image. With the forward equation above, we can solve for <code>x<sub>0</sub></code> (the original image) given the timestamp <code>t</code>:
to compute <code>x</code> at timestamp <code>T</code>, where <code>T</code> (or <code>prev_t</code>) is the next timestamp after the current timestamp <code>t</code>in <code>strided_timestamps</code>. First, we compute the constants:
320
+
to compute <code>x</code> at timestamp <code>T</code>, where <code>T</code> (or <code>prev_t</code>) is the next timestamp after the current timestamp in the strided timestamps. First, we use the definition of the constants, where <code>alpha_cumprod_t</code> is the variable with the bar:
310
321
311
-
<divclass="subsection">
312
-
<pre><code>alpha_cumprod = alphas_cumprod[t]
322
+
<pre><code>alpha_cumprod_t = alphas_cumprod[t]
313
323
alpha_cumprod_prev = alphas_cumprod[prev_t]
314
-
alpha_t = alpha_cumprod / alpha_cumprod_prev
324
+
alpha_t = alpha_cumprod_t / alpha_cumprod_prev
315
325
beta_t = 1 - alpha_t</code></pre>
316
-
</div>
317
-
318
-
Then, we can compute <code>x<sub>T</sub></code> by using the one-step estimate of <code>x<sub>0</sub></code> as follows:
Below are some visualizations for the iterative denoising process:
327
+
Then, we can get an approximation of <code>x<sub>0</sub></code> by using the one-step estimate, The estimated variance will be computed along with the noise estimate, so we can now compute <code>x<sub>T</sub></code> by using the formula above and obtain the image estimate for the next step. Below are some visualizations for the iterative denoising process:
To improve the quality of the images, we can compute both a noise estiamte conditioned on the text prompt, and the unconditional noise estimate, based on the null prompt <code>''</code>. Denoting the conditional noise estimate as ε<sub>c</sub> and the unconditional noise estimate as ε<sub>u</sub>, we let our noise estimate be ε = ε<sub>u</sub> + γ(ε<sub>c</sub> - ε<sub>u</sub>). Note that we have ε = ε<sub>u</sub> and ε = ε<sub>c</sub> for γ = 0 and γ = 1 respectively. However, when γ > 1, we can get much higher equality images for reasons still dicussed today. This technique is known as <strong>classifier-free guidance</strong>, and we can implement the noise estimate as follows:
424
+
To improve the quality of the images, we can compute both a noise estiamte conditioned on the text prompt, and the unconditional noise estimate, based on the null prompt <code>''</code>. Denoting the conditional noise estimate as ε<sub>c</sub> and the unconditional noise estimate as ε<sub>u</sub>, we let our noise estimate be ε = ε<sub>u</sub> + γ(ε<sub>c</sub> - ε<sub>u</sub>). Note that we have ε = ε<sub>u</sub> and ε = ε<sub>c</sub> for γ = 0 and γ = 1 respectively. However, when γ > 1, we can get much higher equality images for reasons still dicussed today. This technique is known as <strong>classifier-free guidance</strong>, and we can calculate the noise estimate as follows:
By setting <code>scale = 7</code> (γ = 7) and the conditional & unconditional prompts be <code>'a high quality photo'</code> & the null prompt <code>''</code>, we get the following sample images:
432
430
@@ -679,11 +677,9 @@ <h4>Hand Drawn Image 2</h4>
679
677
<divclass="subsection">
680
678
<h3>1.7.2 – Inpainting</h3>
681
679
682
-
Using the techniques above, we can also modify our <code>iterative_denoise_cfg</code> function to edit certain sections of an image. To do so, we first define a mask the same size as the image that is 1 at the pixels where we want to edit, and 0 otherwise. For each loop of the denoising process, we replace <code>x<sub>t</sub></code> with <strong>m</strong><code>x<sub>t</sub></code> + (1 - <strong>m</strong>)forward(<code>x<sub>0</sub>, t</code>), where <strong>m</strong> is the mask and <code>x<sub>0</sub></code> is the original image. This can be accomplished by the following code:
680
+
Using the techniques above, we can also modify our <code>iterative_denoise_cfg</code> function to edit certain sections of an image. To do so, we first define a mask the same size as the image that is 1 at the pixels where we want to edit, and 0 otherwise. For each loop of the denoising process, we replace <code>x<sub>t</sub></code> with <strong>m</strong><code>x<sub>t</sub></code> + (1 - <strong>m</strong>)forward(<code>x<sub>0</sub>, t</code>), where <strong>m</strong> is the mask and <code>x<sub>0</sub></code> is the original image.<br>
Once <code>image</code> is replaced by <code>masked_image</code>, we replace all further occurrences of <code>image</code> except for the last instance, as the image at each step still needs to be updated. Finally, we let our starting noise be purely random and start with a timestamp index of 0, so that the patch we want to change can be sufficiently denoised. Below are the results on the Campanile image:
682
+
<br>Once <code>image</code> is replaced by <code>masked_image</code>, we replace all further occurrences of <code>image</code> except for the last instance, as the image at each step still needs to be updated. Finally, we let our starting noise be purely random and start with a timestamp index of 0, so that the patch we want to change can be sufficiently denoised. Below are the results on the Campanile image:
687
683
688
684
<h4>Campanile Inpainting</h4>
689
685
<divclass="image-row">
@@ -843,7 +839,7 @@ <h3>Prompts: <code>'an oil painting of an old man'</code> and <code>'an oil pain
@@ -883,69 +879,44 @@ <h3>Prompts: <code>'an oil painting of a snowy mountain village'</code> and <cod
883
879
<sectionid="part-1-9">
884
880
<h2>Part 1.9 – Hybrid Images</h2>
885
881
886
-
With the technqiues above, we can now also create hybrid images, or images that look like different subjects depending on the viewing distance. The classical way to create a hybrid image is to transform the image you want to see at close range with a low-pass filter, thus keeping
882
+
With the technqiues above, we can now also create hybrid images, or images that look like different subjects depending on the viewing distance. The classical way to create a hybrid image is to transform the image you want to see at far range with a low-pass filter, the image you want to see at close range wiht a high-pass filter, and combine the 2 transformed images. We can use a similar algorithm in the denoising process, namely by passing the noise estimate from <code>p<sub>1</sub></code> and <code>p<sub>2</sub></code> through a low and high pass filter respectively. This will produce an image that when view close up, shows <code>p<sub>1</sub></code>, but when viewed far away, shows <code>p<sub>2</sub></code>. Unlike the anagram images, we don't need to flip or transform the image to be denoised, as both images should be viewed under the same orientation. Below are several examples:
887
883
888
884
<divclass="subsection">
889
-
<h3>1.9.1 – Code: make_hybrids</h3>
890
-
891
-
892
-
<pclass="note">
893
-
Notes: describe your filter choice (Gaussian blur / FFT), the cutoff frequencies (sigmas),
894
-
and how you combined low/high frequency components. <!-- TODO -->
895
-
</p>
896
-
</div>
897
-
898
-
<divclass="subsection">
899
-
<h3>1.9.2 – Two Hybrid Images</h3>
900
-
<p>
901
-
Each hybrid image should look like Image A up close (high frequencies) and Image B from far away
902
-
(low frequencies), or vice versa. Include the two source images and the resulting hybrid.
903
-
</p>
904
-
905
-
<h4>Hybrid 1</h4>
906
-
<p>
907
-
<strong>Image A (high freq / close):</strong><em><!-- TODO: describe A --></em><br/>
908
-
<strong>Image B (low freq / far):</strong><em><!-- TODO: describe B --></em>
909
-
</p>
885
+
<h3>Prompts: <code>'an oil painting of an old man'</code> and <code>'an oil painting of people around a campfire'</code></h3>
0 commit comments