@@ -146,29 +146,28 @@ <h3>Images generated with num_inference_steps=100</h3>
146146<!-- ========================================================= -->
147147< section id ="part-1-1 ">
148148 < h2 > Part 1.1 – Implementing the forward process</ h2 >
149-
150- < div class ="subsection ">
151- < h3 > Code: forward(im, t)</ h3 >
152- < pre > < code > # TODO
153- # def forward(im, t):
154- # ...
155- # return im_noisy</ code > </ pre >
156- </ div >
149+
150+ To start, we have the original Campanile image at 64px:
151+ < figure >
152+ < img src ="images/campanile.png " alt ="campanile.png " />
153+ </ figure >
154+
155+ For the forward function, we can use < code > alphas_cumprod[t]</ code > to obtain the noise coefficient at timestamp < code > t</ code > , and < code > torch.randn_like</ code > to get ε ∈ [0, 1), allowing us to compute < code > im_noisy</ code > . Below are examples of the Campanile at noise timestamps 250, 500, and 750:
157156
158157 < div class ="subsection ">
159158 < h3 > Campanile at Different Noise Levels</ h3 >
160159 < div class ="image-row ">
161160 < figure >
162- < img src ="images/part1_1_campanile_t250 .png " alt ="Campanile t=250 " />
163- < figcaption > Campanile at noise level t = 250</ figcaption >
161+ < img src ="images/250500750/campanile_250noise .png " alt ="campanile_250noise.png " />
162+ < figcaption > Campanile at t = 250</ figcaption >
164163 </ figure >
165164 < figure >
166- < img src ="images/part1_1_campanile_t500 .png " alt ="Campanile t=500 " />
167- < figcaption > Campanile at noise level t = 500</ figcaption >
165+ < img src ="images/250500750/campanile_500noise .png " alt ="campanile_500noise.png " />
166+ < figcaption > Campanile at t = 500</ figcaption >
168167 </ figure >
169168 < figure >
170- < img src ="images/part1_1_campanile_t750 .png " alt ="Campanile t=750 " />
171- < figcaption > Campanile at noise level t = 750</ figcaption >
169+ < img src ="images/250500750/campanile_750noise .png " alt ="campanile_750noise.png " />
170+ < figcaption > Campanile at t = 750</ figcaption >
172171 </ figure >
173172 </ div >
174173 </ div >
@@ -180,44 +179,43 @@ <h3>Campanile at Different Noise Levels</h3>
180179< section id ="part-1-2 ">
181180 < h2 > Part 1.2 – Classical Denoising</ h2 >
182181
183- < div class ="subsection ">
184- < h3 > Code: Gaussian Denoising</ h3 >
185-
182+ In order to try to revert the image with noise, we can try the classical method for denoising, namely Gaussian filtering. However, with high noise the effect is limited:
183+
186184 < div class ="subsection ">
187185 < h3 > Noisy vs Gaussian-Denoised Campanile</ h3 >
188186
189187 < h4 > t = 250</ h4 >
190188 < div class ="image-row ">
191189 < figure >
192- < img src ="images/part1_2_campanile_t250_noisy .png " alt ="Campanile noisy t=250 " />
190+ < img src ="images/250500750/campanile_250noise .png " alt ="campanile_250noise.png " />
193191 < figcaption > Noisy Campanile (t = 250)</ figcaption >
194192 </ figure >
195193 < figure >
196- < img src ="images/part1_2_campanile_t250_gauss .png " alt ="Campanile denoised t=250 " />
194+ < img src ="images/250500750/campanile_250denoise_gaussian .png " alt ="campanile_250denoise_gaussian.png " />
197195 < figcaption > Gaussian denoised (t = 250)</ figcaption >
198196 </ figure >
199197 </ div >
200198
201199 < h4 > t = 500</ h4 >
202200 < div class ="image-row ">
203201 < figure >
204- < img src ="images/part1_2_campanile_t500_noisy .png " alt ="Campanile noisy t=500 " />
202+ < img src ="images/250500750/campanile_500noise .png " alt ="campanile_500noise.png " />
205203 < figcaption > Noisy Campanile (t = 500)</ figcaption >
206204 </ figure >
207205 < figure >
208- < img src ="images/part1_2_campanile_t500_gauss .png " alt ="Campanile denoised t=500 " />
206+ < img src ="images/250500750/campanile_500denoise_gaussian .png " alt ="campanile_500denoise_gaussian.png " />
209207 < figcaption > Gaussian denoised (t = 500)</ figcaption >
210208 </ figure >
211209 </ div >
212210
213211 < h4 > t = 750</ h4 >
214212 < div class ="image-row ">
215213 < figure >
216- < img src ="images/part1_2_campanile_t750_noisy .png " alt ="Campanile noisy t=750 " />
214+ < img src ="images/250500750/campanile_750noise .png " alt ="campanile_750noise.png " />
217215 < figcaption > Noisy Campanile (t = 750)</ figcaption >
218216 </ figure >
219217 < figure >
220- < img src ="images/part1_2_campanile_t750_gauss .png " alt ="Campanile denoised t=750 " />
218+ < img src ="images/250500750/campanile_750denoise_gaussian .png " alt ="campanile_750denoise_gaussian.png " />
221219 < figcaption > Gaussian denoised (t = 750)</ figcaption >
222220 </ figure >
223221 </ div >
@@ -228,65 +226,64 @@ <h4>t = 750</h4>
228226<!-- Part 1.3: One-Step Denoising -->
229227<!-- ========================================================= -->
230228< section id ="part-1-3 ">
231- < h2 > Part 1.3 – One-Step Denoising with UNet</ h2 >
229+ < h2 > Part 1.3 – Implementing One Step Denoising</ h2 >
230+
231+ A much more effective method is to use a pretrained diffusion model. Using < code > stage_1.unet</ code > , we can estimate the amount of noise in the noisy image. With the forward equation, we can solve for x< sub > 0</ sub > (the original image) given the timestamp < code > t</ code > :
232232
233233 < div class ="subsection ">
234- < h3 > Code: One-Step Denoise</ h3 >
235- < pre > < code > # TODO
236- # def one_step_denoise(im_noisy, t, ...):
237- # # 1) forward(...) to get noisy version
238- # # 2) stage_1.unet to estimate noise
239- # # 3) subtract noise to estimate x_0
240- # return im_estimated</ code > </ pre >
234+ < pre > < code > at_x0 = im_noisy_cpu - (1 - alpha_cumprod).sqrt() * noise_est
235+ original_im = at_x0 / alpha_cumprod.sqrt()</ code > </ pre >
241236 </ div >
237+
238+ Below are a comparison the original, noisy, and the estimate of the original image for < code > t</ code > ∈ [250, 500, 750]:
242239
243240 < div class ="subsection ">
244241 < h3 > Original, Noisy, One-Step Estimate (t = 250, 500, 750)</ h3 >
245242
246243 < h4 > t = 250</ h4 >
247244 < div class ="image-row ">
248245 < figure >
249- < img src ="images/part1_3_t250_original .png " alt ="Original Campanile " />
246+ < img src ="images/campanile .png " alt ="campanile.png " />
250247 < figcaption > Original Campanile</ figcaption >
251248 </ figure >
252249 < figure >
253- < img src ="images/part1_3_t250_noisy .png " alt ="Noisy Campanile t=250 " />
250+ < img src ="images/250500750/campanile_250noise .png " alt ="campanile_250noise.png " />
254251 < figcaption > Noisy (t = 250)</ figcaption >
255252 </ figure >
256253 < figure >
257- < img src ="images/part1_3_t250_est .png " alt ="Estimate Campanile t=250 " />
254+ < img src ="images/250500750/campanile_250denoise_onestep .png " alt ="campanile_250denoise_onestep.png " />
258255 < figcaption > One-step estimate of original (t = 250)</ figcaption >
259256 </ figure >
260257 </ div >
261258
262259 < h4 > t = 500</ h4 >
263260 < div class ="image-row ">
264261 < figure >
265- < img src ="images/part1_3_t500_original .png " alt ="Original Campanile " />
262+ < img src ="images/campanile .png " alt ="campanile.png " />
266263 < figcaption > Original Campanile</ figcaption >
267264 </ figure >
268265 < figure >
269- < img src ="images/part1_3_t500_noisy .png " alt ="Noisy Campanile t=500 " />
266+ < img src ="images/250500750/campanile_500noise .png " alt ="campanile_500noise.png " />
270267 < figcaption > Noisy (t = 500)</ figcaption >
271268 </ figure >
272269 < figure >
273- < img src ="images/part1_3_t500_est .png " alt ="Estimate Campanile t=500 " />
270+ < img src ="images/250500750/campanile_500denoise_onestep .png " alt ="campanile_500denoise_onestep.png " />
274271 < figcaption > One-step estimate of original (t = 500)</ figcaption >
275272 </ figure >
276273 </ div >
277274
278275 < h4 > t = 750</ h4 >
279276 < div class ="image-row ">
280277 < figure >
281- < img src ="images/part1_3_t750_original .png " alt ="Original Campanile " />
278+ < img src ="images/campanile .png " alt ="campanile.png " />
282279 < figcaption > Original Campanile</ figcaption >
283280 </ figure >
284281 < figure >
285- < img src ="images/part1_3_t750_noisy .png " alt ="Noisy Campanile t=750 " />
282+ < img src ="images/250500750/campanile_750noise .png " alt ="campanile_750noise.png " />
286283 < figcaption > Noisy (t = 750)</ figcaption >
287284 </ figure >
288285 < figure >
289- < img src ="images/part1_3_t750_est .png " alt ="Estimate Campanile t=750 " />
286+ < img src ="images/250500750/campanile_750denoise_onestep .png " alt ="campanile_750denoise_onestep.png " />
290287 < figcaption > One-step estimate of original (t = 750)</ figcaption >
291288 </ figure >
292289 </ div >
@@ -299,13 +296,9 @@ <h4>t = 750</h4>
299296< section id ="part-1-4 ">
300297 < h2 > Part 1.4 – Iterative Denoising</ h2 >
301298
302- < div class ="subsection ">
303- < h3 > Code: strided_timesteps</ h3 >
304- < pre > < code > # TODO
305- # Example:
306- # strided_timesteps = list(range(990, -10, -30))
307- # stage_1.scheduler.set_timesteps(timesteps=strided_timesteps)</ code > </ pre >
308- </ div >
299+ Instead of using one step, we can obtain better results by iterativly denoising from step < code > t</ code > until step 0. However, this means running the diffusion model 1000 times in the worst case, which is slow and costly.
300+
301+ Fortunately, we can speed up the computation by iterating in steps. Due to
309302
310303 < div class ="subsection ">
311304 < h3 > Code: iterative_denoise</ h3 >
0 commit comments