tae898
diff --git a/‎02.vae-without-encoder.ipynb‎
Lines changed: 10 additions & 4 deletions b/‎02.vae-without-encoder.ipynb‎
Lines changed: 10 additions & 4 deletions
@@ -290,7 +290,7 @@
     "    *   **DownBlock**: Downsampling using stride-2 convolution.\n",
     "    *   **UpBlock**: Upsampling using bilinear interpolation followed by convolution.\n",
     "\n",
-    "3.  **U-Net in `02.vae-without-encoder.ipynb`**:\n",
+    "3.  **U-Net in [`02.vae-without-encoder.ipynb`](./02.vae-without-encoder.ipynb)**:\n",
     "    *   The `Unet` class implements this straightforward architecture.\n",
     "    *   Crucially, for this single-step model, the U-Net **does not use time embeddings**. The corruption is fixed (a single $\\alpha$ value), so the network doesn't need to adapt to different noise levels. Its task is to denoise $x_1$ which always has similar noise characteristics defined by the chosen $\\alpha$.\n",
     "\n",
@@ -4140,12 +4140,18 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1dbe51d7",
+   "id": "28e0614a",
    "metadata": {},
    "source": [
-    "As you can see, it kinda learns how to generate the digits, but not really. One step diffusion is not easy. Even with a better neural network (U-Net vs. Conv) that is often used in diffusion, we are not getting better results.\n",
+    "As you can see, the model learns *some* structure, but struggles to generate realistic digits — even when using a strong architecture like U-Net.\n",
     "\n",
-    "That's why multi step diffusion models are more common. Also the problem is that corrupting $x_{0}$ with noise to make $x_{1}$, won't make it a normal Gaussian, unless $\\alpha$ is 0. But if it's set to 0, then the decoder can't learn anything since there is no signal in $x_{1}$, but only noise."
+    "This highlights a fundamental challenge: **one-step denoising is hard**.\n",
+    "\n",
+    "In multi-step diffusion models like DDPM, the model solves a **sequence of easier sub-problems** — gradually denoising from a high-noise image to a clean one. But in this one-step setup, the model has to learn to **jump all the way from noise to signal in a single step**.\n",
+    "\n",
+    "Another challenge is that, for the corrupted input $x_1 = \\sqrt{\\alpha} x_0 + \\sqrt{1 - \\alpha} \\epsilon$, the distribution of $x_1$ only resembles a standard Gaussian $\\mathcal{N}(0, I)$ **when $\\alpha \\to 0$**. But when $\\alpha$ is near 0, the model sees **almost no signal** from $x_0$ — it’s all noise. On the other hand, if $\\alpha$ is too high, the latent $x_1$ carries more signal but **deviates from the Gaussian prior**, which can hurt generation quality.\n",
+    "\n",
+    "This tension makes one-step models hard to train and sample from. **Multi-step diffusion models strike a better balance**: they allow the model to progressively refine the sample, without needing to generate clean images from scratch in one step.\n"
    ]
   }
  ],