Update 06.ipynb

Programmer-RD-AI · Programmer-RD-AI · commit 510e71328bb1 · 2024-05-28T23:59:22.000+05:30
diff --git a/06.ipynb b/06.ipynb
@@ -518,15 +518,33 @@
     "In pseudocode:\n",
     "\n",
     "```python\n",
+    "# Equation 1\n",
     "x_input = [class_token, image_patch_1, image_patch_2, ..., image_patch_N] + [class_token_pos, image_patch_1_pos, image_patch_2_pos, ..., image_patch_N_pos]\n",
     "```\n",
     "---\n",
     "\n",
     "##### Equation 2&3\n",
     "The Transformer encoder (Vaswani et al., 2017) consists of alternating layers of multiheaded selfattention (MSA, see Appendix A) and MLP blocks (Eq. 2, 3). Layernorm (LN) is applied before every block, and residual connections after every block (Wang et al., 2019; Baevski \\& Auli, 2019).\n",
     "\n",
+    "In pseudocode:\n",
+    "\n",
+    "```python\n",
+    "# Equation 2\n",
+    "x_output_MSA_block = MSA_layer(LN_Layer(x_input)) + x_input\n",
+    "\n",
+    "# Equation 3\n",
+    "x_output_MLP_block = MLP_layer(LN_layer(x_output_MSA_block)) + x_output_MSA_block\n",
+    "```\n",
+    "\n",
     "##### Equation 4\n",
-    "Similar to BERT's [class] token, we prepend a learnable embedding to the sequence of embedded patches $\\left(\\mathbf{z}_0^0=\\mathbf{x}_{\\text {class }}\\right)$, whose state at the output of the Transformer encoder $\\left(\\mathbf{z}_L^0\\right)$ serves as the image representation $y$ (Eq. 4). Both during pre-training and fine-tuning, a classification head is attached to $\\mathbf{z}_L^0$. The classification head is implemented by a MLP with one hidden layer at pre-training time and by a single linear layer at fine-tuning time."
+    "Similar to BERT's [class] token, we prepend a learnable embedding to the sequence of embedded patches $\\left(\\mathbf{z}_0^0=\\mathbf{x}_{\\text {class }}\\right)$, whose state at the output of the Transformer encoder $\\left(\\mathbf{z}_L^0\\right)$ serves as the image representation $y$ (Eq. 4). Both during pre-training and fine-tuning, a classification head is attached to $\\mathbf{z}_L^0$. The classification head is implemented by a MLP with one hidden layer at pre-training time and by a single linear layer at fine-tuning time.\n",
+    "\n",
+    "In pseudocode:\n",
+    "\n",
+    "```python\n",
+    "# Equation 4\n",
+    "y = Linear_layer(LN_layer(x_output_MLP_block))\n",
+    "```"
    ]
   },
   {