|
482 | 482 | "\n",
|
483 | 483 | "The equations for both of these losses are provided below:\n",
|
484 | 484 | "\n",
|
485 |
| - "$ L_{KL}(\\mu, \\sigma) = \\frac{1}{2}\\sum\\limits_{j=0}^{k-1}\\small{(\\sigma_j + \\mu_j^2 - 1 - \\log{\\sigma_j})} $\n", |
| 485 | + "\\begin{equation*}\n", |
| 486 | + "L_{KL}(\\mu, \\sigma) = \\frac{1}{2}\\sum\\limits_{j=0}^{k-1}\\small{(\\sigma_j + \\mu_j^2 - 1 - \\log{\\sigma_j})}\n", |
| 487 | + "\\end{equation*}\n", |
486 | 488 | "\n",
|
487 |
| - "$ L_{x}{(x,\\hat{x})} = ||x-\\hat{x}||_1 $ \n", |
| 489 | + "\\begin{equation*}\n", |
| 490 | + "L_{x}{(x,\\hat{x})} = ||x-\\hat{x}||_1\n", |
| 491 | + "\\end{equation*}\n", |
488 | 492 | "\n",
|
489 | 493 | "Thus for the VAE loss we have: \n",
|
490 | 494 | "\n",
|
491 |
| - "$ L_{VAE} = c\\cdot L_{KL} + L_{x}{(x,\\hat{x})} $\n", |
| 495 | + "\\begin{equation*}\n", |
| 496 | + "L_{VAE} = c\\cdot L_{KL} + L_{x}{(x,\\hat{x})}\n", |
| 497 | + "\\end{equation*}\n", |
492 | 498 | "\n",
|
493 | 499 | "where $c$ is a weighting coefficient used for regularization. \n",
|
494 | 500 | "\n",
|
|
551 | 557 | "\n",
|
552 | 558 | "As you may recall from lecture, VAEs use a \"reparameterization trick\" for sampling learned latent variables. Instead of the VAE encoder generating a single vector of real numbers for each latent variable, it generates a vector of means and a vector of standard deviations that are constrained to roughly follow Gaussian distributions. We then sample from the standard deviations and add back the mean to output this as our sampled latent vector. Formalizing this for a latent variable $z$ where we sample $\\epsilon \\sim \\mathcal{N}(0,(I))$ we have: \n",
|
553 | 559 | "\n",
|
554 |
| - "$ z = \\mathbb{\\mu} + e^{\\left(\\frac{1}{2} \\cdot \\log{\\Sigma}\\right)}\\circ \\epsilon $\n", |
| 560 | + "\\begin{equation}\n", |
| 561 | + "z = \\mathbb{\\mu} + e^{\\left(\\frac{1}{2} \\cdot \\log{\\Sigma}\\right)}\\circ \\epsilon\n", |
| 562 | + "\\end{equation}\n", |
555 | 563 | "\n",
|
556 | 564 | "where $\\mu$ is the mean and $\\Sigma$ is the covariance matrix. This is useful because it will let us neatly define the loss function for the VAE, generate randomly sampled latent variables, achieve improved network generalization, **and** make our complete VAE network differentiable so that it can be trained via backpropagation. Quite powerful!\n",
|
557 | 565 | "\n",
|
|
635 | 643 | "\n",
|
636 | 644 | "We can write a single expression for the loss by defining an indicator variable $\\mathcal{I}_f$which reflects which training data are images of faces ($\\mathcal{I}_f(y) = 1$ ) and which are images of non-faces ($\\mathcal{I}_f(y) = 0$). Using this, we obtain:\n",
|
637 | 645 | "\n",
|
638 |
| - "$$L_{total} = L_y(y,\\hat{y}) + \\mathcal{I}_f(y)\\Big[L_{VAE}\\Big]$$\n", |
| 646 | + "\\begin{equation}\n", |
| 647 | + "L_{total} = L_y(y,\\hat{y}) + \\mathcal{I}_f(y)\\Big[L_{VAE}\\Big]\n", |
| 648 | + "\\end{equation}\n", |
639 | 649 | "\n",
|
640 | 650 | "Let's write a function to define the DB-VAE loss function:\n",
|
641 | 651 | "\n"
|
|
1067 | 1077 | "\n",
|
1068 | 1078 | "Hopefully this lab has shed some light on a few concepts, from vision based tasks, to VAEs, to algorithmic bias. We like to think it has, but we're biased ;). \n",
|
1069 | 1079 | "\n",
|
1070 |
| - "" |
| 1080 | + "<img src=\"https://i.ibb.co/PmCSNXs/tenor.gif\" />" |
1071 | 1081 | ]
|
1072 | 1082 | }
|
1073 | 1083 | ]
|
|
0 commit comments