Add mathematical expressions of activation functions

gjbex · gjbex · commit 14a0d8f169a4 · 2020-03-21T12:03:23.000+01:00
diff --git a/hands-on/030_activation_function_complete.ipynb b/hands-on/030_activation_function_complete.ipynb
@@ -36,7 +36,10 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "A sigmoid function has been used as an activation function for neural networks for several decades, and only recently been partly replaced by ReLU.  It is still used quite frequently though."
+    "A sigmoid function has been used as an activation function for neural networks for several decades, and only recently been partly replaced by ReLU.  It is still used quite frequently though.\n",
+    "$$\n",
+    "    \\sigma(x) = \\frac{1}{1 + \\exp(-x)}\n",
+    "$$"
    ]
   },
   {
@@ -98,7 +101,10 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Whereas the output of the sigmoid function is always positive, the hyperbolic tangent is used when negative output values are required."
+    "Whereas the output of the sigmoid function is always positive, the hyperbolic tangent is used when negative output values are required.\n",
+    "$$\n",
+    "    \\sigma(x) = \\tanh x\n",
+    "$$"
    ]
   },
   {
@@ -143,7 +149,17 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "An activation that is used quite often in the context of deep learning is ReLU (Rectified Linear Unit).  It is an approximation for the SoftPlus function, and although it is not differentiable, it is far cheaper computationally."
+    "An activation that is used quite often in the context of deep learning is ReLU (Rectified Linear Unit).  It is an approximation for the SoftPlus function, and although it is not differentiable, it is far cheaper computationally.\n",
+    "\n",
+    "ReLU:\n",
+    "$$\n",
+    "    \\sigma(x) = \\begin{cases} 0 & \\textrm{if} & x < 0 \\\\\n",
+    "                              x & \\textrm{if} & x \\ge 0 \\end{cases}\n",
+    "$$\n",
+    "Softplus:\n",
+    "$$\n",
+    "    \\sigma(x) = \\log(1 + \\exp x)\n",
+    "$$"
    ]
   },
   {
@@ -212,7 +228,10 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The SoftMax function is often used for an output layer that represents categorical data. It will relatively increase high values, decrease low values.  More importantly, for categorical output represented by a one-hot encoding, it will normalize the outputs such that the sum is equal to 1, and they can be interpreted as the proobability of the categories."
+    "The SoftMax function is often used for an output layer that represents categorical data. It will relatively increase high values, decrease low values.  More importantly, for categorical output represented by a one-hot encoding, it will normalize the outputs such that the sum is equal to 1, and they can be interpreted as the proobability of the categories.\n",
+    "$$\n",
+    "    \\sigma(x_i) = \\frac{\\exp x_i}{\\sum_{i=1}^N \\exp x_i}\n",
+    "$$"
    ]
   },
   {
@@ -301,7 +320,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.3"
+   "version": "3.7.6"
   }
  },
  "nbformat": 4,