Add Wilkinson notation for each model

drbenvincent · drbenvincent · commit 9a7729c88de0 · 2024-09-01T11:09:07.000+01:00
diff --git a/examples/generalized_linear_models/GLM-simpsons-paradox.ipynb b/examples/generalized_linear_models/GLM-simpsons-paradox.ipynb
@@ -368,6 +368,18 @@
     "$$"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    ":::{info}\n",
+    "We can also express Model 1 in Wilkinson notation as `y ~ 1 + x` which is equivalent to `y ~ x` as the intercept is included by default.\n",
+    "\n",
+    "* The `1` term corresponds to the intercept term $\\beta_0$.\n",
+    "* The `x` term corresponds to the slope term $\\beta_1$.\n",
+    ":::"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -918,6 +930,18 @@
     "Where $g_i$ is the group index for observation $i$. So the parameters $\\beta_0$ and $\\beta_1$ are now length $g$ vectors, not scalars. And the $[g_i]$ acts as an index to look up the group for the $i^\\text{th}$ observation."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    ":::{info}\n",
+    "We can also express this Model 2 in Wilkinson notation as `y ~ g + x:g`.\n",
+    "\n",
+    "* The `g` term captures the group specific intercept $\\beta_0[g_i]$ parameters.\n",
+    "* The `x:g` term captures group specific slope $\\beta_1[g_i]$ parameters.\n",
+    ":::"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -1423,6 +1447,21 @@
     "This model could also be called a partial pooling model. "
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    ":::{info}\n",
+    "We can also express this Model 3 in Wilkinson notation as `1 + x + (1 + x | g)`.\n",
+    "\n",
+    "* The `1` captures the global intercept, $\\mathrm{Normal}(p_{0\\mu}, p_{0\\sigma})$.\n",
+    "* The `x` captures the global slope, $\\mathrm{Normal}(p_{1\\mu}, p_{1\\sigma})$.\n",
+    "* The `(1 + x | g)` term captures group specific random effects for the intercept and slope.\n",
+    "  * `1 | g` captures the group specific intercept $\\vec{\\beta_0}[g_i]$ parameters.\n",
+    "  * `x | g` captures the group specific slope $\\vec{\\beta_1}[g_i]$ parameters.\n",
+    ":::"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -1478,6 +1517,30 @@
     "    pm.Normal(\"y\", mu=μ, sigma=sigma, observed=data.y, dims=\"obs_id\")"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with pm.Model(coords=coords) as model3:\n",
+    "    # Define priors\n",
+    "    intercept_mu = pm.Normal(\"intercept_mu\", 0, 1)\n",
+    "    slope_mu = pm.Normal(\"slope_mu\", 0, 1)\n",
+    "    intercept_sigma = pm.Gamma(\"intercept_sigma\", 2, 2)\n",
+    "    slope_sigma = pm.Gamma(\"slope_sigma\", 2, 2)\n",
+    "    sigma = pm.Gamma(\"sigma\", 2, 2)\n",
+    "    β0 = pm.Normal(\"β0\", intercept_mu, intercept_sigma, dims=\"group\")\n",
+    "    β1 = pm.Normal(\"β1\", slope_mu, slope_sigma, dims=\"group\")\n",
+    "    # Data\n",
+    "    x = pm.Data(\"x\", data.x, dims=\"obs_id\")\n",
+    "    g = pm.Data(\"g\", data.group_idx, dims=\"obs_id\")\n",
+    "    # Linear model\n",
+    "    μ = pm.Deterministic(\"μ\", β0[g] + β1[g] * x, dims=\"obs_id\")\n",
+    "    # Define likelihood\n",
+    "    pm.Normal(\"y\", mu=μ, sigma=sigma, observed=data.y, dims=\"obs_id\")"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
diff --git a/examples/generalized_linear_models/GLM-simpsons-paradox.myst.md b/examples/generalized_linear_models/GLM-simpsons-paradox.myst.md
@@ -119,6 +119,15 @@ $$
 
 +++
 
+:::{info}
+We can also express Model 1 in Wilkinson notation as `y ~ 1 + x` which is equivalent to `y ~ x` as the intercept is included by default.
+
+* The `1` term corresponds to the intercept term $\beta_0$.
+* The `x` term corresponds to the slope term $\beta_1$.
+:::
+
++++
+
 ### Build model
 
 ```{code-cell} ipython3
@@ -274,6 +283,15 @@ Where $g_i$ is the group index for observation $i$. So the parameters $\beta_0$
 
 +++
 
+:::{info}
+We can also express this Model 2 in Wilkinson notation as `y ~ g + x:g`.
+
+* The `g` term captures the group specific intercept $\beta_0[g_i]$ parameters.
+* The `x:g` term captures group specific slope $\beta_1[g_i]$ parameters.
+:::
+
++++
+
 ### Build model
 
 ```{code-cell} ipython3
@@ -432,6 +450,18 @@ This model could also be called a partial pooling model.
 
 +++
 
+:::{info}
+We can also express this Model 3 in Wilkinson notation as `1 + x + (1 + x | g)`.
+
+* The `1` captures the global intercept, $\mathrm{Normal}(p_{0\mu}, p_{0\sigma})$.
+* The `x` captures the global slope, $\mathrm{Normal}(p_{1\mu}, p_{1\sigma})$.
+* The `(1 + x | g)` term captures group specific random effects for the intercept and slope.
+  * `1 | g` captures the group specific intercept $\vec{\beta_0}[g_i]$ parameters.
+  * `x | g` captures the group specific slope $\vec{\beta_1}[g_i]$ parameters.
+:::
+
++++
+
 :::{note}
 The hierarchical model we are considering contains a simplification in that the population level slope and intercept are assumed to be independent. It is possible to relax this assumption and model any correlation between these parameters by using a multivariate normal distribution.
 
@@ -474,6 +504,25 @@ with pm.Model(coords=coords) as model3:
     pm.Normal("y", mu=μ, sigma=sigma, observed=data.y, dims="obs_id")
 ```
 
+```{code-cell} ipython3
+with pm.Model(coords=coords) as model3:
+    # Define priors
+    intercept_mu = pm.Normal("intercept_mu", 0, 1)
+    slope_mu = pm.Normal("slope_mu", 0, 1)
+    intercept_sigma = pm.Gamma("intercept_sigma", 2, 2)
+    slope_sigma = pm.Gamma("slope_sigma", 2, 2)
+    sigma = pm.Gamma("sigma", 2, 2)
+    β0 = pm.Normal("β0", intercept_mu, intercept_sigma, dims="group")
+    β1 = pm.Normal("β1", slope_mu, slope_sigma, dims="group")
+    # Data
+    x = pm.Data("x", data.x, dims="obs_id")
+    g = pm.Data("g", data.group_idx, dims="obs_id")
+    # Linear model
+    μ = pm.Deterministic("μ", β0[g] + β1[g] * x, dims="obs_id")
+    # Define likelihood
+    pm.Normal("y", mu=μ, sigma=sigma, observed=data.y, dims="obs_id")
+```
+
 Plotting the DAG now makes it clear that the group-level intercept and slope parameters are drawn from a population level distributions. That is, we have hyper-priors for the slopes and intercept parameters. This particular model does not have a hyper-prior for the measurement error - this is just left as one parameter per group, as in the previous model.
 
 ```{code-cell} ipython3