pymc-devs
diff --git a/‎examples/causal_inference/moderation_analysis.ipynb
Lines changed: 256 additions & 106 deletions b/‎examples/causal_inference/moderation_analysis.ipynb
Lines changed: 256 additions & 106 deletions
diff --git a/‎examples/causal_inference/moderation_analysis.myst.md
Lines changed: 106 additions & 18 deletions b/‎examples/causal_inference/moderation_analysis.myst.md
Lines changed: 106 additions & 18 deletions
@@ -23,7 +23,16 @@ This notebook covers Bayesian [moderation analysis](https://en.wikipedia.org/wik
 
 This is not intended as a one-stop solution to a wide variety of data analysis problems, rather, it is intended as an educational exposition to show how moderation analysis works and how to conduct Bayesian parameter estimation in PyMC.
 
+Moderation analysis has been approached from a variety of approaches:
+* Statistical approaches: It is entirely possible to approach moderation analysis from a purely statistical perspective. In this approach we might build a linear model (for example) whose aim is purely to _describe_ the data we have while making no claims about causality.
+* Path analysis: This approach asserts that the variables in the model are causally related and is exemplified in {cite:t}`hayes2017introduction`, for example. This approach cannot be considered as 'fully causal' as it lacks a variety of the concepts present in the causal approach.
+* Causal inference: This approach builds upon the path analysis approach in that there is a claim of causal relationships between the variables. But it goes further in that there are additional causal concepts which can be brought to bear.
+
++++
+
+:::{attention}
 Note that this is sometimes mixed up with [mediation analysis](https://en.wikipedia.org/wiki/Mediation_(statistics)). Mediation analysis is appropriate when we believe the effect of a predictor variable upon an outcome variable is (partially, or fully) mediated through a 3rd mediating variable. Readers are referred to the textbook by {cite:t}`hayes2017introduction` as a comprehensive (albeit Frequentist) guide to moderation and related models as well as the PyMC example {ref}`mediation_analysis`.
+:::
 
 ```{code-cell} ipython3
 import arviz as az
@@ -146,17 +155,66 @@ def plot_moderation_effect(result, m, m_quantiles, ax=None):
     )
 ```
 
-# Does the effect of training upon muscularity decrease with age?
+## Does the effect of training upon muscularity decrease with age?
 
 I've taken inspiration from a blog post {cite:t}`vandenbergSPSS` which examines whether age influences (moderates) the effect of training on muscle percentage. We might speculate that more training results in higher muscle mass, at least for younger people. But it might be the case that the relationship between training and muscle mass changes with age - perhaps training is less effective at increasing muscle mass in older age?
 
-The schematic box and arrow notation often used in the _statistical_ literature to represent moderation is shown by an arrow from the moderating variable to the line between a predictor and an outcome variable.
+Let's see how we can visualize this in 3 different ways.
+
++++
+
+### Statistical diagram
+
+In this approach we might model the outcome variable (muscle mass) as a function of the predictor variables. In this case they would be age, training, and the interaction term between age and training. This is a purely statistical approach and does not make any claims about causality or the direction of the relationships.
+
+```{code-cell} ipython3
+:tags: [hide-input]
+
+pgm = daft.PGM(dpi=200)
+
+pgm.add_node("i", "1", 0, 2, aspect=3)
+pgm.add_node("t", "training", 0, 1, aspect=3)
+pgm.add_node("a", "age", 0, 0, aspect=3)
+pgm.add_node("m", r"age $\times$ training", 0, -1, aspect=3)
+pgm.add_node("y", "muscle mass", 4, 0.5, aspect=3)
+pgm.add_node("e", r"$\epsilon$", 4, 1.5, fixed=True)
+
+pgm.add_edge("i", "y", label=r"$\beta_0$")
+pgm.add_edge("t", "y", label=r"$\beta_1$")
+pgm.add_edge("a", "y", label=r"$\beta_2$")
+pgm.add_edge("m", "y", label=r"$\beta_3$")
+pgm.add_edge("e", "y")
+
+
+pgm.render();
+```
+
+This diagram makes it explicit that the moderation effect is the interaction term between age and training. We'll come back to why this is the case below.
+
++++
+
+We could also write this in the form of an equation:
+
+$$
+\text{muscle mass} = \beta_0 + \beta_1 \times \text{training} + \beta_2 \times \text{age} + \beta_3 \times \text{training} \times \text{age} + \epsilon.
+$$
+
++++
+
+### Conceptual or path diagram
+We can also draw moderation in a mode conceptual manner. This is perhaps visually simpler and easier to parse, but is less explicit. The moderation is shown by an arrow from the moderating variable to the line between a predictor and an outcome variable.
+
+But the diagram would represent the exact same equation as shown above.
 
 ![](moderation_figure.png)
 
 +++
 
-It is useful to draw the same diagram out using the visual notation of _structural causal modeling_ (see below). This notation shows that both age and training causally influence muscle mass. The causal relationship also states that muscle mass is a function of both age and training. There is no specific visual notation in the SCM approach to represent moderation. Instead, that would be captured by the functional form of the relationship $f$. Note that the operator $:=$ is similar to the traditional $=$ operator, but it is used to denote a _causal_ or directional relationship rather than just equality.
+### Causal diagram
+
++++
+
+Finally, we could draw the same diagram from the perspective of _structural causal modeling_. This notation shows that both age and training causally influence muscle mass. There is no specific visual notation to represent moderation in this approach. Instead, that would be captured by the functional form of the relationship $f$.
 
 ```{code-cell} ipython3
 :tags: [hide-input]
@@ -175,41 +233,71 @@ pgm.add_text(-0.25, -0.75, r"muscle mass := $f$(training, age)")
 pgm.render();
 ```
 
+Note that the operator $:=$ is similar to the traditional $=$ operator, but it is used to denote a _causal_ or directional relationship rather than just equality.
+
+And we could, if we wanted to assume linearity, model this just as above:
+$$
+\text{muscle mass} := \beta_0 + \beta_1 \times \text{training} + \beta_2 \times \text{age} + \beta_3 \times \text{training} \times \text{age} + \epsilon.
+$$
+
++++
+
+## The moderation model
+
 Because we want to focus on the moderation concept and not the specific example it can be useful to use consistent and more abstract notation, so we will define:
 - $x$ as the main predictor variable. In this example it is training.
 - $y$ as the outcome variable. In this example it is muscle percentage.
 - $m$ as the moderator. In this example it is age.
 
-## The moderation model
++++
+
+### Why is the interaction term the moderation effect?
+We can see that the mean $y$ is simply a multiple linear regression with an interaction term between the two predictors, $x$ and $m$. 
 
-While the visual schematic (above) is a useful shorthand to understand complex models when you already know what moderation is, you can't derive it from the diagram alone. So let us formally specify the moderation model - it defines an outcome variable $y$ as:
+We can get some insight into why this is the case by thinking about this as a multiple linear regression with $x$ and $m$ as predictor variables, but where the value of $m$ influences the relationship between $x$ and $y$. This is achieved by making the regression coefficient for $x$ is a function of $m$:
 
 $$
-y \sim \mathrm{Normal}(\beta_0 + \beta_1 \cdot x + \beta_2 \cdot x \cdot m + \beta_3 \cdot m, \sigma^2)
+y \sim \beta_0 + f(m) \cdot x + \beta_3 \cdot m
 $$
 
-where $y$, $x$, and $m$ are your observed data, and the following are the model parameters:
-- $\beta_0$ is the intercept, its value does not have that much importance in the interpretation of this model.
-- $\beta_1$ is the rate at which $y$ (muscle percentage) increases per unit of $x$ (training hours). 
-- $\beta_2$ is the coefficient for the interaction term $x \cdot m$.
-- $\beta_3$ is the rate at which $y$ (muscle percentage) increases per unit of $m$ (age). 
-- $\sigma$ is the standard deviation of the observation noise.
+and if we define that as a linear function, $f(m) = \beta_1 + \beta_2 \cdot m$, we get
 
-We can see that the mean $y$ is simply a multiple linear regression with an interaction term between the two predictors, $x$ and $m$. 
+$$
+y \sim \beta_0 + (\beta_1 + \beta_2 \cdot m) \cdot x + \beta_3 \cdot m
+$$
 
-We can get some insight into why this is the case by thinking about this as a multiple linear regression with $x$ and $m$ as predictor variables, but where the value of $m$ influences the relationship between $x$ and $y$. This is achieved by making the regression coefficient for $x$ is a function of $m$:
+which multiplies out to
 
 $$
-y \sim \mathrm{Normal}(\beta_0 + f(m) \cdot x + \beta_3 \cdot m, \sigma^2)
+y \sim \beta_0 + \beta_1 \cdot x + \beta_2 \cdot x \cdot m + \beta_3 \cdot m
 $$
 
-and if we define that as a linear function, $f(m) = \beta_1 + \beta_2 \cdot m$, we get
+:::{note}
+We can use $f(m) = \beta_1 + \beta_2 \cdot m$ later to visualise the moderation effect.
+:::
+
++++
+
+### Specifying a Bayesian moderation model
+
+Ok, so let's start to define our moderation model in a Bayesian manner. For this example we will treat the outcome variable as normally distributed around the mean.
 
 $$
-y \sim \mathrm{Normal}(\beta_0 + (\beta_1 + \beta_2 \cdot m) \cdot x + \beta_3 \cdot m, \sigma^2)
+\begin{aligned}
+\beta_0, \ldots, \beta_3 & \sim \text{Normal}(0, 10)\\
+\sigma & \sim \text{HalfCauchy}(1)\\
+\mu &\sim \beta_0 + \beta_1 \cdot x + \beta_2 \cdot x \cdot m + \beta_3 \cdot m\\
+y   &\sim \mathrm{Normal}(\mu, \sigma^2)
+\end{aligned}
 $$
 
-We can use $f(m) = \beta_1 + \beta_2 \cdot m$ later to visualise the moderation effect.
+where $y$, $x$, and $m$ are your observed data, $\mu$ is the expected outcome value, and the following are the model parameters for which we place priors upon:
+- $\beta_0$ is the intercept, its value does not have that much importance in the interpretation of this model.
+- $\beta_1$ is the rate at which $y$ (muscle percentage) increases per unit of $x$ (training hours). 
+- $\beta_2$ is the coefficient for the interaction term $x \cdot m$.
+- $\beta_3$ is the rate at which $y$ (muscle percentage) increases per unit of $m$ (age). 
+- $\sigma$ is the standard deviation of the observation noise.
+
 
 +++