Skip to content

Commit 08eff48

Browse files
committed
add a SCM diagram of moderation
1 parent 8162c95 commit 08eff48

File tree

2 files changed

+306
-61
lines changed

2 files changed

+306
-61
lines changed

examples/causal_inference/moderation_analysis.ipynb

Lines changed: 275 additions & 53 deletions
Large diffs are not rendered by default.

examples/causal_inference/moderation_analysis.myst.md

Lines changed: 31 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ kernelspec:
1313
(moderation_analysis)=
1414
# Bayesian moderation analysis
1515

16-
:::{post} March, 2022
17-
:tags: moderation, path analysis,
16+
:::{post} May, 2024
17+
:tags: moderation, path analysis, causal inference
1818
:category: beginner
1919
:author: Benjamin T. Vincent
2020
:::
@@ -27,6 +27,7 @@ Note that this is sometimes mixed up with [mediation analysis](https://en.wikipe
2727

2828
```{code-cell} ipython3
2929
import arviz as az
30+
import daft
3031
import matplotlib.pyplot as plt
3132
import numpy as np
3233
import pandas as pd
@@ -149,11 +150,32 @@ def plot_moderation_effect(result, m, m_quantiles, ax=None):
149150

150151
I've taken inspiration from a blog post {cite:t}`vandenbergSPSS` which examines whether age influences (moderates) the effect of training on muscle percentage. We might speculate that more training results in higher muscle mass, at least for younger people. But it might be the case that the relationship between training and muscle mass changes with age - perhaps training is less effective at increasing muscle mass in older age?
151152

152-
The schematic box and arrow notation often used to represent moderation is shown by an arrow from the moderating variable to the line between a predictor and an outcome variable.
153+
The schematic box and arrow notation often used in the _statistical_ literature to represent moderation is shown by an arrow from the moderating variable to the line between a predictor and an outcome variable.
153154

154155
![](moderation_figure.png)
155156

156-
It can be useful to use consistent notation, so we will define:
157+
+++
158+
159+
It is useful to draw the same diagram out using the visual notation of _structural causal modeling_ (see below). This notation shows that both age and training causally influence muscle mass. The causal relationship also states that muscle mass is a function of both age and training. There is no specific visual notation in the SCM approach to represent moderation. Instead, that would be captured by the functional form of the relationship $f$. Note that the operator $:=$ is similar to the traditional $=$ operator, but it is used to denote a _causal_ or directional relationship rather than just equality.
160+
161+
```{code-cell} ipython3
162+
:tags: [hide-input]
163+
164+
pgm = daft.PGM(dpi=200)
165+
166+
pgm.add_node("x", "training", 0, 0, aspect=2)
167+
pgm.add_node("m", "age", 0, 1, aspect=2)
168+
pgm.add_node("y", "muscle mass", 2, 0.5, aspect=3)
169+
170+
pgm.add_edge("x", "y")
171+
pgm.add_edge("m", "y")
172+
173+
pgm.add_text(-0.25, -0.75, r"muscle mass := $f$(training, age)")
174+
175+
pgm.render();
176+
```
177+
178+
Because we want to focus on the moderation concept and not the specific example. it can be useful to use consistent and more abstract notation, so we will define:
157179
- $x$ as the main predictor variable. In this example it is training.
158180
- $y$ as the outcome variable. In this example it is muscle percentage.
159181
- $m$ as the moderator. In this example it is age.
@@ -231,8 +253,8 @@ ax[2].set(xlabel="muscle percentage, $y$");
231253
```{code-cell} ipython3
232254
def model_factory(x, m, y):
233255
with pm.Model() as model:
234-
x = pm.ConstantData("x", x)
235-
m = pm.ConstantData("m", m)
256+
x = pm.Data("x", x)
257+
m = pm.Data("m", m)
236258
# priors
237259
β0 = pm.Normal("β0", mu=0, sigma=10)
238260
β1 = pm.Normal("β1", mu=0, sigma=10)
@@ -257,7 +279,7 @@ pm.model_to_graphviz(model)
257279

258280
```{code-cell} ipython3
259281
with model:
260-
result = pm.sample(draws=1000, tune=1000, random_seed=42, nuts={"target_accept": 0.9})
282+
result = pm.sample()
261283
```
262284

263285
Visualise the trace to check for convergence.
@@ -280,7 +302,7 @@ az.plot_pair(
280302
marginals=True,
281303
point_estimate="median",
282304
figsize=(12, 12),
283-
scatter_kwargs={"alpha": 0.01},
305+
scatter_kwargs={"alpha": 0.05},
284306
);
285307
```
286308

@@ -363,6 +385,7 @@ But readers are strongly encouraged to read {cite:t}`mcclelland2017multicollinea
363385
- Updated by Benjamin T. Vincent in March 2022
364386
- Updated by Benjamin T. Vincent in February 2023 to run on PyMC v5
365387
- Updated to use `az.extract` by [Benjamin T. Vincent](https://github.com/drbenvincent) in February 2023 ([pymc-examples#522](https://github.com/pymc-devs/pymc-examples/pull/522))
388+
- Updated by [Benjamin T. Vincent](https://github.com/drbenvincent) in May 2024 to incorporate causal concepts
366389

367390
+++
368391

0 commit comments

Comments
 (0)