You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/generalized_linear_models/GLM-simpsons-paradox.myst.md
+50-20Lines changed: 50 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,7 +21,7 @@ kernelspec:
21
21
22
22
+++
23
23
24
-
[Simpson's Paradox](https://en.wikipedia.org/wiki/Simpson%27s_paradox) describes a situation where there might be a negative relationship between two variables within a group, but when data from multiple groups are combined, that relationship may disappear or even reverse sign. The gif below (from the [Simpson's Paradox](https://en.wikipedia.org/wiki/Simpson%27s_paradox) Wikipedia page) demonstrates this very nicely.
24
+
[Simpson's Paradox](https://en.wikipedia.org/wiki/Simpson%27s_paradox) describes a situation where there might be a negative relationship between two variables within a group, but when data from multiple groups are combined, that relationship may disappear or even reverse sign. The gif below (from the Simpson's Paradox[Wikipedia](https://en.wikipedia.org/wiki/Simpson%27s_paradox) page) demonstrates this very nicely.
This data generation was influenced by this [stackexchange](https://stats.stackexchange.com/questions/479201/understanding-simpsons-paradox-with-random-effects) question.
First we'll define a handy predict function which will do out of sample predictions for us. This will be handy when it comes to visualising the model fit.
151
+
First we'll define a handy predict function which will do out of sample predictions for us. This will be handy when it comes to visualising the model fits.
@@ -234,9 +234,14 @@ The plot on the right shows out posterior beliefs in **parameter space**.
234
234
235
235
+++
236
236
237
-
One of the clear things about this analysis is that we have credible evidence that $x$ and $y$ are _positively_ correlated. We can see this from the posterior over the slope (see right hand panel in the figure above).
237
+
One of the clear things about this analysis is that we have credible evidence that $x$ and $y$ are _positively_ correlated. We can see this from the posterior over the slope (see right hand panel in the figure above) which we isolate in the plot below.
Where $g_i$ is the group index for observation $i$. So the parameters $\beta_0$ and $\beta_1$ are now length $g$ vectors, not scalars. And the $[g_i]$ acts as an index to look up the group for the $i^{\th}$ observation.
273
+
Where $g_i$ is the group index for observation $i$. So the parameters $\beta_0$ and $\beta_1$ are now length $g$ vectors, not scalars. And the $[g_i]$ acts as an index to look up the group for the $i^\text{th}$ observation.
274
+
275
+
+++
276
+
277
+
### Build model
269
278
270
279
```{code-cell} ipython3
271
280
coords = {"group": group_list}
@@ -284,12 +293,14 @@ with pm.Model(coords=coords) as model2:
# Generate the group indices array g and cast it to integers
312
325
g = np.concatenate([[i] * n_points for i in range(n_groups)]).astype(int)
313
326
predict_at = {"x": xi, "g": g}
314
327
```
315
328
316
329
```{code-cell} ipython3
330
+
:tags: [hide-output]
331
+
317
332
idata2 = predict(
318
333
model=model2,
319
334
idata=idata2,
@@ -382,7 +397,12 @@ plot(idata2);
382
397
383
398
In contrast to plain regression model (Model 1), when we model on the group level we can see that now the evidence points toward _negative_ relationships between $x$ and $y$.
@@ -418,6 +438,10 @@ The hierarchical model we are considering contains a simplification in that the
418
438
In one sense this move from Model 2 to Model 3 can be seen as adding parameters, and therefore increasing model complexity. However, in another sense, adding this knowledge about the nested structure of the data actually provides a constraint over parameter space.
419
439
:::
420
440
441
+
+++
442
+
443
+
### Build model
444
+
421
445
```{code-cell} ipython3
422
446
non_centered = False
423
447
@@ -458,6 +482,10 @@ pm.model_to_graphviz(model3)
458
482
459
483
The nodes `pop_intercept` and `pop_slope` represent the population-level intercept and slope parameters. While the 5 $\beta_0$ and $\beta_1$ nodes represent intercepts and slopes for each of the 5 observed groups (respectively), the `pop_intercept` and `pop_slope` represent what we can infer about the population-level intercept and slope. Equivalently, we could say they represent our beliefs about an as yet unobserved group.
# Generate the group indices array g and cast it to integers
486
518
g = np.concatenate([[i] * n_points for i in range(n_groups)]).astype(int)
487
519
predict_at = {"x": xi, "g": g}
488
520
@@ -504,13 +536,11 @@ sns.kdeplot(
504
536
y=az.extract(idata3, var_names="pop_intercept"),
505
537
thresh=0.1,
506
538
levels=5,
539
+
color="k",
507
540
ax=ax[2],
508
541
)
509
542
510
-
ax[2].set(
511
-
xlim=[-2, 1],
512
-
ylim=[-5, 5],
513
-
)
543
+
ax[2].set(xlim=[-2, 1], ylim=[-5, 5]);
514
544
```
515
545
516
546
The panel on the right shows the posterior group level posterior of the slope and intercept parameters as a contour plot. We can also just plot the marginal distribution below to see how much belief we have in the slope being less than zero.
@@ -540,7 +570,7 @@ If you are interested in learning more, there are a number of other [PyMC exampl
540
570
* Updated by [Benjamin T. Vincent](https://github.com/drbenvincent) in April 2022
541
571
* Updated by [Benjamin T. Vincent](https://github.com/drbenvincent) in February 2023 to run on PyMC v5
542
572
* Updated to use `az.extract` by [Benjamin T. Vincent](https://github.com/drbenvincent) in February 2023 ([pymc-examples#522](https://github.com/pymc-devs/pymc-examples/pull/522))
543
-
* Updated by [Benjamin T. Vincent](https://github.com/drbenvincent) in September 2024
573
+
* Updated by [Benjamin T. Vincent](https://github.com/drbenvincent) in September 2024 ([pymc-examples#697](https://github.com/pymc-devs/pymc-examples/pull/697))
0 commit comments