Skip to content

Commit ae6357b

Browse files
committed
update summary section
1 parent b90a3b9 commit ae6357b

File tree

2 files changed

+10
-6
lines changed

2 files changed

+10
-6
lines changed

examples/generalized_linear_models/GLM-simpsons-paradox.ipynb

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1996,11 +1996,13 @@
19961996
"metadata": {},
19971997
"source": [
19981998
"## Summary\n",
1999-
"Using Simpson's paradox, we've walked through 3 different models. The first is a simple linear regression which treats all the data as coming from one group. We saw that this lead us to believe the regression slope was positive.\n",
1999+
"Using Simpson's paradox, we've walked through 3 different models. The first is a simple linear regression which treats all the data as coming from one group. This amounts to a causal DAG asserting that $x$ causally influences $y$ and $\\text{group}$ was ignored (i.e. assumed to be causally unrelated to $x$ or $y$). We saw that this lead us to believe the regression slope was positive.\n",
20002000
"\n",
2001-
"While that is not necessarily wrong, it is paradoxical when we see that the regression slopes for the data _within_ a group is negative. We saw how to apply separate regressions for data in each group in the second model.\n",
2001+
"While that is not necessarily wrong, it is paradoxical when we see that the regression slopes for the data _within_ a group is negative. \n",
20022002
"\n",
2003-
"The third and final model added a layer to the hierarchy, which captures our knowledge that each of these groups are sampled from an overall population. This added the ability to make inferences not only about the regression parameters at the group level, but also at the population level. The final plot shows our posterior over this population level slope parameter from which we believe the groups are sampled from.\n",
2003+
"This paradox is resolved by updating our causal DAG to include the group variable. This is what we did in the second and third models. Model 2 was an unpooled model where we essentially fit separate regressions for each group.\n",
2004+
"\n",
2005+
"Model 3 assumed the same causal DAG, but adds the knowledge that each of these groups are sampled from an overall population. This added the ability to make inferences not only about the regression parameters at the group level, but also at the population level.\n",
20042006
"\n",
20052007
"If you are interested in learning more, there are a number of other [PyMC examples](http://docs.pymc.io/nb_examples/index.html) covering hierarchical modelling and regression topics."
20062008
]

examples/generalized_linear_models/GLM-simpsons-paradox.myst.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -618,11 +618,13 @@ plt.title("Population level slope parameter");
618618
```
619619

620620
## Summary
621-
Using Simpson's paradox, we've walked through 3 different models. The first is a simple linear regression which treats all the data as coming from one group. We saw that this lead us to believe the regression slope was positive.
621+
Using Simpson's paradox, we've walked through 3 different models. The first is a simple linear regression which treats all the data as coming from one group. This amounts to a causal DAG asserting that $x$ causally influences $y$ and $\text{group}$ was ignored (i.e. assumed to be causally unrelated to $x$ or $y$). We saw that this lead us to believe the regression slope was positive.
622622

623-
While that is not necessarily wrong, it is paradoxical when we see that the regression slopes for the data _within_ a group is negative. We saw how to apply separate regressions for data in each group in the second model.
623+
While that is not necessarily wrong, it is paradoxical when we see that the regression slopes for the data _within_ a group is negative.
624624

625-
The third and final model added a layer to the hierarchy, which captures our knowledge that each of these groups are sampled from an overall population. This added the ability to make inferences not only about the regression parameters at the group level, but also at the population level. The final plot shows our posterior over this population level slope parameter from which we believe the groups are sampled from.
625+
This paradox is resolved by updating our causal DAG to include the group variable. This is what we did in the second and third models. Model 2 was an unpooled model where we essentially fit separate regressions for each group.
626+
627+
Model 3 assumed the same causal DAG, but adds the knowledge that each of these groups are sampled from an overall population. This added the ability to make inferences not only about the regression parameters at the group level, but also at the population level.
626628

627629
If you are interested in learning more, there are a number of other [PyMC examples](http://docs.pymc.io/nb_examples/index.html) covering hierarchical modelling and regression topics.
628630

0 commit comments

Comments
 (0)