Skip to content

Commit 4afe596

Browse files
committed
formatting
1 parent 4478531 commit 4afe596

23 files changed

+681
-3392
lines changed

1RHC.Rmd

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -188,7 +188,7 @@ Table 5 in @connors1996effectiveness showed that, after propensity score pair (1
188188
We also conduct propensity score pair matching analysis, as follows.
189189

190190
```{block, type='rmdcomment'}
191-
**Note**: In this workshop, we will not cover Propensity Score Matching (PSM) in this workshop. If you want to learn more about this, feel free to check out this other workshop: [Understanding Propensity Score Matching](https://ehsanx.github.io/psw/).
191+
**Note**: In this workshop, we will not cover Propensity Score Matching (PSM). If you want to learn more about this, feel free to check out this other workshop: [Understanding Propensity Score Matching](https://ehsanx.github.io/psw/) and the [video recording](https://www.youtube.com/watch?v=u4Nl7gnDEAY) on youtube.
192192
```
193193

194194
```{r ps16854, cache=TRUE, echo = TRUE}
@@ -236,6 +236,8 @@ The love plot suggests satisfactory propensity score matching (all SMD < 0.1).
236236

237237
#### PSM results
238238

239+
##### p-value
240+
239241
```{r ps3, cache=TRUE, echo = TRUE}
240242
matched.data <- match.data(match.obj)
241243
tab1y <- CreateTableOne(vars = c("Y"),
@@ -246,9 +248,11 @@ print(tab1y, showAllLevels = FALSE,
246248
```
247249

248250
```{block, type='rmdcomment'}
249-
Hence, our conclusion based on propensity score pair matched data ($p \lt 0.001$) is different than Table 5 in @connors1996effectiveness ($p = 0.14$). Variability in results for 1-to-1 matching is possible, and modelling choices may be different (we used caliper option here).
251+
Our conclusion based on propensity score pair matched data ($p \lt 0.001$) is different than Table 5 in @connors1996effectiveness ($p = 0.14$). Variability in results for 1-to-1 matching is possible, and modelling choices may be different (we used caliper option here).
250252
```
251253

254+
##### Treatment effect
255+
252256
- We can also estimate the effect of `RHC` on `length of stay` using propensity score-matched sample:
253257

254258
```{r ps12ryy, cache=TRUE, echo = TRUE}

2gcomp.Rmd

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -345,6 +345,11 @@ out.formula <- as.formula(paste("Y~ A +",
345345
collapse = "+")))
346346
fit1 <- lm(out.formula, data = ObsData)
347347
```
348+
349+
```{block, type='rmdcomment'}
350+
$Q(A,L)$ is often used to represent the predictions from the G-comp model.
351+
```
352+
348353
#### Step 2
349354

350355
Extract outcome prediction for treated $\hat{Y}_{A=1}$ by setting all $A=1$|

2gcomp2.Rmd

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -289,7 +289,7 @@ Notice that the mean is very similar to the parametric G-computation method.
289289
## G-comp using SuperLearner
290290

291291
```{block, type='rmdcomment'}
292-
SuperLearner is an ensemble MLtechnique, that uses **cross-validation** to find a weighted combination of estimates provided by different **candidate learners** (that help predict).
292+
SuperLearner is an ensemble ML technique, that uses **cross-validation** to find a weighted combination of estimates provided by different **candidate learners** (that help predict).
293293
```
294294

295295
- There exists many candidate learners. Here we are using a combination of
@@ -446,13 +446,16 @@ scaled.coefs <- abs(coefs)/sum(abs(coefs))
446446
scaled.coefs
447447
```
448448

449-
450449
Scaled coefs
451450

452451
```{r ML12stestcoef, cache=cachex, echo = TRUE}
453452
fit.sl$coef
454453
```
455454

455+
```{r ML12stestcoef2b, cache=cachex, echo = TRUE}
456+
sum(fit.sl$coef)
457+
```
458+
456459
Hence, in creating superlearner prediction column,
457460

458461
a. Linear regression has no contribution

3ipw.Rmd

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
# IPTW
22

3+
In this chapter, we will cover PS and IPTW (or IPW).
4+
35
```{block, type='rmdcomment'}
4-
In this chapter, we are primarily interested about **exposure modelling** (e.g., fixing imbalance first, before doing outcome analysis).
6+
We are now primarily interested about **exposure modelling** (e.g., fixing imbalance first, before doing outcome analysis).
57
```
68

79
```{r setup01i, include=FALSE}
@@ -66,7 +68,10 @@ require(Publish)
6668
publish(PS.fit, format = "[u;l]")
6769
```
6870

69-
- Coef of PS model fit is not of concern
71+
```{block, type='rmdcomment'}
72+
Coef of PS model fit is not of concern.
73+
```
74+
7075
- Model can be rich: to the extent that prediction is better
7176
- But look for multi-collinearity issues
7277
- SE too high?
@@ -77,6 +82,10 @@ Obtain the propesnity score (PS) values from the fit
7782
ObsData$PS <- predict(PS.fit, type="response")
7883
```
7984

85+
```{block, type='rmdcomment'}
86+
These propensity score predictions (`PS`) are often represented as $g(A_i=1|L_i)$.
87+
```
88+
8089
Check summaries:
8190

8291
- enough overlap?
@@ -101,14 +110,18 @@ Convert $PS$ to $IPW$ = $\frac{A}{PS} + \frac{1-A}{1-PS}$
101110
```
102111

103112
- Convert PS to IPW using the formula. We are using the formula for average treatment effect (ATE).
104-
- It is possible to use alternative formulas, but we are using ATE formula for our illustration.
113+
114+
```{block, type='rmdcomment'}
115+
It is possible to use alternative formulas, but we are using ATE formula for our illustration.
116+
```
117+
105118

106119
```{r psx2c, cache=TRUE, echo = TRUE}
107120
ObsData$IPW <- ObsData$A/ObsData$PS + (1-ObsData$A)/(1-ObsData$PS)
108121
summary(ObsData$IPW)
109122
```
110123

111-
Also possible to use pre-packged software packages to do the same:
124+
Also possible to use pre-packaged software packages to do the same:
112125

113126
```{r psx2c2, cache=TRUE, echo = TRUE}
114127
require(WeightIt)

3ipw2.Rmd

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,12 @@ This is the exposure model that we decided on:
4747
ps.formula
4848
```
4949

50-
Fit SuperLearner to estimate propensity scores. We again use the same candidate learners:
50+
```{block, type='rmdcomment'}
51+
Fit SuperLearner (SL) to estimate propensity scores.
52+
```
53+
54+
55+
We again use the same candidate learners:
5156

5257
- linear model
5358
- LASSO

5software.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -399,7 +399,7 @@ kable(results,digits = 2)%>%
399399
```
400400

401401
```{block, type='rmdcomment'}
402-
@keele2021comparing used superlearner based on an ensemble of 3 different learners: (1) GLM, (2) random forests, and (3) LASSO.
402+
@keele2021comparing used TMLE-SL based on an ensemble of 3 different learners: (1) GLM, (2) random forests, and (3) LASSO.
403403
```
404404

405405
## Other packages

6final.Rmd

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,11 @@ knitr::include_graphics("images/dagpred.png")
8181
- Assuming all covariates are measured, **parametric models** such as linear and logistic regressions are very efficient, but relies on strong assumptions. In real-world scenarios, it is often hard (if not impossible) to guess the correct specification of the right hand side of the regression equation.
8282
- Machine learning (ML) methods are very helpful for prediction goals. They are also helpful in **identifying complex functions** (non-linearities and non-additive terms) of the covariates (again, assuming they are measured).
8383
- There are many ML methods, but the procedures are very different, and they come with their own advantages and disadvantages. In a given real data, it is **hard to apriori predict which is the best ML algorithm** for a given problem.
84-
- That's where super learner is helpful in **combining strength from various algorithms**, and producing 1 prediction column that has **optimal statistical properties**.
84+
85+
86+
```{block, type='rmdcomment'}
87+
Super learner is helpful in **combining strength from various algorithms**, and producing 1 prediction column that has **optimal statistical properties**.
88+
```
8589

8690
### Causal inference
8791

@@ -122,8 +126,12 @@ knitr::include_graphics("images/dagci.png")
122126
```
123127

124128
- For causal inference goals (when we have a primary exposure of interest), machine learning methods are often misleading. This is primarily due to the fact that they usually do not have an inherent mechanism of focusing on **primary exposure** (RHC in this example); and treats the primary exposure as any other predictors.
125-
- When using g-computation with ML methods, estimation of variance becomes a difficult problem. Generalized procedures such as **robust SE or bootstrap methods** are not supported by theory.
126-
- That's where TMLE methods shine, with the help of it's important **statistical properties (double robustness, finite sample properties)**.
129+
- When using g-computation with ML methods, estimation of variance becomes a difficult problem (with correct coverage). Generalized procedures such as **robust SE or bootstrap methods** are not supported by theory.
130+
131+
132+
```{block, type='rmdcomment'}
133+
TMLE method shine, with the help of it's important **statistical properties (double robustness, finite sample properties)**.
134+
```
127135

128136
### Identifiability assumptions
129137

0 commit comments

Comments
 (0)