Skip to content

Commit 068f690

Browse files
committed
Update documentation
1 parent dfb3d2c commit 068f690

20 files changed

+83
-55
lines changed

03-Stats-Review-The-Most-Dangerous-Equation.html

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -762,9 +762,10 @@ <h2>The Standard Error of Our Estimates<a class="headerlink" href="#the-standard
762762
<p>Since this is just a review of statistics, I’ll take the liberty to go a bit faster now. If you are not familiar with distributions, variance, and standard errors, please read on, but keep in mind that you might need some additional resources. I suggest you google any MIT course on introduction to statistics. They are usually quite good.</p>
763763
<p>In the previous section, we estimated the average treatment effect <span class="math notranslate nohighlight">\(E[Y_1-Y_0]\)</span> as the difference in the means between the treated and the untreated <span class="math notranslate nohighlight">\(E[Y|T=1]-E[Y|T=0]\)</span>. We figured out the <span class="math notranslate nohighlight">\(ATE\)</span> for online classes as our motivating example. We also saw a negative impact; online classes made students perform about 5 points worse than the students with face-to-face classes. Now, we get to see if this impact is statistically significant.</p>
764764
<p>To do so, we need to estimate the <span class="math notranslate nohighlight">\(SE\)</span>. We already have <span class="math notranslate nohighlight">\(n\)</span>, our sample size. To get the estimate for the standard deviation, we can do the following</p>
765-
<p><span class="math notranslate nohighlight">\(
766-
\hat{\sigma}^2=\sqrt{\frac{1}{N-1}\sum_{i=0}^N (x-\bar{x})^2}
767-
\)</span></p>
765+
<div class="math notranslate nohighlight">
766+
\[
767+
\hat{\sigma}=\sqrt{\frac{1}{N-1}\sum_{i=1}^N (x_i-\bar{x})^2}
768+
\]</div>
768769
<p>where <span class="math notranslate nohighlight">\(\bar{x}\)</span> is the mean of <span class="math notranslate nohighlight">\(x\)</span>. Fortunately for us, most programming software already implements this. In Pandas, we can use the method <a class="reference external" href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.std.html">std</a>.</p>
769770
<div class="cell docutils container">
770771
<div class="cell_input docutils container">

16-Regression-Discontinuity-Design.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -794,7 +794,7 @@ <h2>RDD Estimation<a class="headerlink" href="#rdd-estimation" title="Permalink
794794
</tr>
795795
</table></div></div>
796796
</div>
797-
<p>This model is telling us that mortality increases by 7.6627 points with the consumption of alcohol. Another way of putting this is that alcohol increases the chance of death by all causes by 8% ((7.6627+93.6184)/93.6184). Notice that this also gives us standard errors for our causal effect estimate. In this case, the effect is statistically significant, since the p-value is below 0.01.</p>
797+
<p>This model is telling us that mortality increases by 7.6627 points with the consumption of alcohol. Another way of putting this is that alcohol increases the chance of death by all causes by 8% (100*((7.6627+93.6184)/93.6184 - 1)). Notice that this also gives us standard errors for our causal effect estimate. In this case, the effect is statistically significant, since the p-value is below 0.01.</p>
798798
<p>If we want to verify this model visually, we can show the predicted values on the data that we have. You can see that it is as though we had 2 regression models: one for those above the threshold and one for below it.</p>
799799
<div class="cell docutils container">
800800
<div class="cell_input docutils container">

19-Evaluating-Causal-Models.html

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -730,7 +730,7 @@ <h1>19 - Evaluating Causal Models<a class="headerlink" href="#evaluating-causal-
730730
</div>
731731
<div class="cell_output docutils container">
732732
<div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>Train Score: 0.9251704824568053
733-
Test Score: 0.7711074163447711
733+
Test Score: 0.7711074163447716
734734
</pre></div>
735735
</div>
736736
</div>
@@ -751,7 +751,7 @@ <h1>19 - Evaluating Causal Models<a class="headerlink" href="#evaluating-causal-
751751
<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">123</span><span class="p">)</span>
752752
<span class="n">prices_rnd_pred</span> <span class="o">=</span> <span class="n">prices_rnd</span><span class="o">.</span><span class="n">assign</span><span class="p">(</span><span class="o">**</span><span class="p">{</span>
753753
<span class="s2">&quot;elast_m_pred&quot;</span><span class="p">:</span> <span class="n">predict_elast</span><span class="p">(</span><span class="n">m1</span><span class="p">,</span> <span class="n">prices_rnd</span><span class="p">),</span> <span class="c1">## elasticity model</span>
754-
<span class="s2">&quot;pred_m_pred&quot;</span><span class="p">:</span> <span class="n">m1</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">prices_rnd</span><span class="p">[</span><span class="n">X</span><span class="p">]),</span> <span class="c1">## predictive model</span>
754+
<span class="s2">&quot;pred_m_pred&quot;</span><span class="p">:</span> <span class="n">m2</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">prices_rnd</span><span class="p">[</span><span class="n">X</span><span class="p">]),</span> <span class="c1">## predictive model</span>
755755
<span class="s2">&quot;rand_m_pred&quot;</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="n">prices_rnd</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="c1">## random model</span>
756756
<span class="p">})</span>
757757

@@ -797,7 +797,7 @@ <h1>19 - Evaluating Causal Models<a class="headerlink" href="#evaluating-causal-
797797
<td>7</td>
798798
<td>230</td>
799799
<td>-13.096964</td>
800-
<td>220.922251</td>
800+
<td>224.067406</td>
801801
<td>0.696469</td>
802802
</tr>
803803
<tr>
@@ -808,7 +808,7 @@ <h1>19 - Evaluating Causal Models<a class="headerlink" href="#evaluating-causal-
808808
<td>4</td>
809809
<td>190</td>
810810
<td>1.054695</td>
811-
<td>186.241407</td>
811+
<td>189.889147</td>
812812
<td>0.286139</td>
813813
</tr>
814814
<tr>
@@ -819,7 +819,7 @@ <h1>19 - Evaluating Causal Models<a class="headerlink" href="#evaluating-causal-
819819
<td>5</td>
820820
<td>237</td>
821821
<td>-17.362642</td>
822-
<td>269.232654</td>
822+
<td>237.255157</td>
823823
<td>0.226851</td>
824824
</tr>
825825
<tr>
@@ -830,7 +830,7 @@ <h1>19 - Evaluating Causal Models<a class="headerlink" href="#evaluating-causal-
830830
<td>5</td>
831831
<td>193</td>
832832
<td>0.564985</td>
833-
<td>188.229552</td>
833+
<td>186.688619</td>
834834
<td>0.551315</td>
835835
</tr>
836836
<tr>
@@ -841,7 +841,7 @@ <h1>19 - Evaluating Causal Models<a class="headerlink" href="#evaluating-causal-
841841
<td>3</td>
842842
<td>252</td>
843843
<td>-13.717946</td>
844-
<td>275.259570</td>
844+
<td>250.342203</td>
845845
<td>0.719469</td>
846846
</tr>
847847
</tbody>
@@ -1154,16 +1154,16 @@ <h2>Contribute<a class="headerlink" href="#contribute" title="Permalink to this
11541154
},
11551155
codeMirrorConfig: {
11561156
theme: "abcdef",
1157-
mode: "causal-glory"
1157+
mode: "conda-root-py"
11581158
},
11591159
kernelOptions: {
1160-
kernelName: "causal-glory",
1160+
kernelName: "conda-root-py",
11611161
path: "./."
11621162
},
11631163
predefinedOutput: true
11641164
}
11651165
</script>
1166-
<script>kernelName = 'causal-glory'</script>
1166+
<script>kernelName = 'conda-root-py'</script>
11671167

11681168
</div>
11691169

22-Debiased-Orthogonal-Machine-Learning.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -761,7 +761,7 @@ <h2>ML for Nuisance Parameters<a class="headerlink" href="#ml-for-nuisance-param
761761
<p>Thinking about how to model nuisance parameters is already boring with just a few covariates. But what if we had tens or hundreds of them? With modern datasets, this is pretty common. So, what can we do about it? The answer lies the coolest Econometric theorem ever derived.</p>
762762
<section id="frisch-waugh-lovell">
763763
<h3>Frisch-Waugh-Lovell<a class="headerlink" href="#frisch-waugh-lovell" title="Permalink to this headline">#</a></h3>
764-
<p>Frisch, Waugh and Lovell were 19th century econometricians who noticed the coolest thing about linear regression. This isn’t new to you, as we’ve talked about it in the context of regression residuals and when talking about fixed effects. But since this theorem is key to understanding Orthogonal-ML, it’s very much worth recapping it.</p>
764+
<p>Frisch, Waugh and Lovell were 20th century econometricians who noticed the coolest thing about linear regression. This isn’t new to you, as we’ve talked about it in the context of regression residuals and when talking about fixed effects. But since this theorem is key to understanding Orthogonal-ML, it’s very much worth recapping it.</p>
765765
<p>Suppose you have a linear regression model with a set of features <span class="math notranslate nohighlight">\(X_1\)</span> and another set of features <span class="math notranslate nohighlight">\(X_2\)</span>. You then estimate that model’s parameters.</p>
766766
<p><span class="math notranslate nohighlight">\(
767767
\hat{Y} = \hat{\beta_1} X_1 + \hat{\beta_2} X_2
@@ -1001,7 +1001,7 @@ <h3>CATE Estimation with Double-ML<a class="headerlink" href="#cate-estimation-w
10011001
<img alt="_images/22-Debiased-Orthogonal-Machine-Learning_25_0.png" src="_images/22-Debiased-Orthogonal-Machine-Learning_25_0.png" />
10021002
</div>
10031003
</div>
1004-
<p>The Double/Debiased ML procedure with a final linear model is already very good, as we can see by the curve above. But, maybe we can do even better. As a matter of fact, this is a very general procedure that we can understand just like a meta-learner. Nie and Wager called it the R-Learner, as a way to recognise the work of Donald Rubin in the causal literature.</p>
1004+
<p>The Double/Debiased ML procedure with a final linear model is already very good, as we can see by the curve above. But, maybe we can do even better. As a matter of fact, this is a very general procedure that we can understand just like a meta-learner. Nie and Wager called it the R-Learner, as a way to recognise the work of Robinson (1988) and to emphasize the role of residualization.</p>
10051005
<p>This generalization comes from realizing that the Double/Debiased ML procedure defines a new loss function that we can minimize however we want. Next, we will see how to do that in a very similar fashion to what we’ve seen before when discussing target transformation method or F-learner.</p>
10061006
</section>
10071007
</section>

24-The-Diff-in-Diff-Saga.html

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1388,7 +1388,7 @@ <h2>3) Enlightenment: A Flexible Functional Form<a class="headerlink" href="#enl
13881388
</div>
13891389
</div>
13901390
</div>
1391-
<p>To see if this model works, we can make counterfactual predictions for \(Y_0\) by forcing <code class="docutils literal notranslate"><span class="pre">treat</span></code> to be zero for everyone. Then, we can estimate the effect by taking the observed outcome for the treatment, which is \(Y_1\), and subtract \(\hat{Y}_0\) from it. Let’s see if that matches the true ATT.</p>
1391+
<p>To see if this model works, we can make counterfactual predictions for <span class="math notranslate nohighlight">\(Y_0\)</span> by forcing <code class="docutils literal notranslate"><span class="pre">treat</span></code> to be zero for everyone. Then, we can estimate the effect by taking the observed outcome for the treatment, which is <span class="math notranslate nohighlight">\(Y_1\)</span>, and subtract <span class="math notranslate nohighlight">\(\hat{Y}_0\)</span> from it. Let’s see if that matches the true ATT.</p>
13921392
<div class="cell docutils container">
13931393
<div class="cell_input docutils container">
13941394
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">df_pred</span> <span class="o">=</span> <span class="p">(</span><span class="n">df_heter_str</span>
@@ -1598,16 +1598,16 @@ <h2>Contribute<a class="headerlink" href="#contribute" title="Permalink to this
15981598
},
15991599
codeMirrorConfig: {
16001600
theme: "abcdef",
1601-
mode: "causal-glory"
1601+
mode: "conda-root-py"
16021602
},
16031603
kernelOptions: {
1604-
kernelName: "causal-glory",
1604+
kernelName: "conda-root-py",
16051605
path: "./."
16061606
},
16071607
predefinedOutput: true
16081608
}
16091609
</script>
1610-
<script>kernelName = 'causal-glory'</script>
1610+
<script>kernelName = 'conda-root-py'</script>
16111611

16121612
</div>
16131613

25-Synthetic-Diff-in-Diff.html

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -501,6 +501,11 @@ <h1 class="site-logo" id="site-title">Causal Inference for the Brave and True</h
501501
Placebo Variance Estimation
502502
</a>
503503
</li>
504+
<li class="toc-h2 nav-item toc-entry">
505+
<a class="reference internal nav-link" href="#key-concepts">
506+
Key Concepts
507+
</a>
508+
</li>
504509
<li class="toc-h2 nav-item toc-entry">
505510
<a class="reference internal nav-link" href="#references">
506511
References
@@ -555,6 +560,11 @@ <h2> Contents </h2>
555560
Placebo Variance Estimation
556561
</a>
557562
</li>
563+
<li class="toc-h2 nav-item toc-entry">
564+
<a class="reference internal nav-link" href="#key-concepts">
565+
Key Concepts
566+
</a>
567+
</li>
558568
<li class="toc-h2 nav-item toc-entry">
559569
<a class="reference internal nav-link" href="#references">
560570
References
@@ -2131,6 +2141,10 @@ <h2>Placebo Variance Estimation<a class="headerlink" href="#placebo-variance-est
21312141
</div>
21322142
</div>
21332143
</section>
2144+
<section id="key-concepts">
2145+
<h2>Key Concepts<a class="headerlink" href="#key-concepts" title="Permalink to this headline">#</a></h2>
2146+
<p>Synthetic-Diff-in-Diff (SDID) draws inspiration from both Diff-in-Diff and Synthetic Control, which brings advantages from both models. Like SC, SDID still works with multiple periods when pre-treatment trends are not parallel. However, unlike SC, SDID estimate unit weights to build a control unit which is only parallel to the treated group (it doesn’t have to match its level). From DID, SDID leverages time and unit fixed effect, which helps to explain a way a lot of the variance in the outcome, which in turn reduces the variance of the SDID estimator. Synthetic-Diff-in-Diff also introduces some new ideas of its own. First, there is an additional <span class="math notranslate nohighlight">\(L2\)</span> penalty in the optimization of the unit weights which makes them more spread out across control units. Second, SDID allows for an intercept (and hence, extrapolation) when building sutch weights. Third, SDID introduces the use of time weights, which are not present in neither DID nor SC. For this reason, I would’t say SDID is just merging SC and SDID. It is rather building something new, inspired by these two approaches. I also wouldn’t say that SDID is better or worse than traditional Synthetic Control. Each of them have different properties that might be appropriate or not, depending on the situation. For example, you might find yourself in a situation where allowing the extrapolations from SDID is dangerous. In which case, SC might be a good alternative.</p>
2147+
</section>
21342148
<section id="references">
21352149
<h2>References<a class="headerlink" href="#references" title="Permalink to this headline">#</a></h2>
21362150
<p>This chapter is essentially an explainer to the <em>Synthetic Difference in Differences</em> (2019) article, by Dmitry Arkhangelsky, Susan Athey, David A. Hirshberg, Guido W. Imbens and Stefan Wager. Additionally, I would love to recognize Masa Asami for his python implementation of SDID, pysynthdid. His code helped me make sure I didn’t have any bugs in mine, which was extremely helpful.</p>

Conformal-Inference-for-Synthetic-Control.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -868,7 +868,7 @@ <h3>Hypothesis Test and P-Values<a class="headerlink" href="#hypothesis-test-and
868868
<p>With our data, to get the residuals for <span class="math notranslate nohighlight">\(H_0 : 0\)</span>, meaning Proposition 99 had no effect, we can simply pass 0 as the null for our function.</p>
869869
<div class="cell docutils container">
870870
<div class="cell_input docutils container">
871-
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">residuals_df</span> <span class="o">=</span> <span class="n">SyntheticControl</span><span class="p">()</span>
871+
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">model</span> <span class="o">=</span> <span class="n">SyntheticControl</span><span class="p">()</span>
872872

873873
<span class="n">residuals_df</span> <span class="o">=</span> <span class="n">residuals</span><span class="p">(</span><span class="n">data</span><span class="p">,</span>
874874
<span class="s2">&quot;california&quot;</span><span class="p">,</span>
-69 Bytes
Loading
1.46 KB
Loading
631 Bytes
Loading

0 commit comments

Comments
 (0)