You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 03-Stats-Review-The-Most-Dangerous-Equation.html
+4-3Lines changed: 4 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -762,9 +762,10 @@ <h2>The Standard Error of Our Estimates<a class="headerlink" href="#the-standard
762
762
<p>Since this is just a review of statistics, I’ll take the liberty to go a bit faster now. If you are not familiar with distributions, variance, and standard errors, please read on, but keep in mind that you might need some additional resources. I suggest you google any MIT course on introduction to statistics. They are usually quite good.</p>
763
763
<p>In the previous section, we estimated the average treatment effect <spanclass="math notranslate nohighlight">\(E[Y_1-Y_0]\)</span> as the difference in the means between the treated and the untreated <spanclass="math notranslate nohighlight">\(E[Y|T=1]-E[Y|T=0]\)</span>. We figured out the <spanclass="math notranslate nohighlight">\(ATE\)</span> for online classes as our motivating example. We also saw a negative impact; online classes made students perform about 5 points worse than the students with face-to-face classes. Now, we get to see if this impact is statistically significant.</p>
764
764
<p>To do so, we need to estimate the <spanclass="math notranslate nohighlight">\(SE\)</span>. We already have <spanclass="math notranslate nohighlight">\(n\)</span>, our sample size. To get the estimate for the standard deviation, we can do the following</p>
<p>where <spanclass="math notranslate nohighlight">\(\bar{x}\)</span> is the mean of <spanclass="math notranslate nohighlight">\(x\)</span>. Fortunately for us, most programming software already implements this. In Pandas, we can use the method <aclass="reference external" href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.std.html">std</a>.</p>
<p>This model is telling us that mortality increases by 7.6627 points with the consumption of alcohol. Another way of putting this is that alcohol increases the chance of death by all causes by 8% ((7.6627+93.6184)/93.6184). Notice that this also gives us standard errors for our causal effect estimate. In this case, the effect is statistically significant, since the p-value is below 0.01.</p>
797
+
<p>This model is telling us that mortality increases by 7.6627 points with the consumption of alcohol. Another way of putting this is that alcohol increases the chance of death by all causes by 8% (100*((7.6627+93.6184)/93.6184 - 1)). Notice that this also gives us standard errors for our causal effect estimate. In this case, the effect is statistically significant, since the p-value is below 0.01.</p>
798
798
<p>If we want to verify this model visually, we can show the predicted values on the data that we have. You can see that it is as though we had 2 regression models: one for those above the threshold and one for below it.</p>
<spanclass="s2">"rand_m_pred"</span><spanclass="p">:</span><spanclass="n">np</span><spanclass="o">.</span><spanclass="n">random</span><spanclass="o">.</span><spanclass="n">uniform</span><spanclass="p">(</span><spanclass="n">size</span><spanclass="o">=</span><spanclass="n">prices_rnd</span><spanclass="o">.</span><spanclass="n">shape</span><spanclass="p">[</span><spanclass="mi">0</span><spanclass="p">]),</span><spanclass="c1">## random model</span>
Copy file name to clipboardExpand all lines: 22-Debiased-Orthogonal-Machine-Learning.html
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -761,7 +761,7 @@ <h2>ML for Nuisance Parameters<a class="headerlink" href="#ml-for-nuisance-param
761
761
<p>Thinking about how to model nuisance parameters is already boring with just a few covariates. But what if we had tens or hundreds of them? With modern datasets, this is pretty common. So, what can we do about it? The answer lies the coolest Econometric theorem ever derived.</p>
762
762
<sectionid="frisch-waugh-lovell">
763
763
<h3>Frisch-Waugh-Lovell<aclass="headerlink" href="#frisch-waugh-lovell" title="Permalink to this headline">#</a></h3>
764
-
<p>Frisch, Waugh and Lovell were 19th century econometricians who noticed the coolest thing about linear regression. This isn’t new to you, as we’ve talked about it in the context of regression residuals and when talking about fixed effects. But since this theorem is key to understanding Orthogonal-ML, it’s very much worth recapping it.</p>
764
+
<p>Frisch, Waugh and Lovell were 20th century econometricians who noticed the coolest thing about linear regression. This isn’t new to you, as we’ve talked about it in the context of regression residuals and when talking about fixed effects. But since this theorem is key to understanding Orthogonal-ML, it’s very much worth recapping it.</p>
765
765
<p>Suppose you have a linear regression model with a set of features <spanclass="math notranslate nohighlight">\(X_1\)</span> and another set of features <spanclass="math notranslate nohighlight">\(X_2\)</span>. You then estimate that model’s parameters.</p>
766
766
<p><spanclass="math notranslate nohighlight">\(
767
767
\hat{Y} = \hat{\beta_1} X_1 + \hat{\beta_2} X_2
@@ -1001,7 +1001,7 @@ <h3>CATE Estimation with Double-ML<a class="headerlink" href="#cate-estimation-w
<p>The Double/Debiased ML procedure with a final linear model is already very good, as we can see by the curve above. But, maybe we can do even better. As a matter of fact, this is a very general procedure that we can understand just like a meta-learner. Nie and Wager called it the R-Learner, as a way to recognise the work of Donald Rubin in the causal literature.</p>
1004
+
<p>The Double/Debiased ML procedure with a final linear model is already very good, as we can see by the curve above. But, maybe we can do even better. As a matter of fact, this is a very general procedure that we can understand just like a meta-learner. Nie and Wager called it the R-Learner, as a way to recognise the work of Robinson (1988) and to emphasize the role of residualization.</p>
1005
1005
<p>This generalization comes from realizing that the Double/Debiased ML procedure defines a new loss function that we can minimize however we want. Next, we will see how to do that in a very similar fashion to what we’ve seen before when discussing target transformation method or F-learner.</p>
<p>To see if this model works, we can make counterfactual predictions for \(Y_0\) by forcing <codeclass="docutils literal notranslate"><spanclass="pre">treat</span></code> to be zero for everyone. Then, we can estimate the effect by taking the observed outcome for the treatment, which is \(Y_1\), and subtract \(\hat{Y}_0\) from it. Let’s see if that matches the true ATT.</p>
1391
+
<p>To see if this model works, we can make counterfactual predictions for <spanclass="math notranslate nohighlight">\(Y_0\)</span> by forcing <codeclass="docutils literal notranslate"><spanclass="pre">treat</span></code> to be zero for everyone. Then, we can estimate the effect by taking the observed outcome for the treatment, which is <spanclass="math notranslate nohighlight">\(Y_1\)</span>, and subtract <spanclass="math notranslate nohighlight">\(\hat{Y}_0\)</span> from it. Let’s see if that matches the true ATT.</p>
<h2>Key Concepts<aclass="headerlink" href="#key-concepts" title="Permalink to this headline">#</a></h2>
2146
+
<p>Synthetic-Diff-in-Diff (SDID) draws inspiration from both Diff-in-Diff and Synthetic Control, which brings advantages from both models. Like SC, SDID still works with multiple periods when pre-treatment trends are not parallel. However, unlike SC, SDID estimate unit weights to build a control unit which is only parallel to the treated group (it doesn’t have to match its level). From DID, SDID leverages time and unit fixed effect, which helps to explain a way a lot of the variance in the outcome, which in turn reduces the variance of the SDID estimator. Synthetic-Diff-in-Diff also introduces some new ideas of its own. First, there is an additional <spanclass="math notranslate nohighlight">\(L2\)</span> penalty in the optimization of the unit weights which makes them more spread out across control units. Second, SDID allows for an intercept (and hence, extrapolation) when building sutch weights. Third, SDID introduces the use of time weights, which are not present in neither DID nor SC. For this reason, I would’t say SDID is just merging SC and SDID. It is rather building something new, inspired by these two approaches. I also wouldn’t say that SDID is better or worse than traditional Synthetic Control. Each of them have different properties that might be appropriate or not, depending on the situation. For example, you might find yourself in a situation where allowing the extrapolations from SDID is dangerous. In which case, SC might be a good alternative.</p>
2147
+
</section>
2134
2148
<sectionid="references">
2135
2149
<h2>References<aclass="headerlink" href="#references" title="Permalink to this headline">#</a></h2>
2136
2150
<p>This chapter is essentially an explainer to the <em>Synthetic Difference in Differences</em> (2019) article, by Dmitry Arkhangelsky, Susan Athey, David A. Hirshberg, Guido W. Imbens and Stefan Wager. Additionally, I would love to recognize Masa Asami for his python implementation of SDID, pysynthdid. His code helped me make sure I didn’t have any bugs in mine, which was extremely helpful.</p>
Copy file name to clipboardExpand all lines: Conformal-Inference-for-Synthetic-Control.html
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -868,7 +868,7 @@ <h3>Hypothesis Test and P-Values<a class="headerlink" href="#hypothesis-test-and
868
868
<p>With our data, to get the residuals for <spanclass="math notranslate nohighlight">\(H_0 : 0\)</span>, meaning Proposition 99 had no effect, we can simply pass 0 as the null for our function.</p>
0 commit comments