matheusfacure
diff --git a/‎03-Stats-Review-The-Most-Dangerous-Equation.html‎
Lines changed: 4 additions & 3 deletions b/‎03-Stats-Review-The-Most-Dangerous-Equation.html‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎16-Regression-Discontinuity-Design.html‎
Lines changed: 1 addition & 1 deletion b/‎16-Regression-Discontinuity-Design.html‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎19-Evaluating-Causal-Models.html‎
Lines changed: 10 additions & 10 deletions b/‎19-Evaluating-Causal-Models.html‎
Lines changed: 10 additions & 10 deletions
diff --git a/‎22-Debiased-Orthogonal-Machine-Learning.html‎
Lines changed: 2 additions & 2 deletions b/‎22-Debiased-Orthogonal-Machine-Learning.html‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎24-The-Diff-in-Diff-Saga.html‎
Lines changed: 4 additions & 4 deletions b/‎24-The-Diff-in-Diff-Saga.html‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎25-Synthetic-Diff-in-Diff.html‎
Lines changed: 14 additions & 0 deletions b/‎25-Synthetic-Diff-in-Diff.html‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎Conformal-Inference-for-Synthetic-Control.html‎
Lines changed: 1 addition & 1 deletion b/‎Conformal-Inference-for-Synthetic-Control.html‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎_images/19-Evaluating-Causal-Models_17_0.png‎
-69 Bytes b/‎_images/19-Evaluating-Causal-Models_17_0.png‎
-69 Bytes
diff --git a/‎_images/19-Evaluating-Causal-Models_21_0.png‎
1.46 KB b/‎_images/19-Evaluating-Causal-Models_21_0.png‎
1.46 KB
diff --git a/‎_images/19-Evaluating-Causal-Models_26_0.png‎
631 Bytes b/‎_images/19-Evaluating-Causal-Models_26_0.png‎
631 Bytes
@@ -762,9 +762,10 @@ <h2>The Standard Error of Our Estimates<a class="headerlink" href="#the-standard
 <p>Since this is just a review of statistics, I’ll take the liberty to go a bit faster now. If you are not familiar with distributions, variance, and standard errors, please read on, but keep in mind that you might need some additional resources. I suggest you google any MIT course on introduction to statistics. They are usually quite good.</p>
 <p>In the previous section, we estimated the average treatment effect <span class="math notranslate nohighlight">\(E[Y_1-Y_0]\)</span> as the difference in the means between the treated and the untreated <span class="math notranslate nohighlight">\(E[Y|T=1]-E[Y|T=0]\)</span>. We figured out the <span class="math notranslate nohighlight">\(ATE\)</span> for online classes as our motivating example. We also saw a negative impact; online classes made students perform about 5 points worse than the students with face-to-face classes. Now, we get to see if this impact is statistically significant.</p>
 <p>To do so, we need to estimate the <span class="math notranslate nohighlight">\(SE\)</span>. We already have <span class="math notranslate nohighlight">\(n\)</span>, our sample size. To get the estimate for the standard deviation, we can do the following</p>
-<p><span class="math notranslate nohighlight">\(
-\hat{\sigma}^2=\sqrt{\frac{1}{N-1}\sum_{i=0}^N (x-\bar{x})^2}
-\)</span></p>
+<div class="math notranslate nohighlight">
+\[
+\hat{\sigma}=\sqrt{\frac{1}{N-1}\sum_{i=1}^N (x_i-\bar{x})^2}
+\]</div>
 <p>where <span class="math notranslate nohighlight">\(\bar{x}\)</span> is the mean of <span class="math notranslate nohighlight">\(x\)</span>. Fortunately for us, most programming software already implements this. In Pandas, we can use the method <a class="reference external" href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.std.html">std</a>.</p>
 <div class="cell docutils container">
 <div class="cell_input docutils container">
 
@@ -794,7 +794,7 @@ <h2>RDD Estimation<a class="headerlink" href="#rdd-estimation" title="Permalink
 </tr>
 </table></div></div>
 </div>
-<p>This model is telling us that mortality increases by 7.6627 points with the consumption of alcohol. Another way of putting this is that alcohol increases the chance of death by all causes by 8% ((7.6627+93.6184)/93.6184). Notice that this also gives us standard errors for our causal effect estimate. In this case, the effect is statistically significant, since the p-value is below 0.01.</p>
+<p>This model is telling us that mortality increases by 7.6627 points with the consumption of alcohol. Another way of putting this is that alcohol increases the chance of death by all causes by 8% (100*((7.6627+93.6184)/93.6184 - 1)). Notice that this also gives us standard errors for our causal effect estimate. In this case, the effect is statistically significant, since the p-value is below 0.01.</p>
 <p>If we want to verify this model visually, we can show the predicted values on the data that we have. You can see that it is as though we had 2 regression models: one for those above the threshold and one for below it.</p>
 <div class="cell docutils container">
 <div class="cell_input docutils container">
 
@@ -730,7 +730,7 @@ <h1>19 - Evaluating Causal Models<a class="headerlink" href="#evaluating-causal-
 </div>
 <div class="cell_output docutils container">
 <div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>Train Score: 0.9251704824568053
-Test Score: 0.7711074163447711
+Test Score: 0.7711074163447716
 </pre></div>
 </div>
 </div>
@@ -751,7 +751,7 @@ <h1>19 - Evaluating Causal Models<a class="headerlink" href="#evaluating-causal-
 <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">123</span><span class="p">)</span>
 <span class="n">prices_rnd_pred</span> <span class="o">=</span> <span class="n">prices_rnd</span><span class="o">.</span><span class="n">assign</span><span class="p">(</span><span class="o">**</span><span class="p">{</span>
     <span class="s2">&quot;elast_m_pred&quot;</span><span class="p">:</span> <span class="n">predict_elast</span><span class="p">(</span><span class="n">m1</span><span class="p">,</span> <span class="n">prices_rnd</span><span class="p">),</span> <span class="c1">## elasticity model</span>
-    <span class="s2">&quot;pred_m_pred&quot;</span><span class="p">:</span> <span class="n">m1</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">prices_rnd</span><span class="p">[</span><span class="n">X</span><span class="p">]),</span> <span class="c1">## predictive model</span>
+    <span class="s2">&quot;pred_m_pred&quot;</span><span class="p">:</span> <span class="n">m2</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">prices_rnd</span><span class="p">[</span><span class="n">X</span><span class="p">]),</span> <span class="c1">## predictive model</span>
     <span class="s2">&quot;rand_m_pred&quot;</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="n">prices_rnd</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="c1">## random model</span>
 <span class="p">})</span>
 
@@ -797,7 +797,7 @@ <h1>19 - Evaluating Causal Models<a class="headerlink" href="#evaluating-causal-
       <td>7</td>
       <td>230</td>
       <td>-13.096964</td>
-      <td>220.922251</td>
+      <td>224.067406</td>
       <td>0.696469</td>
     </tr>
     <tr>
@@ -808,7 +808,7 @@ <h1>19 - Evaluating Causal Models<a class="headerlink" href="#evaluating-causal-
       <td>4</td>
       <td>190</td>
       <td>1.054695</td>
-      <td>186.241407</td>
+      <td>189.889147</td>
       <td>0.286139</td>
     </tr>
     <tr>
@@ -819,7 +819,7 @@ <h1>19 - Evaluating Causal Models<a class="headerlink" href="#evaluating-causal-
       <td>5</td>
       <td>237</td>
       <td>-17.362642</td>
-      <td>269.232654</td>
+      <td>237.255157</td>
       <td>0.226851</td>
     </tr>
     <tr>
@@ -830,7 +830,7 @@ <h1>19 - Evaluating Causal Models<a class="headerlink" href="#evaluating-causal-
       <td>5</td>
       <td>193</td>
       <td>0.564985</td>
-      <td>188.229552</td>
+      <td>186.688619</td>
       <td>0.551315</td>
     </tr>
     <tr>
@@ -841,7 +841,7 @@ <h1>19 - Evaluating Causal Models<a class="headerlink" href="#evaluating-causal-
       <td>3</td>
       <td>252</td>
       <td>-13.717946</td>
-      <td>275.259570</td>
+      <td>250.342203</td>
       <td>0.719469</td>
     </tr>
   </tbody>
@@ -1154,16 +1154,16 @@ <h2>Contribute<a class="headerlink" href="#contribute" title="Permalink to this
         },
         codeMirrorConfig: {
             theme: "abcdef",
-            mode: "causal-glory"
+            mode: "conda-root-py"
         },
         kernelOptions: {
-            kernelName: "causal-glory",
+            kernelName: "conda-root-py",
             path: "./."
         },
         predefinedOutput: true
     }
     </script>
-    <script>kernelName = 'causal-glory'</script>
+    <script>kernelName = 'conda-root-py'</script>
 
               </div>
 
 
@@ -761,7 +761,7 @@ <h2>ML for Nuisance Parameters<a class="headerlink" href="#ml-for-nuisance-param
 <p>Thinking about how to model nuisance parameters is already boring with just a few covariates. But what if we had tens or hundreds of them? With modern datasets, this is pretty common. So, what can we do about it? The answer lies the coolest Econometric theorem ever derived.</p>
 <section id="frisch-waugh-lovell">
 <h3>Frisch-Waugh-Lovell<a class="headerlink" href="#frisch-waugh-lovell" title="Permalink to this headline">#</a></h3>
-<p>Frisch, Waugh and Lovell were 19th century econometricians who noticed the coolest thing about linear regression. This isn’t new to you, as we’ve talked about it in the context of regression residuals and when talking about fixed effects. But since this theorem is key to understanding Orthogonal-ML, it’s very much worth recapping it.</p>
+<p>Frisch, Waugh and Lovell were 20th century econometricians who noticed the coolest thing about linear regression. This isn’t new to you, as we’ve talked about it in the context of regression residuals and when talking about fixed effects. But since this theorem is key to understanding Orthogonal-ML, it’s very much worth recapping it.</p>
 <p>Suppose you have a linear regression model with a set of features <span class="math notranslate nohighlight">\(X_1\)</span> and another set of features <span class="math notranslate nohighlight">\(X_2\)</span>. You then estimate that model’s parameters.</p>
 <p><span class="math notranslate nohighlight">\(
 \hat{Y} = \hat{\beta_1} X_1 + \hat{\beta_2} X_2
@@ -1001,7 +1001,7 @@ <h3>CATE Estimation with Double-ML<a class="headerlink" href="#cate-estimation-w
 <img alt="_images/22-Debiased-Orthogonal-Machine-Learning_25_0.png" src="_images/22-Debiased-Orthogonal-Machine-Learning_25_0.png" />
 </div>
 </div>
-<p>The Double/Debiased ML procedure with a final linear model is already very good, as we can see by the curve above. But, maybe we can do even better. As a matter of fact, this is a very general procedure that we can understand just like a meta-learner. Nie and Wager called it the R-Learner, as a way to recognise the work of Donald Rubin in the causal literature.</p>
+<p>The Double/Debiased ML procedure with a final linear model is already very good, as we can see by the curve above. But, maybe we can do even better. As a matter of fact, this is a very general procedure that we can understand just like a meta-learner. Nie and Wager called it the R-Learner, as a way to recognise the work of Robinson (1988) and to emphasize the role of residualization.</p>
 <p>This generalization comes from realizing that the Double/Debiased ML procedure defines a new loss function that we can minimize however we want. Next, we will see how to do that in a very similar fashion to what we’ve seen before when discussing target transformation method or F-learner.</p>
 </section>
 </section>
 
@@ -1388,7 +1388,7 @@ <h2>3) Enlightenment: A Flexible Functional Form<a class="headerlink" href="#enl
 </div>
 </div>
 </div>
-<p>To see if this model works, we can make counterfactual predictions for \(Y_0\) by forcing <code class="docutils literal notranslate"><span class="pre">treat</span></code> to be zero for everyone. Then, we can estimate the effect by taking the observed outcome for the treatment, which is \(Y_1\), and subtract \(\hat{Y}_0\) from it. Let’s see if that matches the true ATT.</p>
+<p>To see if this model works, we can make counterfactual predictions for <span class="math notranslate nohighlight">\(Y_0\)</span> by forcing <code class="docutils literal notranslate"><span class="pre">treat</span></code> to be zero for everyone. Then, we can estimate the effect by taking the observed outcome for the treatment, which is <span class="math notranslate nohighlight">\(Y_1\)</span>, and subtract <span class="math notranslate nohighlight">\(\hat{Y}_0\)</span> from it. Let’s see if that matches the true ATT.</p>
 <div class="cell docutils container">
 <div class="cell_input docutils container">
 <div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">df_pred</span> <span class="o">=</span> <span class="p">(</span><span class="n">df_heter_str</span>
@@ -1598,16 +1598,16 @@ <h2>Contribute<a class="headerlink" href="#contribute" title="Permalink to this
         },
         codeMirrorConfig: {
             theme: "abcdef",
-            mode: "causal-glory"
+            mode: "conda-root-py"
         },
         kernelOptions: {
-            kernelName: "causal-glory",
+            kernelName: "conda-root-py",
             path: "./."
         },
         predefinedOutput: true
     }
     </script>
-    <script>kernelName = 'causal-glory'</script>
+    <script>kernelName = 'conda-root-py'</script>
 
               </div>
 
 
@@ -501,6 +501,11 @@ <h1 class="site-logo" id="site-title">Causal Inference for the Brave and True</h
    Placebo Variance Estimation
   </a>
  </li>
+ <li class="toc-h2 nav-item toc-entry">
+  <a class="reference internal nav-link" href="#key-concepts">
+   Key Concepts
+  </a>
+ </li>
  <li class="toc-h2 nav-item toc-entry">
   <a class="reference internal nav-link" href="#references">
    References
@@ -555,6 +560,11 @@ <h2> Contents </h2>
    Placebo Variance Estimation
   </a>
  </li>
+ <li class="toc-h2 nav-item toc-entry">
+  <a class="reference internal nav-link" href="#key-concepts">
+   Key Concepts
+  </a>
+ </li>
  <li class="toc-h2 nav-item toc-entry">
   <a class="reference internal nav-link" href="#references">
    References
@@ -2131,6 +2141,10 @@ <h2>Placebo Variance Estimation<a class="headerlink" href="#placebo-variance-est
 </div>
 </div>
 </section>
+<section id="key-concepts">
+<h2>Key Concepts<a class="headerlink" href="#key-concepts" title="Permalink to this headline">#</a></h2>
+<p>Synthetic-Diff-in-Diff (SDID) draws inspiration from both Diff-in-Diff and Synthetic Control, which brings advantages from both models.  Like SC, SDID still works with multiple periods when pre-treatment trends are not parallel. However, unlike SC, SDID estimate unit weights to build a control unit which is only parallel to the treated group (it doesn’t have to match its level). From DID, SDID leverages time and unit fixed effect, which helps to explain a way a lot of the variance in the outcome, which in turn reduces the variance of the SDID estimator. Synthetic-Diff-in-Diff also introduces some new ideas of its own. First, there is an additional <span class="math notranslate nohighlight">\(L2\)</span> penalty in the optimization of the unit weights which makes them more spread out across control units. Second, SDID allows for an intercept (and hence, extrapolation) when building sutch weights. Third, SDID introduces the use of time weights, which are not present in neither DID nor SC. For this reason, I would’t say SDID is just merging SC and SDID. It is rather building something new, inspired by these two approaches. I also wouldn’t say that SDID is better or worse than traditional Synthetic Control. Each of them have different properties that might be appropriate or not, depending on the situation. For example, you might find yourself in a situation where allowing the extrapolations from SDID is dangerous. In which case, SC might be a good alternative.</p>
+</section>
 <section id="references">
 <h2>References<a class="headerlink" href="#references" title="Permalink to this headline">#</a></h2>
 <p>This chapter is essentially an explainer to the <em>Synthetic Difference in Differences</em> (2019) article, by Dmitry Arkhangelsky, Susan Athey, David A. Hirshberg, Guido W. Imbens and Stefan Wager. Additionally, I would love to recognize Masa Asami for his python implementation of SDID, pysynthdid. His code helped me make sure I didn’t have any bugs in mine, which was extremely helpful.</p>
 
@@ -868,7 +868,7 @@ <h3>Hypothesis Test and P-Values<a class="headerlink" href="#hypothesis-test-and
 <p>With our data, to get the residuals for <span class="math notranslate nohighlight">\(H_0 : 0\)</span>, meaning Proposition 99 had no effect, we can simply pass 0 as the null for our function.</p>
 <div class="cell docutils container">
 <div class="cell_input docutils container">
-<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">residuals_df</span> <span class="o">=</span> <span class="n">SyntheticControl</span><span class="p">()</span>
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">model</span> <span class="o">=</span> <span class="n">SyntheticControl</span><span class="p">()</span>
 
 <span class="n">residuals_df</span> <span class="o">=</span> <span class="n">residuals</span><span class="p">(</span><span class="n">data</span><span class="p">,</span>
                          <span class="s2">&quot;california&quot;</span><span class="p">,</span>