You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 01-Introduction-To-Causality.html
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -680,7 +680,7 @@ <h2>When Association IS Causation<a class="headerlink" href="#when-association-i
680
680
<p><spanclass="math notranslate nohighlight">\(Y_{1i}\)</span> is the potential outcome for <strong>the same</strong> unit i with the treatment.</p>
681
681
<p>Sometimes you might see potential outcomes represented as functions <spanclass="math notranslate nohighlight">\(Y_i(t)\)</span>, so beware. <spanclass="math notranslate nohighlight">\(Y_{0i}\)</span> could be <spanclass="math notranslate nohighlight">\(Y_i(0)\)</span> and <spanclass="math notranslate nohighlight">\(Y_{1i}\)</span> could be <spanclass="math notranslate nohighlight">\(Y_i(1)\)</span>. Here, we will use the subscript notation most of the time.</p>
<p>Back to our example, <spanclass="math notranslate nohighlight">\(Y_{1i}\)</span> is the academic performance for student i if he or she is in a classroom with tablets. Whether this is or not the case, it doesn’t matter for <spanclass="math notranslate nohighlight">\(Y_{1i}\)</span>. It is the same regardless. If student i gets the tablet, we can observe <spanclass="math notranslate nohighlight">\(Y_{1i}\)</span>. If not, we can observe <spanclass="math notranslate nohighlight">\(Y_{0i}\)</span>. Notice how in this last case, <spanclass="math notranslate nohighlight">\(Y_{1i}\)</span> is still defined, we just can’t see it. In this case, it is a counterfactual potential outcome.</p>
683
+
<p>Back to our example, <spanclass="math notranslate nohighlight">\(Y_{1i}\)</span> is the academic performance for student i if he or she is in a classroom with tablets. Whether or not this is the case, it doesn’t matter for <spanclass="math notranslate nohighlight">\(Y_{1i}\)</span>. It is the same regardless. If student i gets the tablet, we can observe <spanclass="math notranslate nohighlight">\(Y_{1i}\)</span>. If not, we can observe <spanclass="math notranslate nohighlight">\(Y_{0i}\)</span>. Notice how in this last case, <spanclass="math notranslate nohighlight">\(Y_{1i}\)</span> is still defined, we just can’t see it. In this case, it is a counterfactual potential outcome.</p>
684
684
<p>With potential outcomes, we can define the individual treatment effect:</p>
<p>Of course, due to the fundamental problem of causal inference, we can never know the individual treatment effect because we only observe one of the potential outcomes. For the time being, let’s focus on something easier than estimating the individual treatment effect. Instead, lets focus on the <strong>average treatment effect</strong>, which is defined as follows.</p>
<h1>02 - Randomised Experiments<aclass="headerlink" href="#randomised-experiments" title="Permalink to this headline">#</a></h1>
580
-
<sectionid="the-golden-standard">
581
-
<h2>The Golden Standard<aclass="headerlink" href="#the-golden-standard" title="Permalink to this headline">#</a></h2>
580
+
<sectionid="the-gold-standard">
581
+
<h2>The Gold Standard<aclass="headerlink" href="#the-gold-standard" title="Permalink to this headline">#</a></h2>
582
582
<p>In the previous session, we saw why and how association is different from causation. We also saw what is required to make association be causation.</p>
<p>where <spanclass="math notranslate nohighlight">\(\tilde{T_i}\)</span> is the residual from a regression of all other covariates <spanclass="math notranslate nohighlight">\(X_{1i} + ... + X_{ki}\)</span> on <spanclass="math notranslate nohighlight">\(T_i\)</span>. Now, let’s appreciate how cool this is. It means that the coefficient of a multivariate regression is the bivariate coefficient of the same regressor <strong>after accounting for the effect of other variables in the model</strong>. In causal inference terms, <spanclass="math notranslate nohighlight">\(\kappa\)</span> is the bivariate coefficient of <spanclass="math notranslate nohighlight">\(T\)</span> after having used all other variables to predict it.</p>
723
+
<p>where <spanclass="math notranslate nohighlight">\(\tilde{T_i}\)</span> is the residual from a regression of <spanclass="math notranslate nohighlight">\(T_i\)</span> on all other covariates <spanclass="math notranslate nohighlight">\(X_{1i}, ..., X_{ki}\)</span>. Now, let’s appreciate how cool this is. It means that the coefficient of a multivariate regression is the bivariate coefficient of the same regressor <strong>after accounting for the effect of other variables in the model</strong>. In causal inference terms, <spanclass="math notranslate nohighlight">\(\kappa\)</span> is the bivariate coefficient of <spanclass="math notranslate nohighlight">\(T\)</span> after having used all other variables to predict it.</p>
724
724
<p>This has a nice intuition behind it. If we can predict <spanclass="math notranslate nohighlight">\(T\)</span> using other variables, it means it’s not random. However, we can make it so that <spanclass="math notranslate nohighlight">\(T\)</span> is as good as random once we control for other available variables. To do so, we use linear regression to predict it from the other variables and then we take the residuals of that regression <spanclass="math notranslate nohighlight">\(\tilde{T}\)</span>. By definition, <spanclass="math notranslate nohighlight">\(\tilde{T}\)</span> cannot be predicted by the other variables <spanclass="math notranslate nohighlight">\(X\)</span> that we’ve already used to predict <spanclass="math notranslate nohighlight">\(T\)</span>. Quite elegantly, <spanclass="math notranslate nohighlight">\(\tilde{T}\)</span> is a version of the treatment that is not associated with any other variable in <spanclass="math notranslate nohighlight">\(X\)</span>.</p>
725
725
<p>By the way, this is also a property of linear regression. The residual are always orthogonal or uncorrelated with any of the variables in the model that created it:</p>
<p>where <spanclass="math notranslate nohighlight">\(\delta_{A}\)</span> is the vector of coefficients from the regression of <spanclass="math notranslate nohighlight">\(A\)</span> on <spanclass="math notranslate nohighlight">\(Educ\)</span></p>
1073
+
<p>where <spanclass="math notranslate nohighlight">\(\delta_{A}\)</span> is the vector of coefficients from the regression of <spanclass="math notranslate nohighlight">\(Educ\)</span> on <spanclass="math notranslate nohighlight">\(A\)</span></p>
1074
1074
<p>The key point here is that it won’t be exactly the <spanclass="math notranslate nohighlight">\(\kappa\)</span> that we want. Instead, it comes with this extra annoying term <spanclass="math notranslate nohighlight">\(\beta'\delta_{A}\)</span>. This term is the impact of the omitted <spanclass="math notranslate nohighlight">\(A\)</span> on <spanclass="math notranslate nohighlight">\(Wage\)</span>, <spanclass="math notranslate nohighlight">\(\beta\)</span> times the impact of the omitted on the included <spanclass="math notranslate nohighlight">\(Educ\)</span>. This is important for economists that Joshua Angrist made a mantra out of it so that the students can recite it in meditation:</p>
1075
1075
<divclass="highlight-default notranslate"><divclass="highlight"><pre><span></span><spanclass="s2">"Short equals long </span>
<p>If we have such a variable, we can recover the causal effect <spanclass="math notranslate nohighlight">\(\kappa\)</span> with what we will see as the IV formula. To do so, let’s think about the ideal equation we want to run. Using more general terms like <spanclass="math notranslate nohighlight">\(T\)</span> for the treatment and <spanclass="math notranslate nohighlight">\(W\)</span> for the confounders, here is want we want:</p>
671
+
<p>If we have such a variable, we can recover the causal effect <spanclass="math notranslate nohighlight">\(\kappa\)</span> with what we will see as the IV formula. To do so, let’s think about the ideal equation we want to run. Using more general terms like <spanclass="math notranslate nohighlight">\(T\)</span> for the treatment and <spanclass="math notranslate nohighlight">\(W\)</span> for the confounders, here is what we want:</p>
<h2>Quarter of Birth and the Effect of Education on Wage<aclass="headerlink" href="#quarter-of-birth-and-the-effect-of-education-on-wage" title="Permalink to this headline">#</a></h2>
706
-
<p>So far, we’ve been treating these instruments as some magical variable <spanclass="math notranslate nohighlight">\(Z\)</span> which have the miraculous propriety of only affecting the outcome through the treatment. To be honest, good instruments are so hard to come by that we might as well consider them miracles. Let’s just say it is not for the faint of heart. Rumor has it that the cool kids at Chicago School of Economics talk about how they come up with this or that instrument at the bar.</p>
706
+
<p>So far, we’ve been treating these instruments as some magical variable <spanclass="math notranslate nohighlight">\(Z\)</span> which have the miraculous property of only affecting the outcome through the treatment. To be honest, good instruments are so hard to come by that we might as well consider them miracles. Let’s just say it is not for the faint of heart. Rumor has it that the cool kids at Chicago School of Economics talk about how they come up with this or that instrument at the bar.</p>
707
707
<p><imgalt="img" src="_images/good-iv.png" /></p>
708
708
<p>Still, we do have some interesting examples of instruments to make things a little more concrete. We will again try to estimate the effect of education on wage. To do so, we will use the person’s quarter of birth as the instrument Z.</p>
709
709
<p>This idea takes advantage of US compulsory attendance law. Usually, they state that a kid must have turned 6 years by January 1 of the year they enter school. For this reason, kids that are born at the beginning of the year will enter school at an older age. Compulsory attendance law also requires students to be in school until they turn 16, at which point they are legally allowed to drop out. The result is that people born later in the year have, on average, more years of education than those born in the beginning of the year.</p>
0 commit comments