review fixed effects

matheusfacure · matheusfacure · commit ecf50835e135 · 2020-11-26T06:38:57.000-03:00
diff --git a/causal-inference-for-the-brave-and-true/13-Panel-Data-and-Fixed-Effects.ipynb b/causal-inference-for-the-brave-and-true/13-Panel-Data-and-Fixed-Effects.ipynb
@@ -18,7 +18,7 @@
     "\n",
     "This puts us in a difficult situation, because if we have unmeasured confounders, we have bias. One way to deal with this is with instrumental variables, like we've seen before. But coming up with good instruments it's no easy task and requires a lot of creativity. Here, let's look at an alternative that takes advantage of time or the temporal structure of data. \n",
     "\n",
-    "The idea is to use **panel data**. Panel data is when we have **observations on the same individual for multiple periods of time**. Panel data formats are very common in the industry, where they keep records of customer behavior for the same customer and for multiple time periods. The reason we can leverage panel data is because we can compare the same unit before and after the treatment, and see how they behave with it. Before we dive in the math, let's see how this makes intuitive sense."
+    "The idea is to use **panel data**. Panel data is when we have **observations on the same individual for multiple periods of time**. Panel data formats are very common in the industry, where they keep records of customer behavior for the same customer and for multiple time periods. The reason we can leverage panel data is because we can compare the same unit before and after the treatment and see how they behave with it. Before we dive in the math, let's see how this makes intuitive sense."
    ]
   },
   {
@@ -49,7 +49,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "First, let's take a look at the causal graph that we have once we include multiple observations of the same unit across time. Suppose we have a situation where marriage at the first time causes income at the same time and subsequent marital status.. This is also true for times 2 and 3. Also, suppose that beauty is the same across all time periods (a bold statement, but reasonable if time is just a few years) and it causes both marriage and income."
+    "First, let's take a look at the causal graph that we have once we include multiple observations of the same unit across time. Suppose we have a situation where marriage at the first time causes income at the same time and subsequent marital status. This is also true for times 2 and 3. Also, suppose that beauty is the same across all time periods (a bold statement, but reasonable if time is just a few years) and it causes both marriage and income."
    ]
   },
   {
@@ -379,7 +379,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Think about it. We don't know the attributes like beauty and intelligence, but we know that person is the same individual. So, we can create a dummy variable indicating that person and add that to a linear model. This is what we mean when we say we can control for the person itself. When estimating the effect of marriage on income with this person dummy in our model, what regression does is it finds the effect of marriage **while keeping the person variable fixed**. Adding this entity dummy is what we call a fixed effect model.\n",
+    "Think about it. We can't measure attributes like beauty and intelligence, but we know that the person who has them is the same individual across time. So, we can create a dummy variable indicating that person and add that to a linear model. This is what we mean when we say we can control for the person itself: we are adding a variable (dummy in this case) that denotes that particular person. When estimating the effect of marriage on income with this person dummy in our model, regression finds the effect of marriage **while keeping the person variable fixed**. Adding this entity dummy is what we call a fixed effect model.\n",
     "\n",
     "\n",
     "## Fixed Effects\n",
@@ -510,7 +510,7 @@
     "y_{it} = \\beta X_{it} + \\gamma U_i + e_{it}\n",
     "$\n",
     "\n",
-    "where \\\\(y_{it}\\\\) is the outcome of individual \\\\(i\\\\) at time \\\\(t\\\\), \\\\(X_{it}\\\\) is the vector of variables for individual \\\\(i\\\\) at time \\\\(t\\\\), \\\\(U_i\\\\) is a set of unobservables for individual \\\\(i\\\\) that are unchanging through time, hence the lack of the time subscript, and \\\\(e_{it}\\\\) is the error term. For the education example, \\\\(y_{it}\\\\) is log wages,  \\\\(X_{it}\\\\) are the observable variables that change in time, like marriage and experience and \\\\(U_i\\\\) are the variables that are not observed but constant for each individual, like beauty and intelligence. \n",
+    "where \\\\(y_{it}\\\\) is the outcome of individual \\\\(i\\\\) at time \\\\(t\\\\), \\\\(X_{it}\\\\) is the vector of variables for individual \\\\(i\\\\) at time \\\\(t\\\\). \\\\(U_i\\\\) is a set of unobservables for individual \\\\(i\\\\). Notice that those unobservables are unchanging through time, hence the lack of the time subscript. Finally,  \\\\(e_{it}\\\\) is the error term. For the education example, \\\\(y_{it}\\\\) is log wages,  \\\\(X_{it}\\\\) are the observable variables that change in time, like marriage and experience and \\\\(U_i\\\\) are the variables that are not observed but constant for each individual, like beauty and intelligence. \n",
     "\n",
     "\n",
     "Now, remember how I've said that using panel data with a fixed effect model is as simple as adding a dummy for the entities. It's true, but in practice, we don't actually do it. Imagine a dataset where we have 1 million customers. If we add one dummy for each of them, we would end up with 1 million columns, which is probably not a good idea. Instead, we use the trick of partitioning the linear regression into 2 separate models. We've seen this before, but now is a good time to recap it. Suppose you have a linear regression model with a set of features \\\\(X_1\\\\) and another set of features \\\\(X_2\\\\).\n",
@@ -587,7 +587,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Notice how this model is predicting the average income in 1980 to be 1.3935, in 1981 to be 1.5129 (1.3935+0.1194) and so on. Now, if we compute the average by year, we get the exact same result (add the intercept to the parameters of the other years to get the mean for the year)."
+    "Notice how this model is predicting the average income in 1980 to be 1.3935, in 1981 to be 1.5129 (1.3935+0.1194) and so on. Now, if we compute the average by year, we get the exact same result. (Remember that the base year, 1980, is the intercept. So you have to add the intercept to the parameters of the other years to get the mean `lwage` for the year)."
    ]
   },
   {
@@ -628,23 +628,28 @@
     "1. Create time-demeaned variables by subtracting the mean for the individual:   \n",
     "\\\\(\\ddot{Y}_{it} = Y_{it} -  \\bar{Y}_i\\\\)  \n",
     "\\\\(\\ddot{X}_{it} = X_{it} -  \\bar{X}_i\\\\)\n",
-    "2. Run regress \\\\(\\ddot{Y}_{it}\\\\) on \\\\(\\ddot{X}_{it}\\\\)\n",
+    "2. Regress \\\\(\\ddot{Y}_{it}\\\\) on \\\\(\\ddot{X}_{it}\\\\)\n",
     "\n",
-    "Notice that when we do so, the unobserved \\\\(U_i\\\\) vanishes, since \\\\(\\bar{U_i}=U_i\\\\):\n",
+    "Notice that when we do so, the unobserved \\\\(U_i\\\\) vanishes. Since \\\\(U_i\\\\) is constant across time, we have that \\\\(\\bar{U_i}=U_i\\\\). If we have the following system of two equations\n",
     "\n",
     "\\begin{align}\n",
     "Y_{it} & = \\beta X_{it} + \\gamma U_i + e_{it} \\\\\n",
-    "\\bar{Y}_{i} & = \\beta \\bar{X}_{it} + \\gamma U_i + \\bar{e}_{it} \\\\\n",
+    "\\bar{Y}_{i} & = \\beta \\bar{X}_{it} + \\gamma \\bar{U}_i + \\bar{e}_{it} \\\\\n",
+    "\\end{align}\n",
+    "\n",
+    "And we subtract one from the other, we get\n",
+    "\n",
+    "\\begin{align}\n",
     "(Y_{it} - \\bar{Y}_{i}) & = (\\beta X_{it} - \\beta \\bar{X}_{it}) + (\\gamma U_i - \\gamma U_i) + (e_{it}-\\bar{e}_{it}) \\\\\n",
     "(Y_{it} - \\bar{Y}_{i}) & = \\beta(X_{it} - \\bar{X}_{it}) + (e_{it}-\\bar{e}_{it}) \\\\\n",
     "\\ddot{Y}_{it} & = \\beta \\ddot{X}_{it} + \\ddot{e}_{it} \\\\\n",
     "\\end{align}\n",
     "\n",
-    "To be honest, not only do the unobserved variables that are constant in time vanish. This happens to all the variables that are constant in time. For this reason, you can't include any variables that are constant across time, as they would be a linear combination of the dummy variables and the model wouldn't run. \n",
+    "which wipes out all unobserved that are constant across time. To be honest, not only do the unobserved variables vanish. This happens to all the variables that are constant in time. For this reason, you can't include any variables that are constant across time, as they would be a linear combination of the dummy variables and the model wouldn't run. \n",
     "\n",
     "![img](./data/img/fixed-effects/demeaned.png)\n",
     "\n",
-    "To check which variables are those, we can group our data by individual and get the sum of the standard deviations. If it is zero, it means the variable isn't changing for the same individual. "
+    "To check which variables are those, we can group our data by individual and get the sum of the standard deviations. If it is zero, it means the variable isn't changing across time for any of the individuals. "
    ]
   },
   {
@@ -682,9 +687,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "For our data, we need to remove entinicity dummies, `black` and `hisp`, since they are constant for the individual. Also, we need to remove education. We will also not use occupation, since this is probably mediating the effect of marriage on wage (it could be that single men are able to take more time demanding positions). \n",
+    "For our data, we need to remove entinicity dummies, `black` and `hisp`, since they are constant for the individual. Also, we need to remove education. We will also not use occupation, since this is probably mediating the effect of marriage on wage (it could be that single men are able to take more time demanding positions). Having selected the features we will use, it's time to estimate this model.\n",
     "\n",
-    "To run our fixed effect model, first, let's get our mean data."
+    "To run our fixed effect model, first, let's get our mean data. We can achieve this by grouping everything by individuals and taking the mean."
    ]
   },
   {
@@ -968,7 +973,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "If we were to believe that fixed effect eliminates the all omitted variable bias, this model is telling us that marriage increases a man's wage by 11%. This result is very significant. One detail here is that for fixed effect models, the standard errors need to be clustered. So, instead of doing all our estimation by hand (which is only nice for pedagogical reasons), we can use the library `linearmodels` and set the argument `cluster_entity` to True."
+    "If we believe that fixed effect eliminates the all omitted variable bias, this model is telling us that marriage increases a man's wage by 11%. This result is very significant. One detail here is that for fixed effect models, the standard errors need to be clustered. So, instead of doing all our estimation by hand (which is only nice for pedagogical reasons), we can use the library `linearmodels` and set the argument `cluster_entity` to True."
    ]
   },
   {
@@ -1079,11 +1084,11 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This model is saying that marriage increases the man's wage by 14%. A somewhat larger effect than the one we found with the fixed effect model. This suggests some omitted variable bias due to fixed individual factors, like intelligence and beauty not being added to the model. \n",
+    "This model is saying that marriage increases the man's wage by 14%. A somewhat larger effect than the one we found with the fixed effect model. This suggests some omitted variable bias due to fixed individual factors, like intelligence and beauty, not being added to the model.\n",
     "\n",
     "## Visualizing Fixed Effects\n",
     "\n",
-    "To expand our intuition about how fixed effect models work, let's diverge a little to another example. Suppose you work for a big tech company and you want to estimate the impact of a billboard marketing campaign on in-app purchase. When you look at data from the past, you see that the marketing department tends to spend more to place billboards on cities where the purchase level is lower. This makes sense right? They wouldn't need to do lots of marketing if sales were skyrocketing. If you run a regression model on this data, it looks like higher cost in marketing leads to less in-app purchase."
+    "To expand our intuition about how fixed effect models work, let's diverge a little to another example. Suppose you work for a big tech company and you want to estimate the impact of a billboard marketing campaign on in-app purchase. When you look at data from the past, you see that the marketing department tends to spend more to place billboards on cities where the purchase level is lower. This makes sense right? They wouldn't need to do lots of advertisement if sales were skyrocketing. If you run a regression model on this data, it looks like higher cost in marketing leads to less in-app purchase, but only because marketing investments is biased towards low spending regions. "
    ]
   },
   {
@@ -1165,11 +1170,11 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Take a minute to appreciate the figure above and let it tell you what fixed effect is doing. Notice that fixed effect is fitting **one regression line per city**. Also notice that the lines are parallel. The slope of the line is the effect of marketing costs on in-app purchase. So the **fixed effect is assuming that the causal effect is constants across all entities**, which are cities in this case. This can be a weakness or an advantage, depending on how you see it. It is a weakness if you are interested in finding the causal effect per city. Since the FE model assumes this effect is constant across entities, you won't find any difference in the causal effect. However, if you want to find the overall impact of marketing on in-app purchase, the panel structure of the data is a very useful leverage that fixed effects can explore. \n",
+    "Take a minute to appreciate what the image above is telling you about what fixed effect is doing. Notice that fixed effect is fitting **one regression line per city**. Also notice that the lines are parallel. The slope of the line is the effect of marketing costs on in-app purchase. So the **fixed effect is assuming that the causal effect is constants across all entities**, which are cities in this case. This can be a weakness or an advantage, depending on how you see it. It is a weakness if you are interested in finding the causal effect per city. Since the FE model assumes this effect is constant across entities, you won't find any difference in the causal effect. However, if you want to find the overall impact of marketing on in-app purchase, the panel structure of the data is a very useful leverage that fixed effects can explore. \n",
     "\n",
     "## Time Effects\n",
     "\n",
-    "Just like we did a fixed effect for the individual level, we could design a fixed effect for the time level. If adding a dummy for each individual controls for fixed individual characteristics, adding a time dummy would control for variables that are fixed across time. One example of such a variable is inflation. Prices and salary tend to go up with time. If the wage marriage proportion also changes with time, we would have time as a confounder. To give a more concrete example, suppose that marriage is increasing with time. Since inflation also makes salary increase with time, some of the positive association we see between marriage and wage would be simply because both are increasing with time. To correct for that, we can add a dummy variable for each time period. In `linear models`, this is as simple as adding `TimeEffects` to our formula and setting the `cluster_time` to ture."
+    "Just like we did a fixed effect for the individual level, we could design a fixed effect for the time level. If adding a dummy for each individual controls for fixed individual characteristics, adding a time dummy would control for variables that are fixed across time. One example of such a variable is inflation. Prices and salary tend to go up with time. If the wage and marriage proportion also changes with time, we would have time as a confounder. To give a more concrete example, suppose that marriage is increasing with time. Since inflation also makes salary increase with time, some of the positive association we see between marriage and wage would be simply because both are increasing with time. To correct for that, we can add a dummy variable for each time period. In `linear models`, this is as simple as adding `TimeEffects` to our formula and setting the `cluster_time` to true."
    ]
   },
   {
@@ -1232,12 +1237,15 @@
     "\n",
     "Another less obvious case when fixed effect fails is when you have **reversed causality**. For instance, let's say that it isn't marriage that causes you to earn more. Is earning more that increases your chances of getting married. In this case, it will appear that they have a positive correlation but earnings come first. They would change in time and in the same direction, so fixed effects wouldn't be able to control for that. \n",
     "\n",
+    "\n",
     "## Key Ideas\n",
     "\n",
-    "Here, we saw how to use panel data, data where we have multiple measurements of the same individuals across multiple time periods. When that is the case, we can use a fixed effect model that controls for the entity, holding all individual, time constant attributes fixed. This is a powerful and very convincing way of controlling for confounding and it is as good as it gets with non random data. \n",
+    "Here, we saw how to use panel data, data where we have multiple measurements of the same individuals across multiple time periods. When that is the case, we can use a fixed effect model that controls for the entity, holding all individual, time constant attributes, fixed. This is a powerful and very convincing way of controlling for confounding and it is as good as it gets with non random data. \n",
     "\n",
     "Finally, we saw that FE is not a panacea. We understood two situations where it doesn't work: when we have reverse causality and when the unmeasured confounding is changing in time.\n",
     "\n",
+    "\n",
+    "\n",
     "## References\n",
     "\n",
     "I like to think of this entire book as a tribute to Joshua Angrist, Alberto Abadie and Christopher Walters for their amazing Econometrics class. Most of the ideas here are taken from their classes at the American Economic Association. Watching them is what is keeping me sane during this tough year of 2020.\n",
@@ -1255,7 +1263,9 @@
     "\n",
     "Finally, I'd also like to compliment Scott Cunningham and his brilliant work mingling Causal Inference and Rap quotes:\n",
     "\n",
-    "* [Causal Inference: The Mixtape](https://www.scunning.com/mixtape.html)"
+    "* [Causal Inference: The Mixtape](https://www.scunning.com/mixtape.html)\n",
+    "\n",
+    "![img](./data/img/poetry.png)"
    ]
   },
   {
@@ -1282,7 +1292,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.9"
+   "version": "3.8.5"
   }
  },
  "nbformat": 4,