-
-
Notifications
You must be signed in to change notification settings - Fork 570
chapters 8 to 20 - grammar, rendering, typos, phrasing, etc #456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
gsmafra
wants to merge
5
commits into
matheusfacure:master
Choose a base branch
from
gsmafra:chapters-08-20
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 2 commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -163,7 +163,7 @@ | |
| "\\hat{ATE} = \\sum^K_{k=1}(\\bar{Y}_{k1} - \\bar{Y}_{k0}) * \\dfrac{N_k}{N}\n", | ||
| "$\n", | ||
| "\n", | ||
| "where the bar represent the mean of the outcome on the treated, $Y_{k1}$, and non-treated, $Y_{k0}$, at cell k and $N_{k}$ is the number of observations in that same cell. As you can see, we are computing a local ATE for each cell and combining them using a weighted average, where the weights are the sample size of the cell. In our medicine example above, this would be the first estimate, which gave us −2.6.\n", | ||
| "where the bar represents the mean of the outcome on the treated, $Y_{k1}$, and non-treated, $Y_{k0}$, at cell k and $N_{k}$ is the number of observations in that same cell. As you can see, we are computing a local ATE for each cell and combining them using a weighted average, where the weights are the sample size of the cell. In our medicine example above, this would be the first estimate, which gave us −2.6.\n", | ||
| "\n", | ||
| "## Matching Estimator\n", | ||
| "\n", | ||
|
|
@@ -828,7 +828,7 @@ | |
| "cell_type": "markdown", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "But this was a very contrived example, just to introduce matching. In reality, we usually have more than one feature and units don't match perfectly. In this case, we have to define some measurement of proximity to compare how units are close to each other. One common metric for this is the euclidean norm $||X_i - X_j||$. This difference, however, is not invariant to the scale of the features. This means that features like age, that take values on the tenths, will be much less important when computing this norm compared to features like income, which take the order of hundreds. For this reason, before applying the norm, we need to scale the features so that they are on roughly the same scale.\n", | ||
| "But this was a very contrived example, just to introduce matching. In reality, we usually have more than one feature and units don't match perfectly. In this case, we have to define some measurement of proximity to compare how units are close to each other. One common metric for this is the euclidean norm $||X_i - X_j||$. This difference, however, is not invariant to the scale of the features. This means that features like age, that take values in the tens, will be much less important when computing this norm compared to features like income, which take the order of thousands. For this reason, before applying the norm, we need to scale the features so that they are on roughly the same scale.\n", | ||
| "\n", | ||
| "Having defined a distance measure, we can now define the match as the nearest neighbour to that sample we wish to match. In math terms, we can write the matching estimator the following way\n", | ||
| "\n", | ||
|
|
@@ -1253,7 +1253,7 @@ | |
| "\\sqrt{N_1}(\\hat{ATET} - ATET)\n", | ||
| "$\n", | ||
| "\n", | ||
| "However, this doesn't alway happen. If we define the mean outcome for the untreated given X, $\\mu_0(x)=E[Y|X=x, T=0]$, we will have that (btw, I've omitted the proof for that because it's a little beyond the point here).\n", | ||
| "However, this doesn't always happen. If we define the mean outcome for the untreated given X, $\\mu_0(x)=E[Y|X=x, T=0]$, we will have that (btw, I've omitted the proof for that because it's a little beyond the point here).\n", | ||
| "\n", | ||
| "$\n", | ||
| "E[\\sqrt{N_1}(\\hat{ATET} - ATET)] = E[\\sqrt{N_1}(\\mu_0(X_i) - \\mu_0(X_j(i)))]\n", | ||
|
|
@@ -1269,7 +1269,7 @@ | |
| "\\hat{ATET} = \\frac{1}{N_1}\\sum \\big((Y_i - Y_{j(i)}) - (\\hat{\\mu_0}(X_i) - \\hat{\\mu_0}(X_{j(i)}))\\big)\n", | ||
| "$\n", | ||
| "\n", | ||
| "where $\\hat{\\mu_0}(x)$ is some estimative of $E[Y|X, T=0]$, like a linear regression fitted only on the untreated sample." | ||
| "where $\\hat{\\mu_0}(x)$ is some estimate of $E[Y|X, T=0]$, like a linear regression fitted only on the untreated sample." | ||
| ] | ||
| }, | ||
| { | ||
|
|
@@ -1496,9 +1496,11 @@ | |
| "\n", | ||
| "As it turns out, the answer is quite simple and intuitive. It is easy to find people that match on a few characteristics, like sex. But if we add more characteristics, like age, income, city of birth and so on, it becomes harder and harder to find matches. In more general terms, the more features we have, the higher will be the distance between units and their matches. \n", | ||
| "\n", | ||
| "This is not something that hurts only the matching estimator. It ties back to the subclassification estimator we saw earlier. Early on, in that contrived medicine example where with man and woman, it was quite easy to build the subclassification estimator. That was because we only had 2 cells: man and woman. But what would happen if we had more? Let's say we have 2 continuous features like age and income and we manage to discretise them into 5 buckets each. This will give us 25 cells, or $5^2$. And what if we had 10 covariates with 3 buckets each? Doesn't seem like a lot right? Well, this would give us 59049 cells, or $3^{10}$. It's easy to see how this can blow out of proportion pretty quickly. This is a phenomena pervasive in all data science, which is called the **The Curse of Dimensionality**!!!\n", | ||
| "This is not something that hurts only the matching estimator. It ties back to the subclassification estimator we saw earlier. Early on, in that contrived medicine example where with man and woman, it was quite easy to build the subclassification estimator. That was because we only had 2 cells: man and woman. But what would happen if we had more? Let's say we have 2 continuous features like age and income and we manage to discretise them into 5 buckets each. This will give us 25 cells, or $5^2$. And what if we had 10 covariates with 3 buckets each? Doesn't seem like a lot right? Well, this would give us 59049 cells, or $3^{10}$. It's easy to see how this can blow out of proportion pretty quickly. This is a phenomenon pervasive in all data science, which is called the **The Curse of Dimensionality**!!!\n", | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Github is not rendering this quite well, but the only difference is phenomena -> phenomenon |
||
| "\n", | ||
| "", | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| "\n", | ||
| "\n", | ||
| "\n", | ||
| "Image Source: https://deepai.org/machine-learning-glossary-and-terms/curse-of-dimensionality\n", | ||
| "\n", | ||
| "Despite its scary and pretentious name, this only means that the number of data points required to fill a feature space grows exponentially with the number of features, or dimensions. So, if it takes X data points to fill the space of, say, 3 feature spaces, it takes exponentially more points to fill in the space of 4 features. \n", | ||
|
|
@@ -1515,7 +1517,7 @@ | |
| "\n", | ||
| "From there, we've derived a very general causal inference estimator with subclassification. We saw how that estimator is not very useful in practice but it gave us some interesting insights on how to tackle the problem of causal inference estimation. That gave us the opportunity to talk about the matching estimator. \n", | ||
| "\n", | ||
| "Matching controls for the confounders by looking at each treated unit and finding an untreated pair that is very similar to it and similarly for the untreated units. We saw how to implement this method using the KNN algorithm and also how to debiase it using regression. Finally, we discussed the difference between matching and linear regression. We saw how matching is a non parametric estimator that doesn't rely on linearity the way linear regression does.\n", | ||
| "Matching controls for the confounders by looking at each treated unit and finding an untreated pair that is very similar to it and similarly for the untreated units. We saw how to implement this method using the KNN algorithm and also how to debias it using regression. Finally, we discussed the difference between matching and linear regression. We saw how matching is a non-parametric estimator that doesn't rely on linearity the way linear regression does.\n", | ||
| "\n", | ||
| "Finally, we've delved into the problem of high dimensional datasets and we saw how causal inference methods can suffer from it.\n", | ||
| "\n", | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#445