You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As described in @sec-correlationCausation, correlation does not imply causation.
1459
1459
There are several reasons (described in @sec-correlationCausation) that, just because `X` is correlated with `Y` does not necessarily mean that `X` causes `Y`.
1460
1460
However, correlation can still be useful.
1461
-
In order for two processes to be causally related, they must be associated.
1461
+
In order for two processes to be causally related, they must be associated, as described in @sec-conditionsForCausality.
1462
1462
That is, association is necessary but insufficient for causality.
Copy file name to clipboardExpand all lines: machine-learning.qmd
+4-1Lines changed: 4 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -119,6 +119,9 @@ Machine learning is a class of algorithmic approaches that are used to identify
119
119
Machine learning takes us away from focusing on [causal inference](#sec-causalInference).
120
120
Machine learning does not care about which processes are causal—i.e., which processes influence the outcome.
121
121
Instead, machine learning cares about prediction—it cares about a predictor variable to the extent that it increases predictive accuracy regardless of whether it is causally related to the outcome.
122
+
Nevertheless, association is necessary (despite being insufficient) for causality, as described in @sec-conditionsForCausality.
123
+
Thus, achieving strong prediction is important (even if insufficient) for the model to be useful.
124
+
If a model does explains only a small portion of variance, it is difficult for it to be useful.
122
125
123
126
Machine learning can be useful for leveraging big data and many predictor variables to develop predictive models with greater accuracy.
124
127
However, many machine learning techniques are black boxes—it is often unclear how or why certain predictions are made, which can make it difficult to interpret the model's decisions and understand the underlying relationships between variables.
@@ -179,7 +182,7 @@ This chapter discusses several key ones:
179
182
Supervised learning involves learning from data where the correct classification or outcome is known (and the classification is thus part of the data).
180
183
For instance, predicting how many points a player will score is a supervised learning task, because there is a ground truth—the actual number of points scored—that can be used to train and evaluate the model.
181
184
If the outcome variable is categorical, the approach involves classification.
182
-
If the outcome vairable is continuous, the approach involves regression.
185
+
If the outcome variable is continuous, the approach involves regression.
183
186
184
187
Unlike linear and logistic regression, various machine learning techniques can handle [multicollinearity](#sec-multipleRegressionMulticollinearity), including [LASSO regression](#sec-lasso), [ridge regression](#sec-ridgeRegression), and [elastic net regression](#sec-elasticNet) via regularization.
185
188
Regularization involves penalizing model complexity to avoid [overfitting](#sec-overfitting)[@Ramasubramanian2016].
0 commit comments