20260131 - prediction

isaactpetersen · isaactpetersen · commit a377e24590dd · 2026-01-31T23:21:00.000-06:00
diff --git a/correlation.qmd b/correlation.qmd
@@ -1458,7 +1458,7 @@ plotly::ggplotly(plot_scatterplotWithRangeRestriction)
 As described in @sec-correlationCausation, correlation does not imply causation.
 There are several reasons (described in @sec-correlationCausation) that, just because `X` is correlated with `Y` does not necessarily mean that `X` causes `Y`.
 However, correlation can still be useful.
-In order for two processes to be causally related, they must be associated.
+In order for two processes to be causally related, they must be associated, as described in @sec-conditionsForCausality.
 That is, association is necessary but insufficient for causality.
 
 ## Conclusion {#sec-correlationConclusion}
diff --git a/machine-learning.qmd b/machine-learning.qmd
@@ -119,6 +119,9 @@ Machine learning is a class of algorithmic approaches that are used to identify
 Machine learning takes us away from focusing on [causal inference](#sec-causalInference).
 Machine learning does not care about which processes are causal—i.e., which processes influence the outcome.
 Instead, machine learning cares about prediction—it cares about a predictor variable to the extent that it increases predictive accuracy regardless of whether it is causally related to the outcome.
+Nevertheless, association is necessary (despite being insufficient) for causality, as described in @sec-conditionsForCausality.
+Thus, achieving strong prediction is important (even if insufficient) for the model to be useful.
+If a model does explains only a small portion of variance, it is difficult for it to be useful.
 
 Machine learning can be useful for leveraging big data and many predictor variables to develop predictive models with greater accuracy.
 However, many machine learning techniques are black boxes—it is often unclear how or why certain predictions are made, which can make it difficult to interpret the model's decisions and understand the underlying relationships between variables.
@@ -179,7 +182,7 @@ This chapter discusses several key ones:
 Supervised learning involves learning from data where the correct classification or outcome is known (and the classification is thus part of the data).
 For instance, predicting how many points a player will score is a supervised learning task, because there is a ground truth—the actual number of points scored—that can be used to train and evaluate the model.
 If the outcome variable is categorical, the approach involves classification.
-If the outcome vairable is continuous, the approach involves regression.
+If the outcome variable is continuous, the approach involves regression.
 
 Unlike linear and logistic regression, various machine learning techniques can handle [multicollinearity](#sec-multipleRegressionMulticollinearity), including [LASSO regression](#sec-lasso), [ridge regression](#sec-ridgeRegression), and [elastic net regression](#sec-elasticNet) via regularization.
 Regularization involves penalizing model complexity to avoid [overfitting](#sec-overfitting) [@Ramasubramanian2016].