Skip to content

Commit a0b57fc

Browse files
committed
visuals
1 parent 96c954b commit a0b57fc

File tree

71 files changed

+2033
-25
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

71 files changed

+2033
-25
lines changed

book/10-bias-fairness.tex

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,14 @@ \subsection{Demographic Parity}
5353

5454
\thispagestyle{customstyle}
5555

56+
\begin{figure*}[ht!]
57+
\centering
58+
\includegraphics[width=\textwidth]{figures/Demographic_Parity.png}
59+
\end{figure*}
60+
61+
\textbf{Figure. Demographic Parity.} \underline{Left:} All groups receive positive predictions at roughly equal rates (~40\%), satisfying parity.
62+
\underline{Right:} Group A is selected at 65\% while Groups B and C at 30\% and 20\%, violating parity.
63+
5664
\orangebox{Did you know that...}
5765
{Demographic parity goes by several other names in the literature, often referred to as statistical parity, group fairness, or even
5866
independence criterion. Different research communities picked different terms, but they all describe the same idea.}
@@ -112,6 +120,14 @@ \subsection{Equality of Opportunity}
112120

113121
\thispagestyle{customstyle}
114122

123+
\begin{figure*}[ht!]
124+
\centering
125+
\includegraphics[width=\textwidth]{figures/Equality_of_Opportunity.png}
126+
\end{figure*}
127+
128+
\textbf{Figure. Equality of Opportunity.} \underline{Left:} All groups have similar TPR (~80\%), meaning qualified individuals are equally likely to be identified.
129+
\underline{Right:} Group A has 90\% TPR while Group C has only 45\%, meaning qualified individuals in Group C are frequently overlooked.
130+
115131
\orangebox{Did you know that...}
116132
{Equality of Opportunity was popularized in the 2016 paper “Equality of Opportunity in Supervised Learning” by Hardt, Price, and Srebro.
117133
In that paper, they introduced both Equality of Opportunity and its stricter sibling, Equalized Odds. The terms have since become standard
@@ -175,6 +191,13 @@ \subsection{Equality of Odds}
175191

176192
\thispagestyle{customstyle}
177193

194+
\begin{figure*}[ht!]
195+
\centering
196+
\includegraphics[width=\textwidth]{figures/Equality_of_Odds.png}
197+
\end{figure*}
198+
199+
\textbf{Figure. Equality of Odds.} \underline{Left:} Both TPR and FPR are consistent across groups (fair). \underline{Right:} Group A has high TPR but low FPR, while Group C has low TPR and high FPR — the model performs well only for Group A.
200+
178201
\orangebox{Did you know that...}
179202
{Equality of Odds was popularized in the 2016 paper “Equality of Opportunity in Supervised Learning” by Hardt, Price, and Srebro.
180203
In that paper, they introduced both Equality of Odds and its less strict sibling, Equality of Opportunity. The terms have since become standard
@@ -241,6 +264,14 @@ \subsection{Predictive Parity}
241264

242265
\thispagestyle{customstyle}
243266

267+
\begin{figure*}[ht!]
268+
\centering
269+
\includegraphics[width=\textwidth]{figures/Predictive_Parity.png}
270+
\end{figure*}
271+
272+
\textbf{Figure. Predictive Parity.} \underline{Left:} All groups have similar PPV (~74\%), meaning a positive prediction is equally trustworthy across groups.
273+
\underline{Right:} A positive prediction for Group A is correct 85\% of the time but only 40\% for Group C.
274+
244275
\orangebox{Did you know that...}
245276
{Predictive parity gained attention during the debate around the COMPAS recidivism tool. ProPublica’s 2016 investigation argued that COMPAS was
246277
unfair because it did not satisfy equalized odds, while its developers countered that the tool did satisfy predictive parity, illustrating how
@@ -298,6 +329,13 @@ \subsection{Calibration within Groups}
298329

299330
\thispagestyle{customstyle}
300331

332+
\begin{figure*}[ht!]
333+
\centering
334+
\includegraphics[width=0.6\textwidth]{figures/Calibration_within_Groups.png}
335+
\end{figure*}
336+
337+
\textbf{Figure. Calibration within Groups.} Group A's calibration curve closely follows the diagonal (well calibrated), while Group B's curve deviates — a predicted score of 0.3 actually corresponds to a 40\% positive rate for Group B, meaning scores have different meanings across groups.
338+
301339
\orangebox{Did you know that...}
302340
{Probability calibration first became popular in weather forecasting. In the 1950s, meteorologists asked whether a
303341
“70\% chance of rain” really meant that it rained on 7 out of 10 such days. This led to the creation of the Brier Score in 1950, one of the

book/2-regression.tex

Lines changed: 25 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,9 @@ \chapter{Regression}
66
\section{MAE}
77
\subsection{Mean Absolute Error}
88

9-
MAE is one of the most popular regression accuracy metrics. It is calculated as the sum of absolute errors divided by the sample size.
10-
It is a scale-dependent accuracy measure which means that it uses the same scale as the data being measured.
9+
MAE is one of the most intuitive regression metrics: it tells you, on average, how far off your predictions are in the same units as your target.
10+
If you're predicting house prices and your MAE is \$15,000, each prediction is off by \$15K on average. This directness makes MAE the go-to metric
11+
when you need to communicate model performance to non-technical stakeholders.
1112

1213
% equation
1314
\begin{center}
@@ -25,23 +26,23 @@ \subsection{Mean Absolute Error}
2526
}
2627
\end{center}
2728

28-
The smaller the MAE, the closer the model's predictions are to the actual targets.
29-
Theoretically, MAE belongs in the 0 to +infinity range. One of the aspects that makes MAE popular is that it is easy to understand and compute.
29+
MAE ranges from 0 (perfect) to +infinity. To interpret it in context, compare MAE to the standard deviation of your target:
30+
MAE $\ll$ std($Y$) suggests a useful model, while MAE $\approx$ std($Y$) means your model is barely better than predicting the mean.
3031

3132
\textbf{When to use MAE?}
3233

33-
Use MAE when you need an interpretable, robust metric that penalizes all errors equally.
34-
Avoid using it when larger errors need more significant penalization.
34+
Use MAE for demand forecasting, inventory planning, or any task where over-predicting by 5 is exactly as bad as under-predicting by 5.
35+
Avoid MAE when large errors are disproportionately costly (use RMSE), when you need scale-free comparison across different targets (use MAPE),
36+
or when you need a differentiable loss function for training (use MSE).
3537

3638
% strength and weakness box
3739
\coloredboxes{
38-
\item MAE provides an easy-to-understand value since it represents the average error in the same units as the data.
39-
\item MAE treats under-predictions and over-predictions equally. Bear in mind that this may not be desirable in all contexts.
40+
\item Robust to outliers. Unlike MSE, a single bad prediction doesn't dominate the score. If 99 predictions are off by 1 and one is off by 100, MAE = 1.99 while RMSE = 10.0.
41+
\item Directly interpretable. MAE = 5.2 means ``on average, predictions miss by 5.2 units.'' No square roots or percentage conversions needed.
4042
}
4143
{
42-
\item MAE can be biased when the distribution of errors is skewed, as it does not account for the direction of the error.
43-
\item The absolute value function used in MAE is not differentiable at zero, which can pose challenges in optimization
44-
and gradient-based learning algorithms.
44+
\item All errors are weighted equally. A model with many small errors gets the same MAE as one with fewer but larger errors. If large errors are costly, use RMSE.
45+
\item Not differentiable at zero, so it cannot be directly used as a loss function in gradient descent. In practice, Huber loss or smooth L1 are used instead.
4546
}
4647

4748
\clearpage
@@ -51,31 +52,30 @@ \subsection{Mean Absolute Error}
5152
\begin{figure*}[ht!]
5253
\centering
5354
\includegraphics[width=0.6\textwidth]{figures/MAE_3d_surface.png}
54-
% \caption{Caption}
5555
\end{figure*}
5656

5757
\begin{wrapfigure}{r}{0.5\textwidth}
5858
\centering
59-
\vspace{-10pt} % Adjust vertical alignment if needed
60-
\includegraphics[width=0.45\textwidth]{figures/MAE_cross_section.png} % Your figure goes here
61-
\vspace{-10pt} % Adjust vertical alignment if needed
59+
\vspace{-10pt}
60+
\includegraphics[width=0.45\textwidth]{figures/MAE_cross_section.png}
61+
\vspace{-10pt}
6262
\end{wrapfigure}
6363

64-
% Left text with the image on the right
65-
\textbf{Figure 3.1 MAE.} \underline{Top:} The rate of change of MAE is linear.
66-
Each error contributes proportionally to the total error.
67-
\underline{Right:} We can see that MAE is always non-negative, symmetrical,
68-
and centered around zero. By looking at this plot it is clear that MAE is not differentiable at zero.
64+
\textbf{Figure. MAE.} \underline{Top:} The rate of change of MAE is linear —
65+
each error contributes proportionally to the total.
66+
\underline{Right:} MAE is always non-negative, symmetrical, and centered around zero.
67+
Unlike MSE's parabola, the V-shape means all errors are penalized equally regardless of magnitude.
6968

7069
\orangebox{Did you know that...}
71-
{A forecast method that minimizes MAE
72-
will lead to forecasts of the median.}
73-
70+
{Minimizing MAE leads to predicting the \textbf{median}, while minimizing MSE leads to predicting the \textbf{mean}.
71+
This is why MAE is more robust to outliers --- the median is less affected by extreme values than the mean. If your target distribution
72+
is skewed (e.g., income, claim amounts), this distinction matters: MAE and MSE will favor different models.}
7473

7574
\textbf{Other related metrics}
7675

77-
Other metrics commonly explored alongside MAE are Mean Squared Error (MSE), Root Mean Squared Error (RMSE),
78-
and Mean Absolute Percentage Error (MAPE).
76+
If your MAE and RMSE differ significantly, it signals that some predictions have large errors (RMSE amplifies them). A rule of thumb:
77+
RMSE/MAE close to 1.0 means errors are uniform; RMSE/MAE $\gg$ 1.0 means a few predictions are far off.
78+
For percentage-based comparison across different-scale targets, use MAPE or sMAPE.
7979

8080
% ---------- MSE ----------
8181
\clearpage

book/3-classification.tex

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -396,6 +396,13 @@ \subsection{True Negative Rate (Specificity)}
396396
\clearpage
397397
\thispagestyle{customstyle}
398398

399+
\begin{figure*}[ht!]
400+
\centering
401+
\includegraphics[width=0.6\textwidth]{figures/TNR_3d_surface.png}
402+
\end{figure*}
403+
404+
\textbf{Figure.} TNR (Specificity) as a function of TN and FP. TNR increases as more actual negatives are correctly identified.
405+
399406
\orangebox{%
400407
Did you know that...}
401408
{
@@ -512,6 +519,13 @@ \subsection{Balanced Accuracy}
512519
\clearpage
513520
\thispagestyle{customstyle}
514521

522+
\begin{figure*}[ht!]
523+
\centering
524+
\includegraphics[width=\textwidth]{figures/Balanced_Accuracy_comparison.png}
525+
\end{figure*}
526+
527+
\textbf{Figure.} On imbalanced data (left), a naive "always negative" classifier gets 95\% accuracy but only 50\% balanced accuracy, exposing its failure. On balanced data (right), both metrics agree.
528+
515529
\orangebox{Did you know that...}
516530
{Balanced Accuracy is mathematically equivalent to the macro-averaged recall. In scikit-learn, you can verify this:
517531
\texttt{balanced\_accuracy\_score(y, pred)} always equals \texttt{recall\_score(y, pred, average='macro')}.}
@@ -564,6 +578,13 @@ \subsection{Precision}
564578
\clearpage
565579
\thispagestyle{customstyle}
566580

581+
\begin{figure*}[ht!]
582+
\centering
583+
\includegraphics[width=0.6\textwidth]{figures/Precision_3d_surface.png}
584+
\end{figure*}
585+
586+
\textbf{Figure.} Precision as a function of TP and FP. Precision is highest when FP is low relative to TP.
587+
567588
\orangebox{%
568589
Did you know that...}
569590
{
@@ -695,6 +716,13 @@ \subsection{F-beta}
695716
\clearpage
696717
\thispagestyle{customstyle}
697718

719+
\begin{figure*}[ht!]
720+
\centering
721+
\includegraphics[width=0.6\textwidth]{figures/F_beta_curves.png}
722+
\end{figure*}
723+
724+
\textbf{Figure.} F-beta vs Recall at fixed Precision=0.8. With $\beta=0.5$ (favors precision), the curve is highest. With $\beta=2$ (favors recall), the score increases more steeply with recall.
725+
698726
\orangebox{Did you know that...}
699727
{Common choices for $\beta$ are: $\beta = 2$ (F2-score), which weights recall twice as much as precision, useful in medical screening where missing a disease
700728
is worse than a false alarm; and $\beta = 0.5$ (F0.5-score), which weights precision twice as much, useful in search engines where irrelevant results are costly.}
@@ -752,6 +780,13 @@ \subsection{Area Under the Receiver Operating Characteristic Curve}
752780
\item ROC AUC doesn't inform about precision and negative predicted value.
753781
}
754782

783+
\begin{figure*}[ht!]
784+
\centering
785+
\includegraphics[width=0.6\textwidth]{figures/ROC_AUC_curves.png}
786+
\end{figure*}
787+
788+
\textbf{Figure.} ROC curves for a good model (AUC~0.95, blue) and a weak model (AUC~0.62, red). The dashed diagonal represents random guessing (AUC=0.50). The area under each curve is the AUC.
789+
755790
\orangebox{Did you know that...}
756791
{The ROC curve was developed during World War II for analyzing radar signals. Radar operators needed to distinguish between enemy aircraft and noise,
757792
leading to the development of signal detection theory and the receiver operating characteristic.}
@@ -806,6 +841,13 @@ \subsection{Area Under the Precision-Recall Curve}
806841
\clearpage
807842
\thispagestyle{customstyle}
808843

844+
\begin{figure*}[ht!]
845+
\centering
846+
\includegraphics[width=0.6\textwidth]{figures/PR_AUC_curves.png}
847+
\end{figure*}
848+
849+
\textbf{Figure.} PR curves for a good model (PR AUC~0.85) and a weak model (PR AUC~0.30). A good model maintains high precision even at high recall.
850+
809851
\orangebox{Did you know that...}
810852
{Unlike ROC AUC where a random classifier always scores 0.5, the baseline for PR AUC depends on the class distribution. For a dataset with 10\% positive class,
811853
a random classifier's PR AUC baseline is approximately 0.1, not 0.5. This makes PR AUC particularly sensitive to class imbalance.}
@@ -989,6 +1031,13 @@ \subsection{Jaccard Index}
9891031
\clearpage
9901032
\thispagestyle{customstyle}
9911033

1034+
\begin{figure*}[ht!]
1035+
\centering
1036+
\includegraphics[width=\textwidth]{figures/Jaccard_overlap.png}
1037+
\end{figure*}
1038+
1039+
\textbf{Figure.} The Jaccard Index measures the overlap between two sets. High overlap (left) yields a high score. Low overlap (right) yields a low score.
1040+
9921041
\orangebox{Did you know that...}
9931042
{The Jaccard Index was introduced by the Swiss botanist Paul Jaccard in 1901 to compare the similarity of plant species across different regions.
9941043
It is also known as Intersection over Union (IoU) in computer vision, where it is the standard metric for object detection and image segmentation tasks.}
@@ -1042,6 +1091,13 @@ \subsection{D-squared Log Loss Score}
10421091
\clearpage
10431092
\thispagestyle{customstyle}
10441093

1094+
\begin{figure*}[ht!]
1095+
\centering
1096+
\includegraphics[width=0.6\textwidth]{figures/D2_Log_Loss_curve.png}
1097+
\end{figure*}
1098+
1099+
\textbf{Figure.} D-squared Log Loss Score vs model log loss. D²=1 at zero loss (perfect), D²=0 at the null model's log loss, and negative for worse-than-baseline models.
1100+
10451101
\orangebox{Did you know that...}
10461102
{The D-squared framework generalizes R-squared to any deviance function, not just squared error. In scikit-learn, \texttt{d2\_log\_loss\_score} uses log loss as the deviance,
10471103
but you can also compute D-squared with other losses like absolute error or Poisson deviance.}
@@ -1096,6 +1152,13 @@ \subsection{P4-metric}
10961152
\clearpage
10971153
\thispagestyle{customstyle}
10981154

1155+
\begin{figure*}[ht!]
1156+
\centering
1157+
\includegraphics[width=0.6\textwidth]{figures/P4_vs_F1.png}
1158+
\end{figure*}
1159+
1160+
\textbf{Figure.} P4 vs F1 across scenarios. When a model ignores true negatives, F1 stays high but P4 drops significantly, revealing the imbalance that F1 misses.
1161+
10991162
\orangebox{Did you know that...}
11001163
{The P4-metric was introduced by Tharwat (2020) as a response to F1-score's blindness to true negatives. The name P4 comes from the fact that it
11011164
considers all four Probabilities: Precision, Recall (or sensitivity), Specificity, and Negative Predictive Value.}
@@ -1147,6 +1210,13 @@ \subsection{Cohen's Kappa}
11471210
\clearpage
11481211
\thispagestyle{customstyle}
11491212

1213+
\begin{figure*}[ht!]
1214+
\centering
1215+
\includegraphics[width=0.6\textwidth]{figures/Cohens_Kappa_levels.png}
1216+
\end{figure*}
1217+
1218+
\textbf{Figure.} Interpretation scale for Cohen's Kappa, from poor agreement (negative) to almost perfect agreement (0.8--1.0).
1219+
11501220
\orangebox{Did you know that...}
11511221
{Cohen's Kappa was introduced by Jacob Cohen in 1960 in his paper \textit{A Coefficient of Agreement for Nominal Scales}. Despite its popularity, Cohen himself
11521222
acknowledged its limitations and later introduced weighted kappa (1968) to handle ordinal categories where some disagreements are worse than others.}
@@ -1264,6 +1334,11 @@ \subsection{Expected Cost}
12641334
\clearpage
12651335
\thispagestyle{customstyle}
12661336

1337+
\begin{figure*}[ht!]
1338+
\centering
1339+
\includegraphics[width=0.45\textwidth]{figures/EC_cost_matrix.png}
1340+
\end{figure*}
1341+
12671342
Consider a loan approval scenario with two true classes: $H_1$ (Creditworthy) and $H_2$ (Not Creditworthy), and two decisions: $D_1$ (Approve Loan) and $D_2$ (Reject Loan).
12681343

12691344
Cost Matrix: Here we have defined a cost matrix C, where we want to heavily penalize approving loans for non-creditworthy customers (false positives).

0 commit comments

Comments
 (0)