Skip to content

Commit 157ba58

Browse files
ArturoAmorQogrisel
andauthored
Improve PR and ROC curves plots in metrics classification notebook (#660)
Co-authored-by: Olivier Grisel <[email protected]>
1 parent 3cfb6fc commit 157ba58

File tree

1 file changed

+25
-2
lines changed

1 file changed

+25
-2
lines changed

python_scripts/metrics_classification.py

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -304,6 +304,13 @@
304304
classifier, data_test, target_test, pos_label='donated',
305305
marker="+"
306306
)
307+
disp = PrecisionRecallDisplay.from_estimator(
308+
dummy_classifier, data_test, target_test, pos_label='donated',
309+
color="tab:orange", linestyle="--", ax=disp.ax_)
310+
plt.xlabel("Recall (also known as TPR or sensitivity)")
311+
plt.ylabel("Precision (also known as PPV)")
312+
plt.xlim(0, 1)
313+
plt.ylim(0, 1)
307314
plt.legend(bbox_to_anchor=(1.05, 0.8), loc="upper left")
308315
_ = disp.ax_.set_title("Precision-recall curve")
309316

@@ -325,14 +332,26 @@
325332
# and is named average precision (AP). With an ideal classifier, the average
326333
# precision would be 1.
327334
#
335+
# Notice that the AP of a `DummyClassifier`, used as baseline to define the
336+
# chance level, coincides with the number of samples in the positive class
337+
# divided by the total number of samples (this number is called the prevalence
338+
# of the positive class).
339+
340+
# %%
341+
prevalence = (
342+
target_test.value_counts()[1] / target_test.value_counts().sum()
343+
)
344+
print(f"Prevalence of the class 'donated': {prevalence:.2f}")
345+
346+
# %% [markdown]
328347
# The precision and recall metric focuses on the positive class, however, one
329348
# might be interested in the compromise between accurately discriminating the
330349
# positive class and accurately discriminating the negative classes. The
331350
# statistics used for this are sensitivity and specificity. Sensitivity is just
332351
# another name for recall. However, specificity measures the proportion of
333352
# correctly classified samples in the negative class defined as: TN / (TN +
334353
# FP). Similar to the precision-recall curve, sensitivity and specificity are
335-
# generally plotted as a curve called the receiver operating characteristic
354+
# generally plotted as a curve called the Receiver Operating Characteristic
336355
# (ROC) curve. Below is such a curve:
337356

338357
# %%
@@ -344,8 +363,12 @@
344363
disp = RocCurveDisplay.from_estimator(
345364
dummy_classifier, data_test, target_test, pos_label='donated',
346365
color="tab:orange", linestyle="--", ax=disp.ax_)
366+
plt.xlabel("False positive rate")
367+
plt.ylabel("True positive rate\n(also known as sensitivity or recall)")
368+
plt.xlim(0, 1)
369+
plt.ylim(0, 1)
347370
plt.legend(bbox_to_anchor=(1.05, 0.8), loc="upper left")
348-
_ = disp.ax_.set_title("ROC AUC curve")
371+
_ = disp.ax_.set_title("Receiver Operating Characteristic curve")
349372

350373
# %% [markdown]
351374
# This curve was built using the same principle as the precision-recall curve:

0 commit comments

Comments
 (0)