Update notebooks

lesteve · lesteve · commit c3e371becb8c · 2022-10-12T16:30:11.000+02:00
diff --git a/notebooks/metrics_classification.ipynb b/notebooks/metrics_classification.ipynb
@@ -502,6 +502,13 @@
     "    classifier, data_test, target_test, pos_label='donated',\n",
     "    marker=\"+\"\n",
     ")\n",
+    "disp = PrecisionRecallDisplay.from_estimator(\n",
+    "    dummy_classifier, data_test, target_test, pos_label='donated',\n",
+    "    color=\"tab:orange\", linestyle=\"--\", ax=disp.ax_)\n",
+    "plt.xlabel(\"Recall (also known as TPR or sensitivity)\")\n",
+    "plt.ylabel(\"Precision (also known as PPV)\")\n",
+    "plt.xlim(0, 1)\n",
+    "plt.ylim(0, 1)\n",
     "plt.legend(bbox_to_anchor=(1.05, 0.8), loc=\"upper left\")\n",
     "_ = disp.ax_.set_title(\"Precision-recall curve\")"
    ]
@@ -528,14 +535,36 @@
     "and is named average precision (AP). With an ideal classifier, the average\n",
     "precision would be 1.\n",
     "\n",
+    "Notice that the AP of a `DummyClassifier`, used as baseline to define the\n",
+    "chance level, coincides with the number of samples in the positive class\n",
+    "divided by the total number of samples (this number is called the prevalence\n",
+    "of the positive class)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "prevalence = (\n",
+    "    target_test.value_counts()[1] / target_test.value_counts().sum()\n",
+    ")\n",
+    "print(f\"Prevalence of the class 'donated': {prevalence:.2f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
     "The precision and recall metric focuses on the positive class, however, one\n",
     "might be interested in the compromise between accurately discriminating the\n",
     "positive class and accurately discriminating the negative classes. The\n",
     "statistics used for this are sensitivity and specificity. Sensitivity is just\n",
     "another name for recall. However, specificity measures the proportion of\n",
     "correctly classified samples in the negative class defined as: TN / (TN +\n",
     "FP). Similar to the precision-recall curve, sensitivity and specificity are\n",
-    "generally plotted as a curve called the receiver operating characteristic\n",
+    "generally plotted as a curve called the Receiver Operating Characteristic\n",
     "(ROC) curve. Below is such a curve:"
    ]
   },
@@ -553,8 +582,12 @@
     "disp = RocCurveDisplay.from_estimator(\n",
     "    dummy_classifier, data_test, target_test, pos_label='donated',\n",
     "    color=\"tab:orange\", linestyle=\"--\", ax=disp.ax_)\n",
+    "plt.xlabel(\"False positive rate\")\n",
+    "plt.ylabel(\"True positive rate\\n(also known as sensitivity or recall)\")\n",
+    "plt.xlim(0, 1)\n",
+    "plt.ylim(0, 1)\n",
     "plt.legend(bbox_to_anchor=(1.05, 0.8), loc=\"upper left\")\n",
-    "_ = disp.ax_.set_title(\"ROC AUC curve\")"
+    "_ = disp.ax_.set_title(\"Receiver Operating Characteristic curve\")"
    ]
   },
   {