Skip to content

Commit c3e371b

Browse files
committed
Update notebooks
1 parent 157ba58 commit c3e371b

File tree

1 file changed

+35
-2
lines changed

1 file changed

+35
-2
lines changed

notebooks/metrics_classification.ipynb

Lines changed: 35 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -502,6 +502,13 @@
502502
" classifier, data_test, target_test, pos_label='donated',\n",
503503
" marker=\"+\"\n",
504504
")\n",
505+
"disp = PrecisionRecallDisplay.from_estimator(\n",
506+
" dummy_classifier, data_test, target_test, pos_label='donated',\n",
507+
" color=\"tab:orange\", linestyle=\"--\", ax=disp.ax_)\n",
508+
"plt.xlabel(\"Recall (also known as TPR or sensitivity)\")\n",
509+
"plt.ylabel(\"Precision (also known as PPV)\")\n",
510+
"plt.xlim(0, 1)\n",
511+
"plt.ylim(0, 1)\n",
505512
"plt.legend(bbox_to_anchor=(1.05, 0.8), loc=\"upper left\")\n",
506513
"_ = disp.ax_.set_title(\"Precision-recall curve\")"
507514
]
@@ -528,14 +535,36 @@
528535
"and is named average precision (AP). With an ideal classifier, the average\n",
529536
"precision would be 1.\n",
530537
"\n",
538+
"Notice that the AP of a `DummyClassifier`, used as baseline to define the\n",
539+
"chance level, coincides with the number of samples in the positive class\n",
540+
"divided by the total number of samples (this number is called the prevalence\n",
541+
"of the positive class)."
542+
]
543+
},
544+
{
545+
"cell_type": "code",
546+
"execution_count": null,
547+
"metadata": {},
548+
"outputs": [],
549+
"source": [
550+
"prevalence = (\n",
551+
" target_test.value_counts()[1] / target_test.value_counts().sum()\n",
552+
")\n",
553+
"print(f\"Prevalence of the class 'donated': {prevalence:.2f}\")"
554+
]
555+
},
556+
{
557+
"cell_type": "markdown",
558+
"metadata": {},
559+
"source": [
531560
"The precision and recall metric focuses on the positive class, however, one\n",
532561
"might be interested in the compromise between accurately discriminating the\n",
533562
"positive class and accurately discriminating the negative classes. The\n",
534563
"statistics used for this are sensitivity and specificity. Sensitivity is just\n",
535564
"another name for recall. However, specificity measures the proportion of\n",
536565
"correctly classified samples in the negative class defined as: TN / (TN +\n",
537566
"FP). Similar to the precision-recall curve, sensitivity and specificity are\n",
538-
"generally plotted as a curve called the receiver operating characteristic\n",
567+
"generally plotted as a curve called the Receiver Operating Characteristic\n",
539568
"(ROC) curve. Below is such a curve:"
540569
]
541570
},
@@ -553,8 +582,12 @@
553582
"disp = RocCurveDisplay.from_estimator(\n",
554583
" dummy_classifier, data_test, target_test, pos_label='donated',\n",
555584
" color=\"tab:orange\", linestyle=\"--\", ax=disp.ax_)\n",
585+
"plt.xlabel(\"False positive rate\")\n",
586+
"plt.ylabel(\"True positive rate\\n(also known as sensitivity or recall)\")\n",
587+
"plt.xlim(0, 1)\n",
588+
"plt.ylim(0, 1)\n",
556589
"plt.legend(bbox_to_anchor=(1.05, 0.8), loc=\"upper left\")\n",
557-
"_ = disp.ax_.set_title(\"ROC AUC curve\")"
590+
"_ = disp.ax_.set_title(\"Receiver Operating Characteristic curve\")"
558591
]
559592
},
560593
{

0 commit comments

Comments
 (0)