|
502 | 502 | " classifier, data_test, target_test, pos_label='donated',\n", |
503 | 503 | " marker=\"+\"\n", |
504 | 504 | ")\n", |
| 505 | + "disp = PrecisionRecallDisplay.from_estimator(\n", |
| 506 | + " dummy_classifier, data_test, target_test, pos_label='donated',\n", |
| 507 | + " color=\"tab:orange\", linestyle=\"--\", ax=disp.ax_)\n", |
| 508 | + "plt.xlabel(\"Recall (also known as TPR or sensitivity)\")\n", |
| 509 | + "plt.ylabel(\"Precision (also known as PPV)\")\n", |
| 510 | + "plt.xlim(0, 1)\n", |
| 511 | + "plt.ylim(0, 1)\n", |
505 | 512 | "plt.legend(bbox_to_anchor=(1.05, 0.8), loc=\"upper left\")\n", |
506 | 513 | "_ = disp.ax_.set_title(\"Precision-recall curve\")" |
507 | 514 | ] |
|
528 | 535 | "and is named average precision (AP). With an ideal classifier, the average\n", |
529 | 536 | "precision would be 1.\n", |
530 | 537 | "\n", |
| 538 | + "Notice that the AP of a `DummyClassifier`, used as baseline to define the\n", |
| 539 | + "chance level, coincides with the number of samples in the positive class\n", |
| 540 | + "divided by the total number of samples (this number is called the prevalence\n", |
| 541 | + "of the positive class)." |
| 542 | + ] |
| 543 | + }, |
| 544 | + { |
| 545 | + "cell_type": "code", |
| 546 | + "execution_count": null, |
| 547 | + "metadata": {}, |
| 548 | + "outputs": [], |
| 549 | + "source": [ |
| 550 | + "prevalence = (\n", |
| 551 | + " target_test.value_counts()[1] / target_test.value_counts().sum()\n", |
| 552 | + ")\n", |
| 553 | + "print(f\"Prevalence of the class 'donated': {prevalence:.2f}\")" |
| 554 | + ] |
| 555 | + }, |
| 556 | + { |
| 557 | + "cell_type": "markdown", |
| 558 | + "metadata": {}, |
| 559 | + "source": [ |
531 | 560 | "The precision and recall metric focuses on the positive class, however, one\n", |
532 | 561 | "might be interested in the compromise between accurately discriminating the\n", |
533 | 562 | "positive class and accurately discriminating the negative classes. The\n", |
534 | 563 | "statistics used for this are sensitivity and specificity. Sensitivity is just\n", |
535 | 564 | "another name for recall. However, specificity measures the proportion of\n", |
536 | 565 | "correctly classified samples in the negative class defined as: TN / (TN +\n", |
537 | 566 | "FP). Similar to the precision-recall curve, sensitivity and specificity are\n", |
538 | | - "generally plotted as a curve called the receiver operating characteristic\n", |
| 567 | + "generally plotted as a curve called the Receiver Operating Characteristic\n", |
539 | 568 | "(ROC) curve. Below is such a curve:" |
540 | 569 | ] |
541 | 570 | }, |
|
553 | 582 | "disp = RocCurveDisplay.from_estimator(\n", |
554 | 583 | " dummy_classifier, data_test, target_test, pos_label='donated',\n", |
555 | 584 | " color=\"tab:orange\", linestyle=\"--\", ax=disp.ax_)\n", |
| 585 | + "plt.xlabel(\"False positive rate\")\n", |
| 586 | + "plt.ylabel(\"True positive rate\\n(also known as sensitivity or recall)\")\n", |
| 587 | + "plt.xlim(0, 1)\n", |
| 588 | + "plt.ylim(0, 1)\n", |
556 | 589 | "plt.legend(bbox_to_anchor=(1.05, 0.8), loc=\"upper left\")\n", |
557 | | - "_ = disp.ax_.set_title(\"ROC AUC curve\")" |
| 590 | + "_ = disp.ax_.set_title(\"Receiver Operating Characteristic curve\")" |
558 | 591 | ] |
559 | 592 | }, |
560 | 593 | { |
|
0 commit comments