|
304 | 304 | classifier, data_test, target_test, pos_label='donated', |
305 | 305 | marker="+" |
306 | 306 | ) |
| 307 | +disp = PrecisionRecallDisplay.from_estimator( |
| 308 | + dummy_classifier, data_test, target_test, pos_label='donated', |
| 309 | + color="tab:orange", linestyle="--", ax=disp.ax_) |
| 310 | +plt.xlabel("Recall (also known as TPR or sensitivity)") |
| 311 | +plt.ylabel("Precision (also known as PPV)") |
| 312 | +plt.xlim(0, 1) |
| 313 | +plt.ylim(0, 1) |
307 | 314 | plt.legend(bbox_to_anchor=(1.05, 0.8), loc="upper left") |
308 | 315 | _ = disp.ax_.set_title("Precision-recall curve") |
309 | 316 |
|
|
325 | 332 | # and is named average precision (AP). With an ideal classifier, the average |
326 | 333 | # precision would be 1. |
327 | 334 | # |
| 335 | +# Notice that the AP of a `DummyClassifier`, used as baseline to define the |
| 336 | +# chance level, coincides with the number of samples in the positive class |
| 337 | +# divided by the total number of samples (this number is called the prevalence |
| 338 | +# of the positive class). |
| 339 | + |
| 340 | +# %% |
| 341 | +prevalence = ( |
| 342 | + target_test.value_counts()[1] / target_test.value_counts().sum() |
| 343 | +) |
| 344 | +print(f"Prevalence of the class 'donated': {prevalence:.2f}") |
| 345 | + |
| 346 | +# %% [markdown] |
328 | 347 | # The precision and recall metric focuses on the positive class, however, one |
329 | 348 | # might be interested in the compromise between accurately discriminating the |
330 | 349 | # positive class and accurately discriminating the negative classes. The |
331 | 350 | # statistics used for this are sensitivity and specificity. Sensitivity is just |
332 | 351 | # another name for recall. However, specificity measures the proportion of |
333 | 352 | # correctly classified samples in the negative class defined as: TN / (TN + |
334 | 353 | # FP). Similar to the precision-recall curve, sensitivity and specificity are |
335 | | -# generally plotted as a curve called the receiver operating characteristic |
| 354 | +# generally plotted as a curve called the Receiver Operating Characteristic |
336 | 355 | # (ROC) curve. Below is such a curve: |
337 | 356 |
|
338 | 357 | # %% |
|
344 | 363 | disp = RocCurveDisplay.from_estimator( |
345 | 364 | dummy_classifier, data_test, target_test, pos_label='donated', |
346 | 365 | color="tab:orange", linestyle="--", ax=disp.ax_) |
| 366 | +plt.xlabel("False positive rate") |
| 367 | +plt.ylabel("True positive rate\n(also known as sensitivity or recall)") |
| 368 | +plt.xlim(0, 1) |
| 369 | +plt.ylim(0, 1) |
347 | 370 | plt.legend(bbox_to_anchor=(1.05, 0.8), loc="upper left") |
348 | | -_ = disp.ax_.set_title("ROC AUC curve") |
| 371 | +_ = disp.ax_.set_title("Receiver Operating Characteristic curve") |
349 | 372 |
|
350 | 373 | # %% [markdown] |
351 | 374 | # This curve was built using the same principle as the precision-recall curve: |
|
0 commit comments