Skip to content

Commit aebdb50

Browse files
committed
deploy: 74f2e54
0 parents  commit aebdb50

File tree

10 files changed

+206
-0
lines changed

10 files changed

+206
-0
lines changed

.nojekyll

Whitespace-only changes.

api.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# API Reference
2+
3+
## compute_operating_profile
4+
5+
```python
6+
from opproplot import compute_operating_profile
7+
profile = compute_operating_profile(y_true, y_score, bins=40, score_range=(0, 1))
8+
```
9+
10+
- `y_true`: array-like of shape (n_samples,), binary labels.
11+
- `y_score`: array-like of shape (n_samples,), predicted scores or probabilities.
12+
- `bins`: number of score bins (default 40).
13+
- `score_range`: tuple or None. If None, uses min/max of scores.
14+
- `show_key`: display combined legend for bars and lines (default True).
15+
- `key_location`: `"inside"` (axis legend) or `"outside"` (fig-level, right dock).
16+
- `show_grid`: draw a background grid on the metric axis (default False).
17+
- `grid_kwargs`: dict passed to `ax_metric.grid`, e.g., `{"alpha": 0.2, "linestyle": "--"}`.
18+
19+
Returns an `OperatingProfile` dataclass with:
20+
- `edges`, `mids`, `pos_hist`, `neg_hist`, `tpr`, `fpr`, `accuracy`.
21+
22+
## operating_profile_plot
23+
24+
```python
25+
from opproplot import operating_profile_plot
26+
fig, ax_hist, ax_metric = operating_profile_plot(y_true, y_score, bins=30, show_accuracy=True)
27+
```
28+
29+
- `show_accuracy`: include the dashed accuracy curve (default True).
30+
- `ax`: optional Matplotlib axis to draw on; otherwise creates a new figure.
31+
32+
Returns `(fig, ax_hist, ax_metric)` for further styling or saving.

assets/opproplot_breast_cancer.png

82.2 KB
Loading

assets/opproplot_example.png

107 KB
Loading

assets/opproplot_hero.png

40.7 KB
Loading

examples.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Examples
2+
3+
Use these patterns to compare models and datasets.
4+
5+
## Breast cancer (scikit-learn)
6+
7+
- Load `sklearn.datasets.load_breast_cancer`.
8+
- Train a logistic regression or gradient boosting model.
9+
- Plot the operating profile on the test split to inspect separability.
10+
11+
## Fraud-like imbalance
12+
13+
- Simulate or load an imbalanced dataset.
14+
- Compare a calibrated model vs an overconfident one.
15+
- Observe how class imbalance alters histogram heights and accuracy peaks.
16+
17+
## Good vs bad model
18+
19+
- Train two models on the same data.
20+
- Plot both operating profiles side by side.
21+
- Look for:
22+
- Separation of score distributions.
23+
- Lower FPR for the same TPR.
24+
- Stability of accuracy across thresholds.
25+
26+
Swap in your own datasets; the plotting API stays the same.

getting_started.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# Getting Started
2+
3+
This page shows how to generate an operating profile in a notebook and how to interpret it for common binary classifiers.
4+
5+
## Setup
6+
7+
```bash
8+
pip install -e .
9+
```
10+
11+
```python
12+
import numpy as np
13+
from opproplot import operating_profile_plot
14+
```
15+
16+
## Basic example
17+
18+
```python
19+
rng = np.random.default_rng(0)
20+
y_true = rng.integers(0, 2, size=5000)
21+
scores = rng.random(size=5000)
22+
23+
fig, ax_hist, ax_metric = operating_profile_plot(y_true, scores, bins=30)
24+
```
25+
26+
- Left axis: stacked histogram of scores by class.
27+
- Right axis: TPR, FPR, and Accuracy evaluated at each bin midpoint threshold.
28+
- Choose thresholds where TPR/FPR trade-offs make sense for your application.
29+
30+
![Opproplot simulated example](assets/opproplot_example.png)
31+
32+
## Detailed example (scikit-learn)
33+
34+
```python
35+
from sklearn.datasets import load_breast_cancer
36+
from sklearn.model_selection import train_test_split
37+
from sklearn.linear_model import LogisticRegression
38+
39+
data = load_breast_cancer()
40+
X_train, X_test, y_train, y_test = train_test_split(
41+
data.data, data.target, test_size=0.3, random_state=0, stratify=data.target
42+
)
43+
44+
clf = LogisticRegression(max_iter=500)
45+
clf.fit(X_train, y_train)
46+
47+
y_score = clf.predict_proba(X_test)[:, 1]
48+
49+
fig, ax_hist, ax_metric = operating_profile_plot(y_test, y_score, bins=30)
50+
ax_hist.set_title("Breast cancer classifier operating profile")
51+
```
52+
53+
![Opproplot breast cancer example](assets/opproplot_breast_cancer.png)
54+
55+
Pattern applies to other models:
56+
57+
- Random forest / gradient boosting: use `model.predict_proba(X)[:, 1]`.
58+
- XGBoost / LightGBM: use `predict` outputs as scores.
59+
60+
## Interpreting the plot
61+
62+
- Separability: wider gap between class histograms indicates better discrimination.
63+
- Threshold effects: steep TPR drops highlight sensitive regions.
64+
- Accuracy peak: dashed accuracy curve shows the maximizer without trial-and-error.
65+
66+
For deeper theory and metric formulas, see [Theory](theory.md).

index.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Opproplot
2+
3+
A compact operating profile plot for binary classifiers: stacked score histograms by class plus TPR/FPR/Accuracy curves at bin-midpoint thresholds. One view to understand every possible cutoff.
4+
5+
![Opproplot hero](assets/opproplot_hero.png)
6+
7+
## Why Opproplot
8+
9+
- See score separation between classes directly.
10+
- Trace how recall and false positives move as you slide the threshold.
11+
- Spot the accuracy peak without losing visibility into the distribution.
12+
13+
## Install
14+
15+
```bash
16+
pip install -e .
17+
```
18+
19+
## Quickstart
20+
21+
```python
22+
import numpy as np
23+
from opproplot import operating_profile_plot
24+
25+
rng = np.random.default_rng(0)
26+
y_true = rng.integers(0, 2, size=5000)
27+
scores = rng.random(size=5000)
28+
29+
operating_profile_plot(y_true, scores, bins=30)
30+
```
31+
32+
![Opproplot simulated example](assets/opproplot_example.png)
33+
34+
## Detailed example (scikit-learn)
35+
36+
![Opproplot breast cancer](assets/opproplot_breast_cancer.png)
37+
38+
## Learn more
39+
40+
- [Getting started](getting_started.md): notebook-friendly walkthroughs.
41+
- [Theory](theory.md): decision rules, distributions, and threshold geometry.
42+
- [Examples](examples.md): real datasets and comparisons.
43+
- [API](api.md): function reference and parameters.
44+
- [Roadmap](roadmap.md): upcoming features.

roadmap.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Roadmap
2+
3+
Feature | Status
4+
--- | ---
5+
Base Opproplot (TPR/FPR/Accuracy) | ✅ in v0.1.0
6+
MCC / Balanced Accuracy overlays | 🔜
7+
Plotly interactive version | 🔜
8+
Custom binning (score or segment axis) | 🔜
9+
Multi-class (one-vs-rest + small multiples) | 🔜
10+
Threshold selection heuristics (maximize metric) | 🔜
11+
Dash app for validation workflows | Future
12+
Integration into sklearn-like pipeline | Future
13+
14+
Ideas and contributions are welcome; file issues or PRs to shape the next release.

theory.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Theory: The Geometry of Thresholds
2+
3+
Opproplot treats thresholding as a geometric object over score space. For a scoring function f(x) and threshold t, the decision rule is
4+
5+
h_t(x) = 1{f(x) >= t}.
6+
7+
## Distributions
8+
9+
- p(s | Y=1) and p(s | Y=0) are estimated with class-conditional histograms.
10+
- Midpoints of bins act as candidate thresholds.
11+
12+
## Metrics as cumulative integrals
13+
14+
- True Positive Rate: TPR(t) = P(f(X) >= t | Y=1).
15+
- False Positive Rate: FPR(t) = P(f(X) >= t | Y=0).
16+
- Accuracy: Acc(t) = [TP(t) + TN(t)] / (P + N).
17+
18+
These are computed in a single pass over scores by sorting once and evaluating cumulative counts at the bin midpoints.
19+
20+
## Why this view
21+
22+
- Links the score distribution to threshold outcomes directly.
23+
- Shows the full family of operating points without switching plots.
24+
- Works for imbalanced data: histogram heights reveal prevalence while TPR/FPR curves show trade-offs.

0 commit comments

Comments
 (0)