Update documentation and plotting code

rollingstorms · rollingstorms · commit 67e4e70f07a0 · 2025-12-05T08:40:34.000-05:00
diff --git a/README.md b/README.md
@@ -6,6 +6,8 @@ Docs: https://rollingstorms.github.io/opproplot
 
 ![Opproplot hero](docs/assets/opproplot_hero.png)
 
+<u>OP</u>erating <u>PRO</u>file <u>PLOT</u> ← Opproplot spelled out.
+
 **What is an Operating Profile Plot?**
 
 An Operating Profile Plot (Opproplot) is a unified visualization for binary classifiers that shows how a model behaves across every possible decision threshold. It combines:
@@ -15,6 +17,8 @@ An Operating Profile Plot (Opproplot) is a unified visualization for binary clas
 
 This creates a complete operating profile of the model in a single view — letting you see where the model is confident, where the classes overlap, and how performance changes as you move the threshold.
 
+It is a compact, multidimensional readout of model behavior: score distribution by class plus operating curves (TPR/FPR/accuracy) on the same axis. Comparing profiles across models or datasets shows whether a model separates classes cleanly, where overlap drives errors, and how threshold choices shift business metrics.
+
 Rather than switching between ROC curves, PR curves, histograms, and calibration plots, Opproplot places the score distribution and the operating characteristics on the same axis, making it easy to:
 - identify thresholds with optimal trade-offs
 - diagnose where errors occur in score space
@@ -77,6 +81,7 @@ operating_profile_plot(y_test, y_score, bins=30)
 - Package code lives in `src/opproplot`.
 - Tests live in `tests/`.
 - Documentation for GitHub Pages lives in `docs/` (see below).
+- Regenerate doc images with `python scripts/generate_docs_images.py` (requires numpy, matplotlib, scikit-learn).
 
 ## Documentation site
 
diff --git a/docs/api.md b/docs/api.md
@@ -11,10 +11,6 @@ profile = compute_operating_profile(y_true, y_score, bins=40, score_range=(0, 1)
 - `y_score`: array-like of shape (n_samples,), predicted scores or probabilities.
 - `bins`: number of score bins (default 40).
 - `score_range`: tuple or None. If None, uses min/max of scores.
-- `show_key`: display combined legend for bars and lines (default True).
-- `key_location`: `"inside"` (axis legend) or `"outside"` (fig-level, right dock).
-- `show_grid`: draw a background grid on the metric axis (default False).
-- `grid_kwargs`: dict passed to `ax_metric.grid`, e.g., `{"alpha": 0.2, "linestyle": "--"}`.
 
 Returns an `OperatingProfile` dataclass with:
 - `edges`, `mids`, `pos_hist`, `neg_hist`, `tpr`, `fpr`, `accuracy`.
@@ -23,10 +19,28 @@ Returns an `OperatingProfile` dataclass with:
 
 ```python
 from opproplot import operating_profile_plot
-fig, ax_hist, ax_metric = operating_profile_plot(y_true, y_score, bins=30, show_accuracy=True)
+fig, ax_hist, ax_metric = operating_profile_plot(
+    y_true,
+    y_score,
+    bins=30,
+    show_accuracy=True,
+    show_key=True,
+    key_location="inside",
+    show_grid=False,
+    title=None,
+)
 ```
 
+- `y_true`: array-like of shape (n_samples,), binary labels.
+- `y_score`: array-like of shape (n_samples,), predicted scores or probabilities.
+- `bins`: number of score bins (default 40).
+- `score_range`: tuple or None. If None, uses min/max of scores.
 - `show_accuracy`: include the dashed accuracy curve (default True).
+- `show_key`: display combined legend for bars and lines (default True).
+- `key_location`: `"inside"` (axis legend) or `"outside"` (fig-level, right dock).
+- `show_grid`: draw a background grid on the metric axis (default False).
+- `grid_kwargs`: dict passed to `ax_metric.grid`, e.g., `{"alpha": 0.2, "linestyle": "--"}`.
+- `title`: optional title string; defaults to "Opproplot: Operating Profile".
 - `ax`: optional Matplotlib axis to draw on; otherwise creates a new figure.
 
 Returns `(fig, ax_hist, ax_metric)` for further styling or saving.
diff --git a/docs/examples.md b/docs/examples.md
@@ -2,25 +2,23 @@
 
 Use these patterns to compare models and datasets.
 
-## Breast cancer (scikit-learn)
+## Clear separation (breast cancer, scikit-learn)
 
 - Load `sklearn.datasets.load_breast_cancer`.
 - Train a logistic regression or gradient boosting model.
 - Plot the operating profile on the test split to inspect separability.
+- Interpretation: distributions are well separated; TPR stays high while FPR stays low across much of the threshold range.
 
-## Fraud-like imbalance
+## Ambiguous scores (overlapping normals)
 
-- Simulate or load an imbalanced dataset.
-- Compare a calibrated model vs an overconfident one.
-- Observe how class imbalance alters histogram heights and accuracy peaks.
+- Simulate scores from two overlapping normal distributions with similar means/variance.
+- Expect intertwined histograms and TPR/FPR curves that cross more frequently.
+- Interpretation: thresholds are fragile; small shifts move a lot of examples between classes.
 
-## Good vs bad model
+## Bumpy distributions (mixed pockets)
 
-- Train two models on the same data.
-- Plot both operating profiles side by side.
-- Look for:
-  - Separation of score distributions.
-  - Lower FPR for the same TPR.
-  - Stability of accuracy across thresholds.
+- Build a model that produces multi-modal scores (e.g., mixture components or segment-specific calibrations).
+- Look for “bumps” in the histogram and corresponding inflections in TPR/FPR.
+- Interpretation: localized score clusters may indicate subpopulations; thresholding there can create sharp metric changes.
 
 Swap in your own datasets; the plotting API stays the same.
diff --git a/docs/index.md b/docs/index.md
@@ -1,5 +1,7 @@
 # Opproplot
 
+<u>OP</u>erating <u>PRO</u>file <u>PLOT</u>
+
 A compact operating profile plot for binary classifiers: stacked score histograms by class plus TPR/FPR/Accuracy curves at bin-midpoint thresholds. One view to understand every possible cutoff.
 
 ![Opproplot hero](assets/opproplot_hero.png)
@@ -13,6 +15,8 @@ An Operating Profile Plot (Opproplot) is a unified visualization for binary clas
 
 This creates a complete operating profile of the model in a single view — letting you see where the model is confident, where the classes overlap, and how performance changes as you move the threshold.
 
+It is a compact, multidimensional readout of model behavior: score distribution by class plus operating curves (TPR/FPR/accuracy) on the same axis. Comparing profiles across models or datasets shows whether a model separates classes cleanly, where overlap drives errors, and how threshold choices shift business metrics.
+
 Rather than switching between ROC curves, PR curves, histograms, and calibration plots, Opproplot places the score distribution and the operating characteristics on the same axis, making it easy to:
 - identify thresholds with optimal trade-offs
 - diagnose where errors occur in score space
diff --git a/scripts/generate_docs_images.py b/scripts/generate_docs_images.py
@@ -0,0 +1,117 @@
+"""
+Generate documentation images for Opproplot.
+
+Creates:
+- docs/assets/opproplot_hero.png
+- docs/assets/opproplot_example.png
+- docs/assets/opproplot_breast_cancer.png
+"""
+
+import os
+from pathlib import Path
+
+import matplotlib
+
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt  # noqa: E402
+import numpy as np  # noqa: E402
+from sklearn.datasets import load_breast_cancer  # noqa: E402
+from sklearn.linear_model import LogisticRegression  # noqa: E402
+from sklearn.model_selection import train_test_split  # noqa: E402
+
+from opproplot import operating_profile_plot  # noqa: E402
+
+
+ASSETS_DIR = Path("docs/assets")
+
+
+def _ensure_assets_dir() -> None:
+    ASSETS_DIR.mkdir(parents=True, exist_ok=True)
+
+
+def generate_hero() -> None:
+    rng = np.random.default_rng(2)
+    y_true = rng.integers(0, 2, size=4000)
+    scores = rng.normal(loc=y_true * 0.7 + 0.08, scale=0.3, size=4000)
+    scores = np.clip(scores, 0, 1)
+
+    fig, ax_hist, ax_metric = operating_profile_plot(
+        y_true,
+        scores,
+        bins=24,
+        show_accuracy=True,
+        show_key=True,
+        key_location="outside",
+        show_grid=False,
+        title="Operating Profile Plot",
+    )
+
+    # Minimal styling for hero
+    for ax in (ax_hist, ax_metric):
+        ax.set_xlabel("")
+        ax.set_ylabel("")
+        ax.tick_params(labelbottom=False, labelleft=False, labelright=False, length=0)
+        for spine in ax.spines.values():
+            spine.set_visible(False)
+
+    fig.set_size_inches(4.6, 2.4)
+    fig.tight_layout(pad=0.4)
+    fig.savefig(ASSETS_DIR / "opproplot_hero.png", dpi=220, transparent=True, bbox_inches="tight")
+    plt.close(fig)
+
+
+def generate_simulated_example() -> None:
+    rng = np.random.default_rng(0)
+    y_true = rng.integers(0, 2, size=5000)
+    scores = rng.random(size=5000)
+
+    fig, _, _ = operating_profile_plot(
+        y_true,
+        scores,
+        bins=30,
+        show_accuracy=True,
+        show_key=True,
+        key_location="inside",
+        show_grid=False,
+        title="Opproplot: Operating Profile",
+    )
+    fig.tight_layout()
+    fig.savefig(ASSETS_DIR / "opproplot_example.png", dpi=200)
+    plt.close(fig)
+
+
+def generate_breast_cancer() -> None:
+    data = load_breast_cancer()
+    X_train, X_test, y_train, y_test = train_test_split(
+        data.data, data.target, test_size=0.25, random_state=0, stratify=data.target
+    )
+    clf = LogisticRegression(max_iter=1000)
+    clf.fit(X_train, y_train)
+    y_score = clf.predict_proba(X_test)[:, 1]
+
+    fig, ax_hist, _ = operating_profile_plot(
+        y_test,
+        y_score,
+        bins=30,
+        show_accuracy=True,
+        show_key=True,
+        key_location="inside",
+        show_grid=False,
+        title="Breast cancer classifier: operating profile",
+    )
+    ax_hist.set_title("Breast cancer classifier: operating profile", fontsize=11)
+    fig.tight_layout()
+    fig.savefig(ASSETS_DIR / "opproplot_breast_cancer.png", dpi=200)
+    plt.close(fig)
+
+
+def main() -> None:
+    _ensure_assets_dir()
+    generate_hero()
+    generate_simulated_example()
+    generate_breast_cancer()
+    print("Generated docs images in docs/assets/")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/opproplot/plotting.py b/src/opproplot/plotting.py
@@ -17,6 +17,7 @@ def operating_profile_plot(
     key_location: str = "inside",
     show_grid: bool = False,
     grid_kwargs: Optional[dict] = None,
+    title: Optional[str] = None,
     ax: Optional[plt.Axes] = None,
 ):
     """
@@ -44,6 +45,8 @@ def operating_profile_plot(
     grid_kwargs : dict or None, default=None
         Passed to `ax_metric.grid`; useful keys include `alpha`, `color`,
         `linestyle`, and `linewidth`.
+    title : str or None, default=None
+        Title for the histogram axis. If None, uses "Opproplot: Operating Profile".
     ax : matplotlib.axes.Axes or None, default=None
         Axis to plot on. If None, a new figure and axis are created.
 
@@ -122,7 +125,9 @@ def operating_profile_plot(
     if show_grid:
         ax_metric.grid(True, **(grid_kwargs or {"alpha": 0.2, "linestyle": "--"}))
 
-    ax_hist.set_title("Opproplot: Operating Profile")
+    if title is None:
+        title = "Opproplot: Operating Profile"
+    ax_hist.set_title(title)
 
     fig.tight_layout()
     return fig, ax_hist, ax_metric