Add dimensional explainability to LOF detector #653

Powerscore · 2026-01-03T16:28:34Z

Summary

This PR adds dimensional explainability to the Local Outlier Factor (LOF) detector. It implements the same interpretability API pattern proposed for KNN (PR #652), providing consistent explain_outlier() visualization and get_outlier_explainability_scores() programmatic access across PyOD's core detectors.

Motivation

LOF is a density-based algorithm that excels at identifying local outliers. However, a single global LOF score does not indicate which subspace or feature exhibits the density contrast responsible for the anomaly. This PR addresses that gap by:

Consistency: Implementing a method that evaluates 1D density using the original k-nearest neighbors from the full-dimensional space (ensuring the explanation explains the actual anomaly detected).
Visualization: Providing horizontal bar charts of 1D LOF scores with statistical significance bands.
Access: Enabling programmatic access to dimensional scores via get_outlier_explainability_scores() (completing the interface started in COPOD/KNN).

Changes Made

Core Implementation (`pyod/models/lof.py`)

Store Training Data
- Added self.X_train_ = X to enable subspace density calculations.
- Follows the pattern established in COPOD and KNN (PR Add dimensional explainability to KNN detector #652).
- Trade-off: Increases memory usage (O(N×D)) but is strictly necessary for dimensional density estimation.
Lazy Neighbor Caching (_ensure_overall_neighbors)
- Unlike KNN, sklearn's LOF does not always expose the graph easily. This method lazily computes and caches the global k-NN graph upon the first request for an explanation.
Vectorized Subspace Calculation (_compute_lof_subspace_with_neighbors)
- Implements the 1D LOF formula using the global neighbor set.
- Uses a fully vectorized approach (numpy array operations) rather than loops to ensure performance.
- Includes a caching strategy (_cached_1d_k_distances, _cached_1d_lof_scores) to prevent re-computing distances for the same dimension multiple times.
Main Methods
- explain_outlier(): Visualization with statistical cutoffs (percentiles across training data).
- get_outlier_explainability_scores(): Returns the raw 1D LOF scores for specific dimensions.
Added Imports (Lines ~7-9)
- import warnings
- import matplotlib.pyplot as plt as plt
- import numpy as np

Example (`examples/lof_interpretability.py`)

Created a clean example using cardio.mat that mirrors the KNN example:

Demonstrates basic usage on high-dimensional data.
Shows custom cutoff bands.
Explains the difference between global scores and dimensional contributions.

Tests (`pyod/test/test_lof.py`)

Added test_get_outlier_explainability_scores which validates the math on a synthetic 2D dataset where outliers are obvious in specific dimensions (e.g., verifying that an X-axis outlier has a higher X-dimension LOF score).

API Design

The API mirrors the explain_outlier() interface established in COPOD and your recent KNN submission:

Feature	KNN (PR #652)	LOF (This PR)
Method name	`explain_outlier()`	`explain_outlier()`
Parameters	`ind, columns, cutoffs, feature_names...`	Same
Metric	Avg Distance to k-NN	1D LOF Score (using global neighbors)
Programmatic	`get_outlier_explainability_scores()`	`get_outlier_explainability_scores()`

Usage Example:

from pyod.models.lof import LOF
from pyod.utils.data import generate_data

X_train, _, _, _ = generate_data(n_train=200, n_features=5)
clf = LOF(n_neighbors=20)
clf.fit(X_train)

# Visualize explanation
clf.explain_outlier(ind=0, feature_names=['F1', 'F2', 'F3', 'F4', 'F5'])

# Get raw scores
scores = clf.get_outlier_explainability_scores(ind=0)

Technical Details

Algorithm:

To explain an outlier $p$ in dimension $d$:

Retrieve the set of $k$-nearest neighbors $\mathcal{N}_k(p)$ found in the full feature space.
Calculate 1D Reachability Distance in dimension $d$ using these fixed neighbors:
$$reach\text{-}dist_k^{(d)}(p, o) = \max( \text{k-distance}^{(d)}(o), |p_d - o_d| )$$
Compute 1D Local Reachability Density (LRD) and the resulting 1D LOF score.

Complexity:

Space: O(N×D) to store training data.
Time:
- First call (with cutoffs): O(N×k×D) to compute statistical bands across the full training set.
- Subsequent calls: O(k×D) per explanation. Results are cached (self._cached_1d_lof_scores), making interactive exploration nearly instant after the initial computation.
Safety: Includes a ResourceWarning if computing cutoffs on very large datasets (>1GB estimated memory) suggests disabling compute_cutoffs.

Testing

Unit Tests: Added test_get_outlier_explainability_scores in pyod/test/test_lof.py.
Manual Validation: Tested against synthetic 2D cases (Inlier, X-Outlier, Y-Outlier, Sparse Cluster Inlier).
Quantitative Evaluation: Performed a perturbation test on the Pima Indians Diabetes dataset (see below).

Quantitative Evaluation (Perturbation Test)

To validate that the features identified by this method are truly responsible for the anomaly, we conducted a perturbation test on the top 20 outliers of the Pima dataset.

Method: We identified the top feature via 1D LOF, removed it, and re-calculated the score. We compared this to removing a random feature.
Result: Removing the explained feature reduced the LOF score by an average of 0.5737, while removing a random feature reduced it by only 0.0433.
Significance: This difference is statistically significant ($p < 0.001$), confirming the explanation method correctly identifies causal features.

Research Foundation

This implementation is based on the framework proposed in:

Krenmayr, Lucas and Goldstein, Markus (2023). "Explainable Outlier Detection Using Feature Ranking for k-Nearest Neighbors, Gaussian Mixture Model and Autoencoders." In 15th International Conference on Agents and Artificial Intelligence (ICAART).

BibTeX:

@inproceedings{Lucas2023xodknn,
  author = {Krenmayr, Lucas and Goldstein, Markus},
  year = {2023},
  month = {02},
  pages = {245-253},
  title = {Explainable Outlier Detection Using Feature Ranking for k-Nearest Neighbors, Gaussian Mixture Model and Autoencoders},
  doi = {10.5220/0011631900003411}
}

Screenshots/Examples

2D Validation Examples

Figure 1: 2D LOF Inlier

Standard inlier in a dense cluster.

Figure 2: 2D LOF X-Dimension Outlier

Point is an outlier in X (density contrast) but normal in Y.

Figure 3: 2D LOF Y-Dimension Outlier

Point is an outlier in Y (density contrast) but normal in X.

Figure 4: 2D LOF Outlier

Standard outlier to a dense cluster.

Figure 5: 2D LOF Sparse Inlier

A critical case for LOF: This point is far from others (KNN outlier) but fits the density of its local sparse cluster (LOF Inlier). The explanation correctly identifies low scores.

Real-World Dataset (Pima Indians Diabetes)

Note: We performed Min-Max Scaling on the dataset prior to generating these examples.

Figure 6: Pima LOF Outlier 1

Top outlier driven by specific density deviations.

Figure 7: Pima LOF Outlier 2

Figure 8: Pima LOF Inlier

Normal sample showing low dimensional LOF scores.

Checklist

All Submissions Basics:

Have you followed the guidelines in our Contributing document?
Have you checked to ensure there aren't other open Pull Requests for the same update/change?
Have you checked all Issues to tie the PR to a specific one?

All Submissions Cores:

Have you added an explanation of what your changes do and why you'd like us to include them?
Have you written new tests for your core changes, as applicable?
- Added unit test test_get_outlier_explainability_scores in test_lof.py.
- Visualization methods use # pragma: no cover.
Have you successfully ran tests with your changes locally?
- All LOF tests pass.
- Example script runs successfully.
Does your submission pass tests, including CircleCI, Travis CI, and AppVeyor?
Does your submission have appropriate code coverage?
- Core logic covered.
- Visualization excluded via pragma.

Files Changed

pyod/models/lof.py - Added explainability methods
examples/lof_interpretability.py - New example file
pyod/test/test_lof.py - Added unit test for get_outlier_explainability_scores() method

Note to Reviewer: This PR builds upon the explainability effort started in PR #652 (KNN Explainability).

Add dimensional explainability to LOF detector

f5601d2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add dimensional explainability to LOF detector #653

Add dimensional explainability to LOF detector #653

Uh oh!

Powerscore commented Jan 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Add dimensional explainability to LOF detector #653

Are you sure you want to change the base?

Add dimensional explainability to LOF detector #653

Uh oh!

Conversation

Powerscore commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes Made

Core Implementation (pyod/models/lof.py)

Example (examples/lof_interpretability.py)

Tests (pyod/test/test_lof.py)

API Design

Technical Details

Testing

Quantitative Evaluation (Perturbation Test)

Research Foundation

Screenshots/Examples

2D Validation Examples

Real-World Dataset (Pima Indians Diabetes)

Checklist

All Submissions Basics:

All Submissions Cores:

Files Changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Powerscore commented Jan 3, 2026 •

edited

Loading

Core Implementation (`pyod/models/lof.py`)

Example (`examples/lof_interpretability.py`)

Tests (`pyod/test/test_lof.py`)