Add dimensional explainability to LOF detector #653
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds dimensional explainability to the Local Outlier Factor (LOF) detector. It implements the same interpretability API pattern proposed for KNN (PR #652), providing consistent
explain_outlier()visualization andget_outlier_explainability_scores()programmatic access across PyOD's core detectors.Motivation
LOF is a density-based algorithm that excels at identifying local outliers. However, a single global LOF score does not indicate which subspace or feature exhibits the density contrast responsible for the anomaly. This PR addresses that gap by:
get_outlier_explainability_scores()(completing the interface started in COPOD/KNN).Changes Made
Core Implementation (
pyod/models/lof.py)Store Training Data
self.X_train_ = Xto enable subspace density calculations.Lazy Neighbor Caching (
_ensure_overall_neighbors)Vectorized Subspace Calculation (
_compute_lof_subspace_with_neighbors)_cached_1d_k_distances,_cached_1d_lof_scores) to prevent re-computing distances for the same dimension multiple times.Main Methods
explain_outlier(): Visualization with statistical cutoffs (percentiles across training data).get_outlier_explainability_scores(): Returns the raw 1D LOF scores for specific dimensions.Added Imports (Lines ~7-9)
import warningsimport matplotlib.pyplot as plt as pltimport numpy as npExample (
examples/lof_interpretability.py)Created a clean example using
cardio.matthat mirrors the KNN example:Tests (
pyod/test/test_lof.py)Added
test_get_outlier_explainability_scoreswhich validates the math on a synthetic 2D dataset where outliers are obvious in specific dimensions (e.g., verifying that an X-axis outlier has a higher X-dimension LOF score).API Design
The API mirrors the
explain_outlier()interface established in COPOD and your recent KNN submission:explain_outlier()explain_outlier()ind, columns, cutoffs, feature_names...get_outlier_explainability_scores()get_outlier_explainability_scores()Usage Example:
Technical Details
Algorithm:
To explain an outlier$p$ in dimension $d$ :
Complexity:
self._cached_1d_lof_scores), making interactive exploration nearly instant after the initial computation.ResourceWarningif computing cutoffs on very large datasets (>1GB estimated memory) suggests disablingcompute_cutoffs.Testing
test_get_outlier_explainability_scoresinpyod/test/test_lof.py.Quantitative Evaluation (Perturbation Test)
To validate that the features identified by this method are truly responsible for the anomaly, we conducted a perturbation test on the top 20 outliers of the Pima dataset.
Research Foundation
This implementation is based on the framework proposed in:
Krenmayr, Lucas and Goldstein, Markus (2023). "Explainable Outlier Detection Using Feature Ranking for k-Nearest Neighbors, Gaussian Mixture Model and Autoencoders." In 15th International Conference on Agents and Artificial Intelligence (ICAART).
BibTeX:
Screenshots/Examples
2D Validation Examples
Figure 1: 2D LOF Inlier

Standard inlier in a dense cluster.
Figure 2: 2D LOF X-Dimension Outlier

Point is an outlier in X (density contrast) but normal in Y.
Figure 3: 2D LOF Y-Dimension Outlier

Point is an outlier in Y (density contrast) but normal in X.
Figure 4: 2D LOF Outlier

Standard outlier to a dense cluster.
Figure 5: 2D LOF Sparse Inlier

A critical case for LOF: This point is far from others (KNN outlier) but fits the density of its local sparse cluster (LOF Inlier). The explanation correctly identifies low scores.
Real-World Dataset (Pima Indians Diabetes)
Figure 6: Pima LOF Outlier 1

Top outlier driven by specific density deviations.
Figure 7: Pima LOF Outlier 2

Figure 8: Pima LOF Inlier

Normal sample showing low dimensional LOF scores.
Checklist
All Submissions Basics:
All Submissions Cores:
test_get_outlier_explainability_scoresintest_lof.py.# pragma: no cover.Files Changed
pyod/models/lof.py- Added explainability methodsexamples/lof_interpretability.py- New example filepyod/test/test_lof.py- Added unit test forget_outlier_explainability_scores()methodNote to Reviewer: This PR builds upon the explainability effort started in PR #652 (KNN Explainability).