PSOD (Pseudo-Supervised Outlier Detection) is a novel approach for detecting outliers in tabular data by treating each feature as a target variable and using the prediction errors from regression models as outlier scores.
- π Flexible Architecture: Supports any scikit-learn compatible regressor as base learner
- π Mixed Data Types: Handles both numerical and categorical features
- π Multiple Transformations: Supports logarithmic, Yeo-Johnson, and no transformation
- π― Customizable Detection: Configure outlier detection on low end, high end, or both
# TODO: Add PyPI installation instructions once package is published
pip install outlier-pseudo-supervisedgit clone https://github.com/diogoribeiro7/outlier_pseudo_supervised.git
cd outlier_pseudo_supervised
pip install -e .from psod import PSOD
import pandas as pd
# Load your data
df = pd.DataFrame({
'feature1': [1, 2, 3, 4, 100],
'feature2': [10, 20, 30, 40, 1000],
'category': ['A', 'B', 'A', 'B', 'A']
})
# Initialize PSOD
detector = PSOD(cat_columns=['category'])
# Detect outliers
outlier_scores = detector.fit_predict(df)Full documentation available at https://outlier-pseudo-supervised.readthedocs.io
Performance comparisons coming soon.
We welcome contributions! Please see our Contributing Guide for details.
If you use this software in your research, please cite:
@software{ribeiro2024psod,
author = {Ribeiro, Diogo},
title = {PSOD: Pseudo-Supervised Outlier Detection},
year = {2024},
publisher = {GitHub},
url = {https://github.com/diogoribeiro7/outlier_pseudo_supervised}
}This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Diogo Ribeiro - @diogoribeiro7
Project Link: https://github.com/diogoribeiro7/outlier_pseudo_supervised