Skip to content

🎯 PSOD: Pseudo-Supervised Outlier Detection library for tabular data. Novel approach using ensemble regression prediction errors as outlier scores. Supports mixed data types, multiple transformations, and comprehensive visualization tools.

License

Notifications You must be signed in to change notification settings

DiogoRibeiro7/PSOD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

69 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Outlier Detection using Pseudo-Supervised Learning (PSOD)

License Python Version Code style: black

Overview

PSOD (Pseudo-Supervised Outlier Detection) is a novel approach for detecting outliers in tabular data by treating each feature as a target variable and using the prediction errors from regression models as outlier scores.

Key Features

  • πŸš€ Flexible Architecture: Supports any scikit-learn compatible regressor as base learner
  • πŸ“Š Mixed Data Types: Handles both numerical and categorical features
  • πŸ”„ Multiple Transformations: Supports logarithmic, Yeo-Johnson, and no transformation
  • 🎯 Customizable Detection: Configure outlier detection on low end, high end, or both

Installation

# TODO: Add PyPI installation instructions once package is published
pip install outlier-pseudo-supervised

From Source

git clone https://github.com/diogoribeiro7/outlier_pseudo_supervised.git
cd outlier_pseudo_supervised
pip install -e .

Quick Start

from psod import PSOD
import pandas as pd

# Load your data
df = pd.DataFrame({
    'feature1': [1, 2, 3, 4, 100],
    'feature2': [10, 20, 30, 40, 1000],
    'category': ['A', 'B', 'A', 'B', 'A']
})

# Initialize PSOD
detector = PSOD(cat_columns=['category'])

# Detect outliers
outlier_scores = detector.fit_predict(df)

Documentation

TODO: Add link to full documentation once Sphinx docs are generated

Full documentation available at https://outlier-pseudo-supervised.readthedocs.io

Benchmarks

TODO: Add comprehensive benchmark results comparing PSOD with other outlier detection methods

Performance comparisons coming soon.

Contributing

TODO: Create CONTRIBUTING.md with detailed contribution guidelines

We welcome contributions! Please see our Contributing Guide for details.

Citation

TODO: Add proper citation format once paper/preprint is published

If you use this software in your research, please cite:

@software{ribeiro2024psod,
  author = {Ribeiro, Diogo},
  title = {PSOD: Pseudo-Supervised Outlier Detection},
  year = {2024},
  publisher = {GitHub},
  url = {https://github.com/diogoribeiro7/outlier_pseudo_supervised}
}

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Contact

Diogo Ribeiro - @diogoribeiro7

Project Link: https://github.com/diogoribeiro7/outlier_pseudo_supervised

About

🎯 PSOD: Pseudo-Supervised Outlier Detection library for tabular data. Novel approach using ensemble regression prediction errors as outlier scores. Supports mixed data types, multiple transformations, and comprehensive visualization tools.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors