Skip to content

Commit b95b82a

Browse files
authored
Merge pull request #530 from yzhao062/development
v1.1.1
2 parents 1e15311 + f0bfce8 commit b95b82a

File tree

15 files changed

+703
-71
lines changed

15 files changed

+703
-71
lines changed

.github/workflows/testing-cron.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ jobs:
2828
- name: Install dependencies
2929
run: |
3030
python -m pip install --upgrade pip
31-
pip install -r requirements_ci.txt
31+
pip install -r docs/requirements.txt
3232
pip install pytest
3333
pip install coverage
3434
pip install coveralls

.readthedocs.yaml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# .readthedocs.yaml
2+
# Read the Docs configuration file
3+
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
4+
5+
# Required
6+
version: 2
7+
8+
# Set the version of Python and other tools you might need
9+
build:
10+
os: ubuntu-22.04
11+
tools:
12+
python: "3.11"
13+
14+
# Build documentation in the docs/ directory with Sphinx
15+
sphinx:
16+
configuration: docs/conf.py
17+
18+
# We recommend specifying your dependencies to enable reproducible builds:
19+
# https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
20+
python:
21+
install:
22+
- requirements: docs/requirements.txt

.travis.yml

Lines changed: 0 additions & 37 deletions
This file was deleted.

CHANGES.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -178,4 +178,5 @@ v<1.0.8>, <03/08/2023> -- Add QMCD detector (#452).
178178
v<1.0.8>, <03/08/2023> -- Optimized ECDF and drop Statsmodels dependency (#467).
179179
v<1.0.9>, <03/19/2023> -- Hot fix for errors in ECOD and COPOD due to the issue of scipy.
180180
v<1.1.0>, <06/19/2023> -- Further integration of PyThresh.
181-
v<1.1.1>, <07/03/2023> -- Bump up sklearn requirement and some hot fixes.
181+
v<1.1.1>, <07/03/2023> -- Bump up sklearn requirement and some hot fixes.
182+
v<1.1.1>, <10/24/2023> -- Add deep isolation forest (#506)

README.rst

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ Python Outlier Detection (PyOD)
5858

5959
-----
6060

61-
**News**: We just released a 45-page, the most comprehensive `anomaly detection benchmark paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-neurips-adbench.pdf>`_.
61+
**News**: We have a 45-page, the most comprehensive `anomaly detection benchmark paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-neurips-adbench.pdf>`_.
6262
The fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 57 benchmark datasets.
6363

6464
**For time-series outlier detection**, please use `TODS <https://github.com/datamllab/tods>`_.
@@ -70,7 +70,7 @@ multivariate data. This exciting yet challenging field is commonly referred as
7070
or `Anomaly Detection <https://en.wikipedia.org/wiki/Anomaly_detection>`_.
7171

7272
PyOD includes more than 40 detection algorithms, from classical LOF (SIGMOD 2000) to
73-
the latest ECOD (TKDE 2022). Since 2017, PyOD has been successfully used in numerous academic researches and
73+
the latest ECOD and DIF (TKDE 2022 and 2023). Since 2017, PyOD has been successfully used in numerous academic researches and
7474
commercial products with more than `10 million downloads <https://pepy.tech/project/pyod>`_.
7575
It is also well acknowledged by the machine learning community with various dedicated posts/tutorials, including
7676
`Analytics Vidhya <https://www.analyticsvidhya.com/blog/2019/02/outlier-detection-python-pyod/>`_,
@@ -199,9 +199,10 @@ Alternatively, you could clone and run setup.py file:
199199
* numpy>=1.19
200200
* numba>=0.51
201201
* scipy>=1.5.1
202-
* scikit_learn>=0.20.0
202+
* scikit_learn>=0.22.0
203203
* six
204204

205+
205206
**Optional Dependencies (see details below)**\ :
206207

207208
* combo (optional, required for models/combination.py and FeatureBagging)
@@ -392,6 +393,7 @@ Proximity-Based SOD Subspace Outlier Detection
392393
Proximity-Based ROD Rotation-based Outlier Detection 2020 [#Almardeny2020A]_
393394
Outlier Ensembles IForest Isolation Forest 2008 [#Liu2008Isolation]_
394395
Outlier Ensembles INNE Isolation-based Anomaly Detection Using Nearest-Neighbor Ensembles 2018 [#Bandaragoda2018Isolation]_
396+
Outlier Ensembles DIF Deep Isolation Forest for Anomaly Detection 2023 [#Xu2023Deep]_
395397
Outlier Ensembles FB Feature Bagging 2005 [#Lazarevic2005Feature]_
396398
Outlier Ensembles LSCP LSCP: Locally Selective Combination of Parallel Outlier Ensembles 2019 [#Zhao2019LSCP]_
397399
Outlier Ensembles XGBOD Extreme Boosting Based Outlier Detection **(Supervised)** 2018 [#Zhao2018XGBOD]_
@@ -684,6 +686,8 @@ Reference
684686
685687
.. [#Wang2020adVAE] Wang, X., Du, Y., Lin, S., Cui, P., Shen, Y. and Yang, Y., 2019. adVAE: A self-adversarial variational autoencoder with Gaussian anomaly prior knowledge for anomaly detection. *Knowledge-Based Systems*.
686688
689+
.. [#Xu2023Deep] Xu, H., Pang, G., Wang, Y., Wang, Y., 2023. Deep isolation forest for anomaly detection. *IEEE Transactions on Knowledge and Data Engineering*.
690+
687691
.. [#You2017Provable] You, C., Robinson, D.P. and Vidal, R., 2017. Provable self-representation based outlier detection in a union of subspaces. In Proceedings of the IEEE conference on computer vision and pattern recognition.
688692
689693
.. [#Zenati2018Adversarially] Zenati, H., Romain, M., Foo, C.S., Lecouat, B. and Chandrasekhar, V., 2018, November. Adversarially learned anomaly detection. In 2018 IEEE International conference on data mining (ICDM) (pp. 727-736). IEEE.

docs/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ multivariate data. This exciting yet challenging field is commonly referred as
7676
or `Anomaly Detection <https://en.wikipedia.org/wiki/Anomaly_detection>`_.
7777

7878
PyOD includes more than 40 detection algorithms, from classical LOF (SIGMOD 2000) to
79-
the latest ECOD (TKDE 2022). Since 2017, PyOD :cite:`a-zhao2019pyod` has been successfully used in numerous
79+
the latest ECOD and DIF (TKDE 2022 and 2023). Since 2017, PyOD :cite:`a-zhao2019pyod` has been successfully used in numerous
8080
academic researches and commercial products with more than `10 million downloads <https://pepy.tech/project/pyod>`_.
8181
It is also well acknowledged by the machine learning community with various dedicated posts/tutorials, including
8282
`Analytics Vidhya <https://www.analyticsvidhya.com/blog/2019/02/outlier-detection-python-pyod/>`_,
@@ -209,6 +209,7 @@ Proximity-Based SOD Subspace Outlier Detection
209209
Proximity-Based ROD Rotation-based Outlier Detection 2020 :class:`pyod.models.rod.ROD` :cite:`a-almardeny2020novel`
210210
Outlier Ensembles IForest Isolation Forest 2008 :class:`pyod.models.iforest.IForest` :cite:`a-liu2008isolation,a-liu2012isolation`
211211
Outlier Ensembles INNE Isolation-based Anomaly Detection Using Nearest-Neighbor Ensembles 2018 :class:`pyod.models.inne.INNE` :cite:`a-bandaragoda2018isolation`
212+
Outlier Ensembles DIF Deep Isolation Forest for Anomaly Detection 2023 :class:`pyod.models.dif.DIF` :cite:`a-Xu2023Deep`
212213
Outlier Ensembles FB Feature Bagging 2005 :class:`pyod.models.feature_bagging.FeatureBagging` :cite:`a-lazarevic2005feature`
213214
Outlier Ensembles LSCP LSCP: Locally Selective Combination of Parallel Outlier Ensembles 2019 :class:`pyod.models.lscp.LSCP` :cite:`a-zhao2019lscp`
214215
Outlier Ensembles XGBOD Extreme Boosting Based Outlier Detection **(Supervised)** 2018 :class:`pyod.models.xgbod.XGBOD` :cite:`a-zhao2018xgbod`

docs/pyod.models.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,17 @@ pyod.models.deep\_svdd module
105105
:show-inheritance:
106106
:inherited-members:
107107

108+
pyod.models.dif module
109+
-----------------------------
110+
111+
.. automodule:: pyod.models.dif
112+
:members:
113+
:exclude-members:
114+
:undoc-members:
115+
:show-inheritance:
116+
:inherited-members:
117+
118+
108119
pyod.models.ecod module
109120
------------------------
110121

docs/requirements.txt

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,18 @@ keras
66
matplotlib
77
nose
88
numpy>=1.19
9-
numba==0.53 # need to lift this later see github for issue
9+
numba>=0.51
1010
pyclustering
1111
pytest
1212
pythresh>=0.3.1
1313
ruptures
1414
scipy>=1.5.1
15-
scikit_learn>=0.20.0
15+
scikit_learn>=0.22.0
1616
scikit-lego
1717
six
1818
sphinx-rtd-theme
1919
sphinxcontrib-bibtex
20+
statsmodels
2021
suod
2122
tensorflow
2223
torch

docs/zreferences.bib

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -489,4 +489,15 @@ @article{fang2001wrap
489489
pages={608--624},
490490
year={2001},
491491
publisher={Elsevier}
492+
}
493+
494+
@article{xu2023dif,
495+
author={Xu, Hongzuo and Pang, Guansong and Wang, Yijie and Wang, Yongjun},
496+
journal={IEEE Transactions on Knowledge and Data Engineering},
497+
title={Deep Isolation Forest for Anomaly Detection},
498+
year={2023},
499+
volume={},
500+
number={},
501+
pages={1-14},
502+
doi={10.1109/TKDE.2023.3270293}
492503
}

examples/dif_example.py

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# -*- coding: utf-8 -*-
2+
"""Example of using Deep Isolation Forest for
3+
outlier detection"""
4+
# Author: Hongzuo Xu <hongzuoxu@126.com>
5+
# License: BSD 2 clause
6+
7+
from __future__ import division
8+
from __future__ import print_function
9+
10+
import os
11+
import sys
12+
13+
# temporary solution for relative imports in case pyod is not installed
14+
# if pyod is installed, no need to use the following line
15+
sys.path.append(
16+
os.path.abspath(os.path.join(os.path.dirname("__file__"), '..')))
17+
18+
from pyod.models.dif import DIF
19+
from pyod.utils.data import generate_data
20+
from pyod.utils.data import evaluate_print
21+
22+
if __name__ == "__main__":
23+
contamination = 0.1 # percentage of outliers
24+
n_train = 1000 # number of training points
25+
n_test = 200 # number of testing points
26+
n_features = 30 # number of features
27+
28+
# Generate sample data
29+
X_train, X_test, y_train, y_test = \
30+
generate_data(n_train=n_train,
31+
n_test=n_test,
32+
n_features=n_features,
33+
contamination=contamination,
34+
random_state=42)
35+
36+
# train deep isolation forest detector
37+
clf_name = 'DIF'
38+
clf = DIF()
39+
clf.fit(X_train)
40+
41+
# get the prediction labels and outlier scores of the training data
42+
y_train_pred = clf.labels_ # binary labels (0: inliers, 1: outliers)
43+
y_train_scores = clf.decision_scores_ # raw outlier scores
44+
45+
# get the prediction on the test data
46+
y_test_pred = clf.predict(X_test) # outlier labels (0 or 1)
47+
y_test_scores = clf.decision_function(X_test) # outlier scores
48+
49+
# evaluate and print the results
50+
print("\nOn Training Data:")
51+
evaluate_print(clf_name, y_train, y_train_scores)
52+
print("\nOn Test Data:")
53+
evaluate_print(clf_name, y_test, y_test_scores)

0 commit comments

Comments
 (0)