You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**News**: We have a 45-page, the most comprehensive `anomaly detection benchmark paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-neurips-adbench.pdf>`_.
62
-
The fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 57 benchmark datasets.
63
61
64
-
**For time-series outlier detection**, please use `TODS <https://github.com/datamllab/tods>`_.
65
-
**For graph outlier detection**, please use `PyGOD <https://pygod.org/>`_.
62
+
Read Me First
63
+
^^^^^^^^^^^^^
64
+
65
+
Welcome to PyOD, a versatile Python library for detecting anomalies in multivariate data. Whether you're tackling a small-scale project or large datasets, PyOD offers a range of algorithms to suit your needs.
66
+
67
+
* **For time-series outlier detection**, please use `TODS <https://github.com/datamllab/tods>`_.
68
+
69
+
* **For graph outlier detection**, please use `PyGOD <https://pygod.org/>`_.
70
+
71
+
* **Performance Comparison \& Datasets**: We have a 45-page, the most comprehensive `anomaly detection benchmark paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-neurips-adbench.pdf>`_. The fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 57 benchmark datasets.
72
+
73
+
* **Learn more about anomaly detection** \@ `Anomaly Detection Resources <https://github.com/yzhao062/anomaly-detection-resources>`_
74
+
75
+
* **PyOD on Distributed Systems**: you could also run `PyOD on databricks <https://www.databricks.com/blog/2023/03/13/unsupervised-outlier-detection-databricks.html>`_.
76
+
77
+
----
78
+
79
+
About PyOD
80
+
^^^^^^^^^^
66
81
67
-
PyOD is the most comprehensive and scalable **Python library** for **detecting outlying objects** in
82
+
PyOD, established in 2017, has become a go-to **Python library** for **detecting anomalous/outlying objects** in
68
83
multivariate data. This exciting yet challenging field is commonly referred as
`KDnuggets <https://www.kdnuggets.com/2019/02/outlier-detection-methods-cheat-sheet.html>`_, and
@@ -80,10 +95,10 @@ It is also well acknowledged by the machine learning community with various dedi
80
95
81
96
**PyOD is featured for**:
82
97
83
-
* **Unified APIs, detailed documentation, and interactive examples** across various algorithms.
84
-
* **Advanced models**\,including **classical distance and density estimation**, **latest deep learning methods**, and **emerging algorithms like ECOD**.
85
-
* **Optimized performance with JIT and parallelization** using `numba <https://github.com/numba/numba>`_ and `joblib <https://github.com/joblib/joblib>`_.
86
-
* **Fast training & prediction with SUOD** [#Zhao2021SUOD]_.
98
+
* **Unified, User-Friendly Interface** across various algorithms.
99
+
* **Wide Range of Models**\,from classic techniques to the latest deep learning methods.
100
+
* **High Performance & Efficiency**, leveraging `numba <https://github.com/numba/numba>`_ and `joblib <https://github.com/joblib/joblib>`_ for JIT compilation and parallel processing.
101
+
* **Fast Training & Prediction**, achieved through the SUOD framework [#Zhao2021SUOD]_.
87
102
88
103
89
104
**Outlier Detection with 5 Lines of Code**\ :
@@ -92,22 +107,19 @@ It is also well acknowledged by the machine learning community with various dedi
92
107
.. code-block:: python
93
108
94
109
95
-
#train an ECOD detector
110
+
#Example: Training an ECOD detector
96
111
from pyod.models.ecod importECOD
97
112
clf = ECOD()
98
113
clf.fit(X_train)
114
+
y_train_scores = clf.decision_scores_ # Outlier scores for training data
115
+
y_test_scores = clf.decision_function(X_test) # Outlier scores for test data
99
116
100
-
# get outlier scores
101
-
y_train_scores = clf.decision_scores_ # raw outlier scores on the train data
102
-
y_test_scores = clf.decision_function(X_test) # predict raw outlier scores on test
103
-
104
-
105
-
**Personal suggestion on selecting an OD algorithm**. If you do not know which algorithm to try, go with:
117
+
**Selecting the Right Algorithm:**. Unsure where to start? Consider these robust and interpretable options:
106
118
107
119
- `ECOD <https://github.com/yzhao062/pyod/blob/master/examples/ecod_example.py>`_: Example of using ECOD for outlier detection
108
120
- `Isolation Forest <https://github.com/yzhao062/pyod/blob/master/examples/iforest_example.py>`_: Example of using Isolation Forest for outlier detection
109
121
110
-
They are both fast and interpretable. Or, you could try more data-driven approach `MetaOD <https://github.com/yzhao062/MetaOD>`_.
122
+
Alternatively, explore `MetaOD <https://github.com/yzhao062/MetaOD>`_ for a data-driven approach.
111
123
112
124
**Citing PyOD**\ :
113
125
@@ -131,29 +143,34 @@ or::
131
143
132
144
Zhao, Y., Nasrullah, Z. and Li, Z., 2019. PyOD: A Python Toolbox for Scalable Outlier Detection. Journal of machine learning research (JMLR), 20(96), pp.1-7.
133
145
134
-
If you want more general insights of anomaly detection and/or algorithm performance comparison, please see our
135
-
NeurIPS 2022 paper `ADBench: Anomaly Detection Benchmark Paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-neurips-adbench.pdf>`_::
146
+
For a broader perspective on anomaly detection, see our NeurIPS papers
147
+
`ADBench: Anomaly Detection Benchmark Paper <https://viterbi-web.usc.edu/~yzhao010/papers/22-neurips-adbench.pdf>`_ \& `ADGym: Design Choices for Deep Anomaly Detection <https://viterbi-web.usc.edu/~yzhao010/papers/23-neurips-adgym.pdf>`_::
136
148
137
-
@inproceedings{han2022adbench,
138
-
title={ADBench: Anomaly Detection Benchmark},
139
-
author={Songqiao Han and Xiyang Hu and Hailiang Huang and Mingqi Jiang and Yue Zhao},
140
-
booktitle={Neural Information Processing Systems (NeurIPS)}
141
-
year={2022},
149
+
@article{han2022adbench,
150
+
title={Adbench: Anomaly detection benchmark},
151
+
author={Han, Songqiao and Hu, Xiyang and Huang, Hailiang and Jiang, Minqi and Zhao, Yue},
152
+
journal={Advances in Neural Information Processing Systems},
153
+
volume={35},
154
+
pages={32142--32159},
155
+
year={2022}
142
156
}
143
157
144
-
**Key Links and Resources**\ :
145
-
158
+
@article{jiang2023adgym,
159
+
title={ADGym: Design Choices for Deep Anomaly Detection},
160
+
author={Jiang, Minqi and Hou, Chaochuan and Zheng, Ao and Han, Songqiao and Huang, Hailiang and Wen, Qingsong and Hu, Xiyang and Zhao, Yue},
161
+
journal={Advances in Neural Information Processing Systems},
162
+
volume={36},
163
+
year={2023}
164
+
}
146
165
147
-
* `View the latest codes on Github <https://github.com/yzhao062/pyod>`_
@@ -169,8 +186,8 @@ NeurIPS 2022 paper `ADBench: Anomaly Detection Benchmark Paper <https://www.andr
169
186
Installation
170
187
^^^^^^^^^^^^
171
188
172
-
It is recommended to use **pip** or **conda** for installation. Please make sure
173
-
**the latest version** is installed, as PyOD is updated frequently:
189
+
PyOD is designed for easy installation using either **pip** or **conda**.
190
+
We recommend using the latest version of PyOD due to frequent updates and enhancements:
174
191
175
192
.. code-block:: bash
176
193
@@ -193,7 +210,7 @@ Alternatively, you could clone and run setup.py file:
193
210
**Required Dependencies**\ :
194
211
195
212
196
-
* Python 3.6+
213
+
* Python 3.6 or higher
197
214
* joblib
198
215
* matplotlib
199
216
* numpy>=1.19
@@ -207,19 +224,12 @@ Alternatively, you could clone and run setup.py file:
207
224
208
225
* combo (optional, required for models/combination.py and FeatureBagging)
209
226
* keras/tensorflow (optional, required for AutoEncoder, and other deep learning models)
210
-
* pandas (optional, required for running benchmark)
211
227
* suod (optional, required for running SUOD model)
212
228
* xgboost (optional, required for XGBOD)
213
-
* pythresh to use thresholding
229
+
* pythresh (optional, required for thresholding)
214
230
215
231
**Warning**\ :
216
-
PyOD has multiple neural network based models, e.g., AutoEncoders, which are
217
-
implemented in both Tensorflow and PyTorch. However, PyOD does **NOT** install these deep learning libraries for you.
218
-
This reduces the risk of interfering with your local copies.
219
-
If you want to use neural-net based models, please make sure these deep learning libraries are installed.
220
-
Instructions are provided: `neural-net FAQ <https://github.com/yzhao062/pyod/wiki/Setting-up-Keras-and-Tensorflow-for-Neural-net-Based-models>`_.
221
-
Similarly, models depending on **xgboost**, e.g., XGBOD, would **NOT** enforce xgboost installation by default.
222
-
232
+
PyOD includes several neural network-based models, such as AutoEncoders, implemented in Tensorflow and PyTorch. These deep learning libraries are not automatically installed by PyOD to avoid conflicts with existing installations. If you plan to use neural-net based models, please ensure these libraries are installed. See the `neural-net FAQ <https://github.com/yzhao062/pyod/wiki/Setting-up-Keras-and-Tensorflow-for-Neural-net-Based-models>`_ for guidance. Additionally, xgboost is not installed by default but is required for models like XGBOD.
223
233
224
234
225
235
----
@@ -228,29 +238,27 @@ Similarly, models depending on **xgboost**, e.g., XGBOD, would **NOT** enforce x
228
238
API Cheatsheet & Reference
229
239
^^^^^^^^^^^^^^^^^^^^^^^^^^
230
240
231
-
Full API Reference: (https://pyod.readthedocs.io/en/latest/pyod.html). API cheatsheet for all detectors:
232
-
241
+
The full API Reference is available at `PyOD Documentation <https://pyod.readthedocs.io/en/latest/pyod.html>`_. Below is a quick cheatsheet for all detectors:
233
242
234
-
* **fit(X)**\ : Fit detector. y is ignored in unsupervised methods.
235
-
* **decision_function(X)**\ : Predict raw anomaly score of X using the fitted detector.
236
-
* **predict(X)**\ : Predict if a particular sample is an outlier or not using the fitted detector.
237
-
* **predict_proba(X)**\ : Predict the probability of a sample being outlier using the fitted detector.
238
-
* **predict_confidence(X)**\ : Predict the model's sample-wise confidence (available in predict and predict_proba) [#Perini2020Quantifying]_.
243
+
* **fit(X)**\ : Fit the detector. The parameter y is ignored in unsupervised methods.
244
+
* **decision_function(X)**\ : Predict raw anomaly scores for X using the fitted detector.
245
+
* **predict(X)**\ : Determine whether a sample is an outlier or not as binary labels using the fitted detector.
246
+
* **predict_proba(X)**\ : Estimate the probability of a sample being an outlier using the fitted detector.
247
+
* **predict_confidence(X)**\ : Assess the model's confidence on a per-sample basis (applicable in predict and predict_proba) [#Perini2020Quantifying]_.
239
248
240
249
241
-
Key Attributes of a fitted model:
250
+
**Key Attributes of a fitted model**:
242
251
243
252
244
-
* **decision_scores_**\ : The outlier scores of the training data. The higher, the more abnormal.
245
-
Outliers tend to have higher scores.
246
-
* **labels_**\ : The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies.
253
+
* **decision_scores_**\ : Outlier scores of the training data. Higher scores typically indicate more abnormal behavior. Outliers usually have higher scores.
254
+
* **labels_**\ : Binary labels of the training data, where 0 indicates inliers and 1 indicates outliers/anomalies.
247
255
248
256
249
257
----
250
258
251
259
252
-
ADBench Benchmark
253
-
^^^^^^^^^^^^^^^^^
260
+
ADBench Benchmark and Datasets
261
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
254
262
255
263
We just released a 45-page, the most comprehensive `ADBench: Anomaly Detection Benchmark <https://arxiv.org/abs/2206.09426>`_ [#Han2022ADBench]_.
256
264
The fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 57 benchmark datasets.
@@ -262,16 +270,12 @@ The organization of **ADBench** is provided below:
262
270
:alt:benchmark-fig
263
271
264
272
265
-
**The comparison of selected models** is made available below
Copy file name to clipboardExpand all lines: docs/api_cc.rst
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,20 +1,20 @@
1
1
API CheatSheet
2
2
==============
3
3
4
-
The following APIs are applicable for all detector models for easy use.
4
+
The full API Reference is available at `PyOD Documentation <https://pyod.readthedocs.io/en/latest/pyod.html>`_. Below is a quick cheatsheet for all detectors:
5
5
6
-
* :func:`pyod.models.base.BaseDetector.fit`: Fit detector. y is ignored in unsupervised methods.
7
-
* :func:`pyod.models.base.BaseDetector.decision_function`: Predict raw anomaly score of X using the fitted detector.
8
-
* :func:`pyod.models.base.BaseDetector.predict`: Predict if a particular sample is an outlier or not using the fitted detector.
9
-
* :func:`pyod.models.base.BaseDetector.predict_proba`: Predict the probability of a sample being outlier using the fitted detector.
10
-
* :func:`pyod.models.base.BaseDetector.predict_confidence`: Predict the model's sample-wise confidence (available in predict and predict_proba).
6
+
* :func:`pyod.models.base.BaseDetector.fit`: The parameter y is ignored in unsupervised methods.
7
+
* :func:`pyod.models.base.BaseDetector.decision_function`: Predict raw anomaly scores for X using the fitted detector.
8
+
* :func:`pyod.models.base.BaseDetector.predict`: Determine whether a sample is an outlier or not as binary labels using the fitted detector.
9
+
* :func:`pyod.models.base.BaseDetector.predict_proba`: Estimate the probability of a sample being an outlier using the fitted detector.
10
+
* :func:`pyod.models.base.BaseDetector.predict_confidence`: Assess the model's confidence on a per-sample basis (applicable in predict and predict_proba) [#Perini2020Quantifying]_.
11
11
12
12
13
-
Key Attributes of a fitted model:
13
+
**Key Attributes of a fitted model**:
14
14
15
-
* :attr:`pyod.models.base.BaseDetector.decision_scores_`: The outlier scores of the training data. The higher, the more abnormal.
15
+
* :attr:`pyod.models.base.BaseDetector.decision_scores_`: Outlier scores of the training data. Higher scores typically indicate more abnormal behavior. Outliers usually have higher scores.
16
16
Outliers tend to have higher scores.
17
-
* :attr:`pyod.models.base.BaseDetector.labels_`: The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies.
17
+
* :attr:`pyod.models.base.BaseDetector.labels_`: Binary labels of the training data, where 0 indicates inliers and 1 indicates outliers/anomalies.
Copy file name to clipboardExpand all lines: docs/benchmark.rst
+8-1Lines changed: 8 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ Benchmarks
4
4
Latest ADBench (2022)
5
5
---------------------
6
6
7
-
We just released a 36-page, the most comprehensive `anomaly detection benchmark paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-preprint-adbench.pdf>`_ :cite:`a-han2022adbench`.
7
+
We just released a 36-page, the most comprehensive `anomaly detection benchmark paper <https://arxiv.org/abs/2206.09426>`_ :cite:`a-han2022adbench`.
8
8
The fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 55 benchmark datasets.
9
9
10
10
The organization of **ADBench** is provided below:
@@ -14,6 +14,13 @@ The organization of **ADBench** is provided below:
14
14
:alt:benchmark
15
15
16
16
17
+
For a simpler visualization, we make **the comparison of selected models** via
0 commit comments