Skip to content

Commit b5be01d

Browse files
authored
Merge pull request #4374 from VesnaT/outliers_docs
[DOC] outlier_detection: data-mining-library docs
2 parents ecbbe9e + ee12657 commit b5be01d

File tree

7 files changed

+73
-36
lines changed

7 files changed

+73
-36
lines changed

Orange/widgets/data/owoutliers.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -190,7 +190,7 @@ def init_gui(self):
190190

191191
self._init_editors()
192192

193-
gui.auto_send(self.controlArea, self, "auto_commit")
193+
gui.auto_apply(self.controlArea, self, "auto_commit")
194194

195195
self.info.set_input_summary(self.info.NoInput)
196196
self.info.set_output_summary(self.info.NoOutput)

doc/data-mining-library/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ Available classes and methods.
3535

3636
reference/data
3737
reference/preprocess
38+
reference/outliers
3839
reference/classification
3940
reference/regression
4041
reference/clustering

doc/data-mining-library/source/reference/classification.rst

Lines changed: 0 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -109,18 +109,6 @@ Nu-Support Vector Machines
109109
:members:
110110

111111

112-
113-
.. index:: one class SVM
114-
pair: classification; one class SVM
115-
116-
One Class Support Vector Machines
117-
---------------------------------
118-
119-
.. autoclass:: OneClassSVMLearner
120-
:members:
121-
122-
123-
124112
.. index:: classification tree
125113
pair: classification; tree
126114

@@ -158,16 +146,6 @@ Majority Classifier
158146
:members:
159147

160148

161-
.. index:: elliptic envelope
162-
pair: classification; elliptic envelope
163-
164-
Elliptic Envelope
165-
-----------------
166-
167-
.. autoclass:: EllipticEnvelopeLearner
168-
:members:
169-
170-
171149
.. index:: neural network
172150
pair: classification; neural network
173151

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
#########################################
2+
Outlier detection (``classification``)
3+
#########################################
4+
5+
.. automodule:: Orange.classification
6+
7+
.. index:: one class SVM
8+
pair: classification; one class SVM
9+
10+
One Class Support Vector Machines
11+
---------------------------------
12+
13+
.. autoclass:: OneClassSVMLearner
14+
:members:
15+
16+
17+
18+
.. index:: elliptic envelope
19+
pair: classification; elliptic envelope
20+
21+
Elliptic Envelope
22+
-----------------
23+
24+
.. autoclass:: EllipticEnvelopeLearner
25+
:members:
26+
27+
28+
.. index:: local outlier factor
29+
pair: classification; local outlier factor
30+
31+
Local Outlier Factor
32+
--------------------
33+
34+
.. autoclass:: LocalOutlierFactorLearner
35+
:members:
36+
37+
38+
.. index:: isolation forest
39+
pair: classification; isolation forest
40+
41+
Isolation Forest
42+
----------------
43+
44+
.. autoclass:: IsolationForestLearner
45+
:members:
-63.9 KB
Loading
7.98 KB
Loading
Lines changed: 26 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,49 @@
11
Outliers
22
========
33

4-
Simple outlier detection by comparing distances between instances.
4+
Outlier detection widget.
55

66
**Inputs**
77

88
- Data: input dataset
9-
- Distances: distance matrix
109

1110
**Outputs**
1211

1312
- Outliers: instances scored as outliers
1413
- Inliers: instances not scored as outliers
14+
- Data: input dataset appended *Outlier* variable
1515

16-
The **Outliers** widget applies one of the two methods for outlier detection. Both methods apply classification to the dataset, one with SVM (multiple kernels) and the other with elliptical envelope. *One-class SVM with non-linear kernels (RBF)* performs well with non-Gaussian distributions, while *Covariance estimator* works only for data with Gaussian distribution.
16+
The **Outliers** widget applies one of the four methods for outlier detection. All methods apply classification to the dataset. *One-class SVM with non-linear kernels (RBF)* performs well with non-Gaussian distributions, while *Covariance estimator* works only for data with Gaussian distribution. One efficient way to perform outlier detection on moderately high dimensional datasets is to use the *Local Outlier Factor* algorithm. The algorithm computes a score reflecting the degree of abnormality of the observations. It measures the local density deviation of a given data point with respect to its neighbors. Another efficient way of performing outlier detection in high-dimensional datasets is to use random forests (*Isolation Forest*).
1717

1818
![](images/Outliers-stamped.png)
1919

20-
1. Information on the input data, number of inliers and outliers based on the selected model.
21-
2. Select the *Outlier detection method*:
20+
1. Method for outlier detection:
21+
- [One Class SVM](http://scikit-learn.org/stable/modules/generated/sklearn.svm.OneClassSVM.html)
22+
- [Covariance Estimator](http://scikit-learn.org/stable/modules/generated/sklearn.covariance.EllipticEnvelope.html)
23+
- [Local Outlier Factor](http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.LocalOutlierFactor.html)
24+
- [Isolation Forest](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html)
25+
2. Set parameters for the method:
2226
- **One class SVM with non-linear kernel (RBF)**: classifies data as similar or different from the core class:
23-
- **Nu** is a parameter for the upper bound on the fraction of training errors and a lower bound of the fraction of support vectors
24-
- **Kernel coefficient** is a gamma parameter, which specifies how much influence a single data instance has
25-
- **Covariance estimator**: fits ellipsis to central points with Mahalanobis distance metric
26-
- **Contamination** is the proportion of outliers in the dataset
27-
- **Support fraction** specifies the proportion of points included in the estimate
28-
3. Produce a report.
29-
4. Click *Detect outliers* to output the data.
27+
- *Nu* is a parameter for the upper bound on the fraction of training errors and a lower bound of the fraction of support vectors
28+
- *Kernel coefficient* is a gamma parameter, which specifies how much influence a single data instance has
29+
- **Covariance estimator**: fits ellipsis to central points with Mahalanobis distance metric:
30+
- *Contamination* is the proportion of outliers in the dataset
31+
- *Support fraction* specifies the proportion of points included in the estimate
32+
- **Local Outlier Factor**: obtains local density from the k-nearest neighbors:
33+
- *Contamination* is the proportion of outliers in the dataset
34+
- *Neighbors* represents number of neighbors
35+
- *Metric* is the distance measure
36+
- **Isolation Forest**: isolates observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature:
37+
- *Contamination* is the proportion of outliers in the dataset
38+
- *Replicabe training* fixes random seed
39+
3. If *Apply automatically* is ticked, changes will be propagated automatically. Alternatively, click *Apply*.
40+
4. Produce a report.
41+
5. Number of instances on the input, followed by number of instances scored as inliers.
42+
3043

3144
Example
3245
-------
3346

34-
Below, is a simple example of how to use this widget. We used the *Iris* dataset to detect the outliers. We chose the *one class SVM with non-linear kernel (RBF)* method, with Nu set at 20% (less training errors, more support vectors). Then we observed the outliers in the [Data Table](../data/datatable.md) widget, while we sent the inliers to the [Scatter Plot](../visualize/scatterplot.md).
47+
Below is an example of how to use this widget. We used subset (*versicolor* and *virginica* instances) of the *Iris* dataset to detect the outliers. We chose the *Local Outlier Factor* method, with *Euclidean* distance. Then we observed the annotated instances in the [Scatter Plot](../visualize/scatterplot.md) widget. In the next step we used the *setosa* instances to demonstrate novelty detection using [Apply Domain](../data/applydomain.md) widget. After concatenating both outputs we examined the outliers in the *Scatter Plot (1)*.
3548

3649
![](images/Outliers-Example.png)

0 commit comments

Comments
 (0)