Skip to content

Commit e8ac4b5

Browse files
authored
[E&A] Completes feature processors content (#1281)
## Overview Closes #1267 This PR adds content to the [Feature processors](https://www.elastic.co/docs/explore-analyze/machine-learning/data-frame-analytics/ml-feature-processors) page that was accidentally left behind during the migration process. It also amends the links on the page.
1 parent d470d96 commit e8ac4b5

File tree

5 files changed

+56
-5
lines changed

5 files changed

+56
-5
lines changed
61.4 KB
Loading
53.9 KB
Loading
31.9 KB
Loading
75.7 KB
Loading

explore-analyze/machine-learning/data-frame-analytics/ml-feature-processors.md

Lines changed: 56 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,59 @@ Refer to the `feature_processors` property of the [Create {{dfanalytics-job}} AP
1717

1818
Available feature processors:
1919

20-
* [Frequency encoding](https://www.elastic.co/guide/en/machine-learning/current/frequency-encoding.html)
21-
* [Multi encoding](https://www.elastic.co/guide/en/machine-learning/current/multi-encoding.html)
22-
* [n-gram encoding](https://www.elastic.co/guide/en/machine-learning/current/ngram-encoding.html)
23-
* [One hot encoding](https://www.elastic.co/guide/en/machine-learning/current/one-hot-encoding.html)
24-
* [Target mean encoding](https://www.elastic.co/guide/en/machine-learning/current/target-mean-encoding.html)
20+
* [Frequency encoding](#frequency-encoding)
21+
* [Multi encoding](#multi-encoding)
22+
* [n-gram encoding](#ngram-encoding)
23+
* [One hot encoding](#one-hot-encoding)
24+
* [Target mean encoding](#target-mean-encoding)
25+
26+
## Frequency encoding [frequency-encoding]
27+
28+
Frequency encoding takes into account how many times a given categorical feature is present in relation to the value of the encoded field.
29+
The more frequently the feature is present, the greater the weight of the feature in the data set.
30+
With this encoding technique, it is not possible to get back to the categorical values after the encoding is done as different categories may have the same frequency.
31+
32+
:::{image} /explore-analyze/images/frequency-encoding.jpg
33+
:alt: Frequency encoding
34+
:::
35+
36+
*The figure shows a simple frequency encoding example. The Animal_freq value of `cat` is 0.5 as the feature is present at half of the number of related values. The labels `dog` and `crocodile` occur only once each. For this reason, the Animal_freq value of these labels is 0.25.*
37+
38+
## Multi encoding [multi-encoding]
39+
40+
Multi encoding enables you to use multiple processors in the same {{dfanalytics-job}}.
41+
You can define an ordered sequence of processors in which the output of a processor can be forwarded to the next processor as an input.
42+
For example, you can define an n-gram feature processor that creates a series of n-grams that can be encoded by a chained one hot encoding processor.
43+
44+
## n-gram encoding [ngram-encoding]
45+
46+
n-gram encoding encodes a string into a collection of n-grams (a sequence of n items) of a configured length.
47+
The output of this encoding is categorical.
48+
Consequently, additional automatic processing will be done to the resulting n-grams.
49+
50+
:::{image} /explore-analyze/images/ngram-encoding.jpg
51+
:alt: n-gram encoding
52+
:::
53+
54+
*The table shows the n-gram encoding of the Animal field. It executes unigram and bigram encoding (n-gram of size 1 and 2) and goes to the string length of 3.*
55+
56+
## One hot encoding [one-hot-encoding]
57+
58+
One hot encoding transforms categorical values into numerical ones by assigning vectors to each category.
59+
The vector represents whether the corresponding feature is present (1) or not present (0) at the given value, so the encoding method maps the different categorical features to the numerical values.
60+
61+
:::{image} /explore-analyze/images/one-hot-encoding.jpg
62+
:alt: One hot encoding
63+
:::
64+
65+
*One hot encoding maps each category to the corresponding value. If the category is present at a given value, the assigned vector is `1`, if it is not, the vector is `0`.*
66+
67+
## Target mean encoding [target-mean-encoding]
68+
69+
Target mean encoding replaces categorical values with the mean value of the target variable as it relates to the categorical variable itself.
70+
71+
:::{image} /explore-analyze/images/target-mean-encoding.jpg
72+
:alt: Target mean encoding
73+
:::
74+
75+
*The figure shows a simple target mean encoding example. The label `cat` has two occurrences in the data set. One of them has a corresponding target variable of `0`, the other one has a `1`. The `Animal_target_mean` value of the `cat` label is 0.5 after using the target mean encoding processor while the value of `dog` and `crocodile` is 1 as each of their occurrences has a corresponding target variable of `1`.*

0 commit comments

Comments
 (0)