You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/encoders/frequency_encoding/frequency_encoding.jl
+9-9Lines changed: 9 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -3,20 +3,20 @@
3
3
**Private method.**
4
4
5
5
Fit an encoder that encodes the categorical values in the specified
6
-
categorical columns with their (normalized or raw) frequencies of occurrence in the dataset.
6
+
categorical features with their (normalized or raw) frequencies of occurrence in the dataset.
7
7
8
8
# Arguments
9
9
10
-
- `X`: A table where the elements of the categorical columns have [scitypes](https://juliaai.github.io/ScientificTypes.jl/dev/) `Multiclass` or `OrderedFactor`
11
-
- `features=[]`: A list of names of categorical columns given as symbols to exclude or include from encoding
12
-
- `ignore=true`: Whether to exclude or includes the columns given in `features`
10
+
- `X`: A table where the elements of the categorical features have [scitypes](https://juliaai.github.io/ScientificTypes.jl/dev/) `Multiclass` or `OrderedFactor`
11
+
- `features=[]`: A list of names of categorical features given as symbols to exclude or include from encoding
12
+
- `ignore=true`: Whether to exclude or includes the features given in `features`
13
13
- `ordered_factor=false`: Whether to encode `OrderedFactor` or ignore them
14
14
- `normalize=false`: Whether to use normalized frequencies that sum to 1 over category values or to use raw counts.
15
15
16
16
# Returns (in a dict)
17
17
18
-
- `statistic_given_feat_val`: The frequency of each level of each selected categorical column
19
-
- `encoded_features`: The subset of the categorical columns of X that were encoded
18
+
- `statistic_given_feat_val`: The frequency of each level of each selected categorical feature
19
+
- `encoded_features`: The subset of the categorical features of X that were encoded
statistic_given_feat_val =Dict{Any, Real}(level=>frequency_map[level] for level inlevels(col))
@@ -51,12 +51,12 @@ Encode the levels of a categorical variable in a given table with their (normali
51
51
52
52
# Arguments
53
53
54
-
- `X`: A table where the elements of the categorical columns have [scitypes](https://juliaai.github.io/ScientificTypes.jl/dev/) `Multiclass` or `OrderedFactor`
54
+
- `X`: A table where the elements of the categorical features have [scitypes](https://juliaai.github.io/ScientificTypes.jl/dev/) `Multiclass` or `OrderedFactor`
55
55
- `cache`: The output of `frequency_encoder_fit`
56
56
57
57
# Returns
58
58
59
-
- `X_tr`: The table with selected columns after the selected columns are encoded by frequency encoding.
59
+
- `X_tr`: The table with selected features after the selected features are encoded by frequency encoding.
Copy file name to clipboardExpand all lines: src/encoders/ordinal_encoding/ordinal_encoding.jl
+8-8Lines changed: 8 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -6,23 +6,23 @@ Fit an encoder to encode the levels of categorical variables in a given table as
6
6
7
7
# Arguments
8
8
9
-
- `X`: A table where the elements of the categorical columns have [scitypes](https://juliaai.github.io/ScientificTypes.jl/dev/) `Multiclass` or `OrderedFactor`
10
-
- `features=[]`: A list of names of categorical columns given as symbols to exclude or include from encoding
11
-
- `ignore=true`: Whether to exclude or includes the columns given in `features`
9
+
- `X`: A table where the elements of the categorical features have [scitypes](https://juliaai.github.io/ScientificTypes.jl/dev/) `Multiclass` or `OrderedFactor`
10
+
- `features=[]`: A list of names of categorical features given as symbols to exclude or include from encoding
11
+
- `ignore=true`: Whether to exclude or includes the features given in `features`
12
12
- `ordered_factor=false`: Whether to encode `OrderedFactor` or ignore them
13
13
14
14
# Returns (in a dict)
15
15
16
-
- `index_given_feat_level`: Maps each level for each column in a subset of the categorical columns of X into an integer.
17
-
- `encoded_features`: The subset of the categorical columns of X that were encoded
16
+
- `index_given_feat_level`: Maps each level for each column in a subset of the categorical features of X into an integer.
17
+
- `encoded_features`: The subset of the categorical features of X that were encoded
18
18
"""
19
19
functionordinal_encoder_fit(
20
20
X,
21
21
features::AbstractVector{Symbol}= Symbol[];
22
22
ignore::Bool=true,
23
23
ordered_factor::Bool=false,
24
24
)
25
-
# 1. Define column mapper
25
+
# 1. Define feature mapper
26
26
functionfeature_mapper(col, name)
27
27
feat_levels =levels(col)
28
28
index_given_feat_val =
@@ -50,12 +50,12 @@ Encode the levels of a categorical variable in a given table as integers.
50
50
51
51
# Arguments
52
52
53
-
- `X`: A table where the elements of the categorical columns have [scitypes](https://juliaai.github.io/ScientificTypes.jl/dev/) `Multiclass` or `OrderedFactor`
53
+
- `X`: A table where the elements of the categorical features have [scitypes](https://juliaai.github.io/ScientificTypes.jl/dev/) `Multiclass` or `OrderedFactor`
54
54
- `cache`: The output of `ordinal_encoder_fit`
55
55
56
56
# Returns
57
57
58
-
- `X_tr`: The table with selected columns after the selected columns are encoded by ordinal encoding.
58
+
- `X_tr`: The table with selected features after the selected features are encoded by ordinal encoding.
0 commit comments