You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/encoders/contrast_encoder/contrast_encoder.jl
+33-23Lines changed: 33 additions & 23 deletions
Original file line number
Diff line number
Diff line change
@@ -9,13 +9,13 @@ Where `k` is the number of levels in the feature and the returned contrast matri
9
9
"""
10
10
### 1. Dummy Coding
11
11
functionget_dummy_contrast(k)
12
-
returnMatrix(1.0I, k, k-1)
12
+
returnMatrix(1.0I, k, k-1)
13
13
end
14
14
15
15
16
16
### 2. Sum Coding
17
17
functionget_sum_contrast(k)
18
-
C =Matrix(1.0I, k, k-1)
18
+
C =Matrix(1.0I, k, k-1)
19
19
C[end, :] .=-1.0
20
20
return C
21
21
end
@@ -26,7 +26,7 @@ function create_backward_vector(index::Int, length::Int)
26
26
vec =ones(length) .* index / length
27
27
28
28
# [ -(k-i)/k -(k-i)/k -(k-i)/k .. i/k i/k]
29
-
vec[1:index] .= index/length -1
29
+
vec[1:index] .= index/length -1
30
30
return vec
31
31
end
32
32
functionget_backward_diff_contrast(k)
@@ -61,21 +61,21 @@ Fit a contrast encoing scheme on given data in `X`.
61
61
62
62
# Arguments
63
63
64
-
- `X`: A table where the elements of the categorical features have [scitypes](https://juliaai.github.io/ScientificTypes.jl/dev/) `Multiclass` or `OrderedFactor`
65
-
- `features=[]`: A list of names of categorical features given as symbols to exclude or include from encoding
64
+
$X_doc
65
+
$features_doc
66
66
- `mode=:dummy`: The type of encoding to use. Can be one of `:contrast`, `:dummy`, `:sum`, `:backward_diff`, `:forward_diff`, `:helmert` or `:hypothesis`.
67
-
If `ignore=false` (features to be encoded are listed explictly in `features`), then this can be a vector of the same length as `features` to specify a different
68
-
contrast encoding scheme for each feature
69
-
- `buildmatrix=nothing`: A function or other callable with signature `buildmatrix(colname, k)`,
70
-
where `colname` is the name of the feature levels and `k` is it's length, and which returns contrast or
71
-
hypothesis matrix with row/column ordering consistent with the ordering of `levels(col)`. Only relevant if `mode` is `:contrast` or `:hypothesis`.
72
-
- `ignore=true`: Whether to exclude or includes the features given in `features`
73
-
- `ordered_factor=false`: Whether to encode `OrderedFactor` or ignore them
67
+
If `ignore=false` (features to be encoded are listed explictly in `features`), then this can be a vector of the same length as `features` to specify a different
68
+
contrast encoding scheme for each feature
69
+
- `buildmatrix=nothing`: A function or other callable with signature `buildmatrix(colname, k)`,
70
+
where `colname` is the name of the feature levels and `k` is it's length, and which returns contrast or
71
+
hypothesis matrix with row/column ordering consistent with the ordering of `levels(col)`. Only relevant if `mode` is `:contrast` or `:hypothesis`.
72
+
$ignore_doc
73
+
$ordered_factor_doc
74
74
75
-
# Returns (in a dict)
75
+
# Returns as a named-tuple
76
76
77
77
- `vec_given_feat_level`: Maps each level for each column in the selected categorical features to a vector
78
-
- `encoded_features`: The subset of the categorical features of X that were encoded
Copy file name to clipboardExpand all lines: src/encoders/contrast_encoder/interface_mlj.jl
+5-7Lines changed: 5 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -86,23 +86,21 @@ In MLJ (or MLJBase) bind an instance unsupervised `model` to data with
86
86
87
87
Here:
88
88
89
-
- `X` is any table of input features (eg, a `DataFrame`). Features to be transformed must
90
-
have element scitype `Multiclass` or `OrderedFactor`. Use `schema(X)` to
91
-
check scitypes.
89
+
$X_doc_mlj
92
90
93
91
Train the machine using `fit!(mach, rows=...)`.
94
92
95
93
# Hyper-parameters
96
94
97
-
- `features=[]`: A list of names of categorical features given as symbols to exclude or include from encoding
95
+
$features_doc
98
96
- `mode=:dummy`: The type of encoding to use. Can be one of `:contrast`, `:dummy`, `:sum`, `:backward_diff`, `:forward_diff`, `:helmert` or `:hypothesis`.
99
97
If `ignore=false` (features to be encoded are listed explictly in `features`), then this can be a vector of the same length as `features` to specify a different
100
98
contrast encoding scheme for each feature
101
99
- `buildmatrix=nothing`: A function or other callable with signature `buildmatrix(colname, k)`,
102
100
where `colname` is the name of the feature levels and `k` is it's length, and which returns contrast or
103
101
hypothesis matrix with row/column ordering consistent with the ordering of `levels(col)`. Only relevant if `mode` is `:contrast` or `:hypothesis`.
104
-
- `ignore=true`: Whether to exclude or includes the features given in `features`
105
-
- `ordered_factor=false`: Whether to encode `OrderedFactor` or ignore them
102
+
$ignore_doc
103
+
$ordered_factor_doc
106
104
107
105
# Operations
108
106
@@ -120,7 +118,7 @@ The fields of `fitted_params(mach)` are:
120
118
121
119
The fields of `report(mach)` are:
122
120
123
-
- `encoded_features`: The subset of the categorical features of X that were encoded
Copy file name to clipboardExpand all lines: src/encoders/frequency_encoding/frequency_encoding.jl
+10-10Lines changed: 10 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -7,16 +7,16 @@ categorical features with their (normalized or raw) frequencies of occurrence in
7
7
8
8
# Arguments
9
9
10
-
- `X`: A table where the elements of the categorical features have [scitypes](https://juliaai.github.io/ScientificTypes.jl/dev/) `Multiclass` or `OrderedFactor`
11
-
- `features=[]`: A list of names of categorical features given as symbols to exclude or include from encoding
12
-
- `ignore=true`: Whether to exclude or includes the features given in `features`
13
-
- `ordered_factor=false`: Whether to encode `OrderedFactor` or ignore them
10
+
$X_doc
11
+
$features_doc
12
+
$ignore_doc
13
+
$ordered_factor_doc
14
14
- `normalize=false`: Whether to use normalized frequencies that sum to 1 over category values or to use raw counts.
15
15
16
-
# Returns (in a dict)
16
+
# Returns as a named-tuple
17
17
18
18
- `statistic_given_feat_val`: The frequency of each level of each selected categorical feature
19
-
- `encoded_features`: The subset of the categorical features of X that were encoded
19
+
$encoded_features_doc
20
20
"""
21
21
functionfrequency_encoder_fit(
22
22
X,
@@ -39,11 +39,11 @@ function frequency_encoder_fit(
Copy file name to clipboardExpand all lines: src/encoders/frequency_encoding/interface_mlj.jl
+6-8Lines changed: 6 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -86,18 +86,16 @@ In MLJ (or MLJBase) bind an instance unsupervised `model` to data with
86
86
87
87
Here:
88
88
89
-
- `X` is any table of input features (eg, a `DataFrame`). Features to be transformed must
90
-
have element scitype `Multiclass` or `OrderedFactor`. Use `schema(X)` to
91
-
check scitypes.
89
+
$X_doc_mlj
92
90
93
91
Train the machine using `fit!(mach, rows=...)`.
94
92
95
93
# Hyper-parameters
96
94
97
-
- `features=[]`: A list of names of categorical features given as symbols to exclude or include from encoding
98
-
- `ignore=true`: Whether to exclude or include the features given in `features`
99
-
- `ordered_factor=false`: Whether to encode `OrderedFactor` or ignore them
100
-
- `normalize=false`: Whether to use normalized frequencies that sum to 1 over category values or to use raw counts.
95
+
$features_doc
96
+
$ignore_doc
97
+
$ordered_factor_doc
98
+
- `normalize=false`: Whether to use normalized frequencies that sum to 1 over category values or to use raw counts.
101
99
- `output_type=Float32`: The type of the output values. The default is `Float32`, but you can set it to `Float64` or any other type that can hold the frequency values.
102
100
103
101
# Operations
@@ -116,7 +114,7 @@ The fields of `fitted_params(mach)` are:
116
114
117
115
The fields of `report(mach)` are:
118
116
119
-
- `encoded_features`: The subset of the categorical features of X that were encoded
Copy file name to clipboardExpand all lines: src/encoders/ordinal_encoding/ordinal_encoding.jl
+8-9Lines changed: 8 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -5,14 +5,13 @@
5
5
Fit an encoder to encode the levels of categorical variables in a given table as integers (ordered arbitrarily).
6
6
7
7
# Arguments
8
-
9
-
- `X`: A table where the elements of the categorical features have [scitypes](https://juliaai.github.io/ScientificTypes.jl/dev/) `Multiclass` or `OrderedFactor`
10
-
- `features=[]`: A list of names of categorical features given as symbols to exclude or include from encoding
11
-
- `ignore=true`: Whether to exclude or includes the features given in `features`
12
-
- `ordered_factor=false`: Whether to encode `OrderedFactor` or ignore them
8
+
$X_doc
9
+
$features_doc
10
+
$ignore_doc
11
+
$ordered_factor_doc
13
12
- `dtype`: The numerical concrete type of the encoded features. Default is `Float32`.
14
13
15
-
# Returns (in a dict)
14
+
# Returns as a named-tuple
16
15
17
16
- `index_given_feat_level`: Maps each level for each column in a subset of the categorical features of X into an integer.
18
17
- `encoded_features`: The subset of the categorical features of X that were encoded
0 commit comments