You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/encoders/contrast_encoder/contrast_encoder.jl
+38-29Lines changed: 38 additions & 29 deletions
Original file line number
Diff line number
Diff line change
@@ -9,13 +9,13 @@ Where `k` is the number of levels in the feature and the returned contrast matri
9
9
"""
10
10
### 1. Dummy Coding
11
11
functionget_dummy_contrast(k)
12
-
returnMatrix(1.0I, k, k-1)
12
+
returnMatrix(1.0I, k, k-1)
13
13
end
14
14
15
15
16
16
### 2. Sum Coding
17
17
functionget_sum_contrast(k)
18
-
C =Matrix(1.0I, k, k-1)
18
+
C =Matrix(1.0I, k, k-1)
19
19
C[end, :] .=-1.0
20
20
return C
21
21
end
@@ -26,7 +26,7 @@ function create_backward_vector(index::Int, length::Int)
26
26
vec =ones(length) .* index / length
27
27
28
28
# [ -(k-i)/k -(k-i)/k -(k-i)/k .. i/k i/k]
29
-
vec[1:index] .= index/length -1
29
+
vec[1:index] .= index/length -1
30
30
return vec
31
31
end
32
32
functionget_backward_diff_contrast(k)
@@ -61,21 +61,21 @@ Fit a contrast encoing scheme on given data in `X`.
61
61
62
62
# Arguments
63
63
64
-
- `X`: A table where the elements of the categorical features have [scitypes](https://juliaai.github.io/ScientificTypes.jl/dev/) `Multiclass` or `OrderedFactor`
65
-
- `features=[]`: A list of names of categorical features given as symbols to exclude or include from encoding
64
+
$X_doc
65
+
$features_doc
66
66
- `mode=:dummy`: The type of encoding to use. Can be one of `:contrast`, `:dummy`, `:sum`, `:backward_diff`, `:forward_diff`, `:helmert` or `:hypothesis`.
67
-
If `ignore=false` (features to be encoded are listed explictly in `features`), then this can be a vector of the same length as `features` to specify a different
68
-
contrast encoding scheme for each feature
69
-
- `buildmatrix=nothing`: A function or other callable with signature `buildmatrix(colname, k)`,
70
-
where `colname` is the name of the feature levels and `k` is it's length, and which returns contrast or
71
-
hypothesis matrix with row/column ordering consistent with the ordering of `levels(col)`. Only relevant if `mode` is `:contrast` or `:hypothesis`.
72
-
- `ignore=true`: Whether to exclude or includes the features given in `features`
73
-
- `ordered_factor=false`: Whether to encode `OrderedFactor` or ignore them
67
+
If `ignore=false` (features to be encoded are listed explictly in `features`), then this can be a vector of the same length as `features` to specify a different
68
+
contrast encoding scheme for each feature
69
+
- `buildmatrix=nothing`: A function or other callable with signature `buildmatrix(colname, k)`,
70
+
where `colname` is the name of the feature levels and `k` is it's length, and which returns contrast or
71
+
hypothesis matrix with row/column ordering consistent with the ordering of `levels(col)`. Only relevant if `mode` is `:contrast` or `:hypothesis`.
72
+
$ignore_doc
73
+
$ordered_factor_doc
74
74
75
-
# Returns (in a dict)
75
+
# Returns as a named-tuple
76
76
77
77
- `vec_given_feat_level`: Maps each level for each column in the selected categorical features to a vector
78
-
- `encoded_features`: The subset of the categorical features of X that were encoded
@@ -87,23 +86,21 @@ In MLJ (or MLJBase) bind an instance unsupervised `model` to data with
87
86
88
87
Here:
89
88
90
-
- `X` is any table of input features (eg, a `DataFrame`). Features to be transformed must
91
-
have element scitype `Multiclass` or `OrderedFactor`. Use `schema(X)` to
92
-
check scitypes.
89
+
$X_doc_mlj
93
90
94
91
Train the machine using `fit!(mach, rows=...)`.
95
92
96
93
# Hyper-parameters
97
94
98
-
- `features=[]`: A list of names of categorical features given as symbols to exclude or include from encoding
95
+
$features_doc
99
96
- `mode=:dummy`: The type of encoding to use. Can be one of `:contrast`, `:dummy`, `:sum`, `:backward_diff`, `:forward_diff`, `:helmert` or `:hypothesis`.
100
97
If `ignore=false` (features to be encoded are listed explictly in `features`), then this can be a vector of the same length as `features` to specify a different
101
98
contrast encoding scheme for each feature
102
99
- `buildmatrix=nothing`: A function or other callable with signature `buildmatrix(colname, k)`,
103
100
where `colname` is the name of the feature levels and `k` is it's length, and which returns contrast or
104
101
hypothesis matrix with row/column ordering consistent with the ordering of `levels(col)`. Only relevant if `mode` is `:contrast` or `:hypothesis`.
105
-
- `ignore=true`: Whether to exclude or includes the features given in `features`
106
-
- `ordered_factor=false`: Whether to encode `OrderedFactor` or ignore them
102
+
$ignore_doc
103
+
$ordered_factor_doc
107
104
108
105
# Operations
109
106
@@ -121,7 +118,7 @@ The fields of `fitted_params(mach)` are:
121
118
122
119
The fields of `report(mach)` are:
123
120
124
-
- `encoded_features`: The subset of the categorical features of X that were encoded
Copy file name to clipboardExpand all lines: src/encoders/frequency_encoding/frequency_encoding.jl
+13-13Lines changed: 13 additions & 13 deletions
Original file line number
Diff line number
Diff line change
@@ -7,16 +7,16 @@ categorical features with their (normalized or raw) frequencies of occurrence in
7
7
8
8
# Arguments
9
9
10
-
- `X`: A table where the elements of the categorical features have [scitypes](https://juliaai.github.io/ScientificTypes.jl/dev/) `Multiclass` or `OrderedFactor`
11
-
- `features=[]`: A list of names of categorical features given as symbols to exclude or include from encoding
12
-
- `ignore=true`: Whether to exclude or includes the features given in `features`
13
-
- `ordered_factor=false`: Whether to encode `OrderedFactor` or ignore them
10
+
$X_doc
11
+
$features_doc
12
+
$ignore_doc
13
+
$ordered_factor_doc
14
14
- `normalize=false`: Whether to use normalized frequencies that sum to 1 over category values or to use raw counts.
15
15
16
-
# Returns (in a dict)
16
+
# Returns as a named-tuple
17
17
18
18
- `statistic_given_feat_val`: The frequency of each level of each selected categorical feature
19
-
- `encoded_features`: The subset of the categorical features of X that were encoded
19
+
$encoded_features_doc
20
20
"""
21
21
functionfrequency_encoder_fit(
22
22
X,
@@ -39,11 +39,11 @@ function frequency_encoder_fit(
@@ -87,18 +86,16 @@ In MLJ (or MLJBase) bind an instance unsupervised `model` to data with
87
86
88
87
Here:
89
88
90
-
- `X` is any table of input features (eg, a `DataFrame`). Features to be transformed must
91
-
have element scitype `Multiclass` or `OrderedFactor`. Use `schema(X)` to
92
-
check scitypes.
89
+
$X_doc_mlj
93
90
94
91
Train the machine using `fit!(mach, rows=...)`.
95
92
96
93
# Hyper-parameters
97
94
98
-
- `features=[]`: A list of names of categorical features given as symbols to exclude or include from encoding
99
-
- `ignore=true`: Whether to exclude or include the features given in `features`
100
-
- `ordered_factor=false`: Whether to encode `OrderedFactor` or ignore them
101
-
- `normalize=false`: Whether to use normalized frequencies that sum to 1 over category values or to use raw counts.
95
+
$features_doc
96
+
$ignore_doc
97
+
$ordered_factor_doc
98
+
- `normalize=false`: Whether to use normalized frequencies that sum to 1 over category values or to use raw counts.
102
99
- `output_type=Float32`: The type of the output values. The default is `Float32`, but you can set it to `Float64` or any other type that can hold the frequency values.
103
100
104
101
# Operations
@@ -117,7 +114,7 @@ The fields of `fitted_params(mach)` are:
117
114
118
115
The fields of `report(mach)` are:
119
116
120
-
- `encoded_features`: The subset of the categorical features of X that were encoded
0 commit comments