You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/encoders/contrast_encoder/interface_mlj.jl
+26-16Lines changed: 26 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -73,10 +73,10 @@ MMI.metadata_model(
73
73
"""
74
74
$(MMI.doc_header(ContrastEncoder))
75
75
76
-
`ContrastEncoder` implements the following contrast encoding methods for
77
-
categorical features: dummy, sum, backward/forward difference, and Helmert coding.
78
-
More generally, users can specify a custom contrast or hypothesis matrix, and each feature
79
-
can be encoded using a different method.
76
+
`ContrastEncoder` implements the following contrast encoding methods for categorical
77
+
features: dummy, sum, backward/forward difference, and Helmert coding. More generally,
78
+
users can specify a custom contrast or hypothesis matrix, and each feature can be encoded
79
+
using a different method.
80
80
81
81
# Training data
82
82
@@ -93,26 +93,36 @@ Train the machine using `fit!(mach, rows=...)`.
93
93
# Hyper-parameters
94
94
95
95
$features_doc
96
-
- `mode=:dummy`: The type of encoding to use. Can be one of `:contrast`, `:dummy`, `:sum`, `:backward_diff`, `:forward_diff`, `:helmert` or `:hypothesis`.
97
-
If `ignore=false` (features to be encoded are listed explictly in `features`), then this can be a vector of the same length as `features` to specify a different
98
-
contrast encoding scheme for each feature
99
-
- `buildmatrix=nothing`: A function or other callable with signature `buildmatrix(colname, k)`,
100
-
where `colname` is the name of the feature levels and `k` is it's length, and which returns contrast or
101
-
hypothesis matrix with row/column ordering consistent with the ordering of `levels(col)`. Only relevant if `mode` is `:contrast` or `:hypothesis`.
96
+
97
+
- `mode=:dummy`: The type of encoding to use. Can be one of `:contrast`, `:dummy`, `:sum`,
98
+
`:backward_diff`, `:forward_diff`, `:helmert` or `:hypothesis`. If `ignore=false`
99
+
(features to be encoded are listed explictly in `features`), then this can be a vector
100
+
of the same length as `features` to specify a different contrast encoding scheme for
101
+
each feature
102
+
103
+
- `buildmatrix=nothing`: A function or other callable with signature
104
+
`buildmatrix(colname,k)`, where `colname` is the name of the feature levels and `k` is
105
+
it's length, and which returns contrast or hypothesis matrix with row/column ordering
106
+
consistent with the ordering of `levels(col)`. Only relevant if `mode` is `:contrast` or
107
+
`:hypothesis`.
108
+
102
109
$ignore_doc
110
+
103
111
$ordered_factor_doc
104
112
105
113
# Operations
106
114
107
-
- `transform(mach, Xnew)`: Apply contrast encoding to selected `Multiclass` or `OrderedFactor features of `Xnew` specified by hyper-parameters, and
108
-
return the new table. Features that are neither `Multiclass` nor `OrderedFactor`
109
-
are always left unchanged.
115
+
- `transform(mach, Xnew)`: Apply contrast encoding to selected `Multiclass` or
116
+
`OrderedFactor features of `Xnew` specified by hyper-parameters, and return the new
117
+
table. Features that are neither `Multiclass` nor `OrderedFactor` are always left
118
+
unchanged.
110
119
111
120
# Fitted parameters
112
121
113
122
The fields of `fitted_params(mach)` are:
114
123
115
-
- `vector_given_value_given_feature`: A dictionary that maps each level for each column in a subset of the categorical features of X into its frequency.
124
+
- `vector_given_value_given_feature`: A dictionary that maps each level for each column in
125
+
a subset of the categorical features of X into its frequency.
'm', )`: A dictionary where the possible values for keys are the types in `Char`,
107
+
`AbstractString`, and `Number` and where each value signifies the new level to map into
108
+
given a column raw super type. By default, if the raw type of the column subtypes
109
+
`AbstractString` then missing values will be replaced with `"missing"` and if the raw
110
+
type subtypes `Char` then the new value is `'m'` and if the raw type subtypes `Number`
111
+
then the new value is the lowest value in the column - 1.
107
112
108
113
# Operations
109
114
110
-
- `transform(mach, Xnew)`: Apply cardinality reduction to selected `Multiclass` or `OrderedFactor` features of `Xnew` specified by hyper-parameters, and
111
-
return the new table. Features that are neither `Multiclass` nor `OrderedFactor`
112
-
are always left unchanged.
115
+
- `transform(mach, Xnew)`: Apply cardinality reduction to selected `Multiclass` or
116
+
`OrderedFactor` features of `Xnew` specified by hyper-parameters, and return the new
117
+
table. Features that are neither `Multiclass` nor `OrderedFactor` are always left
118
+
unchanged.
113
119
114
120
# Fitted parameters
115
121
116
122
The fields of `fitted_params(mach)` are:
117
123
118
-
- `label_for_missing_given_feature`: A dictionary that for each column, maps `missing` into some value according to `label_for_missing`
124
+
- `label_for_missing_given_feature`: A dictionary that for each column, maps `missing`
0 commit comments