You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
returnhcat([create_helmert_vector(i, k) for i in1:k-1]...)
55
+
end
56
+
57
+
"""
58
+
** Private Method **
59
+
60
+
Fit a contrast encoing scheme on given data in `X`.
61
+
62
+
# Arguments
63
+
64
+
- `X`: A table where the elements of the categorical features have [scitypes](https://juliaai.github.io/ScientificTypes.jl/dev/) `Multiclass` or `OrderedFactor`
65
+
- `features=[]`: A list of names of categorical features given as symbols to exclude or include from encoding
66
+
- `mode=:dummy`: The type of encoding to use. Can be one of `:contrast`, `:dummy`, `:sum`, `:backward_diff`, `:forward_diff`, `:helmert` or `:hypothesis`.
67
+
If `ignore=false` (features to be encoded are listed explictly in `features`), then this can be a vector of the same length as `features` to specify a different
68
+
contrast encoding scheme for each feature
69
+
- `buildmatrix=nothing`: A function or other callable with signature `buildmatrix(colname, k)`,
70
+
where `colname` is the name of the feature levels and `k` is it's length, and which returns contrast or
71
+
hypothesis matrix with row/column ordering consistent with the ordering of `levels(col)`. Only relevant if `mode` is `:contrast` or `:hypothesis`.
72
+
- `ignore=true`: Whether to exclude or includes the features given in `features`
73
+
- `ordered_factor=false`: Whether to encode `OrderedFactor` or ignore them
74
+
75
+
# Returns (in a dict)
76
+
77
+
- `vec_given_feat_level`: Maps each level for each column in the selected categorical features to a vector
78
+
- `encoded_features`: The subset of the categorical features of X that were encoded
Use a fitted contrast encoder to encode the levels of selected categorical variables with contrast encoding.
150
+
151
+
# Arguments
152
+
153
+
- `X`: A table where the elements of the categorical features have [scitypes](https://juliaai.github.io/ScientificTypes.jl/dev/) `Multiclass` or `OrderedFactor`
154
+
- `cache`: The output of `contrast_encoder_fit`
155
+
156
+
# Returns
157
+
158
+
- `X_tr`: The table with selected features after the selected features are encoded by contrast encoding.
MATRIX_SIZE_ERROR(k, matrix_size, feat_name)="In ContrastEncoder, a categorical variable with $k levels should have a contrast matrix of size ($k, $k-1). However, the contrast matrix returned by `buildmatrix` is $matrix_size for feature $feat_name."
2
+
MATRIX_SIZE_ERROR_HYP(k, matrix_size, feat_name)="In ContrastEncoder, a categorical variable with $k levels should have a hypothesis matrix of size ($k-1, $k). However, the given hypothesis matrix returned by `buildmatrix` is $matrix_size for feature $feat_name."
3
+
IGNORE_MUST_FALSE_VEC_MODE ="In ContrastEncoder with mode given as a vector of symbols, the ignore argument must be set to false and features must be explictly specified in features."
4
+
BUILDFUNC_MUST_BE_SPECIFIED ="In ContrastEncoder with mode=:contrast or mode=:hypothesis, the `buildmatrix` argument must be specified."
5
+
LENGTH_MISMATCH_VEC_MODE(len_mode, len_feat) ="In ContrastEncoder with mode given as a vector of symbols, the length of the features argument must match the number of specified modes. However, the method received $(len_mode) modes and $(len_feat) features."
0 commit comments