@@ -13,19 +13,22 @@ class CatBoostEncoder(util.BaseEncoder, util.SupervisedTransformerMixin):
1313
1414 Supported targets: binomial and continuous. For polynomial target support, see PolynomialWrapper.
1515
16- This is very similar to leave-one-out encoding, but calculates the
17- values "on-the-fly". Consequently, the values naturally vary
18- during the training phase and it is not necessary to add random noise.
19-
20- Beware, the training data have to be randomly permutated. E.g.:
21-
22- # Random permutation
23- perm = np.random.permutation(len(X))
24- X = X.iloc[perm].reset_index(drop=True)
25- y = y.iloc[perm].reset_index(drop=True)
26-
27- This is necessary because some data sets are sorted based on the target
28- value and this coder encodes the features on-the-fly in a single pass.
16+ CatBoost encoder is the variation of target encoding. It supports
17+ time-aware encoding, regularization and online learning.
18+
19+ This implementation is time-aware (similar to CatBoos 'has_time=True'),
20+ so no random permutations are used. This makes this encoder sensitive to
21+ ordering of the data and suitable for time series problems. If your data
22+ does not have time dependency it should still work just fine assuming
23+ sorting of the data won't leak any information.
24+
25+ Regularization (parameter a) is achieved by adding it to running counts
26+ (so called pseudocounts).
27+
28+ NOTE: behavior of the transformer would differ in transform and fit_transform
29+ methods depending if y values are passed. If no target is passed then
30+ encoder will map the last value of running mean to each category. If y is passed
31+ then it will continue to update running mean and encode it to passed feature columns.
2932
3033 Parameters
3134 ----------
0 commit comments