Skip to content

Commit 690866d

Browse files
authored
update docs
1 parent 9ea4aba commit 690866d

File tree

1 file changed

+16
-13
lines changed

1 file changed

+16
-13
lines changed

category_encoders/cat_boost.py

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -13,19 +13,22 @@ class CatBoostEncoder(util.BaseEncoder, util.SupervisedTransformerMixin):
1313
1414
Supported targets: binomial and continuous. For polynomial target support, see PolynomialWrapper.
1515
16-
This is very similar to leave-one-out encoding, but calculates the
17-
values "on-the-fly". Consequently, the values naturally vary
18-
during the training phase and it is not necessary to add random noise.
19-
20-
Beware, the training data have to be randomly permutated. E.g.:
21-
22-
# Random permutation
23-
perm = np.random.permutation(len(X))
24-
X = X.iloc[perm].reset_index(drop=True)
25-
y = y.iloc[perm].reset_index(drop=True)
26-
27-
This is necessary because some data sets are sorted based on the target
28-
value and this coder encodes the features on-the-fly in a single pass.
16+
CatBoost encoder is the variation of target encoding. It supports
17+
time-aware encoding, regularization and online learning.
18+
19+
This implementation is time-aware (similar to CatBoos 'has_time=True'),
20+
so no random permutations are used. This makes this encoder sensitive to
21+
ordering of the data and suitable for time series problems. If your data
22+
does not have time dependency it should still work just fine assuming
23+
sorting of the data won't leak any information.
24+
25+
Regularization (parameter a) is achieved by adding it to running counts
26+
(so called pseudocounts).
27+
28+
NOTE: behavior of the transformer would differ in transform and fit_transform
29+
methods depending if y values are passed. If no target is passed then
30+
encoder will map the last value of running mean to each category. If y is passed
31+
then it will continue to update running mean and encode it to passed feature columns.
2932
3033
Parameters
3134
----------

0 commit comments

Comments
 (0)