Skip to content

Commit 0fd5d28

Browse files
Merge pull request #381 from glevv/catboost-docs
[DOC] Catboost docs reformulation
2 parents c5dd2b7 + 56772f7 commit 0fd5d28

File tree

1 file changed

+17
-15
lines changed

1 file changed

+17
-15
lines changed

category_encoders/cat_boost.py

Lines changed: 17 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -11,21 +11,23 @@
1111
class CatBoostEncoder(util.BaseEncoder, util.SupervisedTransformerMixin):
1212
"""CatBoost Encoding for categorical features.
1313
14-
Supported targets: binomial and continuous. For polynomial target support, see PolynomialWrapper.
15-
16-
This is very similar to leave-one-out encoding, but calculates the
17-
values "on-the-fly". Consequently, the values naturally vary
18-
during the training phase and it is not necessary to add random noise.
19-
20-
Beware, the training data have to be randomly permutated. E.g.:
21-
22-
# Random permutation
23-
perm = np.random.permutation(len(X))
24-
X = X.iloc[perm].reset_index(drop=True)
25-
y = y.iloc[perm].reset_index(drop=True)
26-
27-
This is necessary because some data sets are sorted based on the target
28-
value and this coder encodes the features on-the-fly in a single pass.
14+
    Supported targets: binomial and continuous. For polynomial target support, see PolynomialWrapper.
15+
16+
    CatBoostEncoder is the variation of target encoding. It supports
17+
    time-aware encoding, regularization, and online learning.
18+
19+
    This implementation is time-aware (similar to CatBoost's parameter 'has_time=True'),
20+
    so no random permutations are used. It makes this encoder sensitive to
21+
    ordering of the data and suitable for time series problems. If your data
22+
    does not have time dependency, it should still work just fine, assuming
23+
    sorting of the data won't leak any information outside the training scope
24+
    (i.e., no data leakage). When data leakage is a possibility, it is wise to
25+
    eliminate it first (for example, shuffle or resample the data).
26+
27+
    NOTE: behavior of the transformer would differ in transform and fit_transform
28+
    methods depending if y values are passed. If no target is passed, then
29+
    encoder will map the last value of the running mean to each category. If y is passed
30+
    then it will map all values of the running mean to each category's occurrences.
2931
3032
Parameters
3133
----------

0 commit comments

Comments
 (0)