Skip to content

Commit c68684f

Browse files
authored
Update parameter for categorical feature. (dmlc#8285)
1 parent 5545c49 commit c68684f

File tree

3 files changed

+18
-4
lines changed

3 files changed

+18
-4
lines changed

doc/parameter.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -235,7 +235,7 @@ These parameters are only used for training with categorical data. See
235235

236236
* ``max_cat_to_onehot``
237237

238-
.. versionadded:: 1.6
238+
.. versionadded:: 1.6.0
239239

240240
.. note:: This parameter is experimental. ``exact`` tree method is not yet supported.
241241

doc/tutorials/categorical.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ values are categories, and the measure is the output leaf value. Intuitively, w
8484
group the categories that output similar leaf values. During split finding, we first sort
8585
the gradient histogram to prepare the contiguous partitions then enumerate the splits
8686
according to these sorted values. One of the related parameters for XGBoost is
87-
``max_cat_to_one_hot``, which controls whether one-hot encoding or partitioning should be
87+
``max_cat_to_onehot``, which controls whether one-hot encoding or partitioning should be
8888
used for each feature, see :ref:`cat-param` for details.
8989

9090

python-package/xgboost/sklearn.py

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -249,8 +249,20 @@ def inner(y_score: np.ndarray, dmatrix: DMatrix) -> Tuple[str, float]:
249249
A threshold for deciding whether XGBoost should use one-hot encoding based split
250250
for categorical data. When number of categories is lesser than the threshold
251251
then one-hot encoding is chosen, otherwise the categories will be partitioned
252-
into children nodes. Only relevant for regression and binary classification.
253-
See :doc:`Categorical Data </tutorials/categorical>` for details.
252+
into children nodes. Also, `enable_categorical` needs to be set to have
253+
categorical feature support. See :doc:`Categorical Data
254+
</tutorials/categorical>` and :ref:`cat-param` for details.
255+
256+
max_cat_threshold : Optional[int]
257+
258+
.. versionadded:: 1.7.0
259+
260+
.. note:: This parameter is experimental
261+
262+
Maximum number of categories considered for each split. Used only by
263+
partition-based splits for preventing over-fitting. Also, `enable_categorical`
264+
needs to be set to have categorical feature support. See :doc:`Categorical Data
265+
</tutorials/categorical>` and :ref:`cat-param` for details.
254266
255267
eval_metric : Optional[Union[str, List[str], Callable]]
256268
@@ -562,6 +574,7 @@ def __init__(
562574
enable_categorical: bool = False,
563575
feature_types: FeatureTypes = None,
564576
max_cat_to_onehot: Optional[int] = None,
577+
max_cat_threshold: Optional[int] = None,
565578
eval_metric: Optional[Union[str, List[str], Callable]] = None,
566579
early_stopping_rounds: Optional[int] = None,
567580
callbacks: Optional[List[TrainingCallback]] = None,
@@ -607,6 +620,7 @@ def __init__(
607620
self.enable_categorical = enable_categorical
608621
self.feature_types = feature_types
609622
self.max_cat_to_onehot = max_cat_to_onehot
623+
self.max_cat_threshold = max_cat_threshold
610624
self.eval_metric = eval_metric
611625
self.early_stopping_rounds = early_stopping_rounds
612626
self.callbacks = callbacks

0 commit comments

Comments
 (0)