Skip to content

Commit 04ccbdb

Browse files
committed
Clarify docs for handle_unknown and handle_missing
1 parent 2ac8ad6 commit 04ccbdb

File tree

1 file changed

+13
-6
lines changed

1 file changed

+13
-6
lines changed

category_encoders/one_hot.py

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -27,13 +27,20 @@ class OneHotEncoder(BaseEstimator, TransformerMixin):
2727
if True, category values will be included in the encoded column names. Since this can result in duplicate column names, duplicates are suffixed with '#' symbol until a unique name is generated.
2828
If False, category indices will be used instead of the category values.
2929
handle_unknown: str
30-
options are 'error', 'return_nan', 'value', and 'indicator'. The default is 'value'. Warning: if indicator is used,
31-
an extra column will be added in if the transform matrix has unknown categories. This can cause
32-
unexpected changes in dimension in some cases.
30+
options are 'error', 'return_nan', 'value', and 'indicator'. The default is 'value'.
31+
32+
'error' will raise a `ValueError` at transform time if there are new categories.
33+
'return_nan' will encode a new value as `np.nan` in every dummy column.
34+
'value' will encode a new value as 0 in every dummy column.
35+
'indicator' will add an additional dummy column (in both training and test data).
3336
handle_missing: str
34-
options are 'error', 'return_nan', 'value', and 'indicator'. The default is 'value'. Warning: if indicator is used,
35-
an extra column will be added in if the transform matrix has nan values. This can cause
36-
unexpected changes in dimension in some cases.
37+
options are 'error', 'return_nan', 'value', and 'indicator'. The default is 'value'.
38+
39+
'error' will raise a `ValueError` if missings are encountered.
40+
'return_nan' will encode a missing value as `np.nan` in every dummy column.
41+
'value' will encode a missing value as 0 in every dummy column.
42+
'indicator' will treat missingness as its own category, adding an additional dummy column
43+
(whether there are missing values in the training set or not).
3744
3845
Example
3946
-------

0 commit comments

Comments
 (0)