You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: category_encoders/one_hot.py
+37-14Lines changed: 37 additions & 14 deletions
Original file line number
Diff line number
Diff line change
@@ -27,13 +27,20 @@ class OneHotEncoder(BaseEstimator, TransformerMixin):
27
27
if True, category values will be included in the encoded column names. Since this can result in duplicate column names, duplicates are suffixed with '#' symbol until a unique name is generated.
28
28
If False, category indices will be used instead of the category values.
29
29
handle_unknown: str
30
-
options are 'error', 'return_nan', 'value', and 'indicator'. The default is 'value'. Warning: if indicator is used,
31
-
an extra column will be added in if the transform matrix has unknown categories. This can cause
32
-
unexpected changes in dimension in some cases.
30
+
options are 'error', 'return_nan', 'value', and 'indicator'. The default is 'value'.
31
+
32
+
'error' will raise a `ValueError` at transform time if there are new categories.
33
+
'return_nan' will encode a new value as `np.nan` in every dummy column.
34
+
'value' will encode a new value as 0 in every dummy column.
35
+
'indicator' will add an additional dummy column (in both training and test data).
33
36
handle_missing: str
34
-
options are 'error', 'return_nan', 'value', and 'indicator'. The default is 'value'. Warning: if indicator is used,
35
-
an extra column will be added in if the transform matrix has nan values. This can cause
36
-
unexpected changes in dimension in some cases.
37
+
options are 'error', 'return_nan', 'value', and 'indicator'. The default is 'value'.
38
+
39
+
'error' will raise a `ValueError` if missings are encountered.
40
+
'return_nan' will encode a missing value as `np.nan` in every dummy column.
41
+
'value' will encode a missing value as 0 in every dummy column.
42
+
'indicator' will treat missingness as its own category, adding an additional dummy column
43
+
(whether there are missing values in the training set or not).
0 commit comments