You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/index.rst
+33-1Lines changed: 33 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,16 @@ transformers in this library all share a few useful properties:
14
14
* Can explicitly configure which columns in the data are encoded by name or index, or infer non-numeric columns regardless of input type
15
15
* Can drop any columns with very low variance based on training set optionally
16
16
* Portability: train a transformer on data, pickle it, reuse it later and get the same thing out.
17
-
* Full compatibility with sklearn pipelines, input an array-like dataset like any other transformer
17
+
* Full compatibility with sklearn pipelines, input an array-like dataset like any other transformer (\*)
18
+
19
+
(\*) For full compatibility with Pipelines and ColumnTransformers, and consistent behaviour of `get_feature_names_out`, it's recommended to upgrade `sklearn` to a version at least '1.2.0' and to set output as pandas:
20
+
21
+
.. code-block:: python
22
+
23
+
import sklearn
24
+
sklearn.set_config(transform_output="pandas")
25
+
26
+
18
27
19
28
Usage
20
29
-----
@@ -65,7 +74,30 @@ To use:
65
74
All of these are fully compatible sklearn transformers, so they can be used in pipelines or in your existing scripts. If
66
75
the cols parameter isn't passed, every non-numeric column will be converted. See below for detailed documentation
67
76
77
+
Known issues:
78
+
----
79
+
80
+
`CategoryEncoders` internally works with `pandas DataFrames` as apposed to `sklearn` which works with `numpy arrays`. This can cause problems in `sklearn` versions prior to 1.2.0. In order to ensure full compatibility with `sklearn` set `sklearn` to also output `DataFrames`. This can be done by
81
+
82
+
.. code-block::python
83
+
84
+
sklearn.set_config(transform_output="pandas")
85
+
86
+
for a whole project or just for a single pipeline using
0 commit comments