Skip to content

Commit b3389c1

Browse files
committed
Remove extensive docstring from TargetEncoder class
1 parent 42c2184 commit b3389c1

File tree

1 file changed

+0
-167
lines changed

1 file changed

+0
-167
lines changed

stubs/sklearn/preprocessing/_target_encoder.pyi

Lines changed: 0 additions & 167 deletions
Original file line numberDiff line numberDiff line change
@@ -8,173 +8,6 @@ from ..base import OneToOneFeatureMixin
88
from ._encoders import _BaseEncoder
99

1010
class TargetEncoder(OneToOneFeatureMixin, _BaseEncoder):
11-
"""Target Encoder for regression and classification targets.
12-
13-
Each category is encoded based on a shrunk estimate of the average target
14-
values for observations belonging to the category. The encoding scheme mixes
15-
the global target mean with the target mean conditioned on the value of the
16-
category (see [MIC]_).
17-
18-
When the target type is "multiclass", encodings are based
19-
on the conditional probability estimate for each class. The target is first
20-
binarized using the "one-vs-all" scheme via
21-
:class:`~sklearn.preprocessing.LabelBinarizer`, then the average target
22-
value for each class and each category is used for encoding, resulting in
23-
`n_features` * `n_classes` encoded output features.
24-
25-
:class:`TargetEncoder` considers missing values, such as `np.nan` or `None`,
26-
as another category and encodes them like any other category. Categories
27-
that are not seen during :meth:`fit` are encoded with the target mean, i.e.
28-
`target_mean_`.
29-
30-
For a demo on the importance of the `TargetEncoder` internal cross-fitting,
31-
see
32-
:ref:`sphx_glr_auto_examples_preprocessing_plot_target_encoder_cross_val.py`.
33-
For a comparison of different encoders, refer to
34-
:ref:`sphx_glr_auto_examples_preprocessing_plot_target_encoder.py`. Read
35-
more in the :ref:`User Guide <target_encoder>`.
36-
37-
.. note::
38-
`fit(X, y).transform(X)` does not equal `fit_transform(X, y)` because a
39-
:term:`cross fitting` scheme is used in `fit_transform` for encoding.
40-
See the :ref:`User Guide <target_encoder>` for details.
41-
42-
.. versionadded:: 1.3
43-
44-
Parameters
45-
----------
46-
categories : "auto" or list of shape (n_features,) of array-like, default="auto"
47-
Categories (unique values) per feature:
48-
49-
- `"auto"` : Determine categories automatically from the training data.
50-
- list : `categories[i]` holds the categories expected in the i-th column. The
51-
passed categories should not mix strings and numeric values within a single
52-
feature, and should be sorted in case of numeric values.
53-
54-
The used categories are stored in the `categories_` fitted attribute.
55-
56-
target_type : {"auto", "continuous", "binary", "multiclass"}, default="auto"
57-
Type of target.
58-
59-
- `"auto"` : Type of target is inferred with
60-
:func:`~sklearn.utils.multiclass.type_of_target`.
61-
- `"continuous"` : Continuous target
62-
- `"binary"` : Binary target
63-
- `"multiclass"` : Multiclass target
64-
65-
.. note::
66-
The type of target inferred with `"auto"` may not be the desired target
67-
type used for modeling. For example, if the target consisted of integers
68-
between 0 and 100, then :func:`~sklearn.utils.multiclass.type_of_target`
69-
will infer the target as `"multiclass"`. In this case, setting
70-
`target_type="continuous"` will specify the target as a regression
71-
problem. The `target_type_` attribute gives the target type used by the
72-
encoder.
73-
74-
.. versionchanged:: 1.4
75-
Added the option 'multiclass'.
76-
77-
smooth : "auto" or float, default="auto"
78-
The amount of mixing of the target mean conditioned on the value of the
79-
category with the global target mean. A larger `smooth` value will put
80-
more weight on the global target mean.
81-
If `"auto"`, then `smooth` is set to an empirical Bayes estimate.
82-
83-
cv : int, default=5
84-
Determines the number of folds in the :term:`cross fitting` strategy used in
85-
:meth:`fit_transform`. For classification targets, `StratifiedKFold` is used
86-
and for continuous targets, `KFold` is used.
87-
88-
shuffle : bool, default=True
89-
Whether to shuffle the data in :meth:`fit_transform` before splitting into
90-
folds. Note that the samples within each split will not be shuffled.
91-
92-
random_state : int, RandomState instance or None, default=None
93-
When `shuffle` is True, `random_state` affects the ordering of the
94-
indices, which controls the randomness of each fold. Otherwise, this
95-
parameter has no effect.
96-
Pass an int for reproducible output across multiple function calls.
97-
See :term:`Glossary <random_state>`.
98-
99-
Attributes
100-
----------
101-
encodings_ : list of shape (n_features,) or (n_features * n_classes) of \
102-
ndarray
103-
Encodings learnt on all of `X`.
104-
For feature `i`, `encodings_[i]` are the encodings matching the
105-
categories listed in `categories_[i]`. When `target_type_` is
106-
"multiclass", the encoding for feature `i` and class `j` is stored in
107-
`encodings_[j + (i * len(classes_))]`. E.g., for 2 features (f) and
108-
3 classes (c), encodings are ordered:
109-
f0_c0, f0_c1, f0_c2, f1_c0, f1_c1, f1_c2,
110-
111-
categories_ : list of shape (n_features,) of ndarray
112-
The categories of each input feature determined during fitting or
113-
specified in `categories`
114-
(in order of the features in `X` and corresponding with the output
115-
of :meth:`transform`).
116-
117-
target_type_ : str
118-
Type of target.
119-
120-
target_mean_ : float
121-
The overall mean of the target. This value is only used in :meth:`transform`
122-
to encode categories.
123-
124-
n_features_in_ : int
125-
Number of features seen during :term:`fit`.
126-
127-
feature_names_in_ : ndarray of shape (`n_features_in_`,)
128-
Names of features seen during :term:`fit`. Defined only when `X`
129-
has feature names that are all strings.
130-
131-
classes_ : ndarray or None
132-
If `target_type_` is 'binary' or 'multiclass', holds the label for each class,
133-
otherwise `None`.
134-
135-
See Also
136-
--------
137-
OrdinalEncoder : Performs an ordinal (integer) encoding of the categorical features.
138-
Contrary to TargetEncoder, this encoding is not supervised. Treating the
139-
resulting encoding as a numerical features therefore lead arbitrarily
140-
ordered values and therefore typically lead to lower predictive performance
141-
when used as preprocessing for a classifier or regressor.
142-
OneHotEncoder : Performs a one-hot encoding of categorical features. This
143-
unsupervised encoding is better suited for low cardinality categorical
144-
variables as it generate one new feature per unique category.
145-
146-
References
147-
----------
148-
.. [MIC] :doi:`Micci-Barreca, Daniele. "A preprocessing scheme for high-cardinality
149-
categorical attributes in classification and prediction problems"
150-
SIGKDD Explor. Newsl. 3, 1 (July 2001), 27-32. <10.1145/507533.507538>`
151-
152-
Examples
153-
--------
154-
With `smooth="auto"`, the smoothing parameter is set to an empirical Bayes estimate:
155-
156-
>>> import numpy as np
157-
>>> from sklearn.preprocessing import TargetEncoder
158-
>>> X = np.array([["dog"] * 20 + ["cat"] * 30 + ["snake"] * 38], dtype=object).T
159-
>>> y = [90.3] * 5 + [80.1] * 15 + [20.4] * 5 + [20.1] * 25 + [21.2] * 8 + [49] * 30
160-
>>> enc_auto = TargetEncoder(smooth="auto")
161-
>>> X_trans = enc_auto.fit_transform(X, y)
162-
163-
>>> # A high `smooth` parameter puts more weight on global mean on the categorical
164-
>>> # encodings:
165-
>>> enc_high_smooth = TargetEncoder(smooth=5000.0).fit(X, y)
166-
>>> enc_high_smooth.target_mean_
167-
np.float64(44.3)
168-
>>> enc_high_smooth.encodings_
169-
[array([44.1, 44.4, 44.3])]
170-
171-
>>> # On the other hand, a low `smooth` parameter puts more weight on target
172-
>>> # conditioned on the value of the categorical:
173-
>>> enc_low_smooth = TargetEncoder(smooth=1.0).fit(X, y)
174-
>>> enc_low_smooth.encodings_
175-
[array([21, 80.8, 43.2])]
176-
"""
177-
17811
encodings_: list[ndarray]
17912
categories_: list[ndarray]
18013
target_type_: str

0 commit comments

Comments
 (0)