Skip to content

Commit d597b05

Browse files
solegalliglemaitre
andauthored
DOC improve documentation of RandomUnderSampler (#1019)
Co-authored-by: Guillaume Lemaitre <[email protected]>
1 parent ed60562 commit d597b05

File tree

1 file changed

+13
-6
lines changed

1 file changed

+13
-6
lines changed

doc/under_sampling.rst

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,12 @@ and are meant for cleaning the feature space.
7777
Controlled under-sampling techniques
7878
------------------------------------
7979

80+
Controlled under-sampling techniques reduce the number of observations from the
81+
targeted classes to a number specified by the user.
82+
83+
Random under-sampling
84+
^^^^^^^^^^^^^^^^^^^^^
85+
8086
:class:`RandomUnderSampler` is a fast and easy way to balance the data by
8187
randomly selecting a subset of data for the targeted classes::
8288

@@ -91,9 +97,9 @@ randomly selecting a subset of data for the targeted classes::
9197
:scale: 60
9298
:align: center
9399

94-
:class:`RandomUnderSampler` allows to bootstrap the data by setting
95-
``replacement`` to ``True``. The resampling with multiple classes is performed
96-
by considering independently each targeted class::
100+
:class:`RandomUnderSampler` allows bootstrapping the data by setting
101+
``replacement`` to ``True``. When there are multiple classes, each targeted class is
102+
under-sampled independently::
97103

98104
>>> import numpy as np
99105
>>> print(np.vstack([tuple(row) for row in X_resampled]).shape)
@@ -103,8 +109,8 @@ by considering independently each targeted class::
103109
>>> print(np.vstack(np.unique([tuple(row) for row in X_resampled], axis=0)).shape)
104110
(181, 2)
105111

106-
In addition, :class:`RandomUnderSampler` allows to sample heterogeneous data
107-
(e.g. containing some strings)::
112+
:class:`RandomUnderSampler` handles heterogeneous data types, i.e. numerical,
113+
categorical, dates, etc.::
108114

109115
>>> X_hetero = np.array([['xxx', 1, 1.0], ['yyy', 2, 2.0], ['zzz', 3, 3.0]],
110116
... dtype=object)
@@ -116,7 +122,8 @@ In addition, :class:`RandomUnderSampler` allows to sample heterogeneous data
116122
>>> print(y_resampled)
117123
[0 1]
118124

119-
It would also work with pandas dataframe::
125+
:class:`RandomUnderSampler` also supports pandas dataframes as input for
126+
undersampling::
120127

121128
>>> from sklearn.datasets import fetch_openml
122129
>>> df_adult, y_adult = fetch_openml(

0 commit comments

Comments
 (0)