@@ -77,6 +77,12 @@ and are meant for cleaning the feature space.
77
77
Controlled under-sampling techniques
78
78
------------------------------------
79
79
80
+ Controlled under-sampling techniques reduce the number of observations from the
81
+ targeted classes to a number specified by the user.
82
+
83
+ Random under-sampling
84
+ ^^^^^^^^^^^^^^^^^^^^^
85
+
80
86
:class: `RandomUnderSampler ` is a fast and easy way to balance the data by
81
87
randomly selecting a subset of data for the targeted classes::
82
88
@@ -91,9 +97,9 @@ randomly selecting a subset of data for the targeted classes::
91
97
:scale: 60
92
98
:align: center
93
99
94
- :class: `RandomUnderSampler ` allows to bootstrap the data by setting
95
- ``replacement `` to ``True ``. The resampling with multiple classes is performed
96
- by considering independently each targeted class ::
100
+ :class: `RandomUnderSampler ` allows bootstrapping the data by setting
101
+ ``replacement `` to ``True ``. When there are multiple classes, each targeted class is
102
+ under-sampled independently::
97
103
98
104
>>> import numpy as np
99
105
>>> print(np.vstack([tuple(row) for row in X_resampled]).shape)
@@ -103,8 +109,8 @@ by considering independently each targeted class::
103
109
>>> print(np.vstack(np.unique([tuple(row) for row in X_resampled], axis=0)).shape)
104
110
(181, 2)
105
111
106
- In addition, :class: `RandomUnderSampler ` allows to sample heterogeneous data
107
- (e.g. containing some strings) ::
112
+ :class: `RandomUnderSampler ` handles heterogeneous data types, i.e. numerical,
113
+ categorical, dates, etc. ::
108
114
109
115
>>> X_hetero = np.array([['xxx', 1, 1.0], ['yyy', 2, 2.0], ['zzz', 3, 3.0]],
110
116
... dtype=object)
@@ -116,7 +122,8 @@ In addition, :class:`RandomUnderSampler` allows to sample heterogeneous data
116
122
>>> print(y_resampled)
117
123
[0 1]
118
124
119
- It would also work with pandas dataframe::
125
+ :class: `RandomUnderSampler ` also supports pandas dataframes as input for
126
+ undersampling::
120
127
121
128
>>> from sklearn.datasets import fetch_openml
122
129
>>> df_adult, y_adult = fetch_openml(
0 commit comments