@@ -244,10 +244,7 @@ Edited data set using nearest neighbours
244
244
"edit" the dataset by removing samples which do not agree "enough" with their
245
245
neighboorhood :cite: `wilson1972asymptotic `. For each sample in the class to be
246
246
under-sampled, the nearest-neighbours are computed and if the selection
247
- criterion is not fulfilled, the sample is removed. Two selection criteria are
248
- currently available: (i) the majority (i.e., ``kind_sel='mode' ``) or (ii) all
249
- (i.e., ``kind_sel='all' ``) the nearest-neighbors have to belong to the same
250
- class than the sample inspected to keep it in the dataset::
247
+ criterion is not fulfilled, the sample is removed::
251
248
252
249
>>> sorted(Counter(y).items())
253
250
[(0, 64), (1, 262), (2, 4674)]
@@ -257,6 +254,22 @@ class than the sample inspected to keep it in the dataset::
257
254
>>> print(sorted(Counter(y_resampled).items()))
258
255
[(0, 64), (1, 213), (2, 4568)]
259
256
257
+ Two selection criteria are currently available: (i) the majority (i.e.,
258
+ ``kind_sel='mode' ``) or (ii) all (i.e., ``kind_sel='all' ``) the
259
+ nearest-neighbors have to belong to the same class than the sample inspected to
260
+ keep it in the dataset. Thus, it implies that `kind_sel='all' ` will be less
261
+ conservative than `kind_sel='mode' `, and more samples will be excluded in
262
+ the former strategy than the latest::
263
+
264
+ >>> enn = EditedNearestNeighbours(kind_sel="all")
265
+ >>> X_resampled, y_resampled = enn.fit_resample(X, y)
266
+ >>> print(sorted(Counter(y_resampled).items()))
267
+ [(0, 64), (1, 213), (2, 4568)]
268
+ >>> enn = EditedNearestNeighbours(kind_sel="mode")
269
+ >>> X_resampled, y_resampled = enn.fit_resample(X, y)
270
+ >>> print(sorted(Counter(y_resampled).items()))
271
+ [(0, 64), (1, 234), (2, 4666)]
272
+
260
273
The parameter ``n_neighbors `` allows to give a classifier subclassed from
261
274
``KNeighborsMixin `` from scikit-learn to find the nearest neighbors and make
262
275
the decision to keep a given sample or not.
0 commit comments