@@ -274,6 +274,9 @@ The parameter ``n_neighbors`` allows to give a classifier subclassed from
274
274
``KNeighborsMixin `` from scikit-learn to find the nearest neighbors and make
275
275
the decision to keep a given sample or not.
276
276
277
+ Repeated Edited Nearest Neighbours
278
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
279
+
277
280
:class: `RepeatedEditedNearestNeighbours ` extends
278
281
:class: `EditedNearestNeighbours ` by repeating the algorithm multiple times
279
282
:cite: `tomek1976experiment `. Generally, repeating the algorithm will delete
@@ -285,9 +288,23 @@ more data::
285
288
>>> print(sorted(Counter(y_resampled).items()))
286
289
[(0, 64), (1, 208), (2, 4551)]
287
290
288
- :class: `AllKNN ` differs from the previous
289
- :class: `RepeatedEditedNearestNeighbours ` since the number of neighbors of the
290
- internal nearest neighbors algorithm is increased at each iteration
291
+ The user can set up the number of times the edited nearest neighbours method should be
292
+ repeated through the parameter `max_iter `.
293
+
294
+ The repetitions will stop when:
295
+
296
+ 1. the maximum number of iterations is reached, or
297
+ 2. no more observations are removed, or
298
+ 3. one of the majority classes becomes a minority class, or
299
+ 4. one of the majority classes disappears during the undersampling.
300
+
301
+ All KNN
302
+ ~~~~~~~
303
+
304
+ :class: `AllKNN ` is a variation of the
305
+ :class: `RepeatedEditedNearestNeighbours ` where the number of neighbours evaluated at
306
+ each round of :class: `EditedNearestNeighbours ` increases. It starts by editing based on
307
+ 1-Nearest Neighbour, and it increases the neighbourhood by 1 at each iteration
291
308
:cite: `tomek1976experiment `::
292
309
293
310
>>> from imblearn.under_sampling import AllKNN
@@ -296,8 +313,13 @@ internal nearest neighbors algorithm is increased at each iteration
296
313
>>> print(sorted(Counter(y_resampled).items()))
297
314
[(0, 64), (1, 220), (2, 4601)]
298
315
299
- In the example below, it can be seen that the three algorithms have similar
300
- impact by cleaning noisy samples next to the boundaries of the classes.
316
+ :class: `AllKNN ` stops cleaning when the maximum number of neighbours to examine, which
317
+ is determined by the user through the parameter `n_neighbors ` is reached, or when the
318
+ majority class becomes the minority class.
319
+
320
+ In the example below, we see that :class: `EditedNearestNeighbours `,
321
+ :class: `RepeatedEditedNearestNeighbours ` and :class: `AllKNN ` have similar impact when
322
+ cleaning "noisy" samples at the boundaries between classes.
301
323
302
324
.. image :: ./auto_examples/under-sampling/images/sphx_glr_plot_comparison_under_sampling_004.png
303
325
:target: ./auto_examples/under-sampling/plot_comparison_under_sampling.html
0 commit comments