Skip to content

Commit ed60562

Browse files
solegalliglemaitre
andauthored
DOC improve documentation for RENN and AllKNN (#1022)
Co-authored-by: Guillaume Lemaitre <[email protected]>
1 parent 9a59070 commit ed60562

File tree

1 file changed

+27
-5
lines changed

1 file changed

+27
-5
lines changed

doc/under_sampling.rst

Lines changed: 27 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -274,6 +274,9 @@ The parameter ``n_neighbors`` allows to give a classifier subclassed from
274274
``KNeighborsMixin`` from scikit-learn to find the nearest neighbors and make
275275
the decision to keep a given sample or not.
276276

277+
Repeated Edited Nearest Neighbours
278+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
279+
277280
:class:`RepeatedEditedNearestNeighbours` extends
278281
:class:`EditedNearestNeighbours` by repeating the algorithm multiple times
279282
:cite:`tomek1976experiment`. Generally, repeating the algorithm will delete
@@ -285,9 +288,23 @@ more data::
285288
>>> print(sorted(Counter(y_resampled).items()))
286289
[(0, 64), (1, 208), (2, 4551)]
287290

288-
:class:`AllKNN` differs from the previous
289-
:class:`RepeatedEditedNearestNeighbours` since the number of neighbors of the
290-
internal nearest neighbors algorithm is increased at each iteration
291+
The user can set up the number of times the edited nearest neighbours method should be
292+
repeated through the parameter `max_iter`.
293+
294+
The repetitions will stop when:
295+
296+
1. the maximum number of iterations is reached, or
297+
2. no more observations are removed, or
298+
3. one of the majority classes becomes a minority class, or
299+
4. one of the majority classes disappears during the undersampling.
300+
301+
All KNN
302+
~~~~~~~
303+
304+
:class:`AllKNN` is a variation of the
305+
:class:`RepeatedEditedNearestNeighbours` where the number of neighbours evaluated at
306+
each round of :class:`EditedNearestNeighbours` increases. It starts by editing based on
307+
1-Nearest Neighbour, and it increases the neighbourhood by 1 at each iteration
291308
:cite:`tomek1976experiment`::
292309

293310
>>> from imblearn.under_sampling import AllKNN
@@ -296,8 +313,13 @@ internal nearest neighbors algorithm is increased at each iteration
296313
>>> print(sorted(Counter(y_resampled).items()))
297314
[(0, 64), (1, 220), (2, 4601)]
298315

299-
In the example below, it can be seen that the three algorithms have similar
300-
impact by cleaning noisy samples next to the boundaries of the classes.
316+
:class:`AllKNN` stops cleaning when the maximum number of neighbours to examine, which
317+
is determined by the user through the parameter `n_neighbors` is reached, or when the
318+
majority class becomes the minority class.
319+
320+
In the example below, we see that :class:`EditedNearestNeighbours`,
321+
:class:`RepeatedEditedNearestNeighbours` and :class:`AllKNN` have similar impact when
322+
cleaning "noisy" samples at the boundaries between classes.
301323

302324
.. image:: ./auto_examples/under-sampling/images/sphx_glr_plot_comparison_under_sampling_004.png
303325
:target: ./auto_examples/under-sampling/plot_comparison_under_sampling.html

0 commit comments

Comments
 (0)