Skip to content

Commit a55deec

Browse files
committed
DOC fix issues with SMOTEN doc
1 parent e3df215 commit a55deec

File tree

5 files changed

+14
-14
lines changed

5 files changed

+14
-14
lines changed

README.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -167,8 +167,8 @@ Below is a list of the methods currently implemented in this module.
167167
* Over-sampling
168168
1. Random minority over-sampling with replacement
169169
2. SMOTE - Synthetic Minority Over-sampling Technique [8]_
170-
3. SMOTENC - SMOTE for Nominal Continuous [8]_
171-
4. SMOTEN - SMMOTE for Nominal only [8]_
170+
3. SMOTENC - SMOTE for Nominal and Continuous [8]_
171+
4. SMOTEN - SMOTE for Nominal [8]_
172172
5. bSMOTE(1 & 2) - Borderline SMOTE of types 1 and 2 [9]_
173173
6. SVM SMOTE - Support Vectors SMOTE [10]_
174174
7. ADASYN - Adaptive synthetic sampling approach for imbalanced learning [15]_

doc/over_sampling.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -211,16 +211,16 @@ Therefore, it can be seen that the samples generated in the first and last
211211
columns are belonging to the same categories originally presented without any
212212
other extra interpolation.
213213

214-
However, :class:`SMOTENC` is working with data composed of categorical data
215-
only. WHen data are made of only nominal categorical data, one can use the
216-
:class:`SMOTEN` variant :cite:`chawla2002smote`. The algorithm changes in
217-
two ways:
214+
However, :class:`SMOTENC` is only working when data is a mixed of numerical and
215+
categorical features. If data are made of only nominal categorical data, one
216+
can use the :class:`SMOTEN` variant :cite:`chawla2002smote`. The algorithm
217+
changes in two ways:
218218

219219
* the nearest neighbors search does not rely on the Euclidean distance. Indeed,
220220
the value difference metric (VDM) also implemented in the class
221221
:class:`~imblearn.metrics.ValueDifferenceMetric` is used.
222-
* the new sample generation is based on majority vote per feature to generate
223-
the most common category seen in the neighbors samples.
222+
* a new sample is generated where each feature value corresponds to the most
223+
common category seen in the neighbors samples belonging to the same class.
224224

225225
Let's take the following example::
226226

imblearn/over_sampling/_adasyn.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ class ADASYN(BaseOverSampler):
5252
5353
SMOTENC : Over-sample using SMOTE for continuous and categorical features.
5454
55-
SMOTEN : Over-sample using the SMOTE variable specifically for categorical
55+
SMOTEN : Over-sample using the SMOTE variant specifically for nominal
5656
features only.
5757
5858
SVMSMOTE : Over-sample using SVM-SMOTE variant.

imblearn/over_sampling/_random_over_sampler.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ class RandomOverSampler(BaseOverSampler):
7676
7777
SMOTENC : Over-sample using SMOTE for continuous and categorical features.
7878
79-
SMOTEN : Over-sample using the SMOTE variable specifically for categorical
79+
SMOTEN : Over-sample using the SMOTE variant specifically for nominal
8080
features only.
8181
8282
SVMSMOTE : Over-sample using SVM-SMOTE variant.

imblearn/over_sampling/_smote.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -450,7 +450,7 @@ class SVMSMOTE(BaseSMOTE):
450450
451451
SMOTENC : Over-sample using SMOTE for continuous and categorical features.
452452
453-
SMOTEN : Over-sample using the SMOTE variable specifically for categorical
453+
SMOTEN : Over-sample using the SMOTE variant specifically for nominal
454454
features only.
455455
456456
BorderlineSMOTE : Over-sample using Borderline-SMOTE.
@@ -648,7 +648,7 @@ class SMOTE(BaseSMOTE):
648648
--------
649649
SMOTENC : Over-sample using SMOTE for continuous and categorical features.
650650
651-
SMOTEN : Over-sample using the SMOTE variable specifically for categorical
651+
SMOTEN : Over-sample using the SMOTE variant specifically for nominal
652652
features only.
653653
654654
BorderlineSMOTE : Over-sample using the borderline-SMOTE variant.
@@ -774,7 +774,7 @@ class SMOTENC(SMOTE):
774774
--------
775775
SMOTE : Over-sample using SMOTE.
776776
777-
SMOTEN : Over-sample using the SMOTE variable specifically for categorical
777+
SMOTEN : Over-sample using the SMOTE variant specifically for nominal
778778
features only.
779779
780780
SVMSMOTE : Over-sample using SVM-SMOTE variant.
@@ -1068,7 +1068,7 @@ class KMeansSMOTE(BaseSMOTE):
10681068
10691069
SMOTENC : Over-sample using SMOTE for continuous and categorical features.
10701070
1071-
SMOTEN : Over-sample using the SMOTE variable specifically for categorical
1071+
SMOTEN : Over-sample using the SMOTE variant specifically for nominal
10721072
features only.
10731073
10741074
SVMSMOTE : Over-sample using SVM-SMOTE variant.

0 commit comments

Comments
 (0)