@@ -9,41 +9,45 @@ Introduction
99API's of imbalanced-learn samplers
1010----------------------------------
1111
12- The available samplers follows the scikit-learn API using the base estimator
13- and adding a sampling functionality through the ``sample `` method:
12+ The available samplers follow the
13+ `scikit-learn API <https://scikit-learn.org/stable/getting_started.html#fitting-and-predicting-estimator-basics >`_
14+ using the base estimator
15+ and incorporating a sampling functionality via the ``sample `` method:
1416
1517:Estimator:
1618
17- The base object, implements a ``fit `` method to learn from data, either ::
19+ The base object, implements a ``fit `` method to learn from data::
1820
1921 estimator = obj.fit(data, targets)
2022
2123:Resampler:
2224
23- To resample a data sets, each sampler implements::
25+ To resample a data sets, each sampler implements a `` fit_resample `` method ::
2426
2527 data_resampled, targets_resampled = obj.fit_resample(data, targets)
2628
27- Imbalanced-learn samplers accept the same inputs that in scikit-learn:
29+ Imbalanced-learn samplers accept the same inputs as scikit-learn estimators :
2830
29- * `data `:
30- * 2-D :class: `list `,
31- * 2-D :class: `numpy.ndarray `,
32- * :class: `pandas.DataFrame `,
33- * :class: `scipy.sparse.csr_matrix ` or :class: `scipy.sparse.csc_matrix `;
34- * `targets `:
35- * 1-D :class: `numpy.ndarray `,
36- * :class: `pandas.Series `.
31+ * `data `, 2-dimensional array-like structures, such as:
32+ * Python's list of lists :class: `list `,
33+ * Numpy arrays :class: `numpy.ndarray `,
34+ * Panda dataframes :class: `pandas.DataFrame `,
35+ * Scipy sparse matrices :class: `scipy.sparse.csr_matrix ` or :class: `scipy.sparse.csc_matrix `;
36+
37+ * `targets `, 1-dimensional array-like structures, such as:
38+ * Numpy arrays :class: `numpy.ndarray `,
39+ * Pandas series :class: `pandas.Series `.
3740
3841The output will be of the following type:
3942
40- * `data_resampled `:
41- * 2-D :class: `numpy.ndarray `,
42- * :class: `pandas.DataFrame `,
43- * :class: `scipy.sparse.csr_matrix ` or :class: `scipy.sparse.csc_matrix `;
44- * `targets_resampled `:
45- * 1-D :class: `numpy.ndarray `,
46- * :class: `pandas.Series `.
43+ * `data_resampled `, 2-dimensional aray-like structures, such as:
44+ * Numpy arrays :class: `numpy.ndarray `,
45+ * Pandas dataframes :class: `pandas.DataFrame `,
46+ * Scipy sparse matrices :class: `scipy.sparse.csr_matrix ` or :class: `scipy.sparse.csc_matrix `;
47+
48+ * `targets_resampled `, 1-dimensional array-like structures, such as:
49+ * Numpy arrays :class: `numpy.ndarray `,
50+ * Pandas series :class: `pandas.Series `.
4751
4852.. topic :: Pandas in/out
4953
@@ -62,18 +66,19 @@ The output will be of the following type:
6266Problem statement regarding imbalanced data sets
6367------------------------------------------------
6468
65- The learning phase and the subsequent prediction of machine learning algorithms
66- can be affected by the problem of imbalanced data set. The balancing issue
67- corresponds to the difference of the number of samples in the different
68- classes. We illustrate the effect of training a linear SVM classifier with
69- different levels of class balancing.
69+ The learning and prediction phrases of machine learning algorithms
70+ can be impacted by the issue of **imbalanced datasets **. This imbalance
71+ refers to the difference in the number of samples across different classes.
72+ We demonstrate the effect of training a `Logistic Regression classifier
73+ <https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html> `_
74+ with varying levels of class balancing by adjusting their weights.
7075
7176.. image :: ./auto_examples/over-sampling/images/sphx_glr_plot_comparison_over_sampling_001.png
7277 :target: ./auto_examples/over-sampling/plot_comparison_over_sampling.html
7378 :scale: 60
7479 :align: center
7580
76- As expected, the decision function of the linear SVM varies greatly depending
77- upon how imbalanced the data is. With a greater imbalanced ratio, the decision
78- function favors the class with the larger number of samples, usually referred
79- as the majority class.
81+ As expected, the decision function of the Logistic Regression classifier varies significantly
82+ depending on how imbalanced the data is. With a greater imbalance ratio, the decision function
83+ tends to favour the class with the larger number of samples, usually referred to as the
84+ ** majority class ** .
0 commit comments