Skip to content

Commit d73f287

Browse files
author
Guillaume Lemaitre
committed
Update Readme
1 parent eede039 commit d73f287

File tree

1 file changed

+41
-29
lines changed

1 file changed

+41
-29
lines changed

README.md

Lines changed: 41 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -63,46 +63,58 @@ Bellow is a list of the methods currently implemented in this module.
6363

6464
* Under-sampling
6565
1. Random majority under-sampling with replacement
66-
2. [Extraction of majority-minority Tomek links][1]
66+
2. [Extraction of majority-minority Tomek links](ref1)
6767
3. Under-sampling with Cluster Centroids
68-
4. [NearMiss-(1 & 2 & 3)][2]
69-
5. [Condensend Nearest Neighbour][3]
70-
6. [One-Sided Selection][4]
71-
7. [Neighboorhood Cleaning Rule][5]
72-
8. [Edited Nearest Neighbours][6]
73-
9. [Instance Hardness Threshold][7]
68+
4. [NearMiss-(1 & 2 & 3)](ref2)
69+
5. [Condensend Nearest Neighbour](ref3)
70+
6. [One-Sided Selection](ref4)
71+
7. [Neighboorhood Cleaning Rule](ref5)
72+
8. [Edited Nearest Neighbours](ref6)
73+
9. [Instance Hardness Threshold](ref7)
7474

7575
* Over-sampling
7676
1. Random minority over-sampling with replacement
77-
2. [SMOTE - Synthetic Minority Over-sampling Technique][8]
78-
3. [bSMOTE(1 & 2) - Borderline SMOTE of types 1 and 2][9]
79-
4. [SVM SMOTE - Support Vectors SMOTE][10]
77+
2. [SMOTE - Synthetic Minority Over-sampling Technique](ref8)
78+
3. [bSMOTE(1 & 2) - Borderline SMOTE of types 1 and 2](ref9)
79+
4. [SVM SMOTE - Support Vectors SMOTE](ref10)
8080

8181
* Over-sampling followed by under-sampling
82-
1. [SMOTE + Tomek links][12]
83-
2. [SMOTE + ENN][11]
82+
1. [SMOTE + Tomek links](ref12)
83+
2. [SMOTE + ENN](ref11)
8484

8585
* Ensemble sampling
86-
1. [EasyEnsemble][13]
87-
2. [BalanceCascade][13]
86+
1. [EasyEnsemble](ref13)
87+
2. [BalanceCascade](ref13)
8888

8989
The different algorithms are presented in the [following notebook](https://github.com/fmfn/UnbalancedDataset/blob/master/examples/plot_unbalanced_dataset.ipynb).
9090

9191
This is a work in progress. Any comments, suggestions or corrections are welcome.
9292

9393
References:
94-
===========
95-
96-
[1]: I. Tomek, [“Two modifications of CNN,”](http://sci2s.ugr.es/keel/pdf/algorithm/articulo/1976-Tomek-IEEETSMC(2).pdf) In Systems, Man, and Cybernetics, IEEE Transactions on, vol. 6, pp 769-772, 2010.
97-
[2]: I. Mani, I. Zhang. [“kNN approach to unbalanced data distributions: a case study involving information extraction,”](http://web0.site.uottawa.ca:4321/~nat/Workshop2003/jzhang.pdf) In Proceedings of workshop on learning from imbalanced datasets, 2003.
98-
[3]: P. Hart, [“The condensed nearest neighbor rule,”](http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1054155&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1054155) In Information Theory, IEEE Transactions on, vol. 14(3), pp. 515-516, 1968.
99-
[4]: M. Kubat, S. Matwin, [“Addressing the curse of imbalanced training sets: one-sided selection,”](http://sci2s.ugr.es/keel/pdf/algorithm/congreso/kubat97addressing.pdf) In ICML, vol. 97, pp. 179-186, 1997.
100-
[5]: J. Laurikkala, [“Improving identification of difficult small classes by balancing class distribution,”](http://sci2s.ugr.es/keel/pdf/algorithm/congreso/2001-Laurikkala-LNCS.pdf) Springer Berlin Heidelberg, 2001.
101-
[6]: D. Wilson, [“Asymptotic Properties of Nearest Neighbor Rules Using Edited Data,”](http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=4309137&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4309137) In IEEE Transactions on Systems, Man, and Cybernetrics, vol. 2 (3), pp. 408-421, 1972.
102-
[7]: D. Smith, Michael R., Tony Martinez, and Christophe Giraud-Carrier. [“An instance level analysis of data complexity.”](http://axon.cs.byu.edu/papers/smith.ml2013.pdf) Machine learning 95.2 (2014): 225-256.
103-
[8]: N. V. Chawla, K. W. Bowyer, L. O.Hall, W. P. Kegelmeyer, [“SMOTE: synthetic minority over-sampling technique,”](https://www.jair.org/media/953/live-953-2037-jair.pdf) Journal of artificial intelligence research, 321-357, 2002.
104-
[9]: H. Han, W. Wen-Yuan, M. Bing-Huan, [“Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning,”](http://sci2s.ugr.es/keel/keel-dataset/pdfs/2005-Han-LNCS.pdf) Advances in intelligent computing, 878-887, 2005.
105-
[10]: H. M. Nguyen, E. W. Cooper, K. Kamei, [“Borderline over-sampling for imbalanced data classification,”](https://www.google.fr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CDAQFjABahUKEwjH7qqamr_HAhWLthoKHUr0BIo&url=http%3A%2F%2Fousar.lib.okayama-u.ac.jp%2Ffile%2F19617%2FIWCIA2009_A1005.pdf&ei=a7zZVYeNDIvtasrok9AI&usg=AFQjCNHoQ6oC_dH1M1IncBP0ZAaKj8a8Cw&sig2=lh32CHGjs5WBqxa_l0ylbg) International Journal of Knowledge Engineering and Soft Data Paradigms, 3(1), pp.4-21, 2001.
106-
[11]: G. Batista, R. C. Prati, M. C. Monard. [“A study of the behavior of several methods for balancing machine learning training data,”](http://www.sigkdd.org/sites/default/files/issues/6-1-2004-06/batista.pdf) ACM Sigkdd Explorations Newsletter 6 (1), 20-29, 2004.
107-
[12]: G. Batista, B. Bazzan, M. Monard, [“Balancing Training Data for Automated Annotation of Keywords: a Case Study,”)[(http://www.icmc.usp.br/~gbatista/files/wob2003.pdf)] In WOB, 10-18, 2003.
108-
[13]: X. Y. Liu, J. Wu and Z. H. Zhou, [“Exploratory Undersampling for Class-Imbalance Learning,”](http://cse.seu.edu.cn/people/xyliu/publication/tsmcb09.pdf) in IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 39, no. 2, pp. 539-550, April 2009.
94+
-----------
95+
96+
<a name="ref1"></a>[1]: I. Tomek, [“Two modifications of CNN,”](http://sci2s.ugr.es/keel/pdf/algorithm/articulo/1976-Tomek-IEEETSMC(2).pdf) In Systems, Man, and Cybernetics, IEEE Transactions on, vol. 6, pp 769-772, 2010.
97+
98+
<a name="ref2"></a>[2]: I. Mani, I. Zhang. [“kNN approach to unbalanced data distributions: a case study involving information extraction,”](http://web0.site.uottawa.ca:4321/~nat/Workshop2003/jzhang.pdf) In Proceedings of workshop on learning from imbalanced datasets, 2003.
99+
100+
<a name="ref3"></a>[3]: P. Hart, [“The condensed nearest neighbor rule,”](http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1054155&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1054155) In Information Theory, IEEE Transactions on, vol. 14(3), pp. 515-516, 1968.
101+
102+
<a name="ref4"></a>[4]: M. Kubat, S. Matwin, [“Addressing the curse of imbalanced training sets: one-sided selection,”](http://sci2s.ugr.es/keel/pdf/algorithm/congreso/kubat97addressing.pdf) In ICML, vol. 97, pp. 179-186, 1997.
103+
104+
<a name="ref5"></a>[5]: J. Laurikkala, [“Improving identification of difficult small classes by balancing class distribution,”](http://sci2s.ugr.es/keel/pdf/algorithm/congreso/2001-Laurikkala-LNCS.pdf) Springer Berlin Heidelberg, 2001.
105+
106+
<a name="ref6"></a>[6]: D. Wilson, [“Asymptotic Properties of Nearest Neighbor Rules Using Edited Data,”](http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=4309137&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4309137) In IEEE Transactions on Systems, Man, and Cybernetrics, vol. 2 (3), pp. 408-421, 1972.
107+
108+
<a name="ref7"></a>[7]: D. Smith, Michael R., Tony Martinez, and Christophe Giraud-Carrier. [“An instance level analysis of data complexity.”](http://axon.cs.byu.edu/papers/smith.ml2013.pdf) Machine learning 95.2 (2014): 225-256.
109+
110+
<a name="ref8"></a>[8]: N. V. Chawla, K. W. Bowyer, L. O.Hall, W. P. Kegelmeyer, [“SMOTE: synthetic minority over-sampling technique,”](https://www.jair.org/media/953/live-953-2037-jair.pdf) Journal of artificial intelligence research, 321-357, 2002.
111+
112+
<a name="ref9"></a>[9]: H. Han, W. Wen-Yuan, M. Bing-Huan, [“Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning,”](http://sci2s.ugr.es/keel/keel-dataset/pdfs/2005-Han-LNCS.pdf) Advances in intelligent computing, 878-887, 2005.
113+
114+
<a name="ref10"></a>[10]: H. M. Nguyen, E. W. Cooper, K. Kamei, [“Borderline over-sampling for imbalanced data classification,”](https://www.google.fr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CDAQFjABahUKEwjH7qqamr_HAhWLthoKHUr0BIo&url=http%3A%2F%2Fousar.lib.okayama-u.ac.jp%2Ffile%2F19617%2FIWCIA2009_A1005.pdf&ei=a7zZVYeNDIvtasrok9AI&usg=AFQjCNHoQ6oC_dH1M1IncBP0ZAaKj8a8Cw&sig2=lh32CHGjs5WBqxa_l0ylbg) International Journal of Knowledge Engineering and Soft Data Paradigms, 3(1), pp.4-21, 2001.
115+
116+
<a name="ref11"></a>[11]: G. Batista, R. C. Prati, M. C. Monard. [“A study of the behavior of several methods for balancing machine learning training data,”](http://www.sigkdd.org/sites/default/files/issues/6-1-2004-06/batista.pdf) ACM Sigkdd Explorations Newsletter 6 (1), 20-29, 2004.
117+
118+
<a name="ref12"></a>[12]: G. Batista, B. Bazzan, M. Monard, [“Balancing Training Data for Automated Annotation of Keywords: a Case Study,”)[(http://www.icmc.usp.br/~gbatista/files/wob2003.pdf)] In WOB, 10-18, 2003.
119+
120+
<a name="ref13"></a>[13]: X. Y. Liu, J. Wu and Z. H. Zhou, [“Exploratory Undersampling for Class-Imbalance Learning,”](http://cse.seu.edu.cn/people/xyliu/publication/tsmcb09.pdf) in IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 39, no. 2, pp. 539-550, April 2009.

0 commit comments

Comments
 (0)