Skip to content

SMOTE sampling method for imbalanced data #11279

@exalate-issue-sync

Description

@exalate-issue-sync

Add [SMOTE|http://www.jair.org/papers/paper953.html], "Synthetic Minority Over-sampling Technique" for handling imbalanced datasets/ This is a more sophisticated means of balancing the dataset vs straight-forward over/undersampling that we currently have implemented in H2O via the balance_classes, class_sampling_factors, max_after_balance_size arguments.

Overview (see #4): http://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/

We could also consider the [ROSE|https://www.openstarts.units.it/dspace/bitstream/10077/4002/1/Menardi%20Torelli%20DEAMS%20WPS2.pdf] method (available in the caret R package).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions