Skip to content

Improve Model Performance on Imbalanced Data via SMOTE and Scaling #39

@Arin-12

Description

@Arin-12

Description

The current machine learning models in this project (specifically the Random Forest implementation) show a significant disparity between Accuracy (~90%) and Recall (~50%). This indicates a class imbalance issue where the model struggles to identify the minority class effectively.

Proposed Improvement

I have implemented a pipeline that uses:

  1. SMOTE (Synthetic Minority Over-sampling Technique) to balance the training set.
  2. StandardScaler to normalize feature distributions.

Results

These changes result in a much more balanced and reliable model:

  • Recall: Improved from 0.50 to 0.63 (+26% gain)
  • F1-Score: Improved from 0.59 to 0.64
  • Precision: Maintained at a healthy 0.65

Checklist

  • Code follows Python 3.x standards.
  • Descriptive comments included for all new logic.
  • MIT License added to the top of the file.

I have the code ready and would like to be assigned to this issue to submit a Pull Request!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions