Skip to content

Categorical variables cause NaN values.Β #51

@JelkeW

Description

@JelkeW

Hi,
I had an issue with the resampled data containing lots of NaN values and thus SMOGN not running.
For anyone who is familiar with it: oops! synthetic data contains missing values
During debugging I figured out, that the NaN values only occur on categorical variables.

Two fixes for anyone encountering the problem:

  1. Fix on data side
    Change all the column in your dataframe from type category to type object
    data[column] = data[column].astype("object")
  2. Fix on SMOGN side
    In smogn.over_sampling change
    nom_dtypes = ["object", "bool", "datetime64"]
    to
    nom_dtypes = ["object", "bool", "datetime64", "category"]

Took me a bit of time to figure it out. Hope it helps 😊

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions