-
Notifications
You must be signed in to change notification settings - Fork 85
Open
Description
Hey!
Any idea on why is the algorithm creating a new class (value) for my target? I'm analyzing the Room_Occupancy_Dataset from Kaggle, and in this dataset the target only has four values for occupancy (0, 1, 2, 3 people in the room), but it is expected for the model to be able to predict other cases that have more than 3 people in the room. SMOGN is not balancing the data correctly, because the majority class (0) remains equal, and the minority classes (1,2,3) are not over-sampled. Plus, it creates an extra value (4). I don't know if this is a bug, but i hope you can help me fix it. This is my 2d array:
rg_mtrx = [
[0, 0, 0], ## under-sample ("majority")
[1, 1, 0], ## over-sample ("minority")
[2, 1, 0], ## over-sample ("minority")
[3, 1, 0], ## over-sample ("minority")
]
## conduct smogn
balanced_smogn = smogn.smoter(
## main arguments
data = df, ## pandas dataframe
y = 'Room_Occupancy_Count', ## string ('header name')
k = 5, ## positive integer (k < n)
pert = 0.02, ## real number (0 < R < 1)
samp_method = 'extreme', ## string ('balance' or 'extreme')
drop_na_col = False, ## boolean (True or False)
drop_na_row = False, ## boolean (True or False)
replace = True, ## boolean (True or False)
## phi relevance arguments
rel_thres = 0.50, ## real number (0 < R < 1)
rel_method = 'manual', ## string ('auto' or 'manual')
# rel_xtrm_type = 'both', ## unused (rel_method = 'manual')
# rel_coef = 1.50, ## unused (rel_method = 'manual')
rel_ctrl_pts_rg = rg_mtrx ## 2d array (format: [x, y])
)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels