-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
Question
Length mismatch error
Further Information
I am trying to use hyperimpute on my custom data. I am using the following setup:
method = "hyperimpute"
plugin = Imputers().get(method,
optimizer = "hyperband",
classifier_seed=["logistic_regression", "catboost", "xgboost", "random_forest"],
regression_seed=[
"linear_regression",
"catboost_regressor",
"xgboost_regressor",
"random_forest_regressor",
],
# class_threshold: int. how many max unique items must be in the column to be is associated with categorical
class_threshold=5,
# imputation_order: int. 0 - ascending, 1 - descending, 2 - random
imputation_order=2,
# n_inner_iter: int. number of imputation iterations
n_inner_iter=10,
# select_model_by_column: bool. If true, select a different model for each column. Else, it reuses the model chosen for the first column.
select_model_by_column=True,
# select_model_by_iteration: bool. If true, selects new models for each iteration. Else, it reuses the models chosen in the first iteration.
select_model_by_iteration=True,
# select_lazy: bool. If false, starts the optimizer on every column unless other restrictions apply. Else, if for the current iteration there is a trend(at least to columns of the same type got the same model from the optimizer), it reuses the same model class for all the columns without starting the optimizer.
select_lazy=True,
# select_patience: int. How many iterations without objective function improvement to wait.
select_patience=5,
)
# fit it on the data
plugin.fit(traindataSelected.copy())
# predict the missing values
predictedval = plugin.transform(traindataSelected.copy())
My train data has 1000 rows and 372 columns. When I run, I get the following error:
---> [78] predictedval = plugin.transform(traindataSelected.copy())
ValueError: Length mismatch: Expected axis has 368 elements, new values have 372 elements
Can you please let me know if I am missing something or the reason for the error? Is there a way to manually specify which columns should be considered continuous and which ones should be treated as discrete?
Even when I use mean imputer, my predicted data is 368 columns while my original data has 372 columns.
method = "mean"
plugin = Imputers().get(method)
# fit it on the data
plugin.fit(X.copy())
# predict the missing values
predictedval = plugin.transform(X.copy())
Thanks!
Metadata
Metadata
Assignees
Labels
No labels