Skip to content

Hyperimpute length mismatch #41

@preritt

Description

@preritt

Question

Length mismatch error

Further Information

I am trying to use hyperimpute on my custom data. I am using the following setup:

method = "hyperimpute"
plugin = Imputers().get(method,
                        optimizer = "hyperband",
                           classifier_seed=["logistic_regression", "catboost", "xgboost", "random_forest"],
                            regression_seed=[
                                "linear_regression",
                                "catboost_regressor",
                                "xgboost_regressor",
                                "random_forest_regressor",
                            ], 
                                # class_threshold: int. how many max unique items must be in the column to be is associated with categorical
                            class_threshold=5,
                            # imputation_order: int. 0 - ascending, 1 - descending, 2 - random
                            imputation_order=2,
                            # n_inner_iter: int. number of imputation iterations
                            n_inner_iter=10,
                            # select_model_by_column: bool. If true, select a different model for each column. Else, it reuses the model chosen for the first column.
                            select_model_by_column=True,
                            # select_model_by_iteration: bool. If true, selects new models for each iteration. Else, it reuses the models chosen in the first iteration.
                            select_model_by_iteration=True,
                            # select_lazy: bool. If false, starts the optimizer on every column unless other restrictions apply. Else, if for the current iteration there is a trend(at least to columns of the same type got the same model from the optimizer), it reuses the same model class for all the columns without starting the optimizer.
                            select_lazy=True,
                            # select_patience: int. How many iterations without objective function improvement to wait.
                            select_patience=5,
                            )
# fit it on the data
plugin.fit(traindataSelected.copy())
# predict the missing values
predictedval = plugin.transform(traindataSelected.copy())

My train data has 1000 rows and 372 columns. When I run, I get the following error:

---> [78] predictedval = plugin.transform(traindataSelected.copy())

ValueError: Length mismatch: Expected axis has 368 elements, new values have 372 elements

Can you please let me know if I am missing something or the reason for the error? Is there a way to manually specify which columns should be considered continuous and which ones should be treated as discrete?

Even when I use mean imputer, my predicted data is 368 columns while my original data has 372 columns.

method = "mean"
plugin = Imputers().get(method)
# fit it on the data
plugin.fit(X.copy())
# predict the missing values
predictedval = plugin.transform(X.copy())

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions