Skip to content

Replication on Different Data #9

@sbaird91

Description

@sbaird91

I have been able to reproduce the results based on the vignette with the Lebron dataset. Everything works great.

However, the Logistic function doesn't seem to work on other datasets. I have attached a simulated data to illustrate. I used the same script as the vignette, with the one change being colnum = 1 ("outcome" column)

data_sim_logit.csv

Logistic(data = test,
colnum = 1,
numresamples = 25,
remove_VIF_greater_than <- 5.00,
save_all_trained_models = "N",
save_all_plots = "N",
set_seed = "N",
how_to_handle_strings = 0,
do_you_have_new_data = "N",
remove_ensemble_correlations_greater_than = 0.80,
use_parallel = "Y",
train_amount = 0.50,
test_amount = 0.25,
validation_amount = 0.25)

I get the following error: Error in [.default(ensemble_tree_train_table, 2, 2) subscript out of bounds

I've never received this error with the Lebron data. The basic structure of the simulated data--all variables being num or int--is similar to Lebron.

I've tried additional datasets, some based on actual observations, without any success. I usually get a similar, although not exact, error.

I ran a simple logit model to make sure there wasn't anything problematic with the data. This ran just fine:
mod <- glm(outcome~., data=test, family = 'binomial')

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions