R: setting weights_column = NULL causes unwanted variables to be used as predictors

Setting "weights_column = NULL" explicitly when calling h2o.gbm (or h2o.randomForest) causes unwanted variables to be used as predictors.  The following example illustrates this issue using the prostate data that comes with the h2o R package.

library(h2o)
h2o.init()
pros_path <- system.file("extdata", "prostate.csv", package="h2o")
pros_data <- h2o.importFile(pros_path)
  
pros_data$CAPSULE <- as.factor(pros_data$CAPSULE)

# WITHOUT setting "weights_column = NULL" explicitly in h2o.gbm, things work fine.
model <- h2o.gbm(x = c(3:9), y = 2, pros_data, distribution = "bernoulli")
  
h2o.varimp(model)
#Variable Importances:
#  variable relative_importance scaled_importance percentage
#1  GLEASON          112.808281          1.000000   0.352302
#2      PSA           70.384674          0.623932   0.219813
#3      VOL           51.909214          0.460154   0.162113
#4    DPROS           42.125877          0.373429   0.131560
#5      AGE           36.616043          0.324586   0.114353
#6     RACE            3.215139          0.028501   0.010041
#7    DCAPS            3.143716          0.027868   0.009818

# Setting "weights_column = NULL" explicitly in h2o.gbm causes ID (not selected # initially) to be used as a predictor.
model <- h2o.gbm(x = c(3:9), y = 2, pros_data, distribution = "bernoulli", weights_column = NULL)
  
h2o.varimp(model)
#Variable Importances:
#  variable relative_importance scaled_importance percentage
#1  GLEASON          114.578041          1.000000   0.331926
#2       ID           55.921295          0.488063   0.162001
#3      PSA           54.399891          0.474785   0.157593
#4      VOL           39.352005          0.343452   0.114000
#5    DPROS           37.653904          0.328631   0.109081
#6      AGE           34.698425          0.302837   0.100519
#7    DCAPS            6.719040          0.058642   0.019465
#8     RACE            1.869231          0.016314   0.005415


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R: setting weights_column = NULL causes unwanted variables to be used as predictors #14936

WITHOUT setting "weights_column = NULL" explicitly in h2o.gbm, things work fine.

variable relative_importance scaled_importance percentage

Setting "weights_column = NULL" explicitly in h2o.gbm causes ID (not selected # initially) to be used as a predictor.

variable relative_importance scaled_importance percentage

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

R: setting weights_column = NULL causes unwanted variables to be used as predictors #14936

Description

WITHOUT setting "weights_column = NULL" explicitly in h2o.gbm, things work fine.

variable relative_importance scaled_importance percentage

Setting "weights_column = NULL" explicitly in h2o.gbm causes ID (not selected # initially) to be used as a predictor.

variable relative_importance scaled_importance percentage

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions