-
Notifications
You must be signed in to change notification settings - Fork 20
Closed
Description
Hello. For some strange reason, on hardhat version 1.4.1, mold() crashes when I use the x-y format with a logical outcome variable, but not in other cases. In the example below, you can see that mold.data.frame() works fine for a factor outcome. Yet mold.data.frame() crashes on a logical outcome even though mold.formula() works fine for the exact same dataset.
library(hardhat)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# Create tibble version of iris dataset
iris <- tibble::as_tibble(iris)
iris
#> # A tibble: 150 × 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ℹ 140 more rows
# Factor outcome: no problem with mold.data.frame
mold(
x = iris |> select(-Species),
y = iris['Species']
)
#> $predictors
#> # A tibble: 150 × 4
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3 1.4 0.2
#> 3 4.7 3.2 1.3 0.2
#> 4 4.6 3.1 1.5 0.2
#> 5 5 3.6 1.4 0.2
#> 6 5.4 3.9 1.7 0.4
#> 7 4.6 3.4 1.4 0.3
#> 8 5 3.4 1.5 0.2
#> 9 4.4 2.9 1.4 0.2
#> 10 4.9 3.1 1.5 0.1
#> # ℹ 140 more rows
#>
#> $outcomes
#> # A tibble: 150 × 1
#> Species
#> <fct>
#> 1 setosa
#> 2 setosa
#> 3 setosa
#> 4 setosa
#> 5 setosa
#> 6 setosa
#> 7 setosa
#> 8 setosa
#> 9 setosa
#> 10 setosa
#> # ℹ 140 more rows
#>
#> $blueprint
#> XY blueprint:
#> # Predictors: 4
#> # Outcomes: 1
#> Intercept: FALSE
#> Novel Levels: FALSE
#> Composition: tibble
#>
#>
#> $extras
#> NULL
# Create dataset with logical outcome: Setosa
setosa <- iris |>
mutate(Species = Species == 'setosa') |>
rename(Setosa = Species)
setosa
#> # A tibble: 150 × 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Setosa
#> <dbl> <dbl> <dbl> <dbl> <lgl>
#> 1 5.1 3.5 1.4 0.2 TRUE
#> 2 4.9 3 1.4 0.2 TRUE
#> 3 4.7 3.2 1.3 0.2 TRUE
#> 4 4.6 3.1 1.5 0.2 TRUE
#> 5 5 3.6 1.4 0.2 TRUE
#> 6 5.4 3.9 1.7 0.4 TRUE
#> 7 4.6 3.4 1.4 0.3 TRUE
#> 8 5 3.4 1.5 0.2 TRUE
#> 9 4.4 2.9 1.4 0.2 TRUE
#> 10 4.9 3.1 1.5 0.1 TRUE
#> # ℹ 140 more rows
# !!! Logical outcome crashes mold.data.frame !!!
mold(
x = setosa |> select(-Setosa),
y = setosa['Setosa']
)
#> Error in `standardize()`:
#> ! Not all columns of `y` are known outcome types.
#> ℹ This column has an unknown type: "Setosa".
# Yet no problem with logical outcome on same dataset with mold.formula
mold(
formula = Setosa ~ .,
data = setosa
)
#> $predictors
#> # A tibble: 150 × 4
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3 1.4 0.2
#> 3 4.7 3.2 1.3 0.2
#> 4 4.6 3.1 1.5 0.2
#> 5 5 3.6 1.4 0.2
#> 6 5.4 3.9 1.7 0.4
#> 7 4.6 3.4 1.4 0.3
#> 8 5 3.4 1.5 0.2
#> 9 4.4 2.9 1.4 0.2
#> 10 4.9 3.1 1.5 0.1
#> # ℹ 140 more rows
#>
#> $outcomes
#> # A tibble: 150 × 1
#> Setosa
#> <lgl>
#> 1 TRUE
#> 2 TRUE
#> 3 TRUE
#> 4 TRUE
#> 5 TRUE
#> 6 TRUE
#> 7 TRUE
#> 8 TRUE
#> 9 TRUE
#> 10 TRUE
#> # ℹ 140 more rows
#>
#> $blueprint
#> Formula blueprint:
#> # Predictors: 4
#> # Outcomes: 1
#> Intercept: FALSE
#> Novel Levels: FALSE
#> Composition: tibble
#> Indicators: traditional
#>
#>
#> $extras
#> $extras$offset
#> NULLCreated on 2025-05-23 with reprex v2.1.1
This looks like a bug. Could someone please help me with it?
Metadata
Metadata
Assignees
Labels
No labels