Skip to content

mold.data.frame crashes with logical outcomes #289

@tripartio

Description

@tripartio

Hello. For some strange reason, on hardhat version 1.4.1, mold() crashes when I use the x-y format with a logical outcome variable, but not in other cases. In the example below, you can see that mold.data.frame() works fine for a factor outcome. Yet mold.data.frame() crashes on a logical outcome even though mold.formula() works fine for the exact same dataset.

library(hardhat)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# Create tibble version of iris dataset
iris <- tibble::as_tibble(iris)
iris 
#> # A tibble: 150 × 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ℹ 140 more rows

# Factor outcome: no problem with mold.data.frame
mold(
  x = iris |> select(-Species),
  y = iris['Species']
)
#> $predictors
#> # A tibble: 150 × 4
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width
#>           <dbl>       <dbl>        <dbl>       <dbl>
#>  1          5.1         3.5          1.4         0.2
#>  2          4.9         3            1.4         0.2
#>  3          4.7         3.2          1.3         0.2
#>  4          4.6         3.1          1.5         0.2
#>  5          5           3.6          1.4         0.2
#>  6          5.4         3.9          1.7         0.4
#>  7          4.6         3.4          1.4         0.3
#>  8          5           3.4          1.5         0.2
#>  9          4.4         2.9          1.4         0.2
#> 10          4.9         3.1          1.5         0.1
#> # ℹ 140 more rows
#> 
#> $outcomes
#> # A tibble: 150 × 1
#>    Species
#>    <fct>  
#>  1 setosa 
#>  2 setosa 
#>  3 setosa 
#>  4 setosa 
#>  5 setosa 
#>  6 setosa 
#>  7 setosa 
#>  8 setosa 
#>  9 setosa 
#> 10 setosa 
#> # ℹ 140 more rows
#> 
#> $blueprint
#> XY blueprint:
#> # Predictors: 4
#> # Outcomes: 1
#> Intercept: FALSE
#> Novel Levels: FALSE
#> Composition: tibble
#> 
#> 
#> $extras
#> NULL

# Create dataset with logical outcome: Setosa
setosa <- iris |> 
  mutate(Species = Species == 'setosa') |> 
  rename(Setosa = Species)
setosa
#> # A tibble: 150 × 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Setosa
#>           <dbl>       <dbl>        <dbl>       <dbl> <lgl> 
#>  1          5.1         3.5          1.4         0.2 TRUE  
#>  2          4.9         3            1.4         0.2 TRUE  
#>  3          4.7         3.2          1.3         0.2 TRUE  
#>  4          4.6         3.1          1.5         0.2 TRUE  
#>  5          5           3.6          1.4         0.2 TRUE  
#>  6          5.4         3.9          1.7         0.4 TRUE  
#>  7          4.6         3.4          1.4         0.3 TRUE  
#>  8          5           3.4          1.5         0.2 TRUE  
#>  9          4.4         2.9          1.4         0.2 TRUE  
#> 10          4.9         3.1          1.5         0.1 TRUE  
#> # ℹ 140 more rows

# !!! Logical outcome crashes mold.data.frame !!!
mold(
  x = setosa |> select(-Setosa),
  y = setosa['Setosa']
)
#> Error in `standardize()`:
#> ! Not all columns of `y` are known outcome types.
#> ℹ This column has an unknown type: "Setosa".

# Yet no problem with logical outcome on same dataset with mold.formula
mold(
  formula = Setosa ~ .,
  data = setosa
)
#> $predictors
#> # A tibble: 150 × 4
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width
#>           <dbl>       <dbl>        <dbl>       <dbl>
#>  1          5.1         3.5          1.4         0.2
#>  2          4.9         3            1.4         0.2
#>  3          4.7         3.2          1.3         0.2
#>  4          4.6         3.1          1.5         0.2
#>  5          5           3.6          1.4         0.2
#>  6          5.4         3.9          1.7         0.4
#>  7          4.6         3.4          1.4         0.3
#>  8          5           3.4          1.5         0.2
#>  9          4.4         2.9          1.4         0.2
#> 10          4.9         3.1          1.5         0.1
#> # ℹ 140 more rows
#> 
#> $outcomes
#> # A tibble: 150 × 1
#>    Setosa
#>    <lgl> 
#>  1 TRUE  
#>  2 TRUE  
#>  3 TRUE  
#>  4 TRUE  
#>  5 TRUE  
#>  6 TRUE  
#>  7 TRUE  
#>  8 TRUE  
#>  9 TRUE  
#> 10 TRUE  
#> # ℹ 140 more rows
#> 
#> $blueprint
#> Formula blueprint:
#> # Predictors: 4
#> # Outcomes: 1
#> Intercept: FALSE
#> Novel Levels: FALSE
#> Composition: tibble
#> Indicators: traditional
#> 
#> 
#> $extras
#> $extras$offset
#> NULL

Created on 2025-05-23 with reprex v2.1.1

This looks like a bug. Could someone please help me with it?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions