Skip to content

Feature request: allow initial in workflow_map() to take as input an entire workflow set object #172

@MatthieuStigler

Description

@MatthieuStigler

A great feature of the tune_bayes() and tune_sim_anneal() is the initial argument that can be the output from a previous tune_grid() call.

Unfortunately, this feature becomes quite cumbersome for workflow_set objects? This stack post shows that passing workflow_map(initial =tune_grid) won't work, and the suggested answer is a rather verbose and manual recursive call:

workflow_set() %>% 
 ## add for one model
  option_add(
    id = "recipe_lasso",
    initial = tune_grid_res %>% extract_workflow_set_result("recipe_lasso")
  ) %>% 
  ## repeat for every model....
  option_add(..) %>% 
  option_add(..) %>% 
  workflow_map()

this is also slightly counter-intuitive, as the option_add() is added to the initial workflow_set, not to the workflow_map() call?

Would it be possible to have an explicit argument initial in workflow_map() that would possibly take as input a previous workflow_set and.... map it to the underlying modelling function?

Thanks!

Below is an example, together with a manual solution function option_add_initial() which is quite... untidy!

library(tidymodels)

# Load and prepare data
ames_data <- ames[,sapply(ames, class) %in% c("integer", "numeric")]

# Define a recipe
recipe <- recipe(Sale_Price ~ ., data = ames_data) %>%
  step_normalize(all_predictors())

# Define models
lasso_model <- linear_reg(penalty = tune(), mixture = 1) %>%
  set_engine("glmnet")

rf_model <- rand_forest(min_n = tune(), trees = 100) %>%
  set_engine("ranger") %>%
  set_mode("regression")

# Create workflows
lasso_wf <- workflow() %>%
  add_model(lasso_model) %>%
  add_recipe(recipe)

rf_wf <- workflow() %>%
  add_model(rf_model) %>%
  add_recipe(recipe)

cross_val <- vfold_cv(ames_data, v = 5)

tune_grid_res <- 
  workflow_set(
    preproc = list(recipe),
    models = list(lasso = lasso_model, rf = rf_model)) %>%
  workflow_map("tune_grid", resamples = cross_val, grid = 10)


option_add_initial <- function(workflow_set, workflow_with_grid_results){

  res <- workflow_set
  for(i in 1:nrow(workflow_with_grid_results)){
    id_i <- workflow_with_grid_results$wflow_id[[i]]
    res <- res %>% 
      option_add(
        id = id_i,
        initial = workflow_with_grid_results %>% extract_workflow_set_result(id_i)
      ) 
  }
  res
}

tune_bayes_res <- workflow_set(
  preproc = list(recipe),
  models = list(lasso = lasso_model, rf = rf_model)
  ) %>% 
  option_add_initial(workflow_with_grid_results = tune_grid_res) %>% 
  workflow_map("tune_bayes", resamples = cross_val)
#> ! All of the rmse values were identical. The Gaussian process model cannot be
#>   fit to the data. Try expanding the range of the tuning parameters.
#> → A | error:   Infinite values of the Deviance Function, 
#>                            unable to find optimum parameters 
#> 
#> There were issues with some computations   A: x1
#> ✖ Optimization stopped prematurely; returning current results.
#> There were issues with some computations   A: x1There were issues with some computations   A: x1

tune_bayes_res
#> # A workflow set/tibble: 2 × 4
#>   wflow_id     info             option    result   
#>   <chr>        <list>           <list>    <list>   
#> 1 recipe_lasso <tibble [1 × 4]> <opts[2]> <tune[+]>
#> 2 recipe_rf    <tibble [1 × 4]> <opts[2]> <tune[+]>

Created on 2025-03-21 with reprex v2.1.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions