Skip to content

Conversation

dajmcdon
Copy link
Contributor

@dajmcdon dajmcdon commented Sep 3, 2024

@topepo Here's a PR for edits to the framework

  • Added the grf engine
  • Converted the quantile outputs to a vctr type with the quantile_levels as an attribute (this will allow easier creation of downstream methods since it has a class).
library(parsnip) 
library(quantreg)
#> Loading required package: SparseM

qr_spec3 <- 
  linear_reg() %>% 
  set_engine("quantreg") %>% 
  set_mode("quantile regression", quantile_level = c(.1, .5, .9))

library(modeldata)
qr_fit3 <- 
  qr_spec3 %>% fit(price ~ sqft, data = Sacramento)
qr_fit3
#> parsnip model object
#> 
#> Call:
#> quantreg::rq(formula = price ~ sqft, tau = quantile_level, data = data)
#> 
#> Coefficients:
#>                tau= 0.1  tau= 0.5   tau= 0.9
#> (Intercept) -21815.3846 5946.2931 29181.5323
#> sqft           107.6923  136.0187   190.4032
#> 
#> Degrees of freedom: 932 total; 930 residual

predict(qr_fit3, new_data = Sacramento[1:3, ])
#> # A tibble: 3 × 1
#>             .pred_quantile
#>                 <vctrs_qn>
#> 1  [68215.385, 188358.589]
#> 2 [103861.538, 251382.041]
#> 3  [63907.692, 180742.462]

# Just 1 quantile
qr_spec1 <- linear_reg() %>% 
  set_engine("quantreg") %>% 
  set_mode("quantile regression", quantile_level = .2)

qr_fit1 <- qr_spec1 %>% fit(price ~ sqft, data = Sacramento)
qr_fit1
#> parsnip model object
#> 
#> Call:
#> quantreg::rq(formula = price ~ sqft, tau = quantile_level, data = data)
#> 
#> Coefficients:
#> (Intercept)        sqft 
#> -10794.5696    113.1899 
#> 
#> Degrees of freedom: 932 total; 930 residual
predict(qr_fit1, new_data = Sacramento[1:3, ])
#> # A tibble: 3 × 1
#>             .pred_quantile
#>                 <vctrs_qn>
#> 1   [83832.165, 83832.165]
#> 2 [121298.013, 121298.013]
#> 3     [79304.57, 79304.57]


# grf random forests engine -----------------------------------------------


library(grf)
rand_forest(mtry = integer(1), trees = integer(1), min_n = integer(1)) %>%
  set_engine("grf") %>%
  set_mode("quantile regression", quantile_level = c(.1, .5, .9)) %>%
  translate()
#> Random Forest Model Specification (quantile regression)
#> 
#> Main Arguments:
#>   mtry = integer(1)
#>   trees = integer(1)
#>   min_n = integer(1)
#> 
#> Computational engine: grf 
#> 
#> Model fit template:
#> grf::quantile_forest(x = missing_arg(), y = missing_arg(), mtry = min_cols(~integer(1), 
#>     x), num.trees = integer(1), min.node.size = min_rows(~integer(1), 
#>     x), quantiles = quantile_level, num.threads = 1L, seed = runif(1, 
#>     0, .Machine$integer.max))
#> Quantile levels: 0.1, 0.5, and 0.9.

quant_rf <- rand_forest(engine = "grf") %>%
  set_mode("quantile regression", quantile_level = c(.1, .5, .9))
out <- fit(quant_rf, formula = price ~ sqft + type, data = Sacramento)
predict(out, new_data = Sacramento[1:3, ])
#> # A tibble: 3 × 1
#>    .pred_quantile
#>        <vctrs_qn>
#> 1 [61000, 207973]
#> 2 [85000, 249862]
#> 3 [65000, 207973]


# -- standard regression with grf
reg_rf <- rand_forest(mode = "regression", engine = "grf")
out <- fit(reg_rf, formula = price ~ sqft + type, data = Sacramento)
predict(out, new_data = Sacramento[1:3, ])
#> # A tibble: 3 × 1
#>     .pred
#>     <dbl>
#> 1 124396.
#> 2 169566.
#> 3 134514.

# -- classification
class_rf <- rand_forest(mode = "classification", engine = "grf")
out <- fit(class_rf, formula = type ~ sqft + price, data = Sacramento)
predict(out, new_data = Sacramento[1:3, ], type = "class")
#> # A tibble: 3 × 1
#>   .pred_class
#>   <fct>      
#> 1 Residential
#> 2 Residential
#> 3 Residential
predict(out, new_data = Sacramento[1:3, ], type = "prob")
#> # A tibble: 3 × 3
#>   .pred_Condo .pred_Multi_Family .pred_Residential
#>         <dbl>              <dbl>             <dbl>
#> 1      0.255             0.00168             0.743
#> 2      0.0584            0.0108              0.931
#> 3      0.281             0.00163             0.718

Created on 2024-09-03 with reprex v2.1.1

@topepo
Copy link
Member

topepo commented Sep 5, 2024

I'm looking at this and I think that I'd rather have a PR that just has the new vctrs class in it. There is a lot going on in this PR.

Also, the input being one of data.frame/matrix/vector of values really complicates the internals. Can we make it input a vector of predictions and then use lapply(), c(), or similar to create a single vec_quantiles object?

@dajmcdon
Copy link
Contributor Author

dajmcdon commented Sep 5, 2024

No problem. I thought about 2 separate PRs but got lazy. I'll split these.

@dajmcdon dajmcdon closed this Sep 9, 2024
@github-actions
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Sep 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants