Add a parsnip Supported Argument for Monotone Constraints on the XGBoost Engine

Hello there!

I work in a financial services and monotone constraints are particularly important for interpretability in our predictive models. Fortunately, the `xgboost` framework support monotone constraints for numeric predictors. My team heavily uses `tidymodels` for model research. While `parsnip` supports additional arguments to the computational engine via `...` in `parsnip::set_engine()`, the `xgboost` engine requires a vector of values where `-1` corresponds to a negative constraint, `0` corresponds to no constraint, and `1` corresponds to a positive constraint.

This gets a little tricky when working with `recipes` because we typically think of predictors by their column name, not their position in the design matrix. I have found a workaround where I first prep the recipe, extract the predictors, and use pattern matching to create this vector. I then have to pass it to `parsnip::set_engine()` using `{{ }}` so it injects the vector and allows the workflow to be used in cross validation and other tuning exercises.

```r
#### Setup ####

library(tidyverse)
library(tidymodels)

#### Model ####

xgb_rec <-
  recipes::recipe(
    disp ~ mpg + hp + qsec,
    data = mtcars
  )

# Create vector of monotone constraints
# First, prep the recipe and pull predictors
# to simulate design matrix. Then, use term
# names to constrain positive or negative
# and snag the vector. This is a little janky.
monotone <-
  xgb_rec |>
  recipes::prep() |>
  purrr::pluck("term_info") |>
  dplyr::filter(
    role == "predictor"
  ) |>
  dplyr::mutate(
    monotone = dplyr::case_match(
      variable,
      "mpg" ~ -1, # Negative constraint
      "hp" ~ 1, # Positive constraint
      .default = 0 # No constraint
    )
  ) |>
  dplyr::pull(monotone)

xgb_wf <-
  parsnip::boost_tree(
    mode = "regression"
  ) |>
  parsnip::set_engine(
    engine = "xgboost",
    monotone_constraints = {{ monotone }}
  ) |>
  workflows::workflow(
    preprocessor = xgb_rec,
    spec = _
  )

xgb_fit <- parsnip::fit(xgb_wf, mtcars)

#### Validate HP Constraint ####

# Create synthetic data where we ensure
# a positive, monotonic, relationship
# between HP and the predicted output
# holding different MPG values constant.
synthetic <-
  mtcars |>
  dplyr::slice_head(
    n = 1
  ) |>
  dplyr::select(
    - c(hp, mpg)
  ) |>
  tidyr::expand_grid(
    mpg = c(10, 20, 30),
    hp = seq(50, 235)
  )

# Looks good!
synthetic |>
  parsnip::augment(
    x = xgb_fit
  ) |>
  ggplot2::ggplot(
    ggplot2::aes(
      x = hp,
      y = .pred,
      color = factor(mpg)
    )
  ) +
  ggplot2::geom_line()
```

![Image](https://github.com/user-attachments/assets/69bd57db-bdaf-4d6f-81f1-e09a021fb9ee)

My proposal is to provide more official support for monotone constraints via the `parsnip::boost_tree()` function so that we can tune our models with and without these constraints and refer to the predictors by their name in the prepped recipe rather than relying on a solution like I showed above. Maybe it would look something like this:

```r
# Explicit
parsnip::boost_tree(
  mode = "regression",
  engine = "xgboost",
  monotone_positive = "hp",
  monotone_negative = "mpg"
)

# For use later in tune grid
parsnip::boost_tree(
  mode = "regression",
  engine = "xgboost",
  monotone_positive = tune::tune()
)
```
I am also interested in contributing to open source! I have experience with one of my own packages on CRAN and develop R packages internal to my organization, so I am happy to make an attempt at developing this change, if folks are open to it.

Let me know your thoughts!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a parsnip Supported Argument for Monotone Constraints on the XGBoost Engine #1259

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add a parsnip Supported Argument for Monotone Constraints on the XGBoost Engine #1259

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions