diff --git a/DEVELOPMENT.md b/DEVELOPMENT.md index 7fb3c15b5..42a07ef06 100644 --- a/DEVELOPMENT.md +++ b/DEVELOPMENT.md @@ -522,3 +522,252 @@ test_that("select works lazy/eager", { ``` _source: `tests/testthat/test-lazyframe-frame.R` of this repository_ + +## Bumping the Polars Version + +R Polars pins its Polars dependency to a specific git revision hash in `src/rust/Cargo.toml`. +When a new version of Polars is released, we update this hash (and the package version) to pick up +upstream changes. The py-polars release tags (e.g. `py-1.38.1`) serve as the reference point +for what changed, since they correspond to specific Polars revisions. + +### Step-by-step Workflow + +#### 1. Research upstream changes + +Before touching any code, compare the py-polars release notes and changelogs between +the current and target versions. Categorize changes into: + +- **Rust breaking changes** — removed or renamed types, changed function signatures, removed feature flags +- **Deprecations** — parameters or methods deprecated in py-polars that should be deprecated in R Polars +- **New features** — new functions, methods, or options added in py-polars + +The upstream polars CHANGELOG can be unreliable, so base your analysis on the actual code diff +when possible (e.g. comparing between the two py-polars release tags on GitHub or in a local +clone of the [polars](https://github.com/pola-rs/polars) repository). + +#### 2. Update `src/rust/Cargo.toml` + +Update two things in `src/rust/Cargo.toml`: + +1. **The git revision hash** — Find the commit that corresponds to the target py-polars tag + and update all `rev = "..."` values. +2. **The package version** — Update the `version` field (e.g. `"1.9.0-rc.1"`). + +```toml +[package] +name = "r-polars" +version = "1.9.0-rc.1" # <-- bump this + +[dependencies] +polars-core = { git = "https://github.com/pola-rs/polars.git", rev = "", default-features = false } +polars-error = { git = "https://github.com/pola-rs/polars.git", rev = "", default-features = false } +``` + +_source: `src/rust/Cargo.toml` of this repository_ + +> **Important:** The version bump is critical because `build-lib-sums` (run as part of `build-all`) +> reads the version from `Cargo.toml` and updates the `Config/polars/lib-version` field in `DESCRIPTION`. +> Additionally, `build-autogenerated-polars-version-file` regenerates `R/generated-polars-version.R`. +> Without these updates, CI pre-built binary tests will fail due to version mismatches. + +#### 3. Fix Rust compilation + +Run: + +```bash +task build-rust +``` + +Fix compilation errors one logical change at a time, making separate commits for each. +Common issues include renamed types, changed function signatures, and removed feature flags. + +> **Tip:** Avoid Rust import shadowing. If a function has the same name as something imported +> from the prelude, use an alias (e.g. `use crate::prelude::some_fn as some_fn_alias`). + +#### 4. Update R side and run tests + +Run `task build-all` (not just `task build-rust`) to regenerate all auto-generated files: + +```bash +task build-all +``` + +This runs `build-lib-sums` and `build-documents` (including `build-standalone-files` and +`build-autogenerated-files`) as dependencies, then `test-all`, and finally `build-readme`. +Alternatively, you can run the steps individually: + +```bash +task build-lib-sums # update DESCRIPTION lib-version and checksum metadata +task build-documents # regenerate Rd files and auto-generated R files +task test-all # run the full test suite +task build-readme # rebuild README +``` + +Categorize test failures into: + +- **Snapshot updates** — Changed error messages or display output. Review the diffs, then accept: + ```bash + task test-snapshot-accept + ``` +- **Behavior changes** — Tests that fail because upstream behavior changed. These need investigation + and manual updates to test expectations. + +#### 5. Handle deprecations + +When py-polars deprecates a parameter or method, add corresponding deprecation warnings on the R side. + +##### Deprecating a parameter + +Use `deprecated()` as the default value and `is_present()` to detect usage. +These helpers are defined in `R/utils-deprecation.R` (wrapping the lifecycle package). +Here is the pattern used for the `allow_missing_columns` parameter: + +```r +pl__scan_parquet <- function( + source, + ..., + missing_columns = c("insert", "raise"), + allow_missing_columns = deprecated() # <-- deprecated default +) { + check_dots_empty0(...) + # snip + + if (is_present(allow_missing_columns)) { # <-- guard + deprecate_warn( + c( + `!` = sprintf( + "The argument %s of %s is deprecated.", + format_arg("allow_missing_columns"), + format_fn("scan_parquet") + ), + i = sprintf( + "Use the argument %s instead and pass one of %s.", + format_arg("missing_columns"), + format_code("('insert', 'raise')") + ) + ) + ) + + missing_columns <- if (allow_missing_columns) "insert" else "raise" + } + + # snip +} +``` + +_source: `R/input-parquet-functions.R` of this repository_ + +Key points: + +- Only "leaf" functions (e.g. `pl__scan_parquet`, `lazyframe__lazy_sink_*`) need the `is_present()` check. + Wrapper functions (e.g. `lazyframe__sink_parquet`) just pass `deprecated()` through without checking. +- When migrating a parameter into `storage_options`, use `storage_options[["key"]] <- value` + (not `c()`) to avoid creating duplicate keys. + +##### Deprecating a method + +Add a `lifecycle::badge("deprecated")` to the roxygen documentation, emit a warning, +and either delegate to the replacement method or return a no-op as appropriate: + +```r +#' Shrink numeric columns to the minimal required datatype +#' +#' `r lifecycle::badge("deprecated")` +#' Deprecated as of polars 1.3.0 and turned into a no-op. +#' Use [`$shrink_dtype`][series__shrink_dtype] instead. +#' +#' @inherit as_polars_expr return +expr__shrink_dtype <- function() { + deprecate_warn( + c( + `!` = sprintf("%s is deprecated and is a no-op.", format_code("$shrink_dtype()")), + `i` = sprintf("Use %s instead.", format_code("$shrink_dtype()")) + ) + ) + self +} +``` + +_source: `R/expr-expr.R` of this repository_ + +##### Deprecating a class + +When upstream replaces one class with another, use a deprecated constructor that delegates +to the replacement class: + +```r +PartitionMaxSize <- new_class( + "PartitionMaxSize", + parent = SinkDirectory, + constructor = function(base_path, ..., max_size, per_partition_sort_by = NULL) { + check_dots_empty0(...) + deprecate_warn( + c( + `!` = format_warning(sprintf( + "%s is deprecated as of %s 1.8.0.", + format_cls("PartitionMaxSize"), + format_pkg("polars") + )), + i = format_warning(sprintf("Use %s instead.", format_cls("PartitionBy"))) + ) + ) + + new_object( + SinkDirectory( + base_path = base_path, + max_rows_per_file = max_size, + per_partition_sort_by = per_partition_sort_by + ) + ) + } +) +``` + +_source: `R/output-partition.R` of this repository_ + +#### 6. Update documentation + +After all code changes are complete: + +```bash +task build-documents +``` + +Then update `NEWS.md`. Use these heading conventions: + +- `### New features` — new user-facing functionality +- `### Deprecations` — newly deprecated parameters, methods, or classes +- `### Bug fixes` — corrections to existing behavior +- `### Other changes` — internal changes, dependency updates, etc. + +> **Important:** Do NOT use "Breaking changes" as a heading — this is reserved for the 1.0.0 release. + +Link upstream pull requests like this: +`([pola-rs/polars#12345](https://github.com/pola-rs/polars/pull/12345))` + +#### 7. Self-review and follow-up + +Before submitting, verify: + +- **R-side artifacts of Rust-side removals.** When upstream removes a Rust struct field, + check that the R side is updated too: + - S7 class properties (e.g. in `R/output-partition.R`) + - `@param` documentation in roxygen comments + - Deprecated class constructor arguments + - Rust conversion code in `src/rust/src/conversion/s7.rs` +- **All py-polars changes are addressed.** Create follow-up tasks for items deferred to + separate PRs, such as new features that need their own design and testing. + +### CI Troubleshooting + +Some CI issues that commonly arise during version bumps: + +- **Pre-built binary failures:** These are expected until new binaries are published at release time. + Not actionable during the bump PR. +- **`usethis::use_standalone()` side effects:** The `build-standalone-files` task calls + `usethis::use_standalone()`, which may regress the rlang version requirement in `DESCRIPTION`. + Always check the `DESCRIPTION` diff after running `task build-documents` or `task build-all`. +- **Rust import shadowing:** If a function has the same name as something imported via + `use crate::prelude::*`, the compiler may emit confusing errors. Use an alias to resolve. +- **WASM build:** May fail for upstream reasons unrelated to the bump. Check upstream issues + before spending time debugging.