Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
249 changes: 249 additions & 0 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -522,3 +522,252 @@ test_that("select works lazy/eager", {
```

_source: `tests/testthat/test-lazyframe-frame.R` of this repository_

## Bumping the Polars Version

R Polars pins its Polars dependency to a specific git revision hash in `src/rust/Cargo.toml`.
When a new version of Polars is released, we update this hash (and the package version) to pick up
upstream changes. The py-polars release tags (e.g. `py-1.38.1`) serve as the reference point
for what changed, since they correspond to specific Polars revisions.

### Step-by-step Workflow

#### 1. Research upstream changes

Before touching any code, compare the py-polars release notes and changelogs between
the current and target versions. Categorize changes into:

- **Rust breaking changes** — removed or renamed types, changed function signatures, removed feature flags
- **Deprecations** — parameters or methods deprecated in py-polars that should be deprecated in R Polars
- **New features** — new functions, methods, or options added in py-polars

The upstream polars CHANGELOG can be unreliable, so base your analysis on the actual code diff
when possible (e.g. comparing between the two py-polars release tags on GitHub or in a local
clone of the [polars](https://github.com/pola-rs/polars) repository).

#### 2. Update `src/rust/Cargo.toml`

Update two things in `src/rust/Cargo.toml`:

1. **The git revision hash** — Find the commit that corresponds to the target py-polars tag
and update all `rev = "..."` values.
2. **The package version** — Update the `version` field (e.g. `"1.9.0-rc.1"`).

```toml
[package]
name = "r-polars"
version = "1.9.0-rc.1" # <-- bump this

[dependencies]
polars-core = { git = "https://github.com/pola-rs/polars.git", rev = "<new-hash>", default-features = false }
polars-error = { git = "https://github.com/pola-rs/polars.git", rev = "<new-hash>", default-features = false }
```

_source: `src/rust/Cargo.toml` of this repository_

> **Important:** The version bump is critical because `build-lib-sums` (run as part of `build-all`)
> reads the version from `Cargo.toml` and updates the `Config/polars/lib-version` field in `DESCRIPTION`.
> Additionally, `build-autogenerated-polars-version-file` regenerates `R/generated-polars-version.R`.
> Without these updates, CI pre-built binary tests will fail due to version mismatches.

#### 3. Fix Rust compilation

Run:

```bash
task build-rust
```

Fix compilation errors one logical change at a time, making separate commits for each.
Common issues include renamed types, changed function signatures, and removed feature flags.

> **Tip:** Avoid Rust import shadowing. If a function has the same name as something imported
> from the prelude, use an alias (e.g. `use crate::prelude::some_fn as some_fn_alias`).

#### 4. Update R side and run tests

Run `task build-all` (not just `task build-rust`) to regenerate all auto-generated files:

```bash
task build-all
```

This runs `build-lib-sums` and `build-documents` (including `build-standalone-files` and
`build-autogenerated-files`) as dependencies, then `test-all`, and finally `build-readme`.
Alternatively, you can run the steps individually:

```bash
task build-lib-sums # update DESCRIPTION lib-version and checksum metadata
task build-documents # regenerate Rd files and auto-generated R files
task test-all # run the full test suite
task build-readme # rebuild README
```

Categorize test failures into:

- **Snapshot updates** — Changed error messages or display output. Review the diffs, then accept:
```bash
task test-snapshot-accept
```
- **Behavior changes** — Tests that fail because upstream behavior changed. These need investigation
and manual updates to test expectations.

#### 5. Handle deprecations

When py-polars deprecates a parameter or method, add corresponding deprecation warnings on the R side.

##### Deprecating a parameter

Use `deprecated()` as the default value and `is_present()` to detect usage.
These helpers are defined in `R/utils-deprecation.R` (wrapping the lifecycle package).
Here is the pattern used for the `allow_missing_columns` parameter:

```r
pl__scan_parquet <- function(
source,
...,
missing_columns = c("insert", "raise"),
allow_missing_columns = deprecated() # <-- deprecated default
) {
check_dots_empty0(...)
# snip

if (is_present(allow_missing_columns)) { # <-- guard
deprecate_warn(
c(
`!` = sprintf(
"The argument %s of %s is deprecated.",
format_arg("allow_missing_columns"),
format_fn("scan_parquet")
),
i = sprintf(
"Use the argument %s instead and pass one of %s.",
format_arg("missing_columns"),
format_code("('insert', 'raise')")
)
)
)

missing_columns <- if (allow_missing_columns) "insert" else "raise"
}

# snip
}
```

_source: `R/input-parquet-functions.R` of this repository_

Key points:

- Only "leaf" functions (e.g. `pl__scan_parquet`, `lazyframe__lazy_sink_*`) need the `is_present()` check.
Wrapper functions (e.g. `lazyframe__sink_parquet`) just pass `deprecated()` through without checking.
- When migrating a parameter into `storage_options`, use `storage_options[["key"]] <- value`
(not `c()`) to avoid creating duplicate keys.

##### Deprecating a method

Add a `lifecycle::badge("deprecated")` to the roxygen documentation, emit a warning,
and either delegate to the replacement method or return a no-op as appropriate:

```r
#' Shrink numeric columns to the minimal required datatype
#'
#' `r lifecycle::badge("deprecated")`
#' Deprecated as of polars 1.3.0 and turned into a no-op.
#' Use [`<series>$shrink_dtype`][series__shrink_dtype] instead.
#'
#' @inherit as_polars_expr return
expr__shrink_dtype <- function() {
deprecate_warn(
c(
`!` = sprintf("%s is deprecated and is a no-op.", format_code("<expr>$shrink_dtype()")),
`i` = sprintf("Use %s instead.", format_code("<series>$shrink_dtype()"))
)
)
self
}
```

_source: `R/expr-expr.R` of this repository_

##### Deprecating a class

When upstream replaces one class with another, use a deprecated constructor that delegates
to the replacement class:

```r
PartitionMaxSize <- new_class(
"PartitionMaxSize",
parent = SinkDirectory,
constructor = function(base_path, ..., max_size, per_partition_sort_by = NULL) {
check_dots_empty0(...)
deprecate_warn(
c(
`!` = format_warning(sprintf(
"%s is deprecated as of %s 1.8.0.",
format_cls("PartitionMaxSize"),
format_pkg("polars")
)),
i = format_warning(sprintf("Use %s instead.", format_cls("PartitionBy")))
)
)

new_object(
SinkDirectory(
base_path = base_path,
max_rows_per_file = max_size,
per_partition_sort_by = per_partition_sort_by
)
)
}
)
```

_source: `R/output-partition.R` of this repository_

#### 6. Update documentation

After all code changes are complete:

```bash
task build-documents
```

Then update `NEWS.md`. Use these heading conventions:

- `### New features` — new user-facing functionality
- `### Deprecations` — newly deprecated parameters, methods, or classes
- `### Bug fixes` — corrections to existing behavior
- `### Other changes` — internal changes, dependency updates, etc.

> **Important:** Do NOT use "Breaking changes" as a heading — this is reserved for the 1.0.0 release.

Link upstream pull requests like this:
`([pola-rs/polars#12345](https://github.com/pola-rs/polars/pull/12345))`

#### 7. Self-review and follow-up

Before submitting, verify:

- **R-side artifacts of Rust-side removals.** When upstream removes a Rust struct field,
check that the R side is updated too:
- S7 class properties (e.g. in `R/output-partition.R`)
- `@param` documentation in roxygen comments
- Deprecated class constructor arguments
- Rust conversion code in `src/rust/src/conversion/s7.rs`
- **All py-polars changes are addressed.** Create follow-up tasks for items deferred to
separate PRs, such as new features that need their own design and testing.

### CI Troubleshooting

Some CI issues that commonly arise during version bumps:

- **Pre-built binary failures:** These are expected until new binaries are published at release time.
Not actionable during the bump PR.
- **`usethis::use_standalone()` side effects:** The `build-standalone-files` task calls
`usethis::use_standalone()`, which may regress the rlang version requirement in `DESCRIPTION`.
Always check the `DESCRIPTION` diff after running `task build-documents` or `task build-all`.
- **Rust import shadowing:** If a function has the same name as something imported via
`use crate::prelude::*`, the compiler may emit confusing errors. Use an alias to resolve.
- **WASM build:** May fail for upstream reasons unrelated to the bump. Check upstream issues
before spending time debugging.