Description
col_vals_not_null and col_vals_le on a Polars LazyFrame slowed down ~7-11× between
0.24.0 and 0.25.0. The regression is largest when the column has no failures (the
typical pass case), and scales linearly with row count.
Disclaimer: I used AI to format this issue and its associated MRE.
Reproducible example
import time
import polars as pl
import pointblank as pb
N = 2_000_000
lf = pl.DataFrame(
{
"pk": pl.int_range(0, N, eager=True),
"val": pl.int_range(0, N, eager=True),
}
).lazy()
def t(label, fn):
fn() # warm-up
t0 = time.perf_counter()
fn()
print(f" {label:20s} {time.perf_counter() - t0:.3f}s")
print(f"pointblank={pb.__version__} polars={pl.__version__} rows={N:,}")
t("col_vals_not_null", lambda: pb.Validate(data=lf).col_vals_not_null(columns="pk").interrogate())
t("col_vals_le", lambda: pb.Validate(data=lf).col_vals_le(columns="val", value=N).interrogate())
t("rows_distinct", lambda: pb.Validate(data=lf).rows_distinct(columns_subset=["pk"]).interrogate())
Run with:
uv run --with 'pointblank==0.24.0' --with 'polars==1.41.2' mre.py
uv run --with 'pointblank==0.25.0' --with 'polars==1.41.2' mre.py
Result
Median of 3 iters on a 2M-row, ~46-column LazyFrame:
| check |
0.24.0 |
0.25.0 |
slowdown |
col_vals_not_null ×3 |
0.007 s |
0.077 s |
11× |
col_vals_le (date ≤ today) |
0.004 s |
0.031 s |
8× |
rows_distinct (composite PK) |
0.104 s |
0.142 s |
1.4× |
col_schema_match |
0.017 s |
0.016 s |
1.0× |
Development environment
Tested on polars==1.41.2, Python 3.12, macOS arm64.
Additional context
Add any other context about the problem here.
Description
col_vals_not_nullandcol_vals_leon a Polars LazyFrame slowed down ~7-11× between0.24.0 and 0.25.0. The regression is largest when the column has no failures (the
typical pass case), and scales linearly with row count.
Disclaimer: I used AI to format this issue and its associated MRE.
Reproducible example
Run with:
Result
Median of 3 iters on a 2M-row, ~46-column LazyFrame:
col_vals_not_null×3col_vals_le(date ≤ today)rows_distinct(composite PK)col_schema_matchDevelopment environment
Tested on
polars==1.41.2, Python 3.12, macOS arm64.Additional context
Add any other context about the problem here.