perf: Improve criterion benchmarks for cast string to int #3049

andygrove · 2026-01-06T20:39:36Z

Which issue does this PR close?

N/A

Rationale for this change

The current benchmarks only covered legacy eval mode and not ansi or try.

I am adding these benchmarks so that I can verify the optimizations in #3048

What changes are included in this PR?

Cover all eval modes.

How are these changes tested?

andygrove · 2026-01-06T20:40:26Z

cast_string_to_int/legacy/i32
                        time:   [12.296 µs 12.942 µs 13.631 µs]
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) high mild
  9 (9.00%) high severe
cast_string_to_int/legacy/i64
                        time:   [11.607 µs 11.652 µs 11.705 µs]

cast_string_to_int/ansi/i32
                        time:   [10.024 µs 10.156 µs 10.334 µs]
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild
cast_string_to_int/ansi/i64
                        time:   [10.836 µs 11.525 µs 12.240 µs]
Found 16 outliers among 100 measurements (16.00%)
  1 (1.00%) high mild
  15 (15.00%) high severe

cast_string_to_int/try/i32
                        time:   [13.877 µs 13.936 µs 13.999 µs]
cast_string_to_int/try/i64
                        time:   [10.718 µs 10.829 µs 10.967 µs]

cast_string_to_int/legacy_decimals/i32
                        time:   [15.430 µs 16.569 µs 17.804 µs]
Found 18 outliers among 100 measurements (18.00%)
  4 (4.00%) low mild
  2 (2.00%) high mild
  12 (12.00%) high severe
cast_string_to_int/legacy_decimals/i64
                        time:   [13.032 µs 13.076 µs 13.114 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

andygrove · 2026-01-06T20:43:55Z

@coderfender could you review?

andygrove · 2026-01-06T20:45:14Z

New output after re-adding i8 and i16 benchmarks. Note that the benchmark results do fluctuate between runs, which is not ideal.

cast_string_to_int/legacy/i8
                        time:   [6.6654 µs 6.6882 µs 6.7107 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
cast_string_to_int/legacy/i16
                        time:   [6.8855 µs 7.0007 µs 7.1618 µs]
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe
cast_string_to_int/legacy/i32
                        time:   [11.650 µs 11.695 µs 11.741 µs]
                        change: [−6.9011% −4.5301% −2.4419%] (p = 0.00 < 0.05)
                        Performance has improved.
cast_string_to_int/legacy/i64
                        time:   [12.295 µs 12.330 µs 12.364 µs]
                        change: [+5.2149% +5.6287% +6.0241%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

cast_string_to_int/ansi/i8
                        time:   [5.7430 µs 5.7633 µs 5.7841 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
cast_string_to_int/ansi/i16
                        time:   [5.7197 µs 5.7496 µs 5.7824 µs]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
cast_string_to_int/ansi/i32
                        time:   [10.469 µs 10.866 µs 11.369 µs]
                        change: [−23.119% −17.167% −10.789%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  5 (5.00%) high mild
  8 (8.00%) high severe
cast_string_to_int/ansi/i64
                        time:   [10.262 µs 10.302 µs 10.344 µs]
                        change: [−5.2336% −1.9457% +1.0536%] (p = 0.25 > 0.05)
                        No change in performance detected.

cast_string_to_int/try/i8
                        time:   [6.0008 µs 6.0224 µs 6.0429 µs]
cast_string_to_int/try/i16
                        time:   [6.1752 µs 6.4688 µs 6.8385 µs]
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe
cast_string_to_int/try/i32
                        time:   [10.460 µs 10.506 µs 10.564 µs]
                        change: [−19.766% −17.645% −15.299%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 23 outliers among 100 measurements (23.00%)
  23 (23.00%) high severe
cast_string_to_int/try/i64
                        time:   [11.172 µs 11.374 µs 11.645 µs]
                        change: [+13.774% +18.852% +24.117%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild

cast_string_to_int/legacy_decimals/i32
                        time:   [12.402 µs 12.437 µs 12.471 µs]
                        change: [−20.604% −17.705% −15.094%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe
cast_string_to_int/legacy_decimals/i64
                        time:   [12.605 µs 12.740 µs 12.928 µs]
                        change: [−3.4087% −2.5954% −1.6264%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  5 (5.00%) high severe

coderfender · 2026-01-06T20:56:56Z

native/spark-expr/benches/cast_from_string.rs

-// Create UTF8 batch with strings representing ints, floats, nulls
-fn create_utf8_batch() -> RecordBatch {
+/// Create batch with small integer strings that fit in i8 range (for i8/i16 benchmarks)
+fn create_small_int_string_batch() -> RecordBatch {


minor : is it worth creating i16 batch as well ?

coderfender

Thank you @andygrove

comphead

Thanks @andygrove it is lgtm

improve criterion benchmarks for cast string to int

fec6217

andygrove marked this pull request as ready for review January 6, 2026 20:39

add i8/i16 benchmarks

b20850b

andygrove mentioned this pull request Jan 6, 2026

perf: Additional optimizations for cast from string to int #3048

Merged

coderfender reviewed Jan 6, 2026

View reviewed changes

coderfender approved these changes Jan 6, 2026

View reviewed changes

andygrove requested review from comphead, mbutrovich and parthchandra January 7, 2026 15:49

comphead approved these changes Jan 7, 2026

View reviewed changes

andygrove merged commit af3bd81 into apache:main Jan 7, 2026
3 checks passed

andygrove deleted the cast-string-int-criterion branch January 7, 2026 16:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: Improve criterion benchmarks for cast string to int #3049

perf: Improve criterion benchmarks for cast string to int #3049

Uh oh!

andygrove commented Jan 6, 2026 •

edited

Loading

Uh oh!

andygrove commented Jan 6, 2026

Uh oh!

andygrove commented Jan 6, 2026

Uh oh!

andygrove commented Jan 6, 2026

Uh oh!

coderfender Jan 6, 2026

Uh oh!

coderfender left a comment

Uh oh!

comphead left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perf: Improve criterion benchmarks for cast string to int #3049

perf: Improve criterion benchmarks for cast string to int #3049

Uh oh!

Conversation

andygrove commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

andygrove commented Jan 6, 2026

Uh oh!

andygrove commented Jan 6, 2026

Uh oh!

andygrove commented Jan 6, 2026

Uh oh!

coderfender Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderfender left a comment

Choose a reason for hiding this comment

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andygrove commented Jan 6, 2026 •

edited

Loading