Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Jan 6, 2026

Which issue does this PR close?

N/A

Rationale for this change

The current benchmarks only covered legacy eval mode and not ansi or try.

I am adding these benchmarks so that I can verify the optimizations in #3048

What changes are included in this PR?

Cover all eval modes.

How are these changes tested?

@andygrove andygrove marked this pull request as ready for review January 6, 2026 20:39
@andygrove
Copy link
Member Author

cast_string_to_int/legacy/i32
                        time:   [12.296 µs 12.942 µs 13.631 µs]
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) high mild
  9 (9.00%) high severe
cast_string_to_int/legacy/i64
                        time:   [11.607 µs 11.652 µs 11.705 µs]

cast_string_to_int/ansi/i32
                        time:   [10.024 µs 10.156 µs 10.334 µs]
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild
cast_string_to_int/ansi/i64
                        time:   [10.836 µs 11.525 µs 12.240 µs]
Found 16 outliers among 100 measurements (16.00%)
  1 (1.00%) high mild
  15 (15.00%) high severe

cast_string_to_int/try/i32
                        time:   [13.877 µs 13.936 µs 13.999 µs]
cast_string_to_int/try/i64
                        time:   [10.718 µs 10.829 µs 10.967 µs]

cast_string_to_int/legacy_decimals/i32
                        time:   [15.430 µs 16.569 µs 17.804 µs]
Found 18 outliers among 100 measurements (18.00%)
  4 (4.00%) low mild
  2 (2.00%) high mild
  12 (12.00%) high severe
cast_string_to_int/legacy_decimals/i64
                        time:   [13.032 µs 13.076 µs 13.114 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

@andygrove
Copy link
Member Author

@coderfender could you review?

@andygrove
Copy link
Member Author

New output after re-adding i8 and i16 benchmarks. Note that the benchmark results do fluctuate between runs, which is not ideal.

cast_string_to_int/legacy/i8
                        time:   [6.6654 µs 6.6882 µs 6.7107 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
cast_string_to_int/legacy/i16
                        time:   [6.8855 µs 7.0007 µs 7.1618 µs]
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe
cast_string_to_int/legacy/i32
                        time:   [11.650 µs 11.695 µs 11.741 µs]
                        change: [−6.9011% −4.5301% −2.4419%] (p = 0.00 < 0.05)
                        Performance has improved.
cast_string_to_int/legacy/i64
                        time:   [12.295 µs 12.330 µs 12.364 µs]
                        change: [+5.2149% +5.6287% +6.0241%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

cast_string_to_int/ansi/i8
                        time:   [5.7430 µs 5.7633 µs 5.7841 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
cast_string_to_int/ansi/i16
                        time:   [5.7197 µs 5.7496 µs 5.7824 µs]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
cast_string_to_int/ansi/i32
                        time:   [10.469 µs 10.866 µs 11.369 µs]
                        change: [−23.119% −17.167% −10.789%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  5 (5.00%) high mild
  8 (8.00%) high severe
cast_string_to_int/ansi/i64
                        time:   [10.262 µs 10.302 µs 10.344 µs]
                        change: [−5.2336% −1.9457% +1.0536%] (p = 0.25 > 0.05)
                        No change in performance detected.

cast_string_to_int/try/i8
                        time:   [6.0008 µs 6.0224 µs 6.0429 µs]
cast_string_to_int/try/i16
                        time:   [6.1752 µs 6.4688 µs 6.8385 µs]
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe
cast_string_to_int/try/i32
                        time:   [10.460 µs 10.506 µs 10.564 µs]
                        change: [−19.766% −17.645% −15.299%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 23 outliers among 100 measurements (23.00%)
  23 (23.00%) high severe
cast_string_to_int/try/i64
                        time:   [11.172 µs 11.374 µs 11.645 µs]
                        change: [+13.774% +18.852% +24.117%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild

cast_string_to_int/legacy_decimals/i32
                        time:   [12.402 µs 12.437 µs 12.471 µs]
                        change: [−20.604% −17.705% −15.094%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe
cast_string_to_int/legacy_decimals/i64
                        time:   [12.605 µs 12.740 µs 12.928 µs]
                        change: [−3.4087% −2.5954% −1.6264%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  5 (5.00%) high severe

// Create UTF8 batch with strings representing ints, floats, nulls
fn create_utf8_batch() -> RecordBatch {
/// Create batch with small integer strings that fit in i8 range (for i8/i16 benchmarks)
fn create_small_int_string_batch() -> RecordBatch {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor : is it worth creating i16 batch as well ?

Copy link
Contributor

@coderfender coderfender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @andygrove

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove it is lgtm

@andygrove andygrove merged commit af3bd81 into apache:main Jan 7, 2026
3 checks passed
@andygrove andygrove deleted the cast-string-int-criterion branch January 7, 2026 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants