Skip to content

Conversation

@andygrove
Copy link
Member

Which issue does this PR close?

Closes #.

Rationale for this change

Benchmark Before After Improvement
spark_lpad: default padding 441 µs 111 µs -74.7%
spark_lpad: custom padding 437 µs 107 µs -75.7%
spark_rpad: default padding 443 µs 108 µs -75.4%
spark_rpad: custom padding 445 µs 107 µs -75.8%
spark_lpad: multi-char padding 336 µs 158 µs -52.3%
spark_rpad: multi-char padding 337 µs 145 µs -56.1%
spark_lpad: with truncation 178 µs 82 µs -54.7%
spark_rpad: with truncation 173 µs 79 µs -54.0%

What changes are included in this PR?

  1. Reusable buffer: Added a String buffer that is cleared and reused for each element instead of allocating a new String per element
  2. Changed function signature: Replaced add_padding_string(String) -> Result with write_padded_string(&mut String, &str) that writes directly to a buffer
  3. Pre-computed pad characters: The pad_string.chars() iterator is now collected once into a Vec before the loop
  4. Direct padding writes: New write_padding_chars() function writes padding directly to the buffer without intermediate allocations, with a fast path for single-character padding
  5. Eliminated unnecessary allocations: Removed string.parse().unwrap() which was converting &str to String unnecessarily

How are these changes tested?

@andygrove andygrove marked this pull request as ready for review December 23, 2025 00:02
@andygrove
Copy link
Member Author

@coderfender fyi

@coderfender
Copy link
Contributor

Thank you very much for the optimization @andygrove

@codecov-commenter
Copy link

codecov-commenter commented Dec 23, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.62%. Comparing base (f09f8af) to head (a8e7848).
⚠️ Report is 797 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2963      +/-   ##
============================================
+ Coverage     56.12%   59.62%   +3.49%     
- Complexity      976     1375     +399     
============================================
  Files           119      167      +48     
  Lines         11743    15488    +3745     
  Branches       2251     2567     +316     
============================================
+ Hits           6591     9234    +2643     
- Misses         4012     4956     +944     
- Partials       1140     1298     +158     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

let string_array = as_generic_string_array::<T>(array)?;

// Pre-compute pad characters once to avoid repeated iteration
let pad_chars: Vec<char> = pad_string.chars().collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

);

// Reusable buffer to avoid per-element allocations
let mut buffer = String::new();
Copy link
Contributor

@comphead comphead Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can prob to the buffer fixed size?

String::with_capacity(rpad_length);

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated both of these. Thanks!

);

// Reusable buffer to avoid per-element allocations
let mut buffer = String::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same?

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove good benches

@andygrove andygrove merged commit b5e4290 into apache:main Dec 23, 2025
196 of 200 checks passed
@andygrove andygrove deleted the padding-perf branch December 23, 2025 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants