-
Notifications
You must be signed in to change notification settings - Fork 13.8k
slice/ascii: Optimize eq_ignore_ascii_case
with auto-vectorization
#147436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Refactor the current functionality into a helper function Use `as_chunks` to encourage auto-vectorization in the optimized chunk processing function Add a codegen test Add benches for `eq_ignore_ascii_case` The optimized function is initially only enabled for x86_64 which has `sse2` as part of its baseline, but none of the code is platform specific. Other platforms with SIMD instructions may also benefit from this implementation. Performance improvements only manifest for slices of 16 bytes or longer, so the optimized path is gated behind a length check for greater than or equal to 16.
Refactor the eq check into an inner function for reuse in tail checking Rather than fall back to the simple implementation for tail handling, load the last 16 bytes to take advantage of vectorization. This doesn't seem to negatively impact check time even when the remainder count is low.
I've pushed a commit to avoid falling back to the scalar checking for the remainder handling. We reload the last 16 bytes of the slices if there's a remainder, which improves the 31 byte case and doesn't seem to regress the 17 byte case.
|
let (other_chunks, _) = other.as_chunks::<N>(); | ||
|
||
// Branchless check to encourage auto-vectorization | ||
const fn eq_ignore_ascii_inner(lhs: &[u8; N], rhs: &[u8; N]) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I copied this code into compiler explorer with -C opt_level=3
, the call to eq_ignore_ascii_inner
did not get inlined. I would suggest to mark this function #[inline(always)]
and add a CHECK-NOT: call
in the codegen test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the annotation and filecheck adaptation in a5ba248
Add #[inline(always)] to inner function and check not for filecheck test
as_chunks
to encourage auto-vectorization in the optimized chunk processing functioneq_ignore_ascii_case
The optimized function is initially only enabled for x86_64 which has
sse2
as part of its baseline, but none of the code is platform specific. Other platforms with SIMD instructions may also benefit from this implementation.Performance improvements only manifest for slices of 16 bytes or longer, so the optimized path is gated behind a length check for greater than or equal to 16.
Benchmarks - Cases below 16 bytes are unaffected, cases above all show sizeable improvements.