Skip to content

Conversation

@hazzlim
Copy link
Contributor

@hazzlim hazzlim commented Feb 5, 2026

This PR adds a Neon implementation of mismatch 🚫 🚀

Performance numbers (values are speedup figures relative to existing code; values greater than 1 indicate that the new code is faster):

  MSVC Speedup Clang Speedup
bm<uint8_t, op::mismatch>/8/3 1.143 0.996
bm<uint8_t, op::mismatch>/24/22 4.083 4.898
bm<uint8_t, op::mismatch>/105/-1 9.333 10.714
bm<uint8_t, op::mismatch>/4021/3056 14.959 15.674
bm<uint16_t, op::mismatch>/8/3 0.918 0.828
bm<uint16_t, op::mismatch>/24/22 3.111 3.683
bm<uint16_t, op::mismatch>/105/-1 5.855 6.6
bm<uint16_t, op::mismatch>/4021/3056 7.438 7.934
bm<uint32_t, op::mismatch>/8/3 0.922 0.715
bm<uint32_t, op::mismatch>/24/22 2.155 2.257
bm<uint32_t, op::mismatch>/105/-1 3.181 3.529
bm<uint32_t, op::mismatch>/4021/3056 3.802 3.967
bm<uint64_t, op::mismatch>/8/3 0.784 0.654
bm<uint64_t, op::mismatch>/24/22 1.434 1.385
bm<uint64_t, op::mismatch>/105/-1 1.789 1.895
bm<uint64_t, op::mismatch>/4021/3056 1.8 1.969
bm<color, op::mismatch, c1, c2>/8/3 0.996 1.071
bm<color, op::mismatch, c1, c2>/24/22 1.021 3.843
bm<color, op::mismatch, c1, c2>/105/-1 1.023 4.906
bm<color, op::mismatch, c1, c2>/4021/3056 1.023 5.06
bm<uint8_t, op::lexi>/8/3 1.128 1.286
bm<uint8_t, op::lexi>/24/22 3.333 4.25
bm<uint8_t, op::lexi>/105/-1 8.609 9.429
bm<uint8_t, op::lexi>/4021/3056 13.659 13.659
bm<int8_t, op::lexi>/8/3 1.08 1.2
bm<int8_t, op::lexi>/24/22 4.838 6.582
bm<int8_t, op::lexi>/105/-1 14.222 17.633
bm<int8_t, op::lexi>/4021/3056 22.36 26.748
bm<uint16_t, op::lexi>/8/3 0.95 1.162
bm<uint16_t, op::lexi>/24/22 4.019 5.378
bm<uint16_t, op::lexi>/105/-1 8.93 11.25
bm<uint16_t, op::lexi>/4021/3056 11.55 13.429
bm<uint32_t, op::lexi>/8/3 0.864 1.056
bm<uint32_t, op::lexi>/24/22 2.689 3.223
bm<uint32_t, op::lexi>/105/-1 5.065 6.098
bm<uint32_t, op::lexi>/4021/3056 5.775 6.508
bm<uint64_t, op::lexi>/8/3 0.736 0.79
bm<uint64_t, op::lexi>/24/22 1.748 2.091
bm<uint64_t, op::lexi>/105/-1 2.784 3.05
bm<uint64_t, op::lexi>/4021/3056 2.795 3.015

@hazzlim hazzlim requested a review from a team as a code owner February 5, 2026 13:58
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Feb 5, 2026
@StephanTLavavej StephanTLavavej added performance Must go faster ARM64 Related to the ARM64 architecture labels Feb 5, 2026
@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Feb 5, 2026
@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews Feb 9, 2026
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo. Please notify me if any further changes are pushed, otherwise no action is required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ARM64 Related to the ARM64 architecture performance Must go faster

Projects

Status: Merging

Development

Successfully merging this pull request may close these issues.

2 participants