Skip to content

Optional SIMD memcmp #603

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 15, 2025
Merged

Optional SIMD memcmp #603

merged 1 commit into from
Aug 15, 2025

Conversation

ncruces
Copy link
Contributor

@ncruces ncruces commented Jul 28, 2025

This the last PR I'm putting forward for #580.

memcmp has the advantage that we know the lengths and can access the entire buffers (no undefined behavior). It has the difficulty that the buffers may not share an alingment, so it uses wasm_v128_load.

It uses SIMD if there are 16 or more bytes to read, otherwise it fallbacks to scalar. If the number of bytes is larger than 16, but not a multiple of 16, the second iteration retests some already tested bytes, to "align" the remaining length to a multiple of 16. Making the first (rather than the last) iteration special unnecessarily "wastes" these comparisons, but helps the compiler partially unroll the loop.

@abrown
Copy link
Collaborator

abrown commented Aug 15, 2025

Ok, I'm puzzling over a fuzzing issue with this one. Here is the scenario:

  • the memory length of the WebAssembly program is 131072 bytes
  • I write a 16-byte s1 string (e.g., aaa...) to any old place in the memory
  • I write another 16-byte s2 string two bytes from the end, at offset 131070
  • I compile the scalar and SIMD versions of this function
  • I run memcmp(s1, s2, 16) to trigger the SIMD memcmp in this PR for both scalar and SIMD

If s2 is aa, both the scalar and the SIMD versions of this function fail due to an OOB access. If s2 is ab, then the scalar version does not fail, but returns as expected (-1); but the SIMD version fails.

Now, this is a quite fundamental question for this kind of PR: should we match the scalar behavior (i.e., what users might expect)? Or should we punt because the inputs were malformed (i.e., indexed past the end of memory)? @sunfishcode, @alexcrichton, @sbc100: any thoughts on this?

@alexcrichton
Copy link
Collaborator

I write another 16-byte s2 string two bytes from the end, at offset 131070

Do you mean you write a 2-byte string at this location?

If so, I would consider this UB, so anything goes, meaning that either faulting or returning error or return success are all valid. I would consider the only constraint being that when memcmp is given valid regions of memory that the behavior must be exactly the same between simd/scalar versions.

@ncruces
Copy link
Contributor Author

ncruces commented Aug 15, 2025

Right. memcmp (and memrchr) are allowed to assume they can access the whole buffer.

This is not the case for memchr where the C standard specifically says (since C11):

The implementation shall behave as if it reads the characters sequentially and stops as soon as a matching character is found.

It's possible that a future standard might introduce a similar restriction to memcmp (stop as soon as it finds a difference), but so far that has not been the case. That's the reason for making the requirement explicit in the top comment.

@ncruces
Copy link
Contributor Author

ncruces commented Aug 15, 2025

Incidently, that's why I gave up on optimizing the strcmp family: it's much harder, particularly when the strings don't share an alignment.

@abrown
Copy link
Collaborator

abrown commented Aug 15, 2025

If so, I would consider this UB, so anything goes, meaning that either faulting or returning error or return success are all valid.

I guess I was getting to this, cautiously. There are several functions in this SIMD vein that will have a similar caveat: for any future readers, note that using the SIMD versions of these functions in wasi-libc could have different behavior than their scalar version when using data close to the end of memory. But this is not new to WebAssembly, since SIMD versions compiled to native machine code would also deal with protection boundaries that causing similar differences in behavior.

Other than that, my fuzzing shows no other differences and I propose we merge this.

@abrown abrown merged commit 41edcb9 into WebAssembly:main Aug 15, 2025
17 checks passed
@ncruces ncruces deleted the cmp branch August 17, 2025 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants