Conversation
|
Ok, I'm puzzling over a fuzzing issue with this one. Here is the scenario:
If Now, this is a quite fundamental question for this kind of PR: should we match the scalar behavior (i.e., what users might expect)? Or should we punt because the inputs were malformed (i.e., indexed past the end of memory)? @sunfishcode, @alexcrichton, @sbc100: any thoughts on this? |
Do you mean you write a 2-byte string at this location? If so, I would consider this UB, so anything goes, meaning that either faulting or returning error or return success are all valid. I would consider the only constraint being that when memcmp is given valid regions of memory that the behavior must be exactly the same between simd/scalar versions. |
|
Right. This is not the case for
It's possible that a future standard might introduce a similar restriction to |
|
Incidently, that's why I gave up on optimizing the |
I guess I was getting to this, cautiously. There are several functions in this SIMD vein that will have a similar caveat: for any future readers, note that using the SIMD versions of these functions in wasi-libc could have different behavior than their scalar version when using data close to the end of memory. But this is not new to WebAssembly, since SIMD versions compiled to native machine code would also deal with protection boundaries that causing similar differences in behavior. Other than that, my fuzzing shows no other differences and I propose we merge this. |
This the last PR I'm putting forward for #580.
memcmphas the advantage that we know the lengths and can access the entire buffers (no undefined behavior). It has the difficulty that the buffers may not share an alingment, so it useswasm_v128_load.It uses SIMD if there are 16 or more bytes to read, otherwise it fallbacks to scalar. If the number of bytes is larger than 16, but not a multiple of 16, the second iteration retests some already tested bytes, to "align" the remaining length to a multiple of 16. Making the first (rather than the last) iteration special unnecessarily "wastes" these comparisons, but helps the compiler partially unroll the loop.