Optional SIMD memchr by ncruces · Pull Request #592 · WebAssembly/wasi-libc

ncruces · 2025-07-01T03:12:40Z

Continuing #580, followup to #586.

Chose memchr because it's somewhat similar to strlen, but also because it is the basis for strnlen (and in that capacity, for strndup and strlcat) and is also used by strstr, fnmatch.

libc-top-half/musl/src/string/memchr.c

abrown · 2025-07-02T17:19:31Z

libc-top-half/musl/src/string/memchr.c

+		// Bitmask is slow on AArch64, any_true is much faster.
+		if (wasm_v128_any_true(cmp)) {


Not sure about this: using the wasm_i8x16_bitmask directly here is a better lowering for x64 than wasm_v128_any_true.

Yes. This is a mitigation for AArch64. We don't have the luxury of knowing the final architecture, and much less the CPU. Or how good the runtime is at (e.g. peephole) optimizing the final generated assembly.

But I'm pretty sure I measured, and at least for large buffers (and wazero) this was a significant improvement on AArch64, for a pretty insignificant cost on 3 different x86-64 CPUs.

I tried this again with bench.c with wasmtime on my Xeon W-2135, and... it's hard to measure. I'm not saying it's not slower, it may, but it's close enough that between processors, VMs, lengths, I'm not sure which is better.

So, where do I put something like bench.c, and how do we settle the matter?

It seems like this kind of thing could fit in sightglass, even though this is a bit of a micro-benchmark. You could take a look at this example of how the blake3 benchmark was added: benchmark.c. In the Dockerfile that builds a benchmark you could add the special "build wasi-libc with SIMD enabled" logic.

But you don't have to put it there. I think we could probably settle this using the bench.c you provided. I'd probably be comfortable merging this without the special aarch64 optimization now and then submitting that as a second PR once I have a chance to measure a few things. Let me make sure I understand what you're saying precisely: (a) you can't detect a difference using bitmask or any_true with the x64 CPUs you tested but (b) it still makes a very big difference for aarch64?

ncruces · 2025-07-02T23:26:15Z

On hold due to #593.

I don't understand the implications of the UB claim for the pointer arithmetic needed to implement memchr.

Do feel free to update the PR as required.

ncruces · 2025-07-03T14:14:47Z

Updated with the inline assembly, and suggestions in #593 (comment), @alexcrichton please validate.

Also tweaked naming and formatting in strlen, for consistency.

Continuing #580, implements `strspn` and `strcspn`. This one follows the same general structure as #586, #592 and #594, but uses a somewhat more complicated algorithm, described [here](http://0x80.pl/notesen/2018-10-18-simd-byte-lookup.html). I used the Geoff Langdale alternative implementation (the tweet as since disappeared) which is correctly described there but has a subtle bug in the implementation: WojciechMula/simd-byte-lookup#2 Since the complexity needed for `__wasm_v128_bitmap256_t` is shared for both `strspn` and `strcspn`, I moved the implementation to a common file, when SIMD is used. The tests follow a similar structure as the previous ones, and cover the bug, which I was found through fuzzing.

abrown reviewed Jul 2, 2025

View reviewed changes

libc-top-half/musl/src/string/memchr.c Outdated Show resolved Hide resolved

abrown reviewed Jul 2, 2025

View reviewed changes

ncruces marked this pull request as draft July 2, 2025 23:23

ncruces added 2 commits July 3, 2025 14:16

Optional SIMD memchr

098833b

Take #593 into account.

18bee53

ncruces force-pushed the simd branch from 853dad4 to 18bee53 Compare July 3, 2025 14:12

ncruces marked this pull request as ready for review July 3, 2025 14:14

abrown approved these changes Jul 11, 2025

View reviewed changes

abrown merged commit 4ea6fdf into WebAssembly:main Jul 11, 2025
17 checks passed

ncruces mentioned this pull request Jul 17, 2025

Optional SIMD str(c)spn #597

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Optional SIMD memchr#592

Optional SIMD memchr#592
abrown merged 2 commits intoWebAssembly:mainfrom
ncruces:simd

ncruces commented Jul 1, 2025

Uh oh!

Uh oh!

abrown Jul 2, 2025

Uh oh!

ncruces Jul 2, 2025

Uh oh!

ncruces Jul 7, 2025

Uh oh!

abrown Jul 7, 2025

Uh oh!

ncruces commented Jul 2, 2025

Uh oh!

ncruces commented Jul 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// Bitmask is slow on AArch64, any_true is much faster.
		if (wasm_v128_any_true(cmp)) {

Comments

Conversation

ncruces commented Jul 1, 2025

Uh oh!

Uh oh!

abrown Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

ncruces Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

ncruces Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

abrown Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

ncruces commented Jul 2, 2025

Uh oh!

ncruces commented Jul 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants