Skip to content

Commit 98cfe64

Browse files
committed
Optimize a scan of non state-chaning bytes with SSE2 instructions
This commit optimizes the scan of non-state-changing bytes using SSE2 instructions. A [_mm_cmpestri](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_cmpestri) operation appears to be quite slow compared to alternative approach that involves [_mm_shuffle_epi8](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_shuffle_epi8) for low/high nibble of the input and using bitwise-and for the results to get a 16 bytes of LUT in one go (it also involves a bunch of other SSE2 operations which all have nice latency/throughput properties). The resulting LUT of 16 bytes can be analyzed (also vectorized) to get the index of the first byte (if any) that changes the state. That is done by figuring out the first byte that LUTs to zero. The tricky part here is the following: ``` Find A, B arrays (uint8_t[16]) such that * `A[i] & B[j] == 0` if `LUT[i | (j <<4)] == 0` * `A[i] & B[j] != 0` if `LUT[i | (j <<4)] != 0` // Note we don't need any specific non-zero value for all i,j = 0..15. ``` To find `A` and `B` satisfying the above conditions a [Z3](https://github.com/Z3Prover/z3) library is used. The npm package that wrapps z3 for using in ts is not particularly friendly to the author of this change so another package (synckit) was required to handle the async API for z3-wrapper. Using llhttp as a benchmark framework this change draws the following improvemnts: ``` Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz http: "seanmonstar/httparse" (C) BEFORE: 8192.00 mb | 1456.72 mb/s | 2172811.81 ops/sec | 5.62 s AFTER: 8192.00 mb | 1752.90 mb/s | 2614577.82 ops/sec | 4.67 s ~20% improvement http: "nodejs/http-parser" (C) BEFORE: 8192.00 mb | 1050.60 mb/s | 2118535.14 ops/sec | 7.80 s AFTER: 8192.00 mb | 1167.42 mb/s | 2354101.76 ops/sec | 7.02 s ~11% improvement ``` For more header-fields-heavy messages numbers might be even more convincing.
1 parent 4d7e352 commit 98cfe64

File tree

5 files changed

+3077
-28
lines changed

5 files changed

+3077
-28
lines changed

0 commit comments

Comments
 (0)