You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CRC32 version of CombineContiguous for length <= 32.
For length in [17, 32] we compute two chain of dependent CRC32 operations to have good entropy in the resulting two 32 bit numbers.
1. x := CRC32(CRC32(state, A), D)
2. y := CRC32(CRC32(bswap(state), C), B)
On ARM:
CRC32 has 2 cycles latency and throughput equal to 1.
Computations will be pipelined without any wait.
On x86:
CRC32 has 3 cycles latency and throughput equal to 1.
There will be 1 extra cycle wait, but we can do `cmp` in parallel.
At the end we multiply (mul - x) * (y - mul). mul is added to fill upper 32 bits of CRC result with good entropy bits. `mul = rotr(kMul, len)`
We also mixing length differently:
1. `state + 8 * len` (`lea` instruction), later one or two CRC shuffle these bits well into low 32 bit.
2. `rotr(kMul, len)` is used for filling high 32 bits before multiplication in `Mix`. This avoid reading from `kStaticRandomData`.
For smaller strings we try to extremely minimize binary size and register pressure.
CRC instruction fused with memory read is used. llvm-mca reporting 1 cycle smaller latency compared to separate `mov` + `crc`.
ASM analysis https://godbolt.org/z/e1xrKzhdc:
1. 100+ bytes binary size saving (per inline instance)
2. 25+ instruction saving
3. 2 registers are not used (r8 and r9).
Latency in isolation without accounting comparison are controversial.
1. latency for 8 bytes in isolation is 1 cycle better: https://godbolt.org/z/zc39eM3K9
2. latency for 1-3 bytes in isolation is 2 cycles better: https://godbolt.org/z/qMKfbv438
3. latency for 16 bytes in isolation is 3 cycles worse: https://godbolt.org/z/vcqr8oGv3
4. latency for 32 bytes in isolation is 5 cycles worse:
https://godbolt.org/z/nEPP5jP58
PiperOrigin-RevId: 850659551
Change-Id: I02a2434f2d98473b099c171ef1c56adffa821c60
0 commit comments