use AVX-512F setzero instead of set1_epi8 by NexusXe · Pull Request #1101 · Cyan4973/xxHash

NexusXe · 2026-03-17T19:00:32Z

For zero-initializing a 512-bit vector, it is better to use _mm512_setzero_si512(). This shouldn't change anything in compiled binaries, but could avoid issues under unique optimization scenarios.

gzm55 · 2026-03-18T00:33:10Z

from intel docs: _mm512_set1_epi8 and _mm512_setzero_si512 are both AVX512F.

But the latency and throughput of _mm512_setzero_si512 are better.

NexusXe · 2026-03-18T03:10:07Z

from intel docs: _mm512_set1_epi8 and _mm512_setzero_si512 are both AVX512F.

But the latency and throughput of _mm512_setzero_si512 are better.

Note how it says "Sequence" - if only compiled targeting AVX-512F (which it is in this case) it may generate a functionally equivalent but strictly inferior sequence of instructions. vpbroadcastb (which _mm512_set1_epi8 prefers to compile to) is only available with AVX-512BW.

gzm55 · 2026-03-18T06:09:15Z

from intel docs: _mm512_set1_epi8 and _mm512_setzero_si512 are both AVX512F.
But the latency and throughput of _mm512_setzero_si512 are better.

Note how it says "Sequence" - if only compiled targeting AVX-512F (which it is in this case) it may generate a functionally equivalent but strictly inferior sequence of instructions. vpbroadcastb (which _mm512_set1_epi8 prefers to compile to) is only available with AVX-512BW.

a quick test https://godbolt.org/z/de4aEdjas

_mm512_set1_epi8 as a function, it should generate 512F-only instructions when found no 512bw.

see functions in the above test:

__attribute__((target("avx512f,no-avx512bw"), optimize("O0")))
__m512i test_f_set1_epi8_O0() {
    // NOTE: vpbroadcastb ymm0  (AVX2)
    return _mm512_set1_epi8(0xAB);
}
__attribute__((target("avx512f,avx512bw"), optimize("O0")))
__m512i test_bw_set1_epi8_O0() {
    // NOTE: vpbroadcastb zmm0  (512BW)
    return _mm512_set1_epi8(0xAB);
}

vpbroadcastb ymm should be part of AVX2.

the previous document
only describe the broadcast with mask version of vpbroadcastb.

in another document https://www.felixcloutier.com/x86/vpbroadcast
we can find VPBROADCASTB ymm without mask is from AVX2.

NexusXe · 2026-03-18T17:44:08Z

vpbroadcastb ymm should be part of AVX2.

the previous document only describe the broadcast with mask version of vpbroadcastb.

in another document https://www.felixcloutier.com/x86/vpbroadcast we can find VPBROADCASTB ymm without mask is from AVX2.

But vpbroadcastb zmm is only available with AVX-512BW, which _mm512_set1_epi8 is equivalent to. Either way, using the setzero intrinsic is the correct way to do it.

gzm55 · 2026-03-19T00:58:50Z

But vpbroadcastb zmm is only available with AVX-512BW, which _mm512_set1_epi8 is equivalent to. Either way, using the setzero intrinsic is the correct way to do it.

Most likely, I think _mm512_setzero is exactly the right choice for performance. (better also post some perf test results)

Both intrinsics functions are correct in terms of semantics and the generated instruction sequences. The intrinsics functions is not mapped 1-to-1 instruction, but implement in different instruction sequence for different constrains.

In particular, for _mm512_set1_epi8(0), it could be interpreted as vpbroadcastb ymm (AVX2) --> vinserti64x4 zmm (512F) or vpxor xmm0, xmm0, xmm0 (AVX) based on -O? and -m??? constrains.

And _mm512_setzero_si512(), always generate ``vpxor xmm0, xmm0, xmm0which is an AVX instruction, shorter, faster thenvpxor zmm0 (AVX512F)`.

use AVX-512F setzero instead of AVX-512BW set1_epi8

20429ed

NexusXe changed the title ~~use AVX-512F setzero instead of AVX-512BW set1_epi8~~ use AVX-512F setzero instead of set1_epi8 Mar 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use AVX-512F setzero instead of set1_epi8#1101

use AVX-512F setzero instead of set1_epi8#1101
NexusXe wants to merge 1 commit intoCyan4973:devfrom
NexusXe:dev

NexusXe commented Mar 17, 2026 •

edited

Loading

Uh oh!

gzm55 commented Mar 18, 2026

Uh oh!

NexusXe commented Mar 18, 2026

Uh oh!

gzm55 commented Mar 18, 2026

Uh oh!

NexusXe commented Mar 18, 2026

Uh oh!

gzm55 commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NexusXe commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gzm55 commented Mar 18, 2026

Uh oh!

NexusXe commented Mar 18, 2026

Uh oh!

gzm55 commented Mar 18, 2026

Uh oh!

NexusXe commented Mar 18, 2026

Uh oh!

gzm55 commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NexusXe commented Mar 17, 2026 •

edited

Loading