-
I have a double precision workload that seems to significantly benefit from the lower register pressure of the i64x4 avx2 target. An equivalent target that allows the same 256 bit vector width, but that allows access to the full set of 32 vector registers (as opposed to just the 16 that avx2 allows) and avx512 mask registers, seems like it would be very useful. I can't see it in the existing targets, though. Is there any way for me to get what I want, or should I turn this into a feature request? I can try the avx512skx-i32x8 target and use a full 512 bit vector, but I expect the frequency penalty that will be paid on SKX will prevent me from using it. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
This is very reasonable request and I've already heard it from other developers. One caveat here, do not take As for the how to proceed with adding |
Beta Was this translation helpful? Give feedback.
-
Yep. I'm planning to do that along with allowing to drop specifying base type in the targets (at least for the targets that don't care about it, i.e. AVX512 and GPU targets). |
Beta Was this translation helpful? Give feedback.
This is very reasonable request and I've already heard it from other developers.
One caveat here, do not take
i32
ori64
in AVX512 targets seriously. We originally had this "base type" in the target naming schema because it affects how the default mask would look like on the targets that don't have mask registers. I.e. for SSE2/4 and AVX2 it matters a lot. But for AVX512 we have mask registers andavx512skx-i32x4
andavx512skx-i64x4
would be the same thing. Just keep that in mind.As for the how to proceed with adding
avx512skx-i32x4
target, let's convert this question to feature request and I'll implement it.