[X86][AVX-VNNI] Fix VNNI intrinsics argument types #122347

BaiXilin · 2025-01-09T19:28:57Z

Fixed the mismatched VNNI intrinsics argument types to align with the ISA.

VNNI intrinsics affected are:
VPDPBUSD[,S]_128/256/512, VPDPWSSD[,S]_128/256/512,
VPDPB[SS,SU,UU]D[,S]_128/256, VPDPW[SU,US,UU]D[,S]_128/256.

github-actions · 2025-01-09T19:29:19Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

…lvm#120916) Also use getPointerAlignment when trying to use alignment and dereferenceable assumptions. This catches cases where dereferencable is known via the assumption but alignment is known via getPointerAlignment (e.g. via argument attribute or align of 1) PR: llvm#120916

llvm#122151 added this test with an invalid SEW. Use a valid SEW here.

OpenACC data clause operations previously required that the variable operand implemented PointerLikeType interface. This was a reasonable constraint because the dialects currently mixed with `acc` do use pointers to represent variables. However, this forces the "pointer" abstraction to be exposed too early and some cases are not cleanly representable through this approach (more specifically FIR's `fix.box` abstraction). Thus, relax this by allowing a variable to be a type which implements either `PointerLikeType` interface or `MappableType` interface.

@vzakhari

…uilds (llvm#120914) The changes in llvm#87822 introduced a regression where Flang could no longer be built standalone without explicitly specifying all of LLVM_DIR, CLANG_DIR and MLIR_DIR. Restore the earlier logic that used these paths as hints, and supported finding system-wide LLVM install via default paths. Instead, make paths absolute after locating the packages, using the paths CMake determined. ----- @vzakhari, could you confirm that this doesn't break your use case?

…lvm#122316) This is a NFC. Duplicate mc test file for gfx12 vop3c/vop3cx to true16/fake16 mode and update it with +real-true16/-real-true16 flag. This is for the upcoming true16 changes

…#121992) true16 codegen pattern for v_med3_f16

…structions to isSupportedInstr

…ctions to isSupportedInstr

…SupportedInstr

…upportedInstr

The system call `__CELQTBCK()` is used to build a backtrace like on other systems. The collected information are the address of the PC, the address of the entry point (EP), the difference between both addresses (+EP), the dynamic storage area (DSA aka the stack pointer), and the function name. The system call is described here: https://www.ibm.com/docs/en/zos/3.1.0?topic=cwicsa6a-celqtbck-also-known-as-celqtbck-64-bit-traceback-service

…late offsets in bytes (llvm#121989) There will be more changes coming in to `SemaHLSL::ActOnFinishBuffer` so it would be good to move the packoffset validation out to a separate function. This change also unifies the units for cbuffer offset calculations to bytes.

In C++20 constexpr virtual function is allowed. In C++17 although non-pure virtual function is not allowed to be constexpr, pure virtual function is allowed to be constexpr and is allowed to be overriden by non-constexpr virtual function in the derived class. The following code compiles as C++: ``` class A { public: constexpr virtual int f() = 0; }; class B : public A { public: int f() override { return 42; } }; ``` However, it fails to compile as CUDA or HIP code. The reason: A::f() is implicitly host device function whereas B::f() is a host function. Since they have different targets, clang does not treat B::f() as an override of A::f(). Instead, it treats B::f() as a name-hiding non-virtual function for A::f(), and diagnoses it. This causes any CUDA/HIP program using C++ standard header file `<format>` from g++-13 to fail to compile since such usage patten show up there: ``` /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/format:3564:34: error: non-virtual member function marked 'override' hides virtual member function 3564 | _M_format_arg(size_t __id) override | ^ /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/format:3538:30: note: hidden overloaded virtual function 'std::__format::_Scanner<char>::_M_format_arg' declared here 3538 | constexpr virtual void _M_format_arg(size_t __id) = 0; | ^ ``` This is a serious issue and there is no workaround. This patch allows non-constexpr function to override constexpr virtual function for CUDA and HIP. This should be OK since non-constexpr function without explicit host or device attribute can only be called in host functions. Fixes: SWDEV-507350

…21611)" This reverts commit a6b7181. Breaks Clang :: CodeGenHLSL/builtins/length.hlsl, see llvm#121611 (comment)

…vm#122029) Move the common case of FieldDecl::getFieldIndex() inline to mitigate the cost of removing the extra `FieldNo` induction variable. Also rename isNoUniqueAddress parameter to isNonVirtualBaseType, which appears to be more accurate. I think the current name is just a consequence of autocomplete gone wrong.

Need to sync the mask between cost and actual emission to avoid bugs in mask calculation Fixes llvm#122324

I’m seeing a series of errors when trying to run the cmake configure step on macOS when the cmake generator is set to Xcode. All is well if I use the Ninja or Unix Makefile generators. Messages are all of the form: ~~~ CMake Error at …llvm-project/clang/cmake/modules/AddClang.cmake:120 (target_compile_definitions): Cannot specify compile definitions for target "obj.clangBasic" which is not built by this project. Call Stack (most recent call first): …llvm-project/clang/lib/Basic/CMakeLists.txt:57 (add_clang_library) ~~~ The remaining errors are similar but mention targets obj.clangAPINotes, obj.clangLex, obj.clangParse, and so on. The regression appears to have been introduced by commit 09fa2f0 (Oct 14 2024) which added the code in this area. My proposed solution is simply to add a test to ensure that the obj.x target exists before setting its compile definitions. There is precedent doing just this in both clang/cmake/modules/AddClang.cmake and clang/lib/support/CMakeLists.txt as well as in the “MSVC AND NOT CLANG_LINK_CLANG_DYLIB” path immediately above the offending line. I’ve also made a couple of grammatical tweaks in the comments surrounding this code. In case it's relevant, the cmake settings and definitions I've used to trigger these errors is: ~~~bash GENERATOR="Xcode" OUTDIR=build_macos cmake \ -S "$SCRIPT_DIR/llvm" \ -B "$SCRIPT_DIR/$OUTDIR" \ -G "$GENERATOR" \ -D CMAKE_BUILD_TYPE=Release \ -D CMAKE_OSX_ARCHITECTURES=arm64 \ -D LLVM_PARALLEL_LINK_JOBS=1 \ -D LLVM_ENABLE_PROJECTS="clang;lld" \ -D LLVM_TARGETS_TO_BUILD=RISCV \ -D LLVM_DEFAULT_TARGET_TRIPLE=riscv32-unknown-elf \ -D LLVM_OPTIMIZED_TABLEGEN=Yes ~~~ (cmake v3.31.1, Xcode 16.1. I know that not all of these variables are useful for the Xcode generator!) Co-authored-by: Paul Bowen-Huggett <[email protected]>

…llvm#122332) The SEW operand for these instructions should have a value of 0. This matches what was done for vcpop/vfirst.

…2286) Don't suggest to comment-out the parameter name if the parameter has an attribute that's spelled after the parameter name. This prevents the parameter's attributes from being wrongly applied to the parameter's type. This fixes llvm#122191.

…lvm#122190) The GPU ID operations already implement InferIntRangeInterface, which gives constant lower and upper bounds on those IDs when appropriate metadata is prentent on the operations or in the surrounding context. This commit uses that existing code to implement the ValueBoundsOpInterface, which is used when analyzing affine operations (unlike the integer range interface, which is used for arithmetic optimization). It also implements the interface for gpu.launch, where we can use it to express the constraint that block/grid sizes are equal to their value from outside the launch op and that the corresponding IDs are bounded above by that size. As a consequence, the test pass for this inference is updated to work on a FunctionOpInterface and not a func.func, creating minor churn in other tests.

) With this patch we switch from the temporary dummy seeds to actual seeds provided by the seed collector. The seeds get sliced and each slice is used as the starting point for vectorization.

The test runs asynchronous kernels and depending on the timing the output is slightly different. We now only check for the common parts of the output.

Summary: Previously we had some indirection here, this patch updates these utilities to just be normal template functions. We use SFINAE to manage the special case handling for floats. Also this strips address spaces so it can be used more generally.

Summary: Use a normal bitcast, remove from the shared utils since it's not available in GCC 7.4

…`` when operand is integer literal for readability-use-std-min-max (llvm#122296) When comparing with integer literal, integer promote will happen to promote type which has less bit width than int to int or unsigned int. It will let auto-fix provide correct but out of expected fix. e.g. ```c++ short a; if ( a > 10 ) a = 10; ``` will be ```c++ short a; if ( (int)a > 10 ) a = (short)10; ``` which will be fixed as ```c++ short a; a = std::max<int>(a, 10); ``` but actually it can be ```c++ short a; a = std::max<short>(a, 10); ``` Fixed: llvm#121676

Mold prefers the suffix '$' for symbols like PLT and GOT entries, so exclude these symbols as well. Otherwise, this test will fail for developers using mold-linked Clang. Closes llvm#76982

Skip function declarations for instrumentation. Fixes llvm#122467

Add Matcher `m_Undef` Fixes: llvm#122439

…lvm#122507) Internal testing shows improvements in some SPEC HPC benchmarks with this change.

… KnownBits Under certain circumstances, lowering of other instructions can result in computeKnownBits being able to detect a constant that it couldn't previously. Fixes llvm#122580

The inlining code for llvm funcs seems to have needlessly forbidden inlining of private (e.g. non-cloning) symbols.

Co-authored-by: Oleksandr "Alex" Zinenko <[email protected]>

Fixes the test introduced in llvm#111145. It would also make sense to throw an error when the user attempts to use a move-from-sr on an unsupported architecture. Currently the encoder generates garbage instructions for a 68000 because the AsmMatcher is able to match the move against a MOV16rr

…e paths contain `..` (llvm#121323) `makeAbsolute` will not normalize path. When getting parent folder, `..` will go into the subfolder instead of the parent folder.

…122595)

…vm#121350) If we have a CSEL instruction that depends on the flags set by a (SUBS x c) instruction and the true and/or false expression is (add (add x y) -c), we can reassociate the latter expression to (add (SUBS x c) y) and save one instruction. Proof for the basic transformation: https://alive2.llvm.org/ce/z/-337Pb We can extend this transformation for slightly different constants. For example, if we have (add (add x y) -(c-1)) and a the comparison x <u c, we can transform the comparison to x <=u c-1 to eliminate the comparison instruction, too. Similarly, we can transform (x == 0) to (x <u 1). Proofs for the transformations that alter the constants: https://alive2.llvm.org/ce/z/3nVqgR Fixes llvm#119606.

With range and undef metadata on a call we can have vector AssertZExt generated on a target with no vector operations. The AssertZExt needs to scalarize to a normal `AssertZext tin, ValueType`. I have added AssertSext too, although I do not have a test case. Fixes llvm#110374

) This adds a test line and updates a comment.

We want special handing for IGLP instructions in the scheduler but they should still be treated like they have side effects by other passes. Add a target hook to the ScheduleDAGInstrs DAG builder so that we have more control over this.

Providing the character that we failed on is helpful for figuring out what's going wrong in the tzdb.

The body of the loop only applies to wide induction recipes, skip any other header phi recipes up-frond

This patch fixes: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:255:18: error: private field 'DAG' is not used [-Werror,-Wunused-private-field]

…m#122552) - **[InstSimpify] Add tests for simplifying `(xor (sub C_Mask, X), C_Mask)`; NFC** - **[InstSimpify] Simplifying `(xor (sub C_Mask, X), C_Mask)` -> `X`** Helps address regressions with folding `clz(Pow2)`. Proof: https://alive2.llvm.org/ce/z/zGwUBp

…ng CR for `ct{t,l}z` (llvm#122548)

Note that PointerUnion::{is,get} have been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> I'm not touching PointerUnion::dyn_cast for now because it's a bit complicated; we could blindly migrate it to dyn_cast_if_present, but we should probably use dyn_cast when the operand is known to be non-null.

fhahn and others added 29 commits January 11, 2025 19:14

[RISCV][VLOPT] Fix VCFCT incompatible EEW test (llvm#122327)

a593b41

llvm#122151 added this test with an invalid SEW. Use a valid SEW here.

[RISCV][VLOPT] Add vfirst and vcpop to getOperandInfo (llvm#122295)

25a6c8e

[AMDGPU][True16][MC][NFC] add true16/fake16 test for gfx12 vop3c/3cx (l…

981db64

…lvm#122316) This is a NFC. Duplicate mc test file for gfx12 vop3c/vop3cx to true16/fake16 mode and update it with +real-true16/-real-true16 flag. This is for the upcoming true16 changes

[AMDGPU][True16][CodeGen] Update codegen pattern for v_med3_f16 (llvm…

d75d9e4

…#121992) true16 codegen pattern for v_med3_f16

[RISCV] Add tests for legalization of <N x i128> and <N x i256> shuffles

b5e910b

[RISCV][VLOPT] Add vector single width floating point add subtract in…

b27eab7

…structions to isSupportedInstr

[RISCV][VLOPT] Add vector widening floating point add subtract instru…

5a213d0

…ctions to isSupportedInstr

[RISCV][VLOPT] Add floating point multiply divide instructions to get…

701be2e

…SupportedInstr

[RISCV][VLOPT] Add widening floating point multiply to isSupportedInstr

e17e957

[RISCV][VLOPT] Add Vector Floating-Point Compare Instructions to getS…

c31dddb

…upportedInstr

Revert "[HLSL] Move length support out of the DirectX Backend (llvm#1…

566aaa5

…21611)" This reverts commit a6b7181. Breaks Clang :: CodeGenHLSL/builtins/length.hlsl, see llvm#121611 (comment)

[SLP]Fix mask processing for reused gathered scalars

550f6c1

Need to sync the mask between cost and actual emission to avoid bugs in mask calculation Fixes llvm#122324

[libc++][NFC] Remove trailing whitespace from release notes

b38f038

[RISCV] Return MILog2SEW for mask instructions getOperandLog2EEW. NFC (…

f812be4

…llvm#122332) The SEW operand for these instructions should have a value of 0. This matches what was done for vcpop/vfirst.

[WebAssembly] Format WebAssembly ReleaseNote entries (llvm#122203)

8fe6536

[SandboxVec][BottomUpVec] Use SeedCollector and slice seeds (llvm#120826

4a87b7a

) With this patch we switch from the temporary dummy seeds to actual seeds provided by the seed collector. The seeds get sliced and each slice is used as the starting point for vectorization.

[OpenMP][FIX] Adjust test to be non-flaky (llvm#122331)

4e55ddc

The test runs asynchronous kernels and depending on the timing the output is slightly different. We now only check for the common parts of the output.

[OpenMP] Use __builtin_bit_cast instead of UB type punning (llvm#122325)

94b9b70

Summary: Use a normal bitcast, remove from the shared utils since it's not available in GCC 7.4

HerrCai0907 and others added 29 commits January 11, 2025 19:14

[Clang][NFC] Fix a test failure with mold linker (llvm#122587)

3d919e1

Mold prefers the suffix '$' for symbols like PLT and GOT entries, so exclude these symbols as well. Otherwise, this test will fail for developers using mold-linked Clang. Closes llvm#76982

[TySan] Skip instrumentation for function declarations (llvm#122488)

0244d58

Skip function declarations for instrumentation. Fixes llvm#122467

[SDPatternMatch] Add Matcher m_Undef (llvm#122521)

c80ad4a

Add Matcher `m_Undef` Fixes: llvm#122439

[flang] Teach omp-map-info-finalization to reuse descriptor allocas (l…

d1f02bd

…lvm#122507) Internal testing shows improvements in some SPEC HPC benchmarks with this change.

[X86] LowerCTPOP - check if the operand is a constant when collecting…

4d07254

… KnownBits Under certain circumstances, lowering of other instructions can result in computeKnownBits being able to detect a constant that it couldn't previously. Fixes llvm#122580

[MLIR] Enable inlining for private symbols (llvm#122572)

f3bc476

The inlining code for llvm funcs seems to have needlessly forbidden inlining of private (e.g. non-cloning) symbols.

[MLIR] Import LLVM add flag to disable loadalldialects (llvm#122574)

3c140b4

Co-authored-by: Oleksandr "Alex" Zinenko <[email protected]>

[clang-tidy] remove never used IgnoreCase in option (llvm#122573)

56c7135

[clang-tidy] fix incorrect configuration file path resolving when fil…

ca49bef

…e paths contain `..` (llvm#121323) `makeAbsolute` will not normalize path. When getting parent folder, `..` will go into the subfolder instead of the parent folder.

[X86] vector popcnt tests - regenerate VPTERNLOG comments

d7a05f3

[X86] vselect-avx.ll - regenerate VPTERNLOG comments

d427e62

[X86] avx512-mask-op.ll - regenerate VPTERNLOG comments

f6f2454

[X86] avx512-build-vector.ll - regenerate VPTERNLOG comments

eb65c36

[clang-tidy][doc] fix incorrectly code snippet in release note (llvm#…

6b6f370

…122595)

[win/asan] GetInstructionSize: Add test for 8D A4 24 .... (llvm#119794

0772c2d

) This adds a test line and updates a comment.

[libc++] Improve diagnostic when failing to parse the tzdb (llvm#122125)

3ea7162

Providing the character that we failed on is helpful for figuring out what's going wrong in the tzdb.

[VPlan] Skip non-induction phi recipes in legalizeAndOptimizeInductions.

50316be

The body of the loop only applies to wide induction recipes, skip any other header phi recipes up-frond

[AMDGPU] Fix a warning

5f93a44

This patch fixes: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:255:18: error: private field 'DAG' is not used [-Werror,-Wunused-private-field]

[ValueTracking] Take into account whether zero is poison when computi…

7148dee

…ng CR for `ct{t,l}z` (llvm#122548)

[TableGen] Avoid repeated hash lookups (NFC) (llvm#122586)

e048908

[Sema] Avoid repeated hash lookups (NFC) (llvm#122588)

2add8a4

Fix two failed clang tests

9c44615

BaiXilin closed this Jan 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[X86][AVX-VNNI] Fix VNNI intrinsics argument types #122347

[X86][AVX-VNNI] Fix VNNI intrinsics argument types #122347

Uh oh!

BaiXilin commented Jan 9, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jan 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

120 participants

[X86][AVX-VNNI] Fix VNNI intrinsics argument types #122347

[X86][AVX-VNNI] Fix VNNI intrinsics argument types #122347

Uh oh!

Conversation

BaiXilin commented Jan 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

120 participants

BaiXilin commented Jan 9, 2025 •

edited

Loading