-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[X86][AVX-VNNI] Fix VNNI intrinsics argument types #122347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
…lvm#120916) Also use getPointerAlignment when trying to use alignment and dereferenceable assumptions. This catches cases where dereferencable is known via the assumption but alignment is known via getPointerAlignment (e.g. via argument attribute or align of 1) PR: llvm#120916
llvm#122151 added this test with an invalid SEW. Use a valid SEW here.
OpenACC data clause operations previously required that the variable operand implemented PointerLikeType interface. This was a reasonable constraint because the dialects currently mixed with `acc` do use pointers to represent variables. However, this forces the "pointer" abstraction to be exposed too early and some cases are not cleanly representable through this approach (more specifically FIR's `fix.box` abstraction). Thus, relax this by allowing a variable to be a type which implements either `PointerLikeType` interface or `MappableType` interface.
…uilds (llvm#120914) The changes in llvm#87822 introduced a regression where Flang could no longer be built standalone without explicitly specifying all of LLVM_DIR, CLANG_DIR and MLIR_DIR. Restore the earlier logic that used these paths as hints, and supported finding system-wide LLVM install via default paths. Instead, make paths absolute after locating the packages, using the paths CMake determined. ----- @vzakhari, could you confirm that this doesn't break your use case?
…lvm#122316) This is a NFC. Duplicate mc test file for gfx12 vop3c/vop3cx to true16/fake16 mode and update it with +real-true16/-real-true16 flag. This is for the upcoming true16 changes
…#121992) true16 codegen pattern for v_med3_f16
…structions to isSupportedInstr
…ctions to isSupportedInstr
The system call `__CELQTBCK()` is used to build a backtrace like on other systems. The collected information are the address of the PC, the address of the entry point (EP), the difference between both addresses (+EP), the dynamic storage area (DSA aka the stack pointer), and the function name. The system call is described here: https://www.ibm.com/docs/en/zos/3.1.0?topic=cwicsa6a-celqtbck-also-known-as-celqtbck-64-bit-traceback-service
…late offsets in bytes (llvm#121989) There will be more changes coming in to `SemaHLSL::ActOnFinishBuffer` so it would be good to move the packoffset validation out to a separate function. This change also unifies the units for cbuffer offset calculations to bytes.
In C++20 constexpr virtual function is allowed. In C++17 although
non-pure virtual function is not allowed to be constexpr, pure virtual
function is allowed to be constexpr and is allowed to be overriden by
non-constexpr virtual function in the derived class.
The following code compiles as C++:
```
class A
{
public:
constexpr virtual int f() = 0;
};
class B : public A
{
public:
int f() override
{
return 42;
}
};
```
However, it fails to compile as CUDA or HIP code. The reason: A::f() is
implicitly host device function whereas B::f() is a host function. Since
they have different targets, clang does not treat B::f() as an override
of A::f(). Instead, it treats B::f() as a name-hiding non-virtual
function for A::f(), and diagnoses it.
This causes any CUDA/HIP program using C++ standard header file
`<format>` from g++-13 to fail to compile since such usage patten show
up there:
```
/usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/format:3564:34: error: non-virtual member function marked 'override' hides virtual member function
3564 | _M_format_arg(size_t __id) override
| ^
/usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/format:3538:30: note: hidden overloaded virtual function 'std::__format::_Scanner<char>::_M_format_arg' declared here
3538 | constexpr virtual void _M_format_arg(size_t __id) = 0;
| ^
```
This is a serious issue and there is no workaround.
This patch allows non-constexpr function to override constexpr virtual
function for CUDA and HIP. This should be OK since non-constexpr
function without explicit host or device attribute can only be called in
host functions.
Fixes: SWDEV-507350
…21611)" This reverts commit a6b7181. Breaks Clang :: CodeGenHLSL/builtins/length.hlsl, see llvm#121611 (comment)
…vm#122029) Move the common case of FieldDecl::getFieldIndex() inline to mitigate the cost of removing the extra `FieldNo` induction variable. Also rename isNoUniqueAddress parameter to isNonVirtualBaseType, which appears to be more accurate. I think the current name is just a consequence of autocomplete gone wrong.
Need to sync the mask between cost and actual emission to avoid bugs in mask calculation Fixes llvm#122324
I’m seeing a series of errors when trying to run the cmake configure step on macOS when the cmake generator is set to Xcode. All is well if I use the Ninja or Unix Makefile generators. Messages are all of the form: ~~~ CMake Error at …llvm-project/clang/cmake/modules/AddClang.cmake:120 (target_compile_definitions): Cannot specify compile definitions for target "obj.clangBasic" which is not built by this project. Call Stack (most recent call first): …llvm-project/clang/lib/Basic/CMakeLists.txt:57 (add_clang_library) ~~~ The remaining errors are similar but mention targets obj.clangAPINotes, obj.clangLex, obj.clangParse, and so on. The regression appears to have been introduced by commit 09fa2f0 (Oct 14 2024) which added the code in this area. My proposed solution is simply to add a test to ensure that the obj.x target exists before setting its compile definitions. There is precedent doing just this in both clang/cmake/modules/AddClang.cmake and clang/lib/support/CMakeLists.txt as well as in the “MSVC AND NOT CLANG_LINK_CLANG_DYLIB” path immediately above the offending line. I’ve also made a couple of grammatical tweaks in the comments surrounding this code. In case it's relevant, the cmake settings and definitions I've used to trigger these errors is: ~~~bash GENERATOR="Xcode" OUTDIR=build_macos cmake \ -S "$SCRIPT_DIR/llvm" \ -B "$SCRIPT_DIR/$OUTDIR" \ -G "$GENERATOR" \ -D CMAKE_BUILD_TYPE=Release \ -D CMAKE_OSX_ARCHITECTURES=arm64 \ -D LLVM_PARALLEL_LINK_JOBS=1 \ -D LLVM_ENABLE_PROJECTS="clang;lld" \ -D LLVM_TARGETS_TO_BUILD=RISCV \ -D LLVM_DEFAULT_TARGET_TRIPLE=riscv32-unknown-elf \ -D LLVM_OPTIMIZED_TABLEGEN=Yes ~~~ (cmake v3.31.1, Xcode 16.1. I know that not all of these variables are useful for the Xcode generator!) Co-authored-by: Paul Bowen-Huggett <[email protected]>
…llvm#122332) The SEW operand for these instructions should have a value of 0. This matches what was done for vcpop/vfirst.
…2286) Don't suggest to comment-out the parameter name if the parameter has an attribute that's spelled after the parameter name. This prevents the parameter's attributes from being wrongly applied to the parameter's type. This fixes llvm#122191.
…lvm#122190) The GPU ID operations already implement InferIntRangeInterface, which gives constant lower and upper bounds on those IDs when appropriate metadata is prentent on the operations or in the surrounding context. This commit uses that existing code to implement the ValueBoundsOpInterface, which is used when analyzing affine operations (unlike the integer range interface, which is used for arithmetic optimization). It also implements the interface for gpu.launch, where we can use it to express the constraint that block/grid sizes are equal to their value from outside the launch op and that the corresponding IDs are bounded above by that size. As a consequence, the test pass for this inference is updated to work on a FunctionOpInterface and not a func.func, creating minor churn in other tests.
The test runs asynchronous kernels and depending on the timing the output is slightly different. We now only check for the common parts of the output.
Summary: Previously we had some indirection here, this patch updates these utilities to just be normal template functions. We use SFINAE to manage the special case handling for floats. Also this strips address spaces so it can be used more generally.
Summary: Use a normal bitcast, remove from the shared utils since it's not available in GCC 7.4
…`` when operand is integer literal for readability-use-std-min-max (llvm#122296) When comparing with integer literal, integer promote will happen to promote type which has less bit width than int to int or unsigned int. It will let auto-fix provide correct but out of expected fix. e.g. ```c++ short a; if ( a > 10 ) a = 10; ``` will be ```c++ short a; if ( (int)a > 10 ) a = (short)10; ``` which will be fixed as ```c++ short a; a = std::max<int>(a, 10); ``` but actually it can be ```c++ short a; a = std::max<short>(a, 10); ``` Fixed: llvm#121676
Mold prefers the suffix '$' for symbols like PLT and GOT entries, so exclude these symbols as well. Otherwise, this test will fail for developers using mold-linked Clang. Closes llvm#76982
Skip function declarations for instrumentation. Fixes llvm#122467
Add Matcher `m_Undef` Fixes: llvm#122439
…lvm#122507) Internal testing shows improvements in some SPEC HPC benchmarks with this change.
… KnownBits Under certain circumstances, lowering of other instructions can result in computeKnownBits being able to detect a constant that it couldn't previously. Fixes llvm#122580
The inlining code for llvm funcs seems to have needlessly forbidden inlining of private (e.g. non-cloning) symbols.
Co-authored-by: Oleksandr "Alex" Zinenko <[email protected]>
Fixes the test introduced in llvm#111145. It would also make sense to throw an error when the user attempts to use a move-from-sr on an unsupported architecture. Currently the encoder generates garbage instructions for a 68000 because the AsmMatcher is able to match the move against a MOV16rr
…e paths contain `..` (llvm#121323) `makeAbsolute` will not normalize path. When getting parent folder, `..` will go into the subfolder instead of the parent folder.
…vm#121350) If we have a CSEL instruction that depends on the flags set by a (SUBS x c) instruction and the true and/or false expression is (add (add x y) -c), we can reassociate the latter expression to (add (SUBS x c) y) and save one instruction. Proof for the basic transformation: https://alive2.llvm.org/ce/z/-337Pb We can extend this transformation for slightly different constants. For example, if we have (add (add x y) -(c-1)) and a the comparison x <u c, we can transform the comparison to x <=u c-1 to eliminate the comparison instruction, too. Similarly, we can transform (x == 0) to (x <u 1). Proofs for the transformations that alter the constants: https://alive2.llvm.org/ce/z/3nVqgR Fixes llvm#119606.
With range and undef metadata on a call we can have vector AssertZExt generated on a target with no vector operations. The AssertZExt needs to scalarize to a normal `AssertZext tin, ValueType`. I have added AssertSext too, although I do not have a test case. Fixes llvm#110374
We want special handing for IGLP instructions in the scheduler but they should still be treated like they have side effects by other passes. Add a target hook to the ScheduleDAGInstrs DAG builder so that we have more control over this.
Providing the character that we failed on is helpful for figuring out what's going wrong in the tzdb.
The body of the loop only applies to wide induction recipes, skip any other header phi recipes up-frond
This patch fixes: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:255:18: error: private field 'DAG' is not used [-Werror,-Wunused-private-field]
…m#122552) - **[InstSimpify] Add tests for simplifying `(xor (sub C_Mask, X), C_Mask)`; NFC** - **[InstSimpify] Simplifying `(xor (sub C_Mask, X), C_Mask)` -> `X`** Helps address regressions with folding `clz(Pow2)`. Proof: https://alive2.llvm.org/ce/z/zGwUBp
…ng CR for `ct{t,l}z` (llvm#122548)
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
Fixed the mismatched VNNI intrinsics argument types to align with the ISA.
VNNI intrinsics affected are:
VPDPBUSD[,S]_128/256/512, VPDPWSSD[,S]_128/256/512,
VPDPB[SS,SU,UU]D[,S]_128/256, VPDPW[SU,US,UU]D[,S]_128/256.