Skip to content

Conversation

@BaiXilin
Copy link
Contributor

@BaiXilin BaiXilin commented Jan 9, 2025

Fixed the mismatched VNNI intrinsics argument types to align with the ISA.

VNNI intrinsics affected are:
VPDPBUSD[,S]_128/256/512, VPDPWSSD[,S]_128/256/512,
VPDPB[SS,SU,UU]D[,S]_128/256, VPDPW[SU,US,UU]D[,S]_128/256.

@github-actions
Copy link

github-actions bot commented Jan 9, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

fhahn and others added 29 commits January 11, 2025 19:14
…lvm#120916)

Also use getPointerAlignment when trying to use alignment and
dereferenceable assumptions. This catches cases where dereferencable is
known via the assumption but alignment is known via getPointerAlignment
(e.g. via argument attribute or align of 1)

PR: llvm#120916
llvm#122151 added this test with an
invalid SEW. Use a valid SEW here.
OpenACC data clause operations previously required that the variable
operand implemented PointerLikeType interface. This was a reasonable
constraint because the dialects currently mixed with `acc` do use
pointers to represent variables. However, this forces the "pointer"
abstraction to be exposed too early and some cases are not cleanly
representable through this approach (more specifically FIR's `fix.box`
abstraction).

Thus, relax this by allowing a variable to be a type which implements
either `PointerLikeType` interface or `MappableType` interface.
…uilds (llvm#120914)

The changes in llvm#87822 introduced a regression where Flang could no
longer be built standalone without explicitly specifying all of
LLVM_DIR, CLANG_DIR and MLIR_DIR. Restore the earlier logic that used
these paths as hints, and supported finding system-wide LLVM install via
default paths. Instead, make paths absolute after locating the packages,
using the paths CMake determined.

-----

@vzakhari, could you confirm that this doesn't break your use case?
…lvm#122316)

This is a NFC. Duplicate mc test file for gfx12 vop3c/vop3cx to
true16/fake16 mode and update it with +real-true16/-real-true16 flag.

This is for the upcoming true16 changes
The system call `__CELQTBCK()` is used to build a backtrace like
on other systems. The collected information are the address of the PC,
the address of the entry point (EP), the difference between both
addresses (+EP), the dynamic storage area (DSA aka the stack
pointer), and the function name.
The system call is described here:

https://www.ibm.com/docs/en/zos/3.1.0?topic=cwicsa6a-celqtbck-also-known-as-celqtbck-64-bit-traceback-service
…late offsets in bytes (llvm#121989)

There will be more changes coming in to `SemaHLSL::ActOnFinishBuffer` so
it would be good to move the packoffset validation out to a separate
function. This change also unifies the units for cbuffer offset
calculations to bytes.
In C++20 constexpr virtual function is allowed. In C++17 although
non-pure virtual function is not allowed to be constexpr, pure virtual
function is allowed to be constexpr and is allowed to be overriden by
non-constexpr virtual function in the derived class.

The following code compiles as C++:

```
class A
{
public:
    constexpr virtual int f() = 0;
};

class B : public A
{
public:
    int f() override
    {
        return 42;
    }
};
```

However, it fails to compile as CUDA or HIP code. The reason: A::f() is
implicitly host device function whereas B::f() is a host function. Since
they have different targets, clang does not treat B::f() as an override
of A::f(). Instead, it treats B::f() as a name-hiding non-virtual
function for A::f(), and diagnoses it.

This causes any CUDA/HIP program using C++ standard header file
`<format>` from g++-13 to fail to compile since such usage patten show
up there:

```
/usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/format:3564:34: error: non-virtual member function marked 'override' hides virtual member function
 3564 |       _M_format_arg(size_t __id) override
      |                                  ^
/usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/format:3538:30: note: hidden overloaded virtual function 'std::__format::_Scanner<char>::_M_format_arg' declared here
 3538 |       constexpr virtual void _M_format_arg(size_t __id) = 0;
      |                              ^
```

This is a serious issue and there is no workaround.

This patch allows non-constexpr function to override constexpr virtual
function for CUDA and HIP. This should be OK since non-constexpr
function without explicit host or device attribute can only be called in
host functions.

Fixes: SWDEV-507350
…21611)"

This reverts commit a6b7181.
Breaks Clang :: CodeGenHLSL/builtins/length.hlsl, see
llvm#121611 (comment)
…vm#122029)

Move the common case of FieldDecl::getFieldIndex() inline to mitigate
the cost of removing the extra `FieldNo` induction variable.

Also rename isNoUniqueAddress parameter to isNonVirtualBaseType, which
appears to be more accurate. I think the current name is just a
consequence of autocomplete gone wrong.
Need to sync the mask between cost and actual emission to avoid bugs in
mask calculation

Fixes llvm#122324
I’m seeing a series of errors when trying to run the cmake configure
step on macOS when the cmake generator is set to Xcode. All is well if I
use the Ninja or Unix Makefile generators. Messages are all of the form:
~~~
CMake Error at …llvm-project/clang/cmake/modules/AddClang.cmake:120
(target_compile_definitions):
  Cannot specify compile definitions for target "obj.clangBasic" which
  is not built by this project.
Call Stack (most recent call first):
  …llvm-project/clang/lib/Basic/CMakeLists.txt:57 (add_clang_library)
~~~
The remaining errors are similar but mention targets obj.clangAPINotes,
obj.clangLex, obj.clangParse, and so on.

The regression appears to have been introduced by commit 09fa2f0
(Oct 14 2024) which added the code in this area.

My proposed solution is simply to add a test to ensure that the obj.x
target exists before setting its compile definitions. There is precedent
doing just this in both clang/cmake/modules/AddClang.cmake and
clang/lib/support/CMakeLists.txt as well as in the “MSVC AND NOT
CLANG_LINK_CLANG_DYLIB” path immediately above the offending line.

I’ve also made a couple of grammatical tweaks in the comments
surrounding this code.

In case it's relevant, the cmake settings and definitions I've used to
trigger these errors is:
~~~bash
GENERATOR="Xcode"
OUTDIR=build_macos
cmake \
-S "$SCRIPT_DIR/llvm" \
-B "$SCRIPT_DIR/$OUTDIR" \
-G "$GENERATOR" \
-D CMAKE_BUILD_TYPE=Release \
-D CMAKE_OSX_ARCHITECTURES=arm64 \
-D LLVM_PARALLEL_LINK_JOBS=1 \
-D LLVM_ENABLE_PROJECTS="clang;lld" \
-D LLVM_TARGETS_TO_BUILD=RISCV \
-D LLVM_DEFAULT_TARGET_TRIPLE=riscv32-unknown-elf \
-D LLVM_OPTIMIZED_TABLEGEN=Yes
~~~
(cmake v3.31.1, Xcode 16.1. I know that not all of these variables are
useful for the Xcode generator!)

Co-authored-by: Paul Bowen-Huggett <[email protected]>
…llvm#122332)

The SEW operand for these instructions should have a value of 0. This
matches what was done for vcpop/vfirst.
…2286)

Don't suggest to comment-out the parameter name if the parameter has an
attribute that's spelled after the parameter name.

This prevents the parameter's attributes from being wrongly applied to
the parameter's type.

This fixes llvm#122191.
…lvm#122190)

The GPU ID operations already implement InferIntRangeInterface, which
gives constant lower and upper bounds on those IDs when appropriate
metadata is prentent on the operations or in the surrounding context.

This commit uses that existing code to implement the
ValueBoundsOpInterface, which is used when analyzing affine operations
(unlike the integer range interface, which is used for arithmetic
optimization).

It also implements the interface for gpu.launch, where we can use it to
express the constraint that block/grid sizes are equal to their value
from outside the launch op and that the corresponding IDs are bounded
above by that size.

As a consequence, the test pass for this inference is updated to work on
a FunctionOpInterface and not a func.func, creating minor churn in other
tests.
)

With this patch we switch from the temporary dummy seeds to actual seeds
provided by the seed collector.
The seeds get sliced and each slice is used as the starting point for
vectorization.
The test runs asynchronous kernels and depending on the timing the
output is slightly different. We now only check for the common parts of
the output.
Summary:
Previously we had some indirection here, this patch updates these
utilities to just be normal template functions. We use SFINAE to manage
the special case handling for floats. Also this strips address spaces so
it can be used more generally.
Summary:
Use a normal bitcast, remove from the shared utils since it's not
available in
GCC 7.4
HerrCai0907 and others added 29 commits January 11, 2025 19:14
…`` when operand is integer literal for readability-use-std-min-max (llvm#122296)

When comparing with integer literal, integer promote will happen to
promote type which has less bit width than int to int or unsigned int.
It will let auto-fix provide correct but out of expected fix.

e.g.
```c++
short a;
if ( a > 10 )
  a = 10;
```
will be
```c++
short a;
if ( (int)a > 10 )
  a = (short)10;
```

which will be fixed as
```c++
short a;
a = std::max<int>(a, 10);
```

but actually it can be
```c++
short a;
a = std::max<short>(a, 10);
```

Fixed: llvm#121676
Mold prefers the suffix '$' for symbols like PLT and GOT entries, so
exclude these symbols as well. Otherwise, this test will fail for
developers using mold-linked Clang.

Closes llvm#76982
Skip function declarations for instrumentation.

Fixes llvm#122467
…lvm#122507)

Internal testing shows improvements in some SPEC HPC benchmarks with
this change.
… KnownBits

Under certain circumstances, lowering of other instructions can result in computeKnownBits being able to detect a constant that it couldn't previously.

Fixes llvm#122580
The inlining code for llvm funcs seems to have needlessly forbidden
inlining of private (e.g. non-cloning) symbols.
Fixes the test introduced in llvm#111145.

It would also make sense to throw an error when the user attempts to use
a move-from-sr on an unsupported architecture. Currently the encoder
generates garbage instructions for a 68000 because the AsmMatcher is
able to match the move against a MOV16rr
…e paths contain `..` (llvm#121323)

`makeAbsolute` will not normalize path. When getting parent folder, `..`
will go into the subfolder instead of the parent folder.
…vm#121350)

If we have a CSEL instruction that depends on the flags set by a
(SUBS x c) instruction and the true and/or false expression is
(add (add x y) -c), we can reassociate the latter expression to
(add (SUBS x c) y) and save one instruction.

Proof for the basic transformation: https://alive2.llvm.org/ce/z/-337Pb

We can extend this transformation for slightly different constants. For
example, if we have (add (add x y) -(c-1)) and a the comparison x <u c,
we can transform the comparison to x <=u c-1 to eliminate the comparison
instruction, too. Similarly, we can transform (x == 0) to (x <u 1).

Proofs for the transformations that alter the constants:
https://alive2.llvm.org/ce/z/3nVqgR

Fixes llvm#119606.
With range and undef metadata on a call we can have vector AssertZExt
generated on a target with no vector operations. The AssertZExt needs to
scalarize to a normal `AssertZext tin, ValueType`. I have added
AssertSext too, although I do not have a test case.

Fixes llvm#110374
)

This adds a test line and updates a comment.
We want special handing for IGLP instructions in the scheduler but they
should still be treated like they have side effects by other passes. Add
a target hook to the ScheduleDAGInstrs DAG builder so that we have more
control over this.
Providing the character that we failed on is helpful for figuring out
what's going wrong in the tzdb.
The body of the loop only applies to wide induction recipes, skip any other
header phi recipes up-frond
This patch fixes:

  llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:255:18: error: private
  field 'DAG' is not used [-Werror,-Wunused-private-field]
…m#122552)

- **[InstSimpify] Add tests for simplifying `(xor (sub C_Mask, X),
C_Mask)`; NFC**
- **[InstSimpify] Simplifying `(xor (sub C_Mask, X), C_Mask)` -> `X`**

Helps address regressions with folding `clz(Pow2)`.

Proof: https://alive2.llvm.org/ce/z/zGwUBp
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
@BaiXilin BaiXilin closed this Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.