[AMDGPU]: Add support to unpack V_PK_MOV_B32. #163463

akadutta · 2025-10-14T22:10:31Z

This is an extension of #157968 . Adds support for V_PK_MOV_B32. Removes a few redundant checks, and reduces MF scans for archs that do not support this optimization.

…FrameRecord (llvm#156452) Previously, we would pop `FixedObject`-bytes after deallocating the SVE area, then again as part of the "AfterCSRPopSize". This could be seen in the tests `@f6` and `@f9`. This patch removes the erroneous pop, and refactors `CalleeSavesAboveFrameRecord` to reuse more of the existing GPR deallocation logic, which allows for post-decrements.

So far the translator only inlined expressions having no side effects, as rescheduling their evaluation doesn't break semantics. This patch adds inlining of expressions containing side effects if defined just before their use, e.g., ```mlir %c = emitc.expression %a, %b : (i32, !emitc.ptr<i32>) -> i32 { %e = emitc.sub %a, %b : (!emitc.ptr<i32>, i32) -> !emitc.ptr<i32> %d = emitc.apply "*"(%e) : (!emitc.ptr<i32>) -> i32 emitc.yield %d : i32 } emitc.return %c : i32 ``` This restriction is meant to keep the translator as simple as possible, leaving it to transformations to analyze and reorder ops as needed in more complicated cases. The patch handles inlining into `emitc.return`, `emitc.if`, `emitc.switch` and (to some extent) `emitc.assign`.

…162804) Converting a vector float op into a vector int op may be non-profitable, especially for targets where the float op for a given type is legal, but the integer op is not. We could of course also try to address this via a reverse transform in the backend, but I don't think it's worth the bother, given that vectors were never the intended use case for this transform in the first place. Fixes llvm#162749.

…llvm#163060) Allow LLVMGetVolatile() to work with any kind of Instruction, rather than only memory instructions that accept a volatile flag. For instructions that can never be volatile, the function now return false instead of asserting. This matches the behavior of `Instruction::isVolatile()` in the C++ API.

Add a `-print-debug-counter-queries` option which prints the current value of the counter and whether it is executed/skipped each time it is queried. This is useful when interleaving the output with the usual transform debug output, in order to find the correct counter value to use to hit a specific point in the transform.

@example

…k) (llvm#162666) Similarly to llvm#152960, this PR fixes `getTiledOuterDims` for `linalg.pack` by ensuring that the `outer_dims_perm` attributeis properly taken into account. This enables the main change in this PR: relaxing the constraints in * `DecomposeOuterUnitDimsPackOpPattern`. Specifically, the pattern is extended to allow non-unit untiled outer dimensions. For example: ```mlir func.func @example( %src: tensor<2x32x16x8xf32>, %dest: tensor<2x1x16x8x32xf32>) -> tensor<2x1x16x8x32xf32> { %pack = linalg.pack %src inner_dims_pos = [1] inner_tiles = [32] into %dest : tensor<2x32x16x8xf32> -> tensor<2x1x16x8x32xf32> return %pack : tensor<2x1x16x8x32xf32> } ``` decomposes as: ```mlir func.func @example( %src: tensor<2x32x16x8xf32>, %dest: tensor<2x1x16x8x32xf32>) -> tensor<2x1x16x8x32xf32> { %0 = tensor.empty() : tensor<2x16x8x32xf32> %transposed = linalg.transpose ins(%src : tensor<2x32x16x8xf32>) outs(%init : tensor<2x16x8x32xf32>) permutation = [0, 2, 3, 1] %inserted_slice = tensor.insert_slice %transposed into %dest[0, 0, 0, 0, 0] [2, 1, 16, 8, 32] [1, 1, 1, 1, 1] : tensor<2x16x8x32xf32> into tensor<2x1x16x8x32xf32> return %inserted_slice : tensor<2x1x16x8x32xf32> } ``` Importantly, this change makes `DecomposeOuterUnitDimsPackOpPattern` (the decomposition pattern for `linalg.pack`) consistent with the corresponding pattern for `linalg.unpack`: * `DecomposeOuterUnitDimsUnPackOpPattern`. One notable assumption remains: untiled outer dimensions are not permuted. This was already the case but is now explicitly documented. Co-authored by: Max Bartel <[email protected]>

…faces (llvm#162840) The current benchmarks test a very specific case, which makes them rather misleading. This adds new benchmarks so we have better coverage.

…9541) Cygwin builds are currently broken after llvm#157312, which effectively reverted llvm#138117. The root cause is that Cygwin defines `DL_info::dli_fname` as `char[N]`, which is not a valid parameter type for `llvm::format`. This patch allows `llvm::format` to accept `char[N]` by decaying it to `const char *`. As a result, string literals are also accepted without an explicit cast. Other array types remain rejected: - Wide/unicode character arrays (e.g., `wchar_t[N]`) are not supported, as LLVM does not use them and they are less compatible with platform's `printf` implementations. - Non-character arrays (e.g., `int[N]`) are also rejected, since passing such arrays to `printf` is meaningless.

For ptrtoint(inttoptr) and ptrtoaddr(inttoptr), handle the case where the source and destination size do not match and convert to either zext or trunc. We can't do this if the middle size is smaller than both src/dest, because we'd have to perform an additional masking operation in that case. Most of these cases are handled by dint of ptrtoint/inttoptr size canonicalization (so I added some unit tests instead). However, the ptrtoaddr(inttoptr) case where the pointer size and address size differ is relevant, as in that case the mismatch in integer sizes is canonical.

7381558 renamed the FinalizeRequest type to InitializeRequest. This commit updates InitializeRequest variable names to follow suit ("FR"s become "IR"s).

Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions. If we have FP16, then lower v8f16 fmuladds to FMA. I've introduced an ISD node for fmuladd to maintain the rounding ambiguity through legalization / combine / isel.

…56250) Before this PR, the native PDB plugin would create the following LLDB `Type` for `using SomeTypedef = long`: ``` Type{0x00002e03} , name = "SomeTypedef", size = 4, compiler_type = 0x000002becd8d8620 long ``` with this PR, the following is created: ``` Type{0x00002e03} , name = "SomeTypedef", size = 4, compiler_type = 0x0000024d6a7e3c90 typedef SomeTypedef ``` This matches the behavior of the DIA PDB plugin and works towards making [`Shell/SymbolFile/PDB/typedefs.test`](https://github.com/llvm/llvm-project/blob/main/lldb/test/Shell/SymbolFile/PDB/typedefs.test) pass with the native plugin. I added a similar test to the `NativePDB` shell tests to capture the current state, which doesn't quite match that of DIA yet. I'll add some comments on what's missing on this PR, because I'm not fully sure what the correct output would be.

…add SSE/AVX VPTEST/VTESTPD/VTESTPS intrinsics to be used in constexpr (llvm#160428) Fix llvm#158653 Add handling for: ``` ptestz128 / ptestz256 → (a & b) == 0. ptestc128 / ptestc256 → (~a & b) == 0 ptestnzc128 / ptestnzc256 → (a & b) != 0 AND (~a & b) != 0. vtestzps / vtestzps256 → (S(a) & S(b)) == 0. vtestcps / vtestcps256 → (~S(a) & S(b)) == 0. vtestnzcps / vtestnzcps256 → (S(a) & S(b)) != 0 AND (~S(a) & S(b)) != 0. vtestzpd / vtestzpd256 → (S(a) & S(b)) == 0. vtestcpd / vtestcpd256 → (~S(a) & S(b)) == 0. vtestnzcpd / vtestnzcpd256 → (S(a) & S(b)) != 0 AND (~S(a) & S(b)) != 0. ``` Add corresponding test cases for: ``` int _mm_test_all_ones (__m128i a) int _mm_test_all_zeros (__m128i mask, __m128i a) int _mm_test_mix_ones_zeros (__m128i mask, __m128i a) int _mm_testc_pd (__m128d a, __m128d b) int _mm256_testc_pd (__m256d a, __m256d b) int _mm_testc_ps (__m128 a, __m128 b) int _mm256_testc_ps (__m256 a, __m256 b) int _mm_testc_si128 (__m128i a, __m128i b) int _mm256_testc_si256 (__m256i a, __m256i b) int _mm_testnzc_pd (__m128d a, __m128d b) int _mm256_testnzc_pd (__m256d a, __m256d b) int _mm_testnzc_ps (__m128 a, __m128 b) int _mm256_testnzc_ps (__m256 a, __m256 b) int _mm_testnzc_si128 (__m128i a, __m128i b) int _mm256_testnzc_si256 (__m256i a, __m256i b) int _mm_testz_pd (__m128d a, __m128d b) int _mm256_testz_pd (__m256d a, __m256d b) int _mm_testz_ps (__m128 a, __m128 b) int _mm256_testz_ps (__m256 a, __m256 b) int _mm_testz_si128 (__m128i a, __m128i b) int _mm256_testz_si256 (__m256i a, __m256i b) ```

The compiler is missing cases where it checks mips64r6 but not i6400/i6500 causing wrong defines to be generated

Add a test with a non-uniform load of an argument (SCEVUnknown), showing that SCEVUnknown cannot always be considered uniform.

…lvm#162267) We can perform CSE on recipes that do not directly map to Instruction opcodes. One example is VPVectorPointerRecipe. Currently this is handled by supporting them in ::canHandle, but currently that means that we return std::nullopt from getOpcodeOrIntrinsicID() for it. This currently only works, because the only case we return std::nullopt and perform CSE is VPVectorPointerRecipe. But that does not work if we support more such recipes, like VPPredInstPHIRecipe (llvm#162110). To fix this, return a custom opcode from getOpcodeOrIntrinsicID for recipes like VPVectorPointerRecipe, using the VPDefID after all regular instruction opcodes. PR: llvm#162267

Additional test coverage for llvm#160500.

…vm#160330) Previously when using min_fetch/max_fetch atomics with floating point types, LLVM would emit a crash. This patch updates the EmitPostAtomicMinMax function in CGAtomic.cpp to take floating point types. Included is a clang CodeGen test atomic-ops-float-check-minmax.c and Sema test atomic-ops-fp-minmax.c.

…ument (llvm#161883) Changes to linalg `structured.fuse` transform op: * Adds an optional `use_forall` boolean argument which generates a tiled `scf.forall` loop instead of `scf.for` loops. * `tile_sizes` can now be any parameter or handle. * `tile_interchange` can now be any parameter or handle. * IR formatting changes from `transform.structured.fuse %0 [4, 8] ...` to `transform.structured.fuse %0 tile_sizes [4, 8] ...` - boolean arguments are now `UnitAttrs` and should be set via the op attr-dict: `{apply_cleanup, use_forall}`

This patch ports the ISD::SUB handling from SelectionDAG’s ComputeNumSignBits to GlobalISel. Related to llvm#150515. --------- Co-authored-by: Matt Arsenault <[email protected]> Co-authored-by: Simon Pilgrim <[email protected]>

As discussed in: * https://discourse.llvm.org/t/rfc-linalg-forms/87994/ * https://discourse.llvm.org/t/rfc-extend-linalg-elemwise-named-ops-semantics/83927 * https://discourse.llvm.org/t/rfc-op-explosion-in-linalg/82863/1 * https://discourse.llvm.org/t/rfc-mlir-linalg-operation-tree/83586 * llvm#148424 Co-designed by Javed Absar --------- Co-authored-by: Andrzej Warzyński <[email protected]>

Reverts llvm#161355 Looks like I've broken some intrinsic code generation.

The purpose of this commit is to observe the effects of PR llvm#154069.

Only `ninf` should be used.

…lvm#163129) In the previous PR llvm#163123 I made a mistake that unexpectedly moved the "other functionality" section from the "[Providing Python bindings for a dialect](https://mlir.llvm.org/docs/Bindings/Python/#providing-python-bindings-for-a-dialect)" section to the newly-added section ([Extending MLIR in Python](https://mlir.llvm.org/docs/Bindings/Python/#extending-mlir-in-python)). This PR is to fix it.

akadutta · 2025-10-14T22:11:58Z

Closing this PR. Missed a few things.

d0k and others added 30 commits October 14, 2025 09:49

[bazel] Add missing dependencies for 0820266

64cfb50

[bazel] Add missing dependency for ab71b77

85fbb75

[libc++] Add more benchmarks for the associative container hint inter…

c732370

…faces (llvm#162840) The current benchmarks test a very specific case, which makes them rather misleading. This adds new benchmarks so we have better coverage.

[orc-rt] Rename InitializeRequest variables after 7381558. NFCI.

b13c960

7381558 renamed the FinalizeRequest type to InitializeRequest. This commit updates InitializeRequest variable names to follow suit ("FR"s become "IR"s).

[Mips] Fix mcpu flag with i6400/i6500 (llvm#161330)

0b09b99

The compiler is missing cases where it checks mips64r6 but not i6400/i6500 causing wrong defines to be generated

[LV] Increase coverage of uniformity-rewriter (llvm#161219)

d6824c5

Add a test with a non-uniform load of an argument (SCEVUnknown), showing that SCEVUnknown cannot always be considered uniform.

[VPlan] Strip VPDT's default constructor (NFC) (llvm#162692)

1e78348

[SCEV] Add test with ptrtoint guards and their order swapped.

4f2a2c8

Additional test coverage for llvm#160500.

Revert "[WebAssembly] Lower fmuladd to madd and nmadd" (llvm#163171)

d321f58

Reverts llvm#161355 Looks like I've broken some intrinsic code generation.

[NFC][AMDGPU] Precommit test amdgcn-ieee.ll (llvm#163118)

bf1efaa

The purpose of this commit is to observe the effects of PR llvm#154069.

[ARM] carry.ll - regenerate test checks (llvm#163172)

5205f57

[Thumb2] carry.ll - regenerate test checks (llvm#163173)

75bd91b

[AMDGPU] Remove NoInfsFPMath uses (llvm#163028)

8301276

Only `ninf` should be used.

akadutta closed this Oct 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU]: Add support to unpack V_PK_MOV_B32. #163463

[AMDGPU]: Add support to unpack V_PK_MOV_B32. #163463

Uh oh!

akadutta commented Oct 14, 2025

Uh oh!

akadutta commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

[AMDGPU]: Add support to unpack V_PK_MOV_B32. #163463

[AMDGPU]: Add support to unpack V_PK_MOV_B32. #163463

Uh oh!

Conversation

akadutta commented Oct 14, 2025

Uh oh!

akadutta commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants