-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[AMDGPU]: Add support to unpack V_PK_MOV_B32. #163463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
akadutta
wants to merge
706
commits into
llvm:main
from
akadutta:users/akadutta/amdgpu/unpack_v_pk_mov_b32
Closed
[AMDGPU]: Add support to unpack V_PK_MOV_B32. #163463
akadutta
wants to merge
706
commits into
llvm:main
from
akadutta:users/akadutta/amdgpu/unpack_v_pk_mov_b32
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…FrameRecord (llvm#156452) Previously, we would pop `FixedObject`-bytes after deallocating the SVE area, then again as part of the "AfterCSRPopSize". This could be seen in the tests `@f6` and `@f9`. This patch removes the erroneous pop, and refactors `CalleeSavesAboveFrameRecord` to reuse more of the existing GPR deallocation logic, which allows for post-decrements.
So far the translator only inlined expressions having no side effects, as rescheduling their evaluation doesn't break semantics. This patch adds inlining of expressions containing side effects if defined just before their use, e.g., ```mlir %c = emitc.expression %a, %b : (i32, !emitc.ptr<i32>) -> i32 { %e = emitc.sub %a, %b : (!emitc.ptr<i32>, i32) -> !emitc.ptr<i32> %d = emitc.apply "*"(%e) : (!emitc.ptr<i32>) -> i32 emitc.yield %d : i32 } emitc.return %c : i32 ``` This restriction is meant to keep the translator as simple as possible, leaving it to transformations to analyze and reorder ops as needed in more complicated cases. The patch handles inlining into `emitc.return`, `emitc.if`, `emitc.switch` and (to some extent) `emitc.assign`.
…162804) Converting a vector float op into a vector int op may be non-profitable, especially for targets where the float op for a given type is legal, but the integer op is not. We could of course also try to address this via a reverse transform in the backend, but I don't think it's worth the bother, given that vectors were never the intended use case for this transform in the first place. Fixes llvm#162749.
…llvm#163060) Allow LLVMGetVolatile() to work with any kind of Instruction, rather than only memory instructions that accept a volatile flag. For instructions that can never be volatile, the function now return false instead of asserting. This matches the behavior of `Instruction::isVolatile()` in the C++ API.
Add a `-print-debug-counter-queries` option which prints the current value of the counter and whether it is executed/skipped each time it is queried. This is useful when interleaving the output with the usual transform debug output, in order to find the correct counter value to use to hit a specific point in the transform.
…k) (llvm#162666) Similarly to llvm#152960, this PR fixes `getTiledOuterDims` for `linalg.pack` by ensuring that the `outer_dims_perm` attributeis properly taken into account. This enables the main change in this PR: relaxing the constraints in * `DecomposeOuterUnitDimsPackOpPattern`. Specifically, the pattern is extended to allow non-unit untiled outer dimensions. For example: ```mlir func.func @example( %src: tensor<2x32x16x8xf32>, %dest: tensor<2x1x16x8x32xf32>) -> tensor<2x1x16x8x32xf32> { %pack = linalg.pack %src inner_dims_pos = [1] inner_tiles = [32] into %dest : tensor<2x32x16x8xf32> -> tensor<2x1x16x8x32xf32> return %pack : tensor<2x1x16x8x32xf32> } ``` decomposes as: ```mlir func.func @example( %src: tensor<2x32x16x8xf32>, %dest: tensor<2x1x16x8x32xf32>) -> tensor<2x1x16x8x32xf32> { %0 = tensor.empty() : tensor<2x16x8x32xf32> %transposed = linalg.transpose ins(%src : tensor<2x32x16x8xf32>) outs(%init : tensor<2x16x8x32xf32>) permutation = [0, 2, 3, 1] %inserted_slice = tensor.insert_slice %transposed into %dest[0, 0, 0, 0, 0] [2, 1, 16, 8, 32] [1, 1, 1, 1, 1] : tensor<2x16x8x32xf32> into tensor<2x1x16x8x32xf32> return %inserted_slice : tensor<2x1x16x8x32xf32> } ``` Importantly, this change makes `DecomposeOuterUnitDimsPackOpPattern` (the decomposition pattern for `linalg.pack`) consistent with the corresponding pattern for `linalg.unpack`: * `DecomposeOuterUnitDimsUnPackOpPattern`. One notable assumption remains: untiled outer dimensions are not permuted. This was already the case but is now explicitly documented. Co-authored by: Max Bartel <[email protected]>
…faces (llvm#162840) The current benchmarks test a very specific case, which makes them rather misleading. This adds new benchmarks so we have better coverage.
…9541) Cygwin builds are currently broken after llvm#157312, which effectively reverted llvm#138117. The root cause is that Cygwin defines `DL_info::dli_fname` as `char[N]`, which is not a valid parameter type for `llvm::format`. This patch allows `llvm::format` to accept `char[N]` by decaying it to `const char *`. As a result, string literals are also accepted without an explicit cast. Other array types remain rejected: - Wide/unicode character arrays (e.g., `wchar_t[N]`) are not supported, as LLVM does not use them and they are less compatible with platform's `printf` implementations. - Non-character arrays (e.g., `int[N]`) are also rejected, since passing such arrays to `printf` is meaningless.
For ptrtoint(inttoptr) and ptrtoaddr(inttoptr), handle the case where the source and destination size do not match and convert to either zext or trunc. We can't do this if the middle size is smaller than both src/dest, because we'd have to perform an additional masking operation in that case. Most of these cases are handled by dint of ptrtoint/inttoptr size canonicalization (so I added some unit tests instead). However, the ptrtoaddr(inttoptr) case where the pointer size and address size differ is relevant, as in that case the mismatch in integer sizes is canonical.
7381558 renamed the FinalizeRequest type to InitializeRequest. This commit updates InitializeRequest variable names to follow suit ("FR"s become "IR"s).
Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions. If we have FP16, then lower v8f16 fmuladds to FMA. I've introduced an ISD node for fmuladd to maintain the rounding ambiguity through legalization / combine / isel.
…56250) Before this PR, the native PDB plugin would create the following LLDB `Type` for `using SomeTypedef = long`: ``` Type{0x00002e03} , name = "SomeTypedef", size = 4, compiler_type = 0x000002becd8d8620 long ``` with this PR, the following is created: ``` Type{0x00002e03} , name = "SomeTypedef", size = 4, compiler_type = 0x0000024d6a7e3c90 typedef SomeTypedef ``` This matches the behavior of the DIA PDB plugin and works towards making [`Shell/SymbolFile/PDB/typedefs.test`](https://github.com/llvm/llvm-project/blob/main/lldb/test/Shell/SymbolFile/PDB/typedefs.test) pass with the native plugin. I added a similar test to the `NativePDB` shell tests to capture the current state, which doesn't quite match that of DIA yet. I'll add some comments on what's missing on this PR, because I'm not fully sure what the correct output would be.
…add SSE/AVX VPTEST/VTESTPD/VTESTPS intrinsics to be used in constexpr (llvm#160428) Fix llvm#158653 Add handling for: ``` ptestz128 / ptestz256 → (a & b) == 0. ptestc128 / ptestc256 → (~a & b) == 0 ptestnzc128 / ptestnzc256 → (a & b) != 0 AND (~a & b) != 0. vtestzps / vtestzps256 → (S(a) & S(b)) == 0. vtestcps / vtestcps256 → (~S(a) & S(b)) == 0. vtestnzcps / vtestnzcps256 → (S(a) & S(b)) != 0 AND (~S(a) & S(b)) != 0. vtestzpd / vtestzpd256 → (S(a) & S(b)) == 0. vtestcpd / vtestcpd256 → (~S(a) & S(b)) == 0. vtestnzcpd / vtestnzcpd256 → (S(a) & S(b)) != 0 AND (~S(a) & S(b)) != 0. ``` Add corresponding test cases for: ``` int _mm_test_all_ones (__m128i a) int _mm_test_all_zeros (__m128i mask, __m128i a) int _mm_test_mix_ones_zeros (__m128i mask, __m128i a) int _mm_testc_pd (__m128d a, __m128d b) int _mm256_testc_pd (__m256d a, __m256d b) int _mm_testc_ps (__m128 a, __m128 b) int _mm256_testc_ps (__m256 a, __m256 b) int _mm_testc_si128 (__m128i a, __m128i b) int _mm256_testc_si256 (__m256i a, __m256i b) int _mm_testnzc_pd (__m128d a, __m128d b) int _mm256_testnzc_pd (__m256d a, __m256d b) int _mm_testnzc_ps (__m128 a, __m128 b) int _mm256_testnzc_ps (__m256 a, __m256 b) int _mm_testnzc_si128 (__m128i a, __m128i b) int _mm256_testnzc_si256 (__m256i a, __m256i b) int _mm_testz_pd (__m128d a, __m128d b) int _mm256_testz_pd (__m256d a, __m256d b) int _mm_testz_ps (__m128 a, __m128 b) int _mm256_testz_ps (__m256 a, __m256 b) int _mm_testz_si128 (__m128i a, __m128i b) int _mm256_testz_si256 (__m256i a, __m256i b) ```
The compiler is missing cases where it checks mips64r6 but not i6400/i6500 causing wrong defines to be generated
Add a test with a non-uniform load of an argument (SCEVUnknown), showing that SCEVUnknown cannot always be considered uniform.
…lvm#162267) We can perform CSE on recipes that do not directly map to Instruction opcodes. One example is VPVectorPointerRecipe. Currently this is handled by supporting them in ::canHandle, but currently that means that we return std::nullopt from getOpcodeOrIntrinsicID() for it. This currently only works, because the only case we return std::nullopt and perform CSE is VPVectorPointerRecipe. But that does not work if we support more such recipes, like VPPredInstPHIRecipe (llvm#162110). To fix this, return a custom opcode from getOpcodeOrIntrinsicID for recipes like VPVectorPointerRecipe, using the VPDefID after all regular instruction opcodes. PR: llvm#162267
Additional test coverage for llvm#160500.
…vm#160330) Previously when using min_fetch/max_fetch atomics with floating point types, LLVM would emit a crash. This patch updates the EmitPostAtomicMinMax function in CGAtomic.cpp to take floating point types. Included is a clang CodeGen test atomic-ops-float-check-minmax.c and Sema test atomic-ops-fp-minmax.c.
…ument (llvm#161883) Changes to linalg `structured.fuse` transform op: * Adds an optional `use_forall` boolean argument which generates a tiled `scf.forall` loop instead of `scf.for` loops. * `tile_sizes` can now be any parameter or handle. * `tile_interchange` can now be any parameter or handle. * IR formatting changes from `transform.structured.fuse %0 [4, 8] ...` to `transform.structured.fuse %0 tile_sizes [4, 8] ...` - boolean arguments are now `UnitAttrs` and should be set via the op attr-dict: `{apply_cleanup, use_forall}`
This patch ports the ISD::SUB handling from SelectionDAG’s ComputeNumSignBits to GlobalISel. Related to llvm#150515. --------- Co-authored-by: Matt Arsenault <[email protected]> Co-authored-by: Simon Pilgrim <[email protected]>
As discussed in: * https://discourse.llvm.org/t/rfc-linalg-forms/87994/ * https://discourse.llvm.org/t/rfc-extend-linalg-elemwise-named-ops-semantics/83927 * https://discourse.llvm.org/t/rfc-op-explosion-in-linalg/82863/1 * https://discourse.llvm.org/t/rfc-mlir-linalg-operation-tree/83586 * llvm#148424 Co-designed by Javed Absar --------- Co-authored-by: Andrzej Warzyński <[email protected]>
Reverts llvm#161355 Looks like I've broken some intrinsic code generation.
The purpose of this commit is to observe the effects of PR llvm#154069.
Only `ninf` should be used.
…lvm#163129) In the previous PR llvm#163123 I made a mistake that unexpectedly moved the "other functionality" section from the "[Providing Python bindings for a dialect](https://mlir.llvm.org/docs/Bindings/Python/#providing-python-bindings-for-a-dialect)" section to the newly-added section ([Extending MLIR in Python](https://mlir.llvm.org/docs/Bindings/Python/#extending-mlir-in-python)). This PR is to fix it.
Closing this PR. Missed a few things. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is an extension of #157968 . Adds support for V_PK_MOV_B32. Removes a few redundant checks, and reduces MF scans for archs that do not support this optimization.