Skip to content

Conversation

akadutta
Copy link
Contributor

This is an extension of #157968 . Adds support for V_PK_MOV_B32. Removes a few redundant checks, and reduces MF scans for archs that do not support this optimization.

d0k and others added 30 commits October 14, 2025 09:49
…FrameRecord (llvm#156452)

Previously, we would pop `FixedObject`-bytes after deallocating the SVE
area, then again as part of the "AfterCSRPopSize". This could be seen in
the tests `@f6` and `@f9`.

This patch removes the erroneous pop, and refactors
`CalleeSavesAboveFrameRecord` to reuse more of the existing GPR
deallocation logic, which allows for post-decrements.
So far the translator only inlined expressions having no side effects,
as rescheduling their evaluation doesn't break semantics. This patch
adds inlining of expressions containing side effects if defined just
before their use, e.g.,

```mlir
    %c = emitc.expression %a, %b : (i32, !emitc.ptr<i32>) -> i32 {
    %e = emitc.sub %a, %b : (!emitc.ptr<i32>, i32) -> !emitc.ptr<i32>
    %d = emitc.apply "*"(%e) : (!emitc.ptr<i32>) -> i32
    emitc.yield %d : i32
  }
  emitc.return %c : i32
```

This restriction is meant to keep the translator as simple as possible,
leaving it to transformations to analyze and reorder ops as needed in
more complicated cases.
The patch handles inlining into `emitc.return`, `emitc.if`,
`emitc.switch` and (to some extent) `emitc.assign`.
…162804)

Converting a vector float op into a vector int op may be non-profitable,
especially for targets where the float op for a given type is legal, but
the integer op is not.

We could of course also try to address this via a reverse transform in
the backend, but I don't think it's worth the bother, given that vectors
were never the intended use case for this transform in the first place.

Fixes llvm#162749.
…llvm#163060)

Allow LLVMGetVolatile() to work with any kind of Instruction, rather
than only memory instructions that accept a volatile flag. For
instructions that can never be volatile, the function now return false
instead of asserting. This matches the behavior of
`Instruction::isVolatile()` in the C++ API.
Add a `-print-debug-counter-queries` option which prints the current
value of the counter and whether it is executed/skipped each time it is
queried. This is useful when interleaving the output with the usual
transform debug output, in order to find the correct counter value to
use to hit a specific point in the transform.
…k) (llvm#162666)

Similarly to llvm#152960, this PR fixes `getTiledOuterDims` for `linalg.pack` by
ensuring that the `outer_dims_perm` attributeis properly taken into account.

This enables the main change in this PR: relaxing the constraints in
  * `DecomposeOuterUnitDimsPackOpPattern`.

Specifically, the pattern is extended to allow non-unit untiled outer
dimensions. For example:
```mlir
func.func @example(
    %src: tensor<2x32x16x8xf32>,
    %dest: tensor<2x1x16x8x32xf32>) ->  tensor<2x1x16x8x32xf32> {

  %pack = linalg.pack %src
    inner_dims_pos = [1]
    inner_tiles = [32] into %dest
    : tensor<2x32x16x8xf32> -> tensor<2x1x16x8x32xf32>

  return %pack : tensor<2x1x16x8x32xf32>
}
```
decomposes as:
```mlir
func.func @example(
      %src: tensor<2x32x16x8xf32>,
      %dest: tensor<2x1x16x8x32xf32>) -> tensor<2x1x16x8x32xf32> {

  %0 = tensor.empty() : tensor<2x16x8x32xf32>
  %transposed = linalg.transpose
    ins(%src : tensor<2x32x16x8xf32>)
    outs(%init : tensor<2x16x8x32xf32>)
    permutation = [0, 2, 3, 1]
  %inserted_slice = tensor.insert_slice %transposed
    into %dest[0, 0, 0, 0, 0] [2, 1, 16, 8, 32] [1, 1, 1, 1, 1]
    : tensor<2x16x8x32xf32> into tensor<2x1x16x8x32xf32>

  return %inserted_slice : tensor<2x1x16x8x32xf32>
  }
```
Importantly, this change makes `DecomposeOuterUnitDimsPackOpPattern` (the
decomposition pattern for `linalg.pack`) consistent with the corresponding
pattern for `linalg.unpack`:
  * `DecomposeOuterUnitDimsUnPackOpPattern`.

One notable assumption remains: untiled outer dimensions are not permuted. This
was already the case but is now explicitly documented.

Co-authored by: Max Bartel <[email protected]>
…faces (llvm#162840)

The current benchmarks test a very specific case, which makes them
rather misleading. This adds new benchmarks so we have better coverage.
…9541)

Cygwin builds are currently broken after llvm#157312, which effectively
reverted llvm#138117. The root cause is that Cygwin defines
`DL_info::dli_fname` as `char[N]`, which is not a valid parameter type
for `llvm::format`.

This patch allows `llvm::format` to accept `char[N]` by decaying it to
`const char *`. As a result, string literals are also accepted without
an explicit cast.

Other array types remain rejected:
- Wide/unicode character arrays (e.g., `wchar_t[N]`) are not supported,
as LLVM does not use them and they are less compatible with platform's
`printf` implementations.
- Non-character arrays (e.g., `int[N]`) are also rejected, since passing
such arrays to `printf` is meaningless.
For ptrtoint(inttoptr) and ptrtoaddr(inttoptr), handle the case where
the source and destination size do not match and convert to either zext
or trunc. We can't do this if the middle size is smaller than both
src/dest, because we'd have to perform an additional masking operation
in that case.

Most of these cases are handled by dint of ptrtoint/inttoptr size
canonicalization (so I added some unit tests instead). However, the
ptrtoaddr(inttoptr) case where the pointer size and address size differ
is relevant, as in that case the mismatch in integer sizes is canonical.
7381558 renamed the FinalizeRequest type to InitializeRequest. This commit
updates InitializeRequest variable names to follow suit ("FR"s become "IR"s).
Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions.
If we have FP16, then lower v8f16 fmuladds to FMA.

I've introduced an ISD node for fmuladd to maintain the rounding
ambiguity through legalization / combine / isel.
…56250)

Before this PR, the native PDB plugin would create the following LLDB
`Type` for `using SomeTypedef = long`:
```
Type{0x00002e03} , name = "SomeTypedef", size = 4, compiler_type = 0x000002becd8d8620 long
```
with this PR, the following is created:
```
Type{0x00002e03} , name = "SomeTypedef", size = 4, compiler_type = 0x0000024d6a7e3c90 typedef SomeTypedef
```

This matches the behavior of the DIA PDB plugin and works towards making
[`Shell/SymbolFile/PDB/typedefs.test`](https://github.com/llvm/llvm-project/blob/main/lldb/test/Shell/SymbolFile/PDB/typedefs.test)
pass with the native plugin.

I added a similar test to the `NativePDB` shell tests to capture the
current state, which doesn't quite match that of DIA yet. I'll add some
comments on what's missing on this PR, because I'm not fully sure what
the correct output would be.
…add SSE/AVX VPTEST/VTESTPD/VTESTPS intrinsics to be used in constexpr (llvm#160428)

Fix llvm#158653

Add handling for:
```
ptestz128 / ptestz256 → (a & b) == 0.
ptestc128 / ptestc256 → (~a & b) == 0
ptestnzc128 / ptestnzc256 → (a & b) != 0  AND  (~a & b) != 0.
vtestzps / vtestzps256 → (S(a) & S(b)) == 0.
vtestcps / vtestcps256 → (~S(a) & S(b)) == 0.
vtestnzcps / vtestnzcps256 → (S(a) & S(b)) != 0  AND  (~S(a) & S(b)) != 0.
vtestzpd / vtestzpd256 → (S(a) & S(b)) == 0.
vtestcpd / vtestcpd256 → (~S(a) & S(b)) == 0.
vtestnzcpd / vtestnzcpd256 → (S(a) & S(b)) != 0  AND  (~S(a) & S(b)) != 0.
```
Add corresponding test cases for:
```
int _mm_test_all_ones (__m128i a)
int _mm_test_all_zeros (__m128i mask, __m128i a)
int _mm_test_mix_ones_zeros (__m128i mask, __m128i a)
int _mm_testc_pd (__m128d a, __m128d b)
int _mm256_testc_pd (__m256d a, __m256d b)
int _mm_testc_ps (__m128 a, __m128 b)
int _mm256_testc_ps (__m256 a, __m256 b)
int _mm_testc_si128 (__m128i a, __m128i b)
int _mm256_testc_si256 (__m256i a, __m256i b)
int _mm_testnzc_pd (__m128d a, __m128d b)
int _mm256_testnzc_pd (__m256d a, __m256d b)
int _mm_testnzc_ps (__m128 a, __m128 b)
int _mm256_testnzc_ps (__m256 a, __m256 b)
int _mm_testnzc_si128 (__m128i a, __m128i b)
int _mm256_testnzc_si256 (__m256i a, __m256i b)
int _mm_testz_pd (__m128d a, __m128d b)
int _mm256_testz_pd (__m256d a, __m256d b)
int _mm_testz_ps (__m128 a, __m128 b)
int _mm256_testz_ps (__m256 a, __m256 b)
int _mm_testz_si128 (__m128i a, __m128i b)
int _mm256_testz_si256 (__m256i a, __m256i b)
```
The compiler is missing cases where it checks mips64r6 but not
i6400/i6500 causing wrong defines to be generated
Add a test with a non-uniform load of an argument (SCEVUnknown), showing
that SCEVUnknown cannot always be considered uniform.
…lvm#162267)

We can perform CSE on recipes that do not directly map to Instruction
opcodes. One example is VPVectorPointerRecipe. Currently this is handled
by supporting them in ::canHandle, but currently that means that we
return std::nullopt from getOpcodeOrIntrinsicID() for it. This currently
only works, because the only case we return std::nullopt and perform CSE
is VPVectorPointerRecipe. But that does not work if we support more such
recipes, like VPPredInstPHIRecipe
(llvm#162110).

To fix this, return a custom opcode from getOpcodeOrIntrinsicID for
recipes like VPVectorPointerRecipe, using the VPDefID after all regular
instruction opcodes.

PR: llvm#162267
…vm#160330)

Previously when using min_fetch/max_fetch atomics with floating point
types, LLVM would emit a crash. This patch updates the
EmitPostAtomicMinMax function in CGAtomic.cpp to take floating point
types. Included is a clang CodeGen test atomic-ops-float-check-minmax.c
and Sema test atomic-ops-fp-minmax.c.
…ument (llvm#161883)

Changes to linalg `structured.fuse` transform op:

* Adds an optional `use_forall` boolean argument which generates a tiled
  `scf.forall` loop instead of `scf.for` loops.
* `tile_sizes` can now be any parameter or handle.
* `tile_interchange` can now be any parameter or handle.
* IR formatting changes from `transform.structured.fuse %0 [4, 8] ...`
  to `transform.structured.fuse %0 tile_sizes [4, 8] ...`
- boolean arguments are now `UnitAttrs` and should be set via the op
  attr-dict: `{apply_cleanup, use_forall}`
This patch ports the ISD::SUB handling from SelectionDAG’s ComputeNumSignBits to GlobalISel.

Related to llvm#150515.

---------

Co-authored-by: Matt Arsenault <[email protected]>
Co-authored-by: Simon Pilgrim <[email protected]>
Reverts llvm#161355

Looks like I've broken some intrinsic code generation.
The purpose of this commit is to observe the effects of PR llvm#154069.
…lvm#163129)

In the previous PR llvm#163123 I made a mistake that unexpectedly moved the
"other functionality" section from the "[Providing Python bindings for a
dialect](https://mlir.llvm.org/docs/Bindings/Python/#providing-python-bindings-for-a-dialect)"
section to the newly-added section ([Extending MLIR in
Python](https://mlir.llvm.org/docs/Bindings/Python/#extending-mlir-in-python)).

This PR is to fix it.
@akadutta
Copy link
Contributor Author

Closing this PR. Missed a few things.

@akadutta akadutta closed this Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.