[TTI] Introduce getInstructionUniformity API for flexible uniformity analysis #137639

PankajDwivedi-25 · 2025-04-28T14:29:01Z

This patch introduces a new getInstructionUniformity() API to TargetTransformInfo
that provides more flexibility for targets to classify instruction uniformity. The
new API uses an InstructionUniformity enum that can express:

AlwaysUniform: Result is always uniform
NeverUniform: Result is always divergent
Default: Use default divergence propagation rules
AnyOfFirstTwoUseOp: Result is uniform if either of first two operands are uniform
Other complex rule

The old isSourceOfDivergence() and isAlwaysUniform() APIs are kept unchanged for
backward compatibility. The new API delegates to these old APIs, providing an
incremental migration path. UniformityAnalysis is updated to exclusively use the
new API.

This change enables more precise divergence analysis for AMDGPU's permlane16 and
permlanex16 intrinsics, which can produce uniform results when their operands are
uniform, despite being previously marked as unconditionally divergent. These
intrinsics are removed from the divergence table and now return AnyOfFirstTwoUseOp,
allowing the uniformity analysis to track their actual uniformity based on operands.

For NVPTX, the new API simply wraps the existing isSourceOfDivergence behavior.

All existing lit tests pass with this change.

llvmbot · 2025-04-28T14:29:36Z

@llvm/pr-subscribers-backend-nvptx
@llvm/pr-subscribers-llvm-adt
@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-backend-amdgpu

Author: Pankaj Dwivedi (PankajDwivedi-25)

Changes

This patch introduces a new target hook getInstructionUniformity(const Instruction &I) in TargetTransformInfo, enabling targets to describe more complex relationships between operand uniformity and instruction uniformity.

Currently, UniformityAnalysis categorizes instructions into a fixed set of InstructionUniformity values (Default, AlwaysUniform, NeverUniform).
However, some instructions, particularly intrinsics, have operand-dependent uniformity behaviors that are not easily captured within this framework.

This hook allows targets to override and implement custom uniformity-propagation rules for such cases.

Full diff: https://github.com/llvm/llvm-project/pull/137639.diff

7 Files Affected:

(modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+8)
(modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+5)
(modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+5)
(modified) llvm/lib/Analysis/UniformityAnalysis.cpp (+13)
(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp (+21)
(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h (+2)
(added) llvm/test/Analysis/UniformityAnalysis/AMDGPU/uniform_intrinsic.ll (+25)

diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 022530dc846ea..9af5006ce9c6d 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -23,6 +23,7 @@
 
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/Uniformity.h"
 #include "llvm/Analysis/IVDescriptors.h"
 #include "llvm/IR/FMF.h"
 #include "llvm/IR/InstrTypes.h"
@@ -1916,6 +1917,13 @@ class TargetTransformInfo {
       const Function &F,
       SmallVectorImpl<std::pair<StringRef, int64_t>> &LB) const;
 
+  /// Target can implement more complex patterns for getting Uniformity of an
+  /// instruction.Currently Uniformity analysis catagorises instructions with a
+  /// fixed set of InstructionUniformity values: Default, AlwaysUniform and
+  /// NeverUniform.
+  std::optional<InstructionUniformity>
+  getInstructionUniformity(const Instruction &I) const;
+
 private:
   std::unique_ptr<const TargetTransformInfoImplBase> TTIImpl;
 };
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 990252b1e5743..5bee462575181 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -1147,6 +1147,11 @@ class TargetTransformInfoImplBase {
       const Function &F,
       SmallVectorImpl<std::pair<StringRef, int64_t>> &LB) const {}
 
+  virtual std::optional<InstructionUniformity>
+  getInstructionUniformity(const Instruction &I) const {
+    return std::nullopt;
+  }
+
 protected:
   // Obtain the minimum required size to hold the value (without the sign)
   // In case of a vector it returns the min required size for one element.
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index 8548afea72964..50157a7714bf7 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -1476,6 +1476,11 @@ void TargetTransformInfo::collectKernelLaunchBounds(
   return TTIImpl->collectKernelLaunchBounds(F, LB);
 }
 
+std::optional<InstructionUniformity>
+TargetTransformInfo::getInstructionUniformity(const Instruction &I) const {
+  return TTIImpl->getInstructionUniformity(I);
+}
+
 TargetTransformInfoImplBase::~TargetTransformInfoImplBase() = default;
 
 TargetIRAnalysis::TargetIRAnalysis() : TTICallback(&getDefaultTTI) {}
diff --git a/llvm/lib/Analysis/UniformityAnalysis.cpp b/llvm/lib/Analysis/UniformityAnalysis.cpp
index 2101fdfacfc8f..2fc6f523139a7 100644
--- a/llvm/lib/Analysis/UniformityAnalysis.cpp
+++ b/llvm/lib/Analysis/UniformityAnalysis.cpp
@@ -35,7 +35,20 @@ template <> void llvm::GenericUniformityAnalysisImpl<SSAContext>::initialize() {
       markDivergent(I);
     else if (TTI->isAlwaysUniform(&I))
       addUniformOverride(I);
+    else if (auto Uniformity = TTI->getInstructionUniformity(I)) {
+      switch (*Uniformity) {
+      case InstructionUniformity::AlwaysUniform:
+        addUniformOverride(I);
+        break;
+      case InstructionUniformity::NeverUniform:
+        markDivergent(I);
+        break;
+      case InstructionUniformity::Default:
+        break;
+      }
+    }
   }
+
   for (auto &Arg : F.args()) {
     if (TTI->isSourceOfDivergence(&Arg)) {
       markDivergent(&Arg);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index 204d3df546bbf..5c59847dfeb62 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -1422,3 +1422,24 @@ void GCNTTIImpl::collectKernelLaunchBounds(
   LB.push_back({"amdgpu-waves-per-eu[0]", WavesPerEU.first});
   LB.push_back({"amdgpu-waves-per-eu[1]", WavesPerEU.second});
 }
+
+std::optional<InstructionUniformity>
+GCNTTIImpl::getInstructionUniformity(const Instruction &I) const {
+  if (const auto *II = dyn_cast<IntrinsicInst>(&I)) {
+    // We can define the custom rules for the intrinsics uniformity, depending
+    // on argument.
+    switch (II->getIntrinsicID()) {
+    case Intrinsic::amdgcn_permlane64:
+      // If either operand is uniform, the result is uniform.
+      for (unsigned Arg_i = 0, NumArg = II->arg_size(); Arg_i < NumArg;
+           Arg_i++) {
+        if (!isSourceOfDivergence(II->getArgOperand(Arg_i)))
+          return InstructionUniformity::AlwaysUniform;
+      }
+      return InstructionUniformity::Default;
+    default:
+      break;
+    }
+  }
+  return std::nullopt;
+}
\ No newline at end of file
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
index f6f7bd4bfcf5b..bea0b024d745b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
@@ -290,6 +290,8 @@ class GCNTTIImpl final : public BasicTTIImplBase<GCNTTIImpl> {
   void collectKernelLaunchBounds(
       const Function &F,
       SmallVectorImpl<std::pair<StringRef, int64_t>> &LB) const override;
+  std::optional<InstructionUniformity>
+  getInstructionUniformity(const Instruction &I) const override;
 };
 
 } // end namespace llvm
diff --git a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/uniform_intrinsic.ll b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/uniform_intrinsic.ll
new file mode 100644
index 0000000000000..4bb89516b2e81
--- /dev/null
+++ b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/uniform_intrinsic.ll
@@ -0,0 +1,25 @@
+; RUN: opt -mtriple amdgcn-unknown-amdhsa -passes='print<uniformity>' -disable-output %s 2>&1 | FileCheck %s
+
+; CHECK: ALL VALUES UNIFORM
+define amdgpu_kernel void @permlane64_constant(ptr addrspace(1) %out) {
+  %v = call i32 @llvm.amdgcn.permlane64(i32 7)
+  store i32 %v, ptr addrspace(1) %out
+  ret void
+}
+
+; CHECK: ALL VALUES UNIFORM
+define amdgpu_kernel void @permlane64_uniform(ptr addrspace(1) %out, i32 %src) {
+  %v = call i32 @llvm.amdgcn.permlane64(i32 %src)
+  store i32 %v, ptr addrspace(1) %out
+  ret void
+}
+
+; CHECK: DIVERGENT: %tid = call i32 @llvm.amdgcn.workitem.id.x()
+; CHECK: DIVERGENT: %v = call i32 @llvm.amdgcn.permlane64.i32(i32 %tid)
+define amdgpu_kernel void @permlane64_nonuniform(i32 addrspace(1)* %out) {
+  %tid = call i32 @llvm.amdgcn.workitem.id.x()
+  %v = call i32 @llvm.amdgcn.permlane64(i32 %tid)
+  %out_ptr = getelementptr i32, i32 addrspace(1)* %out, i32 %tid
+  store i32 %v, i32 addrspace(1)* %out_ptr
+  ret void
+}

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

arsenm

I don't know what this new hook gives that the existing ones do not handle already

jayfoad · 2025-04-29T14:00:59Z

I don't know what this new hook gives that the existing ones do not handle already

I think this is supposed to address #131779 (but in its current form it does not).

PankajDwivedi-25 · 2025-04-30T04:18:20Z

Right, this is initial patch.

Right now it doesn't do anything more than already there.

I was looking for two things here.

Intrinsic list whose uniformity depends on operand.
How to get the operand uniformity at this point.

PankajDwivedi-25 · 2025-09-19T11:57:38Z

I can see the target intrinsic of this hook is currently classified as sourceOfDivergence in .td file, which is hence marked divergent during the initialization phase itself of the uniformity analysis.

For example, I have removed one such intrinsic from the list in the .td, and now the hook can classify it as uniform if one or more of its operands are uniform.

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h

llvm/lib/Analysis/UniformityAnalysis.cpp

llvm/include/llvm/ADT/Uniformity.h

jayfoad · 2025-09-24T14:34:04Z

This patch introduces a new target hook isSpecialUniformIntrinsic() in TargetTransformInfo, enabling targets to describe more complex relationships between operand uniformity and instruction uniformity.

Currently, all the users of the divergent values are marked as divergent conservatively; instead target can override the hook isSpecialUniformIntrinsic() to capture the operand-dependent uniformity.

The way you have implemented this, isSpecialUniformIntrinsic just identifies intrinsics that have yet another fixed way of calculating their uniformity: they are uniform if any of their operands are uniform. This is useful for some basic permute-style operations but really not flexible enough to be more generally useful.

For example, llvm.amdgcn.permlane16 has extra fetch_invalid and bound_control operands, so the "result if uniform if any operand is uniform" rule does not work for it! It needs to be something like "result is uniform is either of the first two operands are uniform (possibly with some more special cases depending on the values of the extra operands)".

I really think we need a more flexible target hook that can implement any arbitrary logic for the uniformity of the result of an operation, based on the uniformity of each of its operands.

arsenm

Don't we already have special context dependent uniformity with the existing API? e.g. identifying math based on the work item ID

llvm/include/llvm/ADT/GenericUniformityImpl.h

arsenm · 2025-09-25T06:48:47Z

llvm/test/Analysis/UniformityAnalysis/AMDGPU/uniform_intrinsic.ll

+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -mtriple amdgcn-- -passes='print<uniformity>' -disable-output %s 2>&1 | FileCheck %s
+
+; CHECK: ALL VALUES UNIFORM


These aren't always always uniform, so need test coverage for all the conditions that would make this divergent

llvm/lib/Analysis/UniformityAnalysis.cpp

PankajDwivedi-25 · 2025-09-25T08:09:36Z

Don't we already have special context dependent uniformity with the existing API? e.g. identifying math based on the work item ID

I don't get this. you mean pattern optimizations based on work item ID?

I don't see any such API existing at the IR level.

PankajDwivedi-25 · 2025-09-25T08:33:28Z

I really think we need a more flexible target hook that can implement any arbitrary logic for the uniformity of the result of an operation, based on the uniformity of each of its operands.

The only way I see to achieve this is by passing the Uniformity of the operands to Targets and letting them define the custom rule.

My concern is that it would be a bit expensive, also the decision of uniformity will be taken outside of UA.

Let me know if you guys have any thoughts on how best this can be implemented?

jayfoad · 2025-09-25T09:21:21Z

Don't we already have special context dependent uniformity with the existing API? e.g. identifying math based on the work item ID

Yes we have some special known-uniform cases in GCNTTIImpl::isAlwaysUniform, but that code does not have access to the uniformity of the operands, so it cannot implement rules like "result is uniform if any operand is uniform".

jayfoad · 2025-09-25T09:22:55Z

The only way I see to achieve this is by passing the Uniformity of the operands to Targets and letting them define the custom rule.

Exactly, you would need some kind of target hook where the uniformity of the operands is passed in, or where the target hook can call back into UA to query the uniformity of operands.

My concern is that it would be a bit expensive, also the decision of uniformity will be taken outside of UA.

I don't see why it would be too expensive. If it is then we should just give up on trying to implement this feature.

ssahasra · 2025-10-06T02:24:42Z

The only way I see to achieve this is by passing the Uniformity of the operands to Targets and letting them define the custom rule.

Exactly, you would need some kind of target hook where the uniformity of the operands is passed in, or where the target hook can call back into UA to query the uniformity of operands.

My concern is that it would be a bit expensive, also the decision of uniformity will be taken outside of UA.

I don't see why it would be too expensive. If it is then we should just give up on trying to implement this feature.

We believe it will be expensive because there's a virtual function call happening every time the UA (re)visits an intrinsic call. The approach we are exploring now is more static ... the target hook can return an enum encoding the "uniformity policy" of the intrinsic, which will be cached. Then the UA can interpret the policy every time it visits an intrinsic. That way, all the queries and decisions taken for uniformity stay within the UA implementation, and the virtual function is called only once when initializing the UA.

jayfoad · 2025-10-06T09:45:45Z

We believe it will be expensive because there's a virtual function call happening every time the UA (re)visits an intrinsic call. The approach we are exploring now is more static ... the target hook can return an enum encoding the "uniformity policy" of the intrinsic, which will be cached. Then the UA can interpret the policy every time it visits an intrinsic. That way, all the queries and decisions taken for uniformity stay within the UA implementation, and the virtual function is called only once when initializing the UA.

Well, OK, but I think it needs to be at least flexible enough to handle cases like ""result is uniform if either of the first two operands are uniform", for intrinsics like llvm.amdgcn.permlane16.

PankajDwivedi-25 · 2025-10-31T20:37:12Z

PING: @jayfoad @arsenm @ssahasra

arsenm · 2025-11-01T05:02:23Z

llvm/lib/CodeGen/MachineUniformityAnalysis.cpp

+  case InstructionUniformity::EitherOfFirstTwoOp:
+    return !isDivergentUse(I.getOperand(0)) || !isDivergentUse(I.getOperand(1));


This API will work poorly for machine instrs. This will typically be the def

Yeah, right, I think Op_1 and Op_2 should be checked here.

llvm/include/llvm/ADT/Uniformity.h

jayfoad · 2025-11-03T10:31:41Z

llvm/include/llvm/Analysis/TargetTransformInfo.h

If you're adding this new hook, you should aim to remove the existing hooks isSourceOfDivergence and isAlwaysUniform, since the new one is strictly more capable. This could be done as a separate NFC refactoring.

I would strongly prefer removing the existing hooks as part of this patch. It just might expose rough corners that we need to consider in this design itself.

Got it, you guys are saying instead of using those two, I should use the newly added one getInstructionUniformity.

I would strongly prefer removing the existing hooks as part of this patch.

How about doing it as an NFC refactoring before this patch?

Removing those API's will require their uses to be replaced with the new API, so the best approach would be either to do it in the current patch or after this.

ruiling · 2025-11-17T10:26:59Z

Thank you for working on this. Sorry I just notice this. Let me share my thinking on this.

Currently uniform analysis works like: identifying divergent sources (through isSourceOfDivergent). Then go propagating the divergence property mostly through the data flow chain (there are also sync dependency). Currently it assumes the divergence of the source value will be always propagated to destination value (unless being marked as isAlwaysUniform). This is not always true, like in some cases, the propagation behavior might be the result will be divergent if a specific source operand is divergent or other complex rule. The immediate solution in my mind would be something like: bool shouldPropagateDivergence(Instruction *Inst, Value *divergentOp). During the divergence propagation, query the hook before marking user instruction divergent. Within the hook, it will decide whether the divergentOp will make the result value divergent. I think this should be enough for the cases we care. We might need a more complex hook if considering different uniformity among the result operands in MIR. We may ask the hook to return a vector of divergent values like SmallVector<Register> propagateDivergence(MachineInstr *Inst, Register divergentOp). I am not sure whether we really need this.

It would be easy to switch IsAlwaysUniform to this new hook. But I don't think we need to do it in one step. We can add the new hook in the first step, get it tested, then deprecate isAlwaysUniform. isSourceOfDivergent acts as the starter of divergence propagation. We can still keep it, which does not seem like a burden.

Does this make sense?

PankajDwivedi-25 · 2025-11-17T11:48:51Z

During the divergence propagation, query the hook before marking user instruction divergent. Within the hook, it will decide whether the divergentOp will make the result value divergent

Thanks for your review.

As per our design discussion, we want the uniformity decision to be made within the analysis pass itself rather than querying the target about it. For that purpose, during the initialization phase, we query the target and record the target-dependent uniformity for later divergence propagation.

This approach will be flexible to handle any complex target-dependent uniformity, both for IR and MIRs.

PankajDwivedi-25 · 2025-11-17T11:59:48Z

the propagation behavior might be the result will be divergent if a specific source operand is divergent or other complex rule

Yes, it can be seen as this way as well, currently we try to prove it Uniform. The new approach you are suggesting will also look similar. At all the places will be trying to prove divergent instead of uniform in the current patch.

ruiling · 2025-11-18T07:35:20Z

Sorry I just noticed the target hook I suggested can only solve part of the problem... Let's skip it. Back to the permlane16. The intrinsic is defined as llvm.amdgcn.permlane16 <old> <src0> <src1> <src2> <fi> <bound_control>, it sounds like if any of the first two sources old/src0 is divergent, the result would be divergent. Did I misread?

PankajDwivedi-25 · 2025-11-18T08:40:23Z

Sorry I just noticed the target hook I suggested can only solve part of the problem... Let's skip it. Back to the permlane16. The intrinsic is defined as llvm.amdgcn.permlane16 <old> <src0> <src1> <src2> <fi> <bound_control>, it sounds like if any of the first two sources old/src0 is divergent, the result would be divergent. Did I misread?

It’s the other way around: if any of the first two operands (old, src0) is uniform, the result is uniform. The result is divergent only if both operands are divergent.

jayfoad · 2025-11-18T12:09:16Z

Back to the permlane16. The intrinsic is defined as llvm.amdgcn.permlane16 <old> <src0> <src1> <src2> <fi> <bound_control>, it sounds like if any of the first two sources old/src0 is divergent, the result would be divergent. Did I misread?

You have chosen a very complicated example!

permlane16 is like shuffle where src1 and src2 are combined to form the lane select argument, so the result is uniform if either src0 or both src1 and src2 are uniform.

However, old might also be relevant depending on the value of fi. And really it is not clear what the semantics of this intrinsic should be if you try to use it to read from inactive lanes, since in general the compiler cannot guarantee anything about the contents of inactive lanes.

ruiling · 2025-11-19T06:17:22Z

permlane16 is like shuffle where src1 and src2 are combined to form the lane select argument, so the result is uniform if either src0 or both src1 and src2 are uniform.

Since the permlane16-like intrinsics only shuffle within/across 16-lane-wide row. Even the src1 and src2 are uniform (I think in real case they should be always a uniform mask), if src0 is divergent, the result is still divergent (at least the values are non-uniform among different rows).

However, old might also be relevant depending on the value of fi. And really it is not clear what the semantics of this intrinsic should be if you try to use it to read from inactive lanes, since in general the compiler cannot guarantee anything about the contents of inactive lanes.

Yes, agree on the complexity of the uniformity of the intrinsic. Maybe this is less important in real case.
Actually, this newly introduced capability would help modeling the uniformity behavior of intrinsic in #167372 accurately.

PankajDwivedi-25 · 2025-11-19T10:01:08Z

sorry, I am bit confused here llvm.amdgcn.permlane16 <old> <src0> <src1> <src2> <fi> <bound_control> in the current patch i am checking op-0, and op-1 uniformity.

but from this discussion it looks like it should be op-1(which is source) and op-2(which is lane select)?
@jayfoad @ruiling

jayfoad · 2025-11-19T10:29:45Z

permlane16 is like shuffle where src1 and src2 are combined to form the lane select argument, so the result is uniform if either src0 or both src1 and src2 are uniform.

Since the permlane16-like intrinsics only shuffle within/across 16-lane-wide row. Even the src1 and src2 are uniform (I think in real case they should be always a uniform mask), if src0 is divergent, the result is still divergent (at least the values are non-uniform among different rows).

Yes, you are right of course.

PankajDwivedi-25 · 2025-11-19T12:29:01Z

PING, @jayfoad, @ssahasra, and @arsenm: Please review the changes.

…n safely use it

…o use it

Co-authored-by: Matt Arsenault <[email protected]>

…niform

github-actions · 2025-11-19T12:50:54Z

✅ With the latest revision this PR passed the C/C++ code formatter.

github-actions · 2025-11-19T13:16:53Z

🐧 Linux x64 Test Results

186416 tests passed
4867 tests skipped

nhaehnle

Unrelated to the generic uniformity analysis change, there seems to be a misunderstanding by everybody in this thread about how permlane16 works. Please read up the instruction definition, y'all! Trying to rely on the uniformity of src1 and src2 is confused.

As for the uniformity analysis change itself, there are a number of ways this could go, but I think either way what we'd want is the ability to define more complex rules in the TTI itself, not via some ad-hoc enum. This could take the form of a callback like:

virtual bool isUniform(const Instruction &I, const SmallBitVector &UniformArgs);

... where UniformArgs has a bit per instruction operand indicating whether the method may assume it to be uniform or not.

Since this is an uncommon case, we may want to guard using this method by a forth InstructionUniformity enum called something like InstructionUniformity::Complex:

  // If all operands are uniform, the result values are uniform. Otherwise, the result
  // values may be divergent, and a custom check may be used to determine uniformity.
  Custom,

nhaehnle · 2025-11-19T20:49:10Z

llvm/include/llvm/ADT/Uniformity.h

+  /// Result value can be uniform if any of the first two use operand are
+  /// uniform.
+  AnyOfFirstTwoUseOp


That enum value seems like a really bad precedent. It's so arbitrary.

llvm/lib/Analysis/UniformityAnalysis.cpp

PankajDwivedi-25 · 2025-11-20T11:05:07Z

Unrelated to the generic uniformity analysis change, there seems to be a misunderstanding by everybody in this thread about how permlane16 works. Please read up the instruction definition, y'all! Trying to rely on the uniformity of src1 and src2 is confused.

As for the uniformity analysis change itself, there are a number of ways this could go, but I think either way what we'd want is the ability to define more complex rules in the TTI itself, not via some ad-hoc enum. This could take the form of a callback like:
virtual bool isUniform(const Instruction &I, const SmallBitVector &UniformArgs);
... where UniformArgs has a bit per instruction operand indicating whether the method may assume it to be uniform or not.

Since this is an uncommon case, we may want to guard using this method by a forth InstructionUniformity enum called something like InstructionUniformity::Complex:
  // If all operands are uniform, the result values are uniform. Otherwise, the result
  // values may be divergent, and a custom check may be used to determine uniformity.
  Custom,

Thanks @nhaehnle for your feedback, I have addressed the changes. please let me know if there are other changes required.

jayfoad

What happened to the idea of a preliminary NFC patch to introduce TTI::getInstructionUniformity as a replacement for isAlwaysUniform/isSourceOfDivergence?

jayfoad · 2025-11-20T14:05:59Z

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

+      // Operand 2: src1 (lane select within 16-lane group)
+      // Operand 3: src2 (which 16-lane group)
+      // Result is uniform if either src0 (op 1) or src1 (op 2) is uniform


As Nicolai pointed out this is completely wrong and permlane16 is the wrong example to use to demonstrate the new functionality.

PankajDwivedi-25 · 2025-11-20T14:37:51Z

What happened to the idea of a preliminary NFC patch to introduce TTI::getInstructionUniformity as a replacement for isAlwaysUniform/isSourceOfDivergence?

As per our previous discussion, it's better to make that part of this patch only to catch any edge case?

jayfoad · 2025-11-20T14:41:49Z

What happened to the idea of a preliminary NFC patch to introduce TTI::getInstructionUniformity as a replacement for isAlwaysUniform/isSourceOfDivergence?

As per our previous discussion, it's better to make that part of this patch only to catch any edge case?

I don't understand that argument. I'd much prefer to see it as a separate patch.

PankajDwivedi-25 · 2025-11-20T15:22:07Z

What happened to the idea of a preliminary NFC patch to introduce TTI::getInstructionUniformity as a replacement for isAlwaysUniform/isSourceOfDivergence?

As per our previous discussion, it's better to make that part of this patch only to catch any edge case?

I don't understand that argument. I'd much prefer to see it as a separate patch.

Let me do it as a separate patch only. I would prefer to keeping it as a wrapper on top of isAlwaysUniform and isSourceOfDivergence.

PankajDwivedi-25 requested review from jayfoad and ssahasra April 28, 2025 14:29

llvmbot added backend:AMDGPU llvm:analysis Includes value tracking, cost tables and constant folding labels Apr 28, 2025

jayfoad requested changes Apr 28, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp Outdated Show resolved Hide resolved

arsenm reviewed Apr 29, 2025

View reviewed changes

llvmbot added the llvm:adt label May 6, 2025

PankajDwivedi-25 force-pushed the users/pkd-25/make-ui-operand-aware branch from 8cca1b2 to e4cc773 Compare September 19, 2025 11:50

ssahasra reviewed Sep 22, 2025

View reviewed changes

PankajDwivedi-25 force-pushed the users/pkd-25/make-ui-operand-aware branch from e4cc773 to 860b485 Compare September 23, 2025 11:56

arsenm reviewed Sep 25, 2025

View reviewed changes

PankajDwivedi-25 changed the title ~~[AMDGPU][TTI] Add Target Hook for Instruction Uniformity (getInstructionUniformity)~~ [AMDGPU][TTI] Add Target Hook for Instruction Uniformity Sep 25, 2025

PankajDwivedi-25 changed the title ~~[AMDGPU][TTI] Add Target Hook for Instruction Uniformity~~ [AMDGPU][TTI] Add Target Hook for the custom Instruction Uniformity Sep 25, 2025

PankajDwivedi-25 changed the title ~~[AMDGPU][TTI] Add Target Hook for the custom Instruction Uniformity~~ [AMDGPU][TTI] Add target hook for the custom instruction uniformity Sep 25, 2025

PankajDwivedi-25 force-pushed the users/pkd-25/make-ui-operand-aware branch from caa7354 to afec697 Compare October 31, 2025 20:32

llvmbot added the llvm:codegen label Oct 31, 2025

arsenm reviewed Nov 1, 2025

View reviewed changes

jayfoad reviewed Nov 3, 2025

View reviewed changes

ssahasra mentioned this pull request Nov 18, 2025

[AMDGPU] Apply alignment attr for make.buffer.rsrc #166914

Open

PankajDwivedi-25 and others added 6 commits November 19, 2025 18:04

[NFC] move isDivergentUse so later dependent function in pushUsers ca…

1093669

…n safely use it

add target hook to capture special operand uniformity and update UA t…

2e4a108

…o use it

update enum name for more clarity

2452edc

Apply suggestion from @arsenm

62862fa

Co-authored-by: Matt Arsenault <[email protected]>

let getInstructionUniformity hook wrap isSourceOfDivergence/isAlwaysU…

4f71ec7

…niform

update the operand check & update machine inst uniformity

5647603

PankajDwivedi-25 force-pushed the users/pkd-25/make-ui-operand-aware branch from dd8c0e1 to 5647603 Compare November 19, 2025 12:49

PankajDwivedi-25 added 2 commits November 19, 2025 18:59

Fix formatting

6ff4c4b

update mir test check

33e36d4

nhaehnle reviewed Nov 19, 2025

View reviewed changes

seperate complex target based custom logic through target hook

9e7f001

jayfoad reviewed Nov 20, 2025

View reviewed changes

PankajDwivedi-25 mentioned this pull request Nov 20, 2025

[NFC][TTI] Introduce getInstructionUniformity API for uniformity analysis #168903

Open

		case InstructionUniformity::EitherOfFirstTwoOp:
		return !isDivergentUse(I.getOperand(0)) \|\| !isDivergentUse(I.getOperand(1));

[TTI] Introduce getInstructionUniformity API for flexible uniformity analysis #137639

Are you sure you want to change the base?

[TTI] Introduce getInstructionUniformity API for flexible uniformity analysis #137639

Uh oh!

Conversation

PankajDwivedi-25 commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

jayfoad commented Apr 29, 2025

Uh oh!

PankajDwivedi-25 commented Apr 30, 2025

Uh oh!

PankajDwivedi-25 commented Sep 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jayfoad commented Sep 24, 2025

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

PankajDwivedi-25 commented Sep 25, 2025

Uh oh!

PankajDwivedi-25 commented Sep 25, 2025

Uh oh!

jayfoad commented Sep 25, 2025

Uh oh!

jayfoad commented Sep 25, 2025

Uh oh!

ssahasra commented Oct 6, 2025

Uh oh!

jayfoad commented Oct 6, 2025

Uh oh!

PankajDwivedi-25 commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PankajDwivedi-25 Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PankajDwivedi-25 Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ruiling commented Nov 17, 2025

Uh oh!

PankajDwivedi-25 commented Nov 17, 2025

Uh oh!

PankajDwivedi-25 commented Nov 17, 2025

Uh oh!

ruiling commented Nov 18, 2025

Uh oh!

PankajDwivedi-25 commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PankajDwivedi-25 commented Apr 28, 2025 •

edited

Loading

llvmbot commented Apr 28, 2025 •

edited

Loading

PankajDwivedi-25 commented Oct 31, 2025 •

edited

Loading

PankajDwivedi-25 Nov 4, 2025 •

edited

Loading

PankajDwivedi-25 Nov 4, 2025 •

edited

Loading

PankajDwivedi-25 commented Nov 18, 2025 •

edited

Loading

PankajDwivedi-25 commented Nov 19, 2025 •

edited

Loading

github-actions bot commented Nov 19, 2025 •

edited

Loading

github-actions bot commented Nov 19, 2025 •

edited

Loading