[AMDGPU][TTI] Add target hook for the custom instruction uniformity #137639

PankajDwivedi-25 · 2025-04-28T14:29:01Z

This patch introduces a new target hook isSpecialUniformIntrinsic() in TargetTransformInfo, enabling targets to describe more complex relationships between operand uniformity and instruction uniformity.

Currently, all the users of the divergent values are marked as divergent conservatively; instead target can override the hook isSpecialUniformIntrinsic() to capture the operand-dependent uniformity.

llvmbot · 2025-04-28T14:29:36Z

@llvm/pr-subscribers-llvm-adt
@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-backend-amdgpu

Author: Pankaj Dwivedi (PankajDwivedi-25)

Changes

This patch introduces a new target hook getInstructionUniformity(const Instruction &I) in TargetTransformInfo, enabling targets to describe more complex relationships between operand uniformity and instruction uniformity.

Currently, UniformityAnalysis categorizes instructions into a fixed set of InstructionUniformity values (Default, AlwaysUniform, NeverUniform).
However, some instructions, particularly intrinsics, have operand-dependent uniformity behaviors that are not easily captured within this framework.

This hook allows targets to override and implement custom uniformity-propagation rules for such cases.

Full diff: https://github.com/llvm/llvm-project/pull/137639.diff

7 Files Affected:

(modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+8)
(modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+5)
(modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+5)
(modified) llvm/lib/Analysis/UniformityAnalysis.cpp (+13)
(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp (+21)
(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h (+2)
(added) llvm/test/Analysis/UniformityAnalysis/AMDGPU/uniform_intrinsic.ll (+25)

diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 022530dc846ea..9af5006ce9c6d 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -23,6 +23,7 @@
 
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/Uniformity.h"
 #include "llvm/Analysis/IVDescriptors.h"
 #include "llvm/IR/FMF.h"
 #include "llvm/IR/InstrTypes.h"
@@ -1916,6 +1917,13 @@ class TargetTransformInfo {
       const Function &F,
       SmallVectorImpl<std::pair<StringRef, int64_t>> &LB) const;
 
+  /// Target can implement more complex patterns for getting Uniformity of an
+  /// instruction.Currently Uniformity analysis catagorises instructions with a
+  /// fixed set of InstructionUniformity values: Default, AlwaysUniform and
+  /// NeverUniform.
+  std::optional<InstructionUniformity>
+  getInstructionUniformity(const Instruction &I) const;
+
 private:
   std::unique_ptr<const TargetTransformInfoImplBase> TTIImpl;
 };
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 990252b1e5743..5bee462575181 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -1147,6 +1147,11 @@ class TargetTransformInfoImplBase {
       const Function &F,
       SmallVectorImpl<std::pair<StringRef, int64_t>> &LB) const {}
 
+  virtual std::optional<InstructionUniformity>
+  getInstructionUniformity(const Instruction &I) const {
+    return std::nullopt;
+  }
+
 protected:
   // Obtain the minimum required size to hold the value (without the sign)
   // In case of a vector it returns the min required size for one element.
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index 8548afea72964..50157a7714bf7 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -1476,6 +1476,11 @@ void TargetTransformInfo::collectKernelLaunchBounds(
   return TTIImpl->collectKernelLaunchBounds(F, LB);
 }
 
+std::optional<InstructionUniformity>
+TargetTransformInfo::getInstructionUniformity(const Instruction &I) const {
+  return TTIImpl->getInstructionUniformity(I);
+}
+
 TargetTransformInfoImplBase::~TargetTransformInfoImplBase() = default;
 
 TargetIRAnalysis::TargetIRAnalysis() : TTICallback(&getDefaultTTI) {}
diff --git a/llvm/lib/Analysis/UniformityAnalysis.cpp b/llvm/lib/Analysis/UniformityAnalysis.cpp
index 2101fdfacfc8f..2fc6f523139a7 100644
--- a/llvm/lib/Analysis/UniformityAnalysis.cpp
+++ b/llvm/lib/Analysis/UniformityAnalysis.cpp
@@ -35,7 +35,20 @@ template <> void llvm::GenericUniformityAnalysisImpl<SSAContext>::initialize() {
       markDivergent(I);
     else if (TTI->isAlwaysUniform(&I))
       addUniformOverride(I);
+    else if (auto Uniformity = TTI->getInstructionUniformity(I)) {
+      switch (*Uniformity) {
+      case InstructionUniformity::AlwaysUniform:
+        addUniformOverride(I);
+        break;
+      case InstructionUniformity::NeverUniform:
+        markDivergent(I);
+        break;
+      case InstructionUniformity::Default:
+        break;
+      }
+    }
   }
+
   for (auto &Arg : F.args()) {
     if (TTI->isSourceOfDivergence(&Arg)) {
       markDivergent(&Arg);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index 204d3df546bbf..5c59847dfeb62 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -1422,3 +1422,24 @@ void GCNTTIImpl::collectKernelLaunchBounds(
   LB.push_back({"amdgpu-waves-per-eu[0]", WavesPerEU.first});
   LB.push_back({"amdgpu-waves-per-eu[1]", WavesPerEU.second});
 }
+
+std::optional<InstructionUniformity>
+GCNTTIImpl::getInstructionUniformity(const Instruction &I) const {
+  if (const auto *II = dyn_cast<IntrinsicInst>(&I)) {
+    // We can define the custom rules for the intrinsics uniformity, depending
+    // on argument.
+    switch (II->getIntrinsicID()) {
+    case Intrinsic::amdgcn_permlane64:
+      // If either operand is uniform, the result is uniform.
+      for (unsigned Arg_i = 0, NumArg = II->arg_size(); Arg_i < NumArg;
+           Arg_i++) {
+        if (!isSourceOfDivergence(II->getArgOperand(Arg_i)))
+          return InstructionUniformity::AlwaysUniform;
+      }
+      return InstructionUniformity::Default;
+    default:
+      break;
+    }
+  }
+  return std::nullopt;
+}
\ No newline at end of file
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
index f6f7bd4bfcf5b..bea0b024d745b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
@@ -290,6 +290,8 @@ class GCNTTIImpl final : public BasicTTIImplBase<GCNTTIImpl> {
   void collectKernelLaunchBounds(
       const Function &F,
       SmallVectorImpl<std::pair<StringRef, int64_t>> &LB) const override;
+  std::optional<InstructionUniformity>
+  getInstructionUniformity(const Instruction &I) const override;
 };
 
 } // end namespace llvm
diff --git a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/uniform_intrinsic.ll b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/uniform_intrinsic.ll
new file mode 100644
index 0000000000000..4bb89516b2e81
--- /dev/null
+++ b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/uniform_intrinsic.ll
@@ -0,0 +1,25 @@
+; RUN: opt -mtriple amdgcn-unknown-amdhsa -passes='print<uniformity>' -disable-output %s 2>&1 | FileCheck %s
+
+; CHECK: ALL VALUES UNIFORM
+define amdgpu_kernel void @permlane64_constant(ptr addrspace(1) %out) {
+  %v = call i32 @llvm.amdgcn.permlane64(i32 7)
+  store i32 %v, ptr addrspace(1) %out
+  ret void
+}
+
+; CHECK: ALL VALUES UNIFORM
+define amdgpu_kernel void @permlane64_uniform(ptr addrspace(1) %out, i32 %src) {
+  %v = call i32 @llvm.amdgcn.permlane64(i32 %src)
+  store i32 %v, ptr addrspace(1) %out
+  ret void
+}
+
+; CHECK: DIVERGENT: %tid = call i32 @llvm.amdgcn.workitem.id.x()
+; CHECK: DIVERGENT: %v = call i32 @llvm.amdgcn.permlane64.i32(i32 %tid)
+define amdgpu_kernel void @permlane64_nonuniform(i32 addrspace(1)* %out) {
+  %tid = call i32 @llvm.amdgcn.workitem.id.x()
+  %v = call i32 @llvm.amdgcn.permlane64(i32 %tid)
+  %out_ptr = getelementptr i32, i32 addrspace(1)* %out, i32 %tid
+  store i32 %v, i32 addrspace(1)* %out_ptr
+  ret void
+}

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

arsenm

I don't know what this new hook gives that the existing ones do not handle already

jayfoad · 2025-04-29T14:00:59Z

I don't know what this new hook gives that the existing ones do not handle already

I think this is supposed to address #131779 (but in its current form it does not).

PankajDwivedi-25 · 2025-04-30T04:18:20Z

Right, this is initial patch.

Right now it doesn't do anything more than already there.

I was looking for two things here.

Intrinsic list whose uniformity depends on operand.
How to get the operand uniformity at this point.

PankajDwivedi-25 · 2025-09-19T11:57:38Z

I can see the target intrinsic of this hook is currently classified as sourceOfDivergence in .td file, which is hence marked divergent during the initialization phase itself of the uniformity analysis.

For example, I have removed one such intrinsic from the list in the .td, and now the hook can classify it as uniform if one or more of its operands are uniform.

ssahasra · 2025-09-22T06:07:06Z

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

+      return InstructionUniformity::Default;
+
+    if (CalledFunc->getIntrinsicID() == Intrinsic::amdgcn_permlane16) {
+      // Check if any operand is uniform.


But this code does not seem to check any operand. It only checks the new InstructionUniformity.

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h

llvm/lib/Analysis/UniformityAnalysis.cpp

ssahasra · 2025-09-22T06:10:05Z

llvm/lib/Analysis/UniformityAnalysis.cpp

+    if (!Uniformity || *Uniformity == InstructionUniformity::Default)
+      markDivergent(*UserInstr); // fallback: conservative
+    else if (*Uniformity == InstructionUniformity::Uniform)
+      addUniformOverride(*UserInstr);


No one checked if the relevant operand is uniform. How did we conclude that the result is uniform?

ssahasra · 2025-09-22T06:11:24Z

llvm/include/llvm/ADT/Uniformity.h

-  NeverUniform
+  NeverUniform,
+
+  /// The result value is uniform because one or more of its operand are


This seems to be saying: "The result is uniform if any operand is uniform". That is unlikely to be true for all intrinsics. We need statements of the form "The result is uniform if operand N is uniform".

…n safely use it

…o use it

jayfoad · 2025-09-24T14:34:04Z

This patch introduces a new target hook isSpecialUniformIntrinsic() in TargetTransformInfo, enabling targets to describe more complex relationships between operand uniformity and instruction uniformity.

Currently, all the users of the divergent values are marked as divergent conservatively; instead target can override the hook isSpecialUniformIntrinsic() to capture the operand-dependent uniformity.

The way you have implemented this, isSpecialUniformIntrinsic just identifies intrinsics that have yet another fixed way of calculating their uniformity: they are uniform if any of their operands are uniform. This is useful for some basic permute-style operations but really not flexible enough to be more generally useful.

For example, llvm.amdgcn.permlane16 has extra fetch_invalid and bound_control operands, so the "result if uniform if any operand is uniform" rule does not work for it! It needs to be something like "result is uniform is either of the first two operands are uniform (possibly with some more special cases depending on the values of the extra operands)".

I really think we need a more flexible target hook that can implement any arbitrary logic for the uniformity of the result of an operation, based on the uniformity of each of its operands.

arsenm

Don't we already have special context dependent uniformity with the existing API? e.g. identifying math based on the work item ID

arsenm · 2025-09-25T06:47:39Z

llvm/include/llvm/ADT/GenericUniformityImpl.h

+  /// \brief keep track of special target intrinsics that can be proven uniform.
+  void addSpecialUniformIntrinsic(const InstructionT &Instr);


It's not clear to me what 'special' means. Is there a better name?

The term special means here that intrinsic can be uniform though not all of its operands are uniform.

Right, I am also not convinced of this name. will rename it later based on refined functionality.

arsenm · 2025-09-25T06:48:47Z

llvm/test/Analysis/UniformityAnalysis/AMDGPU/uniform_intrinsic.ll

+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -mtriple amdgcn-- -passes='print<uniformity>' -disable-output %s 2>&1 | FileCheck %s
+
+; CHECK: ALL VALUES UNIFORM


These aren't always always uniform, so need test coverage for all the conditions that would make this divergent

arsenm · 2025-09-25T06:50:10Z

llvm/lib/Analysis/UniformityAnalysis.cpp

+      if (SpecialUniformIntrinsics.count(UserInstr) &&
+          isAnyOperandUniform(*UserInstr)) {


This is too specific of a "specially uniform" check. I'd expect this operand validation to be an operation dependent property

Yeah, I think as @jayfoad suggested. Probably I should let this be decided by the target. The only thing is that it would be costlier, and the decision of uniformity will be taken outside of UA, which I am not sure is acceptable?

arsenm · 2025-09-25T06:50:26Z

llvm/lib/Analysis/UniformityAnalysis.cpp

+template <>
+bool llvm::GenericUniformityAnalysisImpl<SSAContext>::isDivergentUse(
+    const Use &U) const {
+  const auto *V = U.get();


PankajDwivedi-25 · 2025-09-25T08:09:36Z

Don't we already have special context dependent uniformity with the existing API? e.g. identifying math based on the work item ID

I don't get this. you mean pattern optimizations based on work item ID?

I don't see any such API existing at the IR level.

PankajDwivedi-25 · 2025-09-25T08:33:28Z

I really think we need a more flexible target hook that can implement any arbitrary logic for the uniformity of the result of an operation, based on the uniformity of each of its operands.

The only way I see to achieve this is by passing the Uniformity of the operands to Targets and letting them define the custom rule.

My concern is that it would be a bit expensive, also the decision of uniformity will be taken outside of UA.

Let me know if you guys have any thoughts on how best this can be implemented?

jayfoad · 2025-09-25T09:21:21Z

Don't we already have special context dependent uniformity with the existing API? e.g. identifying math based on the work item ID

Yes we have some special known-uniform cases in GCNTTIImpl::isAlwaysUniform, but that code does not have access to the uniformity of the operands, so it cannot implement rules like "result is uniform if any operand is uniform".

jayfoad · 2025-09-25T09:22:55Z

The only way I see to achieve this is by passing the Uniformity of the operands to Targets and letting them define the custom rule.

Exactly, you would need some kind of target hook where the uniformity of the operands is passed in, or where the target hook can call back into UA to query the uniformity of operands.

My concern is that it would be a bit expensive, also the decision of uniformity will be taken outside of UA.

I don't see why it would be too expensive. If it is then we should just give up on trying to implement this feature.

ssahasra · 2025-10-06T02:24:42Z

The only way I see to achieve this is by passing the Uniformity of the operands to Targets and letting them define the custom rule.

Exactly, you would need some kind of target hook where the uniformity of the operands is passed in, or where the target hook can call back into UA to query the uniformity of operands.

My concern is that it would be a bit expensive, also the decision of uniformity will be taken outside of UA.

I don't see why it would be too expensive. If it is then we should just give up on trying to implement this feature.

We believe it will be expensive because there's a virtual function call happening every time the UA (re)visits an intrinsic call. The approach we are exploring now is more static ... the target hook can return an enum encoding the "uniformity policy" of the intrinsic, which will be cached. Then the UA can interpret the policy every time it visits an intrinsic. That way, all the queries and decisions taken for uniformity stay within the UA implementation, and the virtual function is called only once when initializing the UA.

jayfoad · 2025-10-06T09:45:45Z

We believe it will be expensive because there's a virtual function call happening every time the UA (re)visits an intrinsic call. The approach we are exploring now is more static ... the target hook can return an enum encoding the "uniformity policy" of the intrinsic, which will be cached. Then the UA can interpret the policy every time it visits an intrinsic. That way, all the queries and decisions taken for uniformity stay within the UA implementation, and the virtual function is called only once when initializing the UA.

Well, OK, but I think it needs to be at least flexible enough to handle cases like ""result is uniform if either of the first two operands are uniform", for intrinsics like llvm.amdgcn.permlane16.

PankajDwivedi-25 requested review from jayfoad and ssahasra April 28, 2025 14:29

llvmbot added backend:AMDGPU llvm:analysis Includes value tracking, cost tables and constant folding labels Apr 28, 2025

jayfoad requested changes Apr 28, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp Outdated Show resolved Hide resolved

arsenm reviewed Apr 29, 2025

View reviewed changes

llvmbot added the llvm:adt label May 6, 2025

PankajDwivedi-25 force-pushed the users/pkd-25/make-ui-operand-aware branch from 8cca1b2 to e4cc773 Compare September 19, 2025 11:50

ssahasra reviewed Sep 22, 2025

View reviewed changes

[NFC] move isDivergentUse so later dependent function in pushUsers ca…

860b485

…n safely use it

PankajDwivedi-25 force-pushed the users/pkd-25/make-ui-operand-aware branch from e4cc773 to 860b485 Compare September 23, 2025 11:56

PankajDwivedi-25 added 2 commits September 23, 2025 17:43

add target hook to capture special operand uniformity and update UA t…

939c410

…o use it

add intrinsic permlanex16

caa7354

arsenm reviewed Sep 25, 2025

View reviewed changes

PankajDwivedi-25 changed the title ~~[AMDGPU][TTI] Add Target Hook for Instruction Uniformity (getInstructionUniformity)~~ [AMDGPU][TTI] Add Target Hook for Instruction Uniformity Sep 25, 2025

PankajDwivedi-25 changed the title ~~[AMDGPU][TTI] Add Target Hook for Instruction Uniformity~~ [AMDGPU][TTI] Add Target Hook for the custom Instruction Uniformity Sep 25, 2025

PankajDwivedi-25 changed the title ~~[AMDGPU][TTI] Add Target Hook for the custom Instruction Uniformity~~ [AMDGPU][TTI] Add target hook for the custom instruction uniformity Sep 25, 2025

		/// \brief keep track of special target intrinsics that can be proven uniform.
		void addSpecialUniformIntrinsic(const InstructionT &Instr);

		if (SpecialUniformIntrinsics.count(UserInstr) &&
		isAnyOperandUniform(*UserInstr)) {

[AMDGPU][TTI] Add target hook for the custom instruction uniformity #137639

Are you sure you want to change the base?

[AMDGPU][TTI] Add target hook for the custom instruction uniformity #137639

Uh oh!

Conversation

PankajDwivedi-25 commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

jayfoad commented Apr 29, 2025

Uh oh!

PankajDwivedi-25 commented Apr 30, 2025

Uh oh!

PankajDwivedi-25 commented Sep 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jayfoad commented Sep 24, 2025

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PankajDwivedi-25 commented Sep 25, 2025

Uh oh!

PankajDwivedi-25 commented Sep 25, 2025

Uh oh!

jayfoad commented Sep 25, 2025

Uh oh!

jayfoad commented Sep 25, 2025

Uh oh!

ssahasra commented Oct 6, 2025

Uh oh!

jayfoad commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

PankajDwivedi-25 commented Apr 28, 2025 •

edited

Loading

llvmbot commented Apr 28, 2025 •

edited

Loading