Skip to content

Conversation

@jwanggit86
Copy link
Contributor

Commit 5927c67 obsoletes instructions buffer_wbinvl1, buffer_wbinvl1_vol, etc. for GFX940+. To prevent causing problems for existing apps that use related intrinsics such as
llvm.amdgcn.buffer.wbinvl1, this patch allows use of those intrinsics for GFX940+.

Commit 5927c67 obsoletes instructions buffer_wbinvl1,
buffer_wbinvl1_vol, etc. for GFX940+. To prevent causing problems for
existing apps that use related intrinsics such as
llvm.amdgcn.buffer.wbinvl1, this patch allows use of those intrinsics
for GFX940+.
@llvmbot
Copy link
Member

llvmbot commented Oct 3, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Jun Wang (jwanggit86)

Changes

Commit 5927c67 obsoletes instructions buffer_wbinvl1, buffer_wbinvl1_vol, etc. for GFX940+. To prevent causing problems for existing apps that use related intrinsics such as
llvm.amdgcn.buffer.wbinvl1, this patch allows use of those intrinsics for GFX940+.


Full diff: https://github.com/llvm/llvm-project/pull/111078.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIInstructions.td (+12)
  • (added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.buffer.wbinvl1.ll (+72)
diff --git a/llvm/lib/Target/AMDGPU/SIInstructions.td b/llvm/lib/Target/AMDGPU/SIInstructions.td
index 8073aca7f197fb..aa6aafd87f506c 100644
--- a/llvm/lib/Target/AMDGPU/SIInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SIInstructions.td
@@ -2348,6 +2348,18 @@ def : GCNPat <
                    (V_RCP_IFLAG_F32_e32 (V_CVT_F32_U32_e32 $src0))))
 >;
 
+let SubtargetPredicate = isGFX940Plus in {
+def : GCNPat <
+  (int_amdgcn_buffer_wbinvl1),
+  (BUFFER_INV)
+>;
+
+def : GCNPat <
+  (int_amdgcn_buffer_wbinvl1_vol),
+  (BUFFER_INV)
+>;
+}
+
 //===----------------------------------------------------------------------===//
 // VOP3 Patterns
 //===----------------------------------------------------------------------===//
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.buffer.wbinvl1.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.buffer.wbinvl1.ll
new file mode 100644
index 00000000000000..87990f46a8fc42
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.buffer.wbinvl1.ll
@@ -0,0 +1,72 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs < %s | FileCheck %s
+
+; Allow intrinsics llvm.amdgcn.buffer.wbinvl1 and llvm.amdgcn.buffer.wbinvl1.vol to be
+; lowered to the instruction buffer_inv for GFX940+.
+; 
+declare void @llvm.amdgcn.buffer.wbinvl1()
+declare void @llvm.amdgcn.buffer.wbinvl1.vol()
+
+define void @test_wbinvl1_gfx908() #0 {
+; CHECK-LABEL: test_wbinvl1_gfx908:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    buffer_wbinvl1
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void @llvm.amdgcn.buffer.wbinvl1()
+  ret void
+}
+
+define void @test_wbinvl1_vol_gfx908() #0 {
+; CHECK-LABEL: test_wbinvl1_vol_gfx908:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    buffer_wbinvl1_vol
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void @llvm.amdgcn.buffer.wbinvl1.vol()
+  ret void
+}
+
+define void @test_wbinvl1_gfx90a() #1 {
+; CHECK-LABEL: test_wbinvl1_gfx90a:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    buffer_wbinvl1
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void @llvm.amdgcn.buffer.wbinvl1()
+  ret void
+}
+
+define void @test_wbinvl1_vol_gfx90a() #1 {
+; CHECK-LABEL: test_wbinvl1_vol_gfx90a:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    buffer_wbinvl1_vol
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void @llvm.amdgcn.buffer.wbinvl1.vol()
+  ret void
+}
+
+define void @test_wbinvl1_gfx940() #2 {
+; CHECK-LABEL: test_wbinvl1_gfx940:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    buffer_inv
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void @llvm.amdgcn.buffer.wbinvl1()
+  ret void
+}
+
+define void @test_wbinvl1_vol_gfx940() #2 {
+; CHECK-LABEL: test_wbinvl1_vol_gfx940:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    buffer_inv
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void @llvm.amdgcn.buffer.wbinvl1.vol()
+  ret void
+}
+
+attributes #0 = { nounwind "target-cpu"="gfx908" }
+attributes #1 = { nounwind "target-cpu"="gfx90a" }
+attributes #2 = { nounwind "target-cpu"="gfx940" }

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the opposite of what we should do. The intrinsics should fail to select. Clang should enforce the corresponding builtin is available for the subtarget

@jwanggit86
Copy link
Contributor Author

Are you suggesting fixing clang to disallow those intrinsics for gfx940?

@arsenm
Copy link
Contributor

arsenm commented Oct 4, 2024

Are you suggesting fixing clang to disallow those intrinsics for gfx940?

Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants