[Clang][AMDGPU] Expose buffer load lds as a clang builtin #132048

jmmartinez · 2025-03-19T15:19:05Z

CK is using either inline assembly or inline LLVM-IR builtins to generate buffer_load_dword lds instructions.

This patch exposes this instruction as a Clang builtin.

Limitations in this patch: these instructions are not available for gfx11/gfx12. I wasn't able to identify a target feature that we could use to predicate the builtin availability, in the same way as it is done for global_load_lds.

Related to SWDEV-519702 and SWDEV-518861

llvmbot · 2025-03-19T15:19:42Z

@llvm/pr-subscribers-flang-openmp
@llvm/pr-subscribers-flang-fir-hlfir
@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-llvm-ir

Author: Juan Manuel Martinez Caamaño (jmmartinez)

Changes

CK is using either inline assembly or inline LLVM-IR builtins to generate buffer_load_dword lds instructions.

This patch exposes this instruction as a Clang builtin.

Limitations in this patch: these instructions are not available for gfx11/gfx12. I wasn't able to identify a target feature that we could use to predicate the builtin availability, in the same way as it is done for global_load_lds.

Related to SWDEV-519702 and SWDEV-518861

Full diff: https://github.com/llvm/llvm-project/pull/132048.diff

6 Files Affected:

(modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+2)
(modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+2-2)
(modified) clang/lib/Sema/SemaAMDGPU.cpp (+3-4)
(modified) clang/test/CodeGenOpenCL/builtins-amdgcn-raw-buffer-load.cl (+9)
(added) clang/test/SemaOpenCL/builtins-amdgcn-raw-ptr-buffer-load-lds-error.cl (+15)
(modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+3-1)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 44ef404aee72f..4117e8d67330b 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -162,6 +162,8 @@ BUILTIN(__builtin_amdgcn_raw_buffer_load_b64, "V2UiQbiiIi", "n")
 BUILTIN(__builtin_amdgcn_raw_buffer_load_b96, "V3UiQbiiIi", "n")
 BUILTIN(__builtin_amdgcn_raw_buffer_load_b128, "V4UiQbiiIi", "n")
 
+BUILTIN(__builtin_amdgcn_raw_ptr_buffer_load_lds, "vQbv*3IUiiiIiIi", "t")
+
 //===----------------------------------------------------------------------===//
 // Ballot builtins.
 //===----------------------------------------------------------------------===//
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 1536a3b8c920a..e17d6b530141d 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -13132,6 +13132,6 @@ def err_acc_duplicate_bind
             "permitted to refer to the same function">;
 
 // AMDGCN builtins diagnostics
-def err_amdgcn_global_load_lds_size_invalid_value : Error<"invalid size value">;
-def note_amdgcn_global_load_lds_size_valid_value : Note<"size must be %select{1, 2, or 4|1, 2, 4, 12 or 16}0">;
+def err_amdgcn_load_lds_size_invalid_value : Error<"invalid size value">;
+def note_amdgcn_load_lds_size_valid_value : Note<"size must be %select{1, 2, or 4|1, 2, 4, 12 or 16}0">;
 } // end of sema component.
diff --git a/clang/lib/Sema/SemaAMDGPU.cpp b/clang/lib/Sema/SemaAMDGPU.cpp
index a4d075dfd0768..7fec099374152 100644
--- a/clang/lib/Sema/SemaAMDGPU.cpp
+++ b/clang/lib/Sema/SemaAMDGPU.cpp
@@ -35,6 +35,7 @@ bool SemaAMDGPU::CheckAMDGCNBuiltinFunctionCall(unsigned BuiltinID,
       Builtin::evaluateRequiredTargetFeatures("gfx950-insts", CallerFeatureMap);
 
   switch (BuiltinID) {
+  case AMDGPU::BI__builtin_amdgcn_raw_ptr_buffer_load_lds:
   case AMDGPU::BI__builtin_amdgcn_global_load_lds: {
     constexpr const int SizeIdx = 2;
     llvm::APSInt Size;
@@ -54,11 +55,9 @@ bool SemaAMDGPU::CheckAMDGCNBuiltinFunctionCall(unsigned BuiltinID,
       [[fallthrough]];
     }
     default:
-      Diag(ArgExpr->getExprLoc(),
-           diag::err_amdgcn_global_load_lds_size_invalid_value)
+      Diag(ArgExpr->getExprLoc(), diag::err_amdgcn_load_lds_size_invalid_value)
           << ArgExpr->getSourceRange();
-      Diag(ArgExpr->getExprLoc(),
-           diag::note_amdgcn_global_load_lds_size_valid_value)
+      Diag(ArgExpr->getExprLoc(), diag::note_amdgcn_load_lds_size_valid_value)
           << HasGFX950Insts << ArgExpr->getSourceRange();
       return true;
     }
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-raw-buffer-load.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-raw-buffer-load.cl
index 3403b69e07e4b..5e3ed9027c17a 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-raw-buffer-load.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-raw-buffer-load.cl
@@ -170,3 +170,12 @@ v3u32 test_amdgcn_raw_ptr_buffer_load_b96_non_const_soffset(__amdgpu_buffer_rsrc
 v4u32 test_amdgcn_raw_ptr_buffer_load_b128_non_const_soffset(__amdgpu_buffer_rsrc_t rsrc, int offset, int soffset) {
   return __builtin_amdgcn_raw_buffer_load_b128(rsrc, /*offset=*/0, soffset, /*aux=*/0);
 }
+
+// CHECK-LABEL: @test_amdgcn_raw_ptr_buffer_load_lds(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void @llvm.amdgcn.raw.ptr.buffer.load.lds(ptr addrspace(8) [[RSRC:%.*]], ptr addrspace(3) [[LDS:%.*]], i32 1, i32 [[OFFSET:%.*]], i32 [[SOFFSET:%.*]], i32 2, i32 3)
+// CHECK-NEXT:    ret void
+//
+void test_amdgcn_raw_ptr_buffer_load_lds(__amdgpu_buffer_rsrc_t rsrc, __local void * lds, int offset, int soffset) {
+    __builtin_amdgcn_raw_ptr_buffer_load_lds(rsrc, lds, 1, offset, soffset, 2, 3);
+}
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-raw-ptr-buffer-load-lds-error.cl b/clang/test/SemaOpenCL/builtins-amdgcn-raw-ptr-buffer-load-lds-error.cl
new file mode 100644
index 0000000000000..b7065ea2b6655
--- /dev/null
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-raw-ptr-buffer-load-lds-error.cl
@@ -0,0 +1,15 @@
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx90a -S -verify -o - %s
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx950 -S -verify -o - %s -DGFX950
+// REQUIRES: amdgpu-registered-target
+
+void test_amdgcn_raw_ptr_buffer_load_lds(__amdgpu_buffer_rsrc_t rsrc, __local void* lds, int offset, int soffset, int x) {
+  __builtin_amdgcn_raw_ptr_buffer_load_lds(rsrc, lds, x, offset, soffset, 0, 0); //expected-error{{argument to '__builtin_amdgcn_raw_ptr_buffer_load_lds' must be a constant integer}}
+  __builtin_amdgcn_raw_ptr_buffer_load_lds(rsrc, lds, 4, offset, soffset, x, 0); //expected-error{{argument to '__builtin_amdgcn_raw_ptr_buffer_load_lds' must be a constant integer}}
+  __builtin_amdgcn_raw_ptr_buffer_load_lds(rsrc, lds, 4, offset, soffset, 0, x); //expected-error{{argument to '__builtin_amdgcn_raw_ptr_buffer_load_lds' must be a constant integer}}
+
+#ifdef GFX950
+  __builtin_amdgcn_raw_ptr_buffer_load_lds(rsrc, lds, 3, offset, soffset, 0, 0); //expected-error{{invalid size value}} expected-note{{size must be 1, 2, 4, 12 or 16}}
+#else
+  __builtin_amdgcn_raw_ptr_buffer_load_lds(rsrc, lds, 3, offset, soffset, 0, 0); //expected-error{{invalid size value}} expected-note{{size must be 1, 2, or 4}}
+#endif
+}
diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
index 307e4b8a01e5c..efec2ef8325c9 100644
--- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -1863,7 +1863,9 @@ class AMDGPURawBufferLoadLDS : Intrinsic <
    ImmArg<ArgIndex<6>>, IntrNoCallback, IntrNoFree], "", [SDNPMemOperand]>, AMDGPURsrcIntrinsic<0>;
 def int_amdgcn_raw_buffer_load_lds : AMDGPURawBufferLoadLDS;
 
-class AMDGPURawPtrBufferLoadLDS : Intrinsic <
+class AMDGPURawPtrBufferLoadLDS :
+  ClangBuiltin<"__builtin_amdgcn_raw_ptr_buffer_load_lds">,
+  Intrinsic <
   [],
   [AMDGPUBufferRsrcTy,        // rsrc(SGPR)
    LLVMQualPointerType<3>,    // LDS base offset

krzysz00

I can't speak to the Clang side of this change, but I don't see any issues here

clang/include/clang/Basic/BuiltinsAMDGPU.def

jmmartinez · 2025-03-20T15:51:25Z

I've also seen that gfx11 seem to have some kind of BUFFER_LOAD_LDS_(SIZE) instruction (different from the BUFFER_LOAD_(SIZE)_LDS instructions associated with this builtin).

However, I'm not sure they actually exists: we do not have any test where we generate them and the gfx11 shader manual explicitly says "Remove "LOAD_LDS" instructions: buffer_load_lds, scratch_load_lds, global_load_lds"

shiltian · 2025-03-20T16:41:30Z

I've also seen that gfx11 seem to have some kind of BUFFER_LOAD_LDS_(SIZE) instruction (different from the BUFFER_LOAD_(SIZE)_LDS instructions associated with this builtin).

However, I'm not sure they actually exists: we do not have any test where we generate them and the gfx11 shader manual explicitly says "Remove "LOAD_LDS" instructions: buffer_load_lds, scratch_load_lds, global_load_lds"

Sounds like we need to get information from HW folks. :-)

clang/test/CodeGen/link-builtin-bitcode.c

clang/include/clang/Basic/BuiltinsAMDGPU.def

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

jmmartinez · 2025-03-25T10:29:29Z

I've also seen that gfx11 seem to have some kind of BUFFER_LOAD_LDS_(SIZE) instruction (different from the BUFFER_LOAD_(SIZE)_LDS instructions associated with this builtin).
However, I'm not sure they actually exists: we do not have any test where we generate them and the gfx11 shader manual explicitly says "Remove "LOAD_LDS" instructions: buffer_load_lds, scratch_load_lds, global_load_lds"

Sounds like we need to get information from HW folks. :-)

I've asked and there is no buffer_load_lds instruction on gfx11 nor gfx12. I guess we can remove them in a separate patch.

arsenm · 2025-03-25T15:48:40Z

clang/test/CodeGen/link-builtin-bitcode.c

-// CHECK: attributes #[[ATTR_COMPATIBLE]] = { {{.*}} "target-cpu"="gfx90a" "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gws,+image-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" }
-// CHECK: attributes #[[ATTR_EXTEND]] = { {{.*}} "target-cpu"="gfx90a" "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+extended-image-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gws,+image-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" }
-// CHECK: attributes #[[ATTR_INCOMPATIBLE]] = { {{.*}} "target-cpu"="gfx90a" "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx90a-insts,+gws,+image-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,-gfx9-insts" }
+// CHECK: attributes #[[ATTR_COMPATIBLE]] = { {{.*}} "target-cpu"="gfx90a" "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gws,+image-insts,+lds-buffer-load-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" }


ReadOnlyFeatures seems to not have helped with this...

jmmartinez · 2025-03-26T09:08:42Z

Squashed and split in 2: #133055

This are used to restrict the availability of buffer_load_lds intrinsics to targets that actually have this instructions.

…lds on gfx9 and gfx10

CK is using either inline assembly or inline llvm-ir builtins to generate buffer_load_dword lds instructions. This patch exposes this instruction as a builtin. Related to SWDEV-519702 and SWDEV-518861

krzysz00 · 2025-04-02T17:53:14Z

Hold on, isn't there an lds bit on buffer_load_* in gfx11? Or am I misremembering

jmmartinez · 2025-04-02T18:04:16Z

Hold on, isn't there an lds bit on buffer_load_* in gfx11? Or am I misremembering

No there is not. Neither gfx11 nor 12 have LDS DMA.

CK is using either inline assembly or inline LLVM-IR builtins to generate buffer_load_dword lds instructions. This patch exposes this instruction as a Clang builtin available on gfx9 and gfx10. Related to SWDEV-519702 and SWDEV-518861

… (llvm#1539)

CK is using either inline assembly or inline LLVM-IR builtins to generate buffer_load_dword lds instructions. This patch exposes this instruction as a Clang builtin available on gfx9 and gfx10. Related to SWDEV-519702 and SWDEV-518861

jmmartinez requested review from arsenm and krzysz00 March 19, 2025 15:19

jmmartinez self-assigned this Mar 19, 2025

llvmbot added clang Clang issues not falling into any other category backend:AMDGPU clang:frontend Language frontend issues, e.g. anything involving "Sema" llvm:ir labels Mar 19, 2025

krzysz00 reviewed Mar 19, 2025

View reviewed changes

shiltian reviewed Mar 19, 2025

View reviewed changes

clang/include/clang/Basic/BuiltinsAMDGPU.def Outdated Show resolved Hide resolved

jmmartinez force-pushed the buffer_load_lds_builtin branch from c89ffa4 to a9ff844 Compare March 20, 2025 15:44

llvmbot added the clang:openmp OpenMP related changes to Clang label Mar 20, 2025

jmmartinez force-pushed the buffer_load_lds_builtin branch from a9ff844 to cb63d67 Compare March 24, 2025 09:24

llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp labels Mar 24, 2025

arsenm reviewed Mar 24, 2025

View reviewed changes

clang/test/CodeGen/link-builtin-bitcode.c Outdated Show resolved Hide resolved

arsenm reviewed Mar 24, 2025

View reviewed changes

clang/include/clang/Basic/BuiltinsAMDGPU.def Outdated Show resolved Hide resolved

arsenm reviewed Mar 24, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp Outdated Show resolved Hide resolved

jmmartinez force-pushed the buffer_load_lds_builtin branch from cb63d67 to 2b8923f Compare March 25, 2025 10:19

arsenm approved these changes Mar 25, 2025

View reviewed changes

jmmartinez force-pushed the buffer_load_lds_builtin branch from 2b8923f to 6aa2a17 Compare March 26, 2025 09:06

jmmartinez mentioned this pull request Mar 26, 2025

[AMDGPU] Use a target feature to enable __builtin_amdgcn_global_load_lds on gfx9/10 #133055

Merged

jmmartinez added 3 commits April 2, 2025 17:37

[AMDGPU] Add "lds-buffer-load-insts" attribute for all targets < gfx11

5f039ef

This are used to restrict the availability of buffer_load_lds intrinsics to targets that actually have this instructions.

[Review] Add failure test

4bed119

[AMDGPU] Use a target feature to enable __builtin_amdgcn_global_load_…

b03c118

…lds on gfx9 and gfx10

jmmartinez added 3 commits April 2, 2025 17:38

[Review] remove declare

ee2123f

[Review] mem-to-lds -> vmem-to-lds

96d7a93

[Clang][AMDGPU] Expose buffer load lds as a clang builtin

b625c81

CK is using either inline assembly or inline llvm-ir builtins to generate buffer_load_dword lds instructions. This patch exposes this instruction as a builtin. Related to SWDEV-519702 and SWDEV-518861

jmmartinez force-pushed the buffer_load_lds_builtin branch from 6aa2a17 to b625c81 Compare April 2, 2025 16:05

Merge branch 'main' into buffer_load_lds_builtin

e18cc34

jmmartinez merged commit 041e842 into llvm:main Apr 3, 2025
11 checks passed

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Apr 11, 2025

[Clang][AMDGPU] Expose buffer load lds as a clang builtin (llvm#132048)…

08bba99

… (llvm#1539)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Clang][AMDGPU] Expose buffer load lds as a clang builtin #132048

[Clang][AMDGPU] Expose buffer load lds as a clang builtin #132048

Uh oh!

jmmartinez commented Mar 19, 2025

Uh oh!

llvmbot commented Mar 19, 2025 •

edited

Loading

Uh oh!

krzysz00 left a comment

Uh oh!

Uh oh!

jmmartinez commented Mar 20, 2025

Uh oh!

shiltian commented Mar 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jmmartinez commented Mar 25, 2025

Uh oh!

arsenm Mar 25, 2025

Uh oh!

jmmartinez commented Mar 26, 2025

Uh oh!

krzysz00 commented Apr 2, 2025

Uh oh!

jmmartinez commented Apr 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Clang][AMDGPU] Expose buffer load lds as a clang builtin #132048

[Clang][AMDGPU] Expose buffer load lds as a clang builtin #132048

Uh oh!

Conversation

jmmartinez commented Mar 19, 2025

Uh oh!

llvmbot commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

krzysz00 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jmmartinez commented Mar 20, 2025

Uh oh!

shiltian commented Mar 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jmmartinez commented Mar 25, 2025

Uh oh!

arsenm Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

jmmartinez commented Mar 26, 2025

Uh oh!

krzysz00 commented Apr 2, 2025

Uh oh!

jmmartinez commented Apr 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

llvmbot commented Mar 19, 2025 •

edited

Loading