[AMDGPU] update LDS block size for gfx1250 #167614

changpeng · 2025-11-12T01:23:45Z

LDS block size should be 2048 bytes (512 dwords) based on current spec.

Should be 2056 bytes (512 dwords) based on current spec.

llvmbot · 2025-11-12T01:24:24Z

@llvm/pr-subscribers-backend-amdgpu

Author: Changpeng Fang (changpeng)

Changes

LDS block size should be 2056 bytes (512 dwords) based on current spec.

Full diff: https://github.com/llvm/llvm-project/pull/167614.diff

5 Files Affected:

(modified) llvm/docs/AMDGPUUsage.rst (+1-1)
(modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp (+3-6)
(modified) llvm/test/CodeGen/AMDGPU/extra-lds-size.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/lds-size-hsa-gfx1250.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/pal-metadata-3.0.gfx1250.ll (+1-1)

diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index ba0e53bceade8..98aaead96b766 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -5855,7 +5855,7 @@ The fields used by CP for code objects before V3 also match those specified in
                                                      GFX950
                                                        roundup(lds-size / (320 * 4))
                                                      GFX125*
-                                                       roundup(lds-size / (256 * 4))
+                                                       roundup(lds-size / (512 * 4))
 
      24      1 bit   ENABLE_EXCEPTION_IEEE_754_FP    Wavefront starts execution
                      _INVALID_OPERATION              with specified exceptions
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index 29f8f9bc8b54c..f47eaea9b136c 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -1161,12 +1161,9 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
   ProgInfo.DX10Clamp = Mode.DX10Clamp;
 
   unsigned LDSAlignShift;
-  if (STM.getFeatureBits().test(FeatureAddressableLocalMemorySize327680)) {
-    // LDS is allocated in 256 dword blocks.
-    LDSAlignShift = 10;
-  } else if (STM.getFeatureBits().test(
-                 FeatureAddressableLocalMemorySize163840)) {
-    // LDS is allocated in 320 dword blocks.
+  if (STM.getFeatureBits().test(FeatureAddressableLocalMemorySize327680) ||
+      STM.getFeatureBits().test(FeatureAddressableLocalMemorySize163840)) {
+    // LDS is allocated in 512 or 320 dword blocks.
     LDSAlignShift = 11;
   } else if (STM.getFeatureBits().test(
                  FeatureAddressableLocalMemorySize65536)) {
diff --git a/llvm/test/CodeGen/AMDGPU/extra-lds-size.ll b/llvm/test/CodeGen/AMDGPU/extra-lds-size.ll
index 4349b18fd394c..b31e87c54b563 100644
--- a/llvm/test/CodeGen/AMDGPU/extra-lds-size.ll
+++ b/llvm/test/CodeGen/AMDGPU/extra-lds-size.ll
@@ -31,10 +31,10 @@
 ; GFX1200-MESA: .long 45100
 ; GFX1200-MESA-NEXT: .long 1024
 
-; GFX1250-PAL: '0x2c0b (SPI_SHADER_PGM_RSRC2_PS)': 0x200
+; GFX1250-PAL: '0x2c0b (SPI_SHADER_PGM_RSRC2_PS)': 0x100
 
 ; GFX1250-MESA: .long 45100
-; GFX1250-MESA-NEXT: .long 512
+; GFX1250-MESA-NEXT: .long 256
 
 @lds = internal addrspace(3) global [4096 x i8] poison
 
diff --git a/llvm/test/CodeGen/AMDGPU/lds-size-hsa-gfx1250.ll b/llvm/test/CodeGen/AMDGPU/lds-size-hsa-gfx1250.ll
index 3db0fa8f21759..7e8d5e0f30b9e 100644
--- a/llvm/test/CodeGen/AMDGPU/lds-size-hsa-gfx1250.ll
+++ b/llvm/test/CodeGen/AMDGPU/lds-size-hsa-gfx1250.ll
@@ -41,7 +41,7 @@ define amdgpu_kernel void @test_lds_i32(i32 %val) {
 ; GCN-LABEL: test_lds_array_i8:
 ; GCN: .amdhsa_group_segment_fixed_size 327680
 ; GCN: ; LDSByteSize: 327680 bytes/workgroup
-; MESA: granulated_lds_size = 320
+; MESA: granulated_lds_size = 160
 define amdgpu_kernel void @test_lds_array_i8() {
   %gep = getelementptr inbounds [327679 x i8], ptr addrspace(3) @lds.array.i8, i32 0, i32 5
   %val = load i8, ptr addrspace(3) %gep
@@ -52,7 +52,7 @@ define amdgpu_kernel void @test_lds_array_i8() {
 ; GCN-LABEL: test_lds_array_i16:
 ; GCN: .amdhsa_group_segment_fixed_size 327680
 ; GCN: ; LDSByteSize: 327680 bytes/workgroup
-; MESA: granulated_lds_size = 320
+; MESA: granulated_lds_size = 160
 define amdgpu_kernel void @test_lds_array_i16() {
   %gep = getelementptr inbounds [163839 x i16], ptr addrspace(3) @lds.array.i16, i32 0, i32 10
   %val = load i16, ptr addrspace(3) %gep
@@ -63,7 +63,7 @@ define amdgpu_kernel void @test_lds_array_i16() {
 ; GCN-LABEL: test_lds_array_i32:
 ; GCN: .amdhsa_group_segment_fixed_size 327680
 ; GCN: ; LDSByteSize: 327680 bytes/workgroup
-; MESA: granulated_lds_size = 320
+; MESA: granulated_lds_size = 160
 define amdgpu_kernel void @test_lds_array_i32() {
   %gep = getelementptr inbounds [81919 x i32], ptr addrspace(3) @lds.array.i32, i32 0, i32 20
   %val = load i32, ptr addrspace(3) %gep
diff --git a/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0.gfx1250.ll b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0.gfx1250.ll
index f934c85f68e0f..0df20f6d349f7 100644
--- a/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0.gfx1250.ll
+++ b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0.gfx1250.ll
@@ -126,7 +126,7 @@
 ; CHECK-NEXT:        .entry_point:    _amdgpu_hs
 ; CHECK-NEXT:        .entry_point_symbol:    hs_shader
 ; CHECK-NEXT:        .forward_progress: true
-; CHECK-NEXT:        .lds_size:       0x1000
+; CHECK-NEXT:        .lds_size:       0x800
 ; CHECK-NEXT:        .mem_ordered:    true
 ; CHECK-NEXT:        .scratch_en:     false
 ; CHECK-NEXT:        .scratch_memory_size: 0

arsenm · 2025-11-12T01:41:04Z

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

+  if (STM.getFeatureBits().test(FeatureAddressableLocalMemorySize327680) ||
+      STM.getFeatureBits().test(FeatureAddressableLocalMemorySize163840)) {


Seems like this should be modeled more directly rather than inferring the allocation size by the target that supports this allocation size

Seems like this should be modeled more directly rather than inferring the allocation size by the target that supports this allocation size

Do you mean something directly like?:
if (gfx1250)
LDSAlignShift = 11;

@arsenm : should we get the lds block size from the target directly?

arsenm · 2025-11-12T01:42:20Z

llvm/test/CodeGen/AMDGPU/extra-lds-size.ll

 ; GFX1200-MESA-NEXT: .long 1024

-; GFX1250-PAL: '0x2c0b (SPI_SHADER_PGM_RSRC2_PS)': 0x200
+; GFX1250-PAL: '0x2c0b (SPI_SHADER_PGM_RSRC2_PS)': 0x100


Not sure how this test ends up showing the allocation granularity

rampitec

Should not it also affect llvm/test/MC/AMDGPU/hsa-gfx1250-v4.s? That is where the size is really recorded, not just printed.

changpeng · 2025-11-12T22:57:39Z

Should not it also affect llvm/test/MC/AMDGPU/hsa-gfx1250-v4.s? That is where the size is really recorded, not just printed.
Not sure. The granularity is not checked at all in the file. Is there a directive for it that we can check?

By the way, it seems LDS size is not appropriate in the file (should be 320K)
// ASM: .amdhsa_group_segment_fixed_size 393216

rampitec · 2025-11-12T23:08:23Z

Should not it also affect llvm/test/MC/AMDGPU/hsa-gfx1250-v4.s? That is where the size is really recorded, not just printed.
Not sure. The granularity is not checked at all in the file. Is there a directive for it that we can check?

It shall change the hex dump of KD.

By the way, it seems LDS size is not appropriate in the file (should be 320K) // ASM: .amdhsa_group_segment_fixed_size 393216

Must be leftover from the moment that was the maximum.

rampitec · 2025-11-13T00:02:26Z

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp


 unsigned getLdsDwGranularity(const MCSubtargetInfo &ST) {
-  return ST.hasFeature(AMDGPU::FeatureAddressableLocalMemorySize327680) ? 256
+  return ST.hasFeature(AMDGPU::FeatureAddressableLocalMemorySize327680) ? 512


That becomes more and more strange, doc lists other sizes, but here we have only 2.

That becomes more and more strange, doc lists other sizes, but here we have only 2.
Maybe it is missing for gfx950 (320 byte block),
And R600 was possibly missed intentionally.
We should add them, also call AMDGPU::getLdsDwGranularity in AMDGPUAsmPrinter::getSIProgramInfo too.

But this needs additional investigation and LIT tests, and should fall into separate PRs, I think.

Perhaps. But I'd still expect hsa-gfx1250-v4.s to show some changes.

Perhaps. But I'd still expect hsa-gfx1250-v4.s to show some changes.

Do you mean we need to add additional tests to expose the change, or something is missed updating when we change the LDS block size in this PR? Thanks; ( I will change the maximum lds size in the test though)

Probably some other place is not changed. Try changing max size in the test, I'd assume hex dump will change.

Probably some other place is not changed. Try changing max size in the test, I'd assume hex dump will change.

maximum lds size updated, and the hex dump changes accordingly. I am going to double check what we possibly missed. Thanks.

[AMDGPU] update LDS block size for gfx1250

97f6ccd

Should be 2056 bytes (512 dwords) based on current spec.

llvmbot added the backend:AMDGPU label Nov 12, 2025

changpeng requested a review from rampitec November 12, 2025 01:24

changpeng requested review from arsenm, cdevadas and jayfoad November 12, 2025 01:24

arsenm reviewed Nov 12, 2025

View reviewed changes

rampitec reviewed Nov 12, 2025

View reviewed changes

Update AMDGPU::getLdsDwGranularity

b9844d9

rampitec reviewed Nov 13, 2025

View reviewed changes

Update maximum lds size for hsa-gfx125*-v4.s

c17c39c

changpeng requested review from arsenm and rampitec November 13, 2025 19:04

		if (STM.getFeatureBits().test(FeatureAddressableLocalMemorySize327680) \|\|
		STM.getFeatureBits().test(FeatureAddressableLocalMemorySize163840)) {

[AMDGPU] update LDS block size for gfx1250 #167614

Are you sure you want to change the base?

[AMDGPU] update LDS block size for gfx1250 #167614

Uh oh!

Conversation

changpeng commented Nov 12, 2025 • edited by jayfoad Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

changpeng Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rampitec left a comment

Choose a reason for hiding this comment

Uh oh!

changpeng commented Nov 12, 2025

Uh oh!

rampitec commented Nov 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

changpeng commented Nov 12, 2025 •

edited by jayfoad

Loading

changpeng Nov 12, 2025 •

edited

Loading