-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[AMDGPU] update LDS block size for gfx1250 #167614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Should be 2056 bytes (512 dwords) based on current spec.
|
@llvm/pr-subscribers-backend-amdgpu Author: Changpeng Fang (changpeng) ChangesLDS block size should be 2056 bytes (512 dwords) based on current spec. Full diff: https://github.com/llvm/llvm-project/pull/167614.diff 5 Files Affected:
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index ba0e53bceade8..98aaead96b766 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -5855,7 +5855,7 @@ The fields used by CP for code objects before V3 also match those specified in
GFX950
roundup(lds-size / (320 * 4))
GFX125*
- roundup(lds-size / (256 * 4))
+ roundup(lds-size / (512 * 4))
24 1 bit ENABLE_EXCEPTION_IEEE_754_FP Wavefront starts execution
_INVALID_OPERATION with specified exceptions
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index 29f8f9bc8b54c..f47eaea9b136c 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -1161,12 +1161,9 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
ProgInfo.DX10Clamp = Mode.DX10Clamp;
unsigned LDSAlignShift;
- if (STM.getFeatureBits().test(FeatureAddressableLocalMemorySize327680)) {
- // LDS is allocated in 256 dword blocks.
- LDSAlignShift = 10;
- } else if (STM.getFeatureBits().test(
- FeatureAddressableLocalMemorySize163840)) {
- // LDS is allocated in 320 dword blocks.
+ if (STM.getFeatureBits().test(FeatureAddressableLocalMemorySize327680) ||
+ STM.getFeatureBits().test(FeatureAddressableLocalMemorySize163840)) {
+ // LDS is allocated in 512 or 320 dword blocks.
LDSAlignShift = 11;
} else if (STM.getFeatureBits().test(
FeatureAddressableLocalMemorySize65536)) {
diff --git a/llvm/test/CodeGen/AMDGPU/extra-lds-size.ll b/llvm/test/CodeGen/AMDGPU/extra-lds-size.ll
index 4349b18fd394c..b31e87c54b563 100644
--- a/llvm/test/CodeGen/AMDGPU/extra-lds-size.ll
+++ b/llvm/test/CodeGen/AMDGPU/extra-lds-size.ll
@@ -31,10 +31,10 @@
; GFX1200-MESA: .long 45100
; GFX1200-MESA-NEXT: .long 1024
-; GFX1250-PAL: '0x2c0b (SPI_SHADER_PGM_RSRC2_PS)': 0x200
+; GFX1250-PAL: '0x2c0b (SPI_SHADER_PGM_RSRC2_PS)': 0x100
; GFX1250-MESA: .long 45100
-; GFX1250-MESA-NEXT: .long 512
+; GFX1250-MESA-NEXT: .long 256
@lds = internal addrspace(3) global [4096 x i8] poison
diff --git a/llvm/test/CodeGen/AMDGPU/lds-size-hsa-gfx1250.ll b/llvm/test/CodeGen/AMDGPU/lds-size-hsa-gfx1250.ll
index 3db0fa8f21759..7e8d5e0f30b9e 100644
--- a/llvm/test/CodeGen/AMDGPU/lds-size-hsa-gfx1250.ll
+++ b/llvm/test/CodeGen/AMDGPU/lds-size-hsa-gfx1250.ll
@@ -41,7 +41,7 @@ define amdgpu_kernel void @test_lds_i32(i32 %val) {
; GCN-LABEL: test_lds_array_i8:
; GCN: .amdhsa_group_segment_fixed_size 327680
; GCN: ; LDSByteSize: 327680 bytes/workgroup
-; MESA: granulated_lds_size = 320
+; MESA: granulated_lds_size = 160
define amdgpu_kernel void @test_lds_array_i8() {
%gep = getelementptr inbounds [327679 x i8], ptr addrspace(3) @lds.array.i8, i32 0, i32 5
%val = load i8, ptr addrspace(3) %gep
@@ -52,7 +52,7 @@ define amdgpu_kernel void @test_lds_array_i8() {
; GCN-LABEL: test_lds_array_i16:
; GCN: .amdhsa_group_segment_fixed_size 327680
; GCN: ; LDSByteSize: 327680 bytes/workgroup
-; MESA: granulated_lds_size = 320
+; MESA: granulated_lds_size = 160
define amdgpu_kernel void @test_lds_array_i16() {
%gep = getelementptr inbounds [163839 x i16], ptr addrspace(3) @lds.array.i16, i32 0, i32 10
%val = load i16, ptr addrspace(3) %gep
@@ -63,7 +63,7 @@ define amdgpu_kernel void @test_lds_array_i16() {
; GCN-LABEL: test_lds_array_i32:
; GCN: .amdhsa_group_segment_fixed_size 327680
; GCN: ; LDSByteSize: 327680 bytes/workgroup
-; MESA: granulated_lds_size = 320
+; MESA: granulated_lds_size = 160
define amdgpu_kernel void @test_lds_array_i32() {
%gep = getelementptr inbounds [81919 x i32], ptr addrspace(3) @lds.array.i32, i32 0, i32 20
%val = load i32, ptr addrspace(3) %gep
diff --git a/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0.gfx1250.ll b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0.gfx1250.ll
index f934c85f68e0f..0df20f6d349f7 100644
--- a/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0.gfx1250.ll
+++ b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0.gfx1250.ll
@@ -126,7 +126,7 @@
; CHECK-NEXT: .entry_point: _amdgpu_hs
; CHECK-NEXT: .entry_point_symbol: hs_shader
; CHECK-NEXT: .forward_progress: true
-; CHECK-NEXT: .lds_size: 0x1000
+; CHECK-NEXT: .lds_size: 0x800
; CHECK-NEXT: .mem_ordered: true
; CHECK-NEXT: .scratch_en: false
; CHECK-NEXT: .scratch_memory_size: 0
|
| if (STM.getFeatureBits().test(FeatureAddressableLocalMemorySize327680) || | ||
| STM.getFeatureBits().test(FeatureAddressableLocalMemorySize163840)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this should be modeled more directly rather than inferring the allocation size by the target that supports this allocation size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this should be modeled more directly rather than inferring the allocation size by the target that supports this allocation size
Do you mean something directly like?:
if (gfx1250)
LDSAlignShift = 11;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arsenm : should we get the lds block size from the target directly?
| ; GFX1200-MESA-NEXT: .long 1024 | ||
|
|
||
| ; GFX1250-PAL: '0x2c0b (SPI_SHADER_PGM_RSRC2_PS)': 0x200 | ||
| ; GFX1250-PAL: '0x2c0b (SPI_SHADER_PGM_RSRC2_PS)': 0x100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how this test ends up showing the allocation granularity
rampitec
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not it also affect llvm/test/MC/AMDGPU/hsa-gfx1250-v4.s? That is where the size is really recorded, not just printed.
By the way, it seems LDS size is not appropriate in the file (should be 320K) |
It shall change the hex dump of KD.
Must be leftover from the moment that was the maximum. |
|
|
||
| unsigned getLdsDwGranularity(const MCSubtargetInfo &ST) { | ||
| return ST.hasFeature(AMDGPU::FeatureAddressableLocalMemorySize327680) ? 256 | ||
| return ST.hasFeature(AMDGPU::FeatureAddressableLocalMemorySize327680) ? 512 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That becomes more and more strange, doc lists other sizes, but here we have only 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That becomes more and more strange, doc lists other sizes, but here we have only 2.
Maybe it is missing for gfx950 (320 byte block),
And R600 was possibly missed intentionally.
We should add them, also call AMDGPU::getLdsDwGranularity in AMDGPUAsmPrinter::getSIProgramInfo too.
But this needs additional investigation and LIT tests, and should fall into separate PRs, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps. But I'd still expect hsa-gfx1250-v4.s to show some changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps. But I'd still expect hsa-gfx1250-v4.s to show some changes.
Do you mean we need to add additional tests to expose the change, or something is missed updating when we change the LDS block size in this PR? Thanks; ( I will change the maximum lds size in the test though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably some other place is not changed. Try changing max size in the test, I'd assume hex dump will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably some other place is not changed. Try changing max size in the test, I'd assume hex dump will change.
maximum lds size updated, and the hex dump changes accordingly. I am going to double check what we possibly missed. Thanks.
LDS block size should be 2048 bytes (512 dwords) based on current spec.