Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion llvm/docs/AMDGPUUsage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5855,7 +5855,7 @@ The fields used by CP for code objects before V3 also match those specified in
GFX950
roundup(lds-size / (320 * 4))
GFX125*
roundup(lds-size / (256 * 4))
roundup(lds-size / (512 * 4))

24 1 bit ENABLE_EXCEPTION_IEEE_754_FP Wavefront starts execution
_INVALID_OPERATION with specified exceptions
Expand Down
9 changes: 3 additions & 6 deletions llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1161,12 +1161,9 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
ProgInfo.DX10Clamp = Mode.DX10Clamp;

unsigned LDSAlignShift;
if (STM.getFeatureBits().test(FeatureAddressableLocalMemorySize327680)) {
// LDS is allocated in 256 dword blocks.
LDSAlignShift = 10;
} else if (STM.getFeatureBits().test(
FeatureAddressableLocalMemorySize163840)) {
// LDS is allocated in 320 dword blocks.
if (STM.getFeatureBits().test(FeatureAddressableLocalMemorySize327680) ||
STM.getFeatureBits().test(FeatureAddressableLocalMemorySize163840)) {
Comment on lines +1164 to +1165
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this should be modeled more directly rather than inferring the allocation size by the target that supports this allocation size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this should be modeled more directly rather than inferring the allocation size by the target that supports this allocation size

Do you mean something directly like?:
if (gfx1250)
LDSAlignShift = 11;

Copy link
Contributor Author

@changpeng changpeng Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm : should we get the lds block size from the target directly?

// LDS is allocated in 512 or 320 dword blocks.
LDSAlignShift = 11;
} else if (STM.getFeatureBits().test(
FeatureAddressableLocalMemorySize65536)) {
Expand Down
2 changes: 1 addition & 1 deletion llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3546,7 +3546,7 @@ bool isDPALU_DPP(const MCInstrDesc &OpDesc, const MCInstrInfo &MII,
}

unsigned getLdsDwGranularity(const MCSubtargetInfo &ST) {
return ST.hasFeature(AMDGPU::FeatureAddressableLocalMemorySize327680) ? 256
return ST.hasFeature(AMDGPU::FeatureAddressableLocalMemorySize327680) ? 512
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That becomes more and more strange, doc lists other sizes, but here we have only 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That becomes more and more strange, doc lists other sizes, but here we have only 2.
Maybe it is missing for gfx950 (320 byte block),
And R600 was possibly missed intentionally.
We should add them, also call AMDGPU::getLdsDwGranularity in AMDGPUAsmPrinter::getSIProgramInfo too.

But this needs additional investigation and LIT tests, and should fall into separate PRs, I think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps. But I'd still expect hsa-gfx1250-v4.s to show some changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps. But I'd still expect hsa-gfx1250-v4.s to show some changes.

Do you mean we need to add additional tests to expose the change, or something is missed updating when we change the LDS block size in this PR? Thanks; ( I will change the maximum lds size in the test though)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably some other place is not changed. Try changing max size in the test, I'd assume hex dump will change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably some other place is not changed. Try changing max size in the test, I'd assume hex dump will change.

maximum lds size updated, and the hex dump changes accordingly. I am going to double check what we possibly missed. Thanks.

: 128;
}

Expand Down
4 changes: 2 additions & 2 deletions llvm/test/CodeGen/AMDGPU/extra-lds-size.ll
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,10 @@
; GFX1200-MESA: .long 45100
; GFX1200-MESA-NEXT: .long 1024

; GFX1250-PAL: '0x2c0b (SPI_SHADER_PGM_RSRC2_PS)': 0x200
; GFX1250-PAL: '0x2c0b (SPI_SHADER_PGM_RSRC2_PS)': 0x100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this test ends up showing the allocation granularity


; GFX1250-MESA: .long 45100
; GFX1250-MESA-NEXT: .long 512
; GFX1250-MESA-NEXT: .long 256

@lds = internal addrspace(3) global [4096 x i8] poison

Expand Down
6 changes: 3 additions & 3 deletions llvm/test/CodeGen/AMDGPU/lds-size-hsa-gfx1250.ll
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ define amdgpu_kernel void @test_lds_i32(i32 %val) {
; GCN-LABEL: test_lds_array_i8:
; GCN: .amdhsa_group_segment_fixed_size 327680
; GCN: ; LDSByteSize: 327680 bytes/workgroup
; MESA: granulated_lds_size = 320
; MESA: granulated_lds_size = 160
define amdgpu_kernel void @test_lds_array_i8() {
%gep = getelementptr inbounds [327679 x i8], ptr addrspace(3) @lds.array.i8, i32 0, i32 5
%val = load i8, ptr addrspace(3) %gep
Expand All @@ -52,7 +52,7 @@ define amdgpu_kernel void @test_lds_array_i8() {
; GCN-LABEL: test_lds_array_i16:
; GCN: .amdhsa_group_segment_fixed_size 327680
; GCN: ; LDSByteSize: 327680 bytes/workgroup
; MESA: granulated_lds_size = 320
; MESA: granulated_lds_size = 160
define amdgpu_kernel void @test_lds_array_i16() {
%gep = getelementptr inbounds [163839 x i16], ptr addrspace(3) @lds.array.i16, i32 0, i32 10
%val = load i16, ptr addrspace(3) %gep
Expand All @@ -63,7 +63,7 @@ define amdgpu_kernel void @test_lds_array_i16() {
; GCN-LABEL: test_lds_array_i32:
; GCN: .amdhsa_group_segment_fixed_size 327680
; GCN: ; LDSByteSize: 327680 bytes/workgroup
; MESA: granulated_lds_size = 320
; MESA: granulated_lds_size = 160
define amdgpu_kernel void @test_lds_array_i32() {
%gep = getelementptr inbounds [81919 x i32], ptr addrspace(3) @lds.array.i32, i32 0, i32 20
%val = load i32, ptr addrspace(3) %gep
Expand Down
2 changes: 1 addition & 1 deletion llvm/test/CodeGen/AMDGPU/pal-metadata-3.0.gfx1250.ll
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@
; CHECK-NEXT: .entry_point: _amdgpu_gs
; CHECK-NEXT: .entry_point_symbol: gs_shader
; CHECK-NEXT: .forward_progress: true
; CHECK-NEXT: .lds_size: 0x400
; CHECK-NEXT: .lds_size: 0x800
; CHECK-NEXT: .mem_ordered: true
; CHECK-NEXT: .scratch_en: false
; CHECK-NEXT: .scratch_memory_size: 0
Expand Down
Loading