Skip to content

Commit d26625d

Browse files
vprosyakalexdeucher
authored andcommitted
drm/amdgpu/gfx10: Refine Cleaner Shader for GFX10.1.10
This patch updates the cleaner shader, which is responsible for initializing GPU resources such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Changes include adjustments to register clearing and shader configuration. - Updated GPU resource initialization addresses in the cleaner shader from `be803080` to `be803000`. - Simplified the logic in the SGPR clearing section, ensuring all SGPRs are set to zero. Fixes: 25961ba ("drm/amdgpu/gfx10: Add cleaner shader for GFX10.1.10") Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Manu Rastogi <[email protected]> Signed-off-by: Vitaly Prosyak <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
1 parent 719d84f commit d26625d

File tree

2 files changed

+9
-10
lines changed

2 files changed

+9
-10
lines changed

drivers/gpu/drm/amd/amdgpu/gfx_v10_0_cleaner_shader.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,9 +43,9 @@ static const u32 gfx_10_1_10_cleaner_shader_hex[] = {
4343
0xd70f6a01, 0x000202ff,
4444
0x00000400, 0x80828102,
4545
0xbf84fff7, 0xbefc03ff,
46-
0x00000068, 0xbe803080,
47-
0xbe813080, 0xbe823080,
48-
0xbe833080, 0x80fc847c,
46+
0x00000068, 0xbe803000,
47+
0xbe813000, 0xbe823000,
48+
0xbe833000, 0x80fc847c,
4949
0xbf84fffa, 0xbeea0480,
5050
0xbeec0480, 0xbeee0480,
5151
0xbef00480, 0xbef20480,

drivers/gpu/drm/amd/amdgpu/gfx_v10_1_10_cleaner_shader.asm

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,6 @@ shader main
4040
type(CS)
4141
wave_size(32)
4242
// Note: original source code from SQ team
43-
4443
//
4544
// Create 32 waves in a threadgroup (CS waves)
4645
// Each allocates 64 VGPRs
@@ -71,8 +70,8 @@ label_0005:
7170
s_sub_u32 s2, s2, 8
7271
s_cbranch_scc0 label_0005
7372
//
74-
s_mov_b32 s2, 0x80000000 // Bit31 is first_wave
75-
s_and_b32 s2, s2, s0 // sgpr0 has tg_size (first_wave) term as in ucode only COMPUTE_PGM_RSRC2.tg_size_en is set
73+
s_mov_b32 s2, 0x80000000 // Bit31 is first_wave
74+
s_and_b32 s2, s2, s1 // sgpr0 has tg_size (first_wave) term as in ucode only COMPUTE_PGM_RSRC2.tg_size_en is set
7675
s_cbranch_scc0 label_0023 // Clean LDS if its first wave of ThreadGroup/WorkGroup
7776
// CLEAR LDS
7877
//
@@ -99,10 +98,10 @@ label_001F:
9998
label_0023:
10099
s_mov_b32 m0, 0x00000068 // Loop 108/4=27 times (loop unrolled for performance)
101100
label_sgpr_loop:
102-
s_movreld_b32 s0, 0
103-
s_movreld_b32 s1, 0
104-
s_movreld_b32 s2, 0
105-
s_movreld_b32 s3, 0
101+
s_movreld_b32 s0, s0
102+
s_movreld_b32 s1, s0
103+
s_movreld_b32 s2, s0
104+
s_movreld_b32 s3, s0
106105
s_sub_u32 m0, m0, 4
107106
s_cbranch_scc0 label_sgpr_loop
108107

0 commit comments

Comments
 (0)