Skip to content

[AMDGPU] generates v_cndmask/lshlrev for uniform select between 0 and a power of 2 #87938

@Engininja2

Description

@Engininja2

Here is my test kernel:

static __global__ void test(const int32_t input, int32_t * result) {
    const int data = input ? 32 : 0;
    int output;

    asm volatile("s_mov_b32 %0, %1" : "=s"(output) : "s"(data));

    result[threadIdx.x] = output;
}

and here's the asm it compiles to for gfx900 on rocm 6.0.0, where it fails to assemble because of an invalid operand.

	s_load_dword s2, s[4:5], 0x0
	s_load_dwordx2 s[0:1], s[4:5], 0x8
	v_lshlrev_b32_e32 v0, 2, v0
	s_waitcnt lgkmcnt(0)
	s_cmp_lg_u32 s2, 0
	s_cselect_b64 s[2:3], -1, 0
	v_cndmask_b32_e64 v1, 0, 1, s[2:3]
	v_lshlrev_b32_e32 v1, 5, v1
	;;#ASMSTART
	s_mov_b32 s2, v1
	;;#ASMEND
	v_mov_b32_e32 v1, s2
	global_store_dword v0, v1, s[0:1]

If either of the values being selected is changed so that either one isn't 0, or the other isn't a power of 2, instead it compiles to this, which works.

	s_load_dword s2, s[4:5], 0x0
	s_load_dwordx2 s[0:1], s[4:5], 0x8
	v_lshlrev_b32_e32 v0, 2, v0
	s_waitcnt lgkmcnt(0)
	s_cmp_eq_u32 s2, 0
	s_cselect_b32 s2, 0, 33
	;;#ASMSTART
	s_mov_b32 s2, s2
	;;#ASMEND
	v_mov_b32_e32 v1, s2
	global_store_dword v0, v1, s[0:1]

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions