-
Notifications
You must be signed in to change notification settings - Fork 14.7k
Open
Description
Consider this example
typedef uint uint32;
typedef ulong uint64;
typedef struct { uint64 s0; uint32 s1; } uint96;
uint96 uint96_add_64(const uint96 x, const uint64 y)
uint96 r;
const uint64 s0 = x.s0 + y;
r.s0 = s0;
r.s1 = x.s1 + (s0 < y);
return r;
}
It gets compiled to
v_add_co_u32 v0, vcc_lo, v3, v0
v_add_co_ci_u32_e32 v1, vcc_lo, v4, v1, vcc_lo
v_cmp_lt_u64_e32 vcc_lo, v[0:1], v[3:4]
v_add_co_ci_u32_e32 v2, vcc_lo, 0, v2, vcc_lo
v_cmp_lt_u64_e32 shouldn’t be needed since vcc_lo already contains the carry.
Expected
v_add_co_u32 v0, vcc_lo, v3, v0
v_add_co_ci_u32_e32 v1, vcc_lo, v4, v1, vcc_lo
v_add_co_ci_u32_e32 v2, vcc_lo, 0, v2, vcc_lo
See for more examples
ROCm/ROCm#4717
ROCm/ROCm#477 (comment)
There's already an optimization for uint (32bit+32bit), which generates an optimal code
v_add_co_u32_e32 v2, vcc, v3, v2
v_addc_co_u32_e32 v2, vcc, 0, v2, vcc
However, the code is not optimized for ulong combinations (32-bit + 64-bit or 64-bit + 64-bit). See 1c9a93a ... could the optimization be extended for those cases? Thanks!