I'm seeing instructions like
%i12 = sext i32 %i11 to i64
%i62 = shl i64 %i12, 1
%i64 = add i64 0, %i62
%i65 = add i64 %i64, 1
resulting in a sequence of 32 bit adds with global isel, while without results in a 64 bit shl+add. I assume this is worse performance. Is there a reason to prefer this or expect the same performance?
I'm using llc -O3 -march=amdgcn -mcpu=gfx942 -mtriple amdgcn-amd-hmcsa ./reduced.ll -global-isel={true,false} -o - in looking at this.
reduced.gisel.txt
reduced.sdisel.txt
reduced.txt