-
Notifications
You must be signed in to change notification settings - Fork 15.3k
Open
1 / 11 of 1 issue completedDescription
Noticed while reviewing constexpr handling of the predicated arithmetic:
define <16 x i32> @add(<16 x i32> %x, <16 x i32> %y) {
%add = add <16 x i32> %y, %x
%res = shufflevector <16 x i32> %add, <16 x i32> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
ret <16 x i32> %res
}add: # @add
vpaddd %zmm0, %zmm1, %zmm0
movw $255, %ax
kmovd %eax, %k1
vpexpandd %zmm0, %zmm0 {%k1} {z}
retqLots of things going wrong here:
- Lowering the shuffle as an expansion instead of a select (which would fold into a predicated instruction)
- Use of movw/kmovd instead of kxnorb to rematerialize the 0xFF predicate mask directly
- Zeroing upper 256-bits of the vector - so this could have just been done as
vpaddd %ymm0, %ymm1, %ymm0for implicit zeroing