Skip to content

[AArch64] Replace expensive move from wzr by two moves via floating point immediate #146538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 16, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 12 additions & 6 deletions llvm/lib/Target/AArch64/AArch64InstrInfo.td
Original file line number Diff line number Diff line change
Expand Up @@ -7356,16 +7356,10 @@ def : Pat<(v4f16 (vector_insert (v4f16 V64:$Rn),
(i64 0)),
dsub)>;

def : Pat<(vector_insert (v8f16 V128:$Rn), (f16 fpimm0), (i64 VectorIndexH:$imm)),
(INSvi16gpr V128:$Rn, VectorIndexH:$imm, WZR)>;
def : Pat<(vector_insert (v4f16 V64:$Rn), (f16 fpimm0), (i64 VectorIndexH:$imm)),
(EXTRACT_SUBREG (INSvi16gpr (v8f16 (INSERT_SUBREG (v8f16 (IMPLICIT_DEF)), V64:$Rn, dsub)), VectorIndexH:$imm, WZR), dsub)>;
def : Pat<(vector_insert (v4f32 V128:$Rn), (f32 fpimm0), (i64 VectorIndexS:$imm)),
(INSvi32gpr V128:$Rn, VectorIndexS:$imm, WZR)>;
def : Pat<(vector_insert (v2f32 V64:$Rn), (f32 fpimm0), (i64 VectorIndexS:$imm)),
(EXTRACT_SUBREG (INSvi32gpr (v4f32 (INSERT_SUBREG (v4f32 (IMPLICIT_DEF)), V64:$Rn, dsub)), VectorIndexS:$imm, WZR), dsub)>;
def : Pat<(vector_insert v2f64:$Rn, (f64 fpimm0), (i64 VectorIndexD:$imm)),
(INSvi64gpr V128:$Rn, VectorIndexS:$imm, XZR)>;

def : Pat<(v8f16 (vector_insert (v8f16 V128:$Rn),
(f16 FPR16:$Rm), (i64 VectorIndexH:$imm))),
Expand Down Expand Up @@ -8035,6 +8029,18 @@ def MOVIv2d_ns : SIMDModifiedImmVectorNoShift<1, 1, 0, 0b1110, V128,
"movi", ".2d",
[(set (v2i64 V128:$Rd), (AArch64movi_edit imm0_255:$imm8))]>;

def : Pat<(vector_insert (v8f16 V128:$Rn), (f16 fpimm0), (i64 VectorIndexH:$imm)),
(INSvi16lane V128:$Rn, VectorIndexH:$imm,
(v8f16 (MOVIv2d_ns (i32 0))), (i64 0))>;

def : Pat<(vector_insert (v4f32 V128:$Rn), (f32 fpimm0), (i64 VectorIndexS:$imm)),
(INSvi32lane V128:$Rn, VectorIndexS:$imm,
(v4f32 (MOVIv2d_ns (i32 0))), (i64 0))>;

def : Pat<(vector_insert (v2f64 V128:$Rn), (f64 fpimm0), (i64 VectorIndexD:$imm)),
(INSvi64lane V128:$Rn, VectorIndexD:$imm,
(v2f64 (MOVIv2d_ns (i32 0))), (i64 0))>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need these new patterns? Or is the codegen without any pattern already OK?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, sorry not sure I understand, without those patterns we get the move from wzr (which we want to avoid).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean - If we remove the INS wzr patterns above, do we need the new patterns or is that already handled by the existing "insert" and "zero a register" patterns?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, I see, it is, yes :)


let Predicates = [HasNEON] in {
def : Pat<(v2i64 immAllZerosV), (MOVIv2d_ns (i32 0))>;
def : Pat<(v4i32 immAllZerosV), (MOVIv2d_ns (i32 0))>;
Expand Down
15 changes: 10 additions & 5 deletions llvm/test/CodeGen/AArch64/arm64-vector-insertion.ll
Original file line number Diff line number Diff line change
Expand Up @@ -172,8 +172,9 @@ define <8 x half> @test_insert_v8f16_insert_1(half %a) {
; CHECK-LABEL: test_insert_v8f16_insert_1:
; CHECK: // %bb.0:
; CHECK-NEXT: // kill: def $h0 killed $h0 def $q0
; CHECK-NEXT: movi.2d v1, #0000000000000000
; CHECK-NEXT: dup.8h v0, v0[0]
; CHECK-NEXT: mov.h v0[7], wzr
; CHECK-NEXT: mov.h v0[7], v1[0]
; CHECK-NEXT: ret
%v.0 = insertelement <8 x half> <half undef, half undef, half undef, half undef, half undef, half undef, half undef, half 0.0>, half %a, i32 0
%v.1 = insertelement <8 x half> %v.0, half %a, i32 1
Expand Down Expand Up @@ -278,8 +279,9 @@ define <4 x float> @test_insert_3_f32_undef_zero_vector(float %a) {
; CHECK-LABEL: test_insert_3_f32_undef_zero_vector:
; CHECK: // %bb.0:
; CHECK-NEXT: // kill: def $s0 killed $s0 def $q0
; CHECK-NEXT: movi.2d v1, #0000000000000000
; CHECK-NEXT: dup.4s v0, v0[0]
; CHECK-NEXT: mov.s v0[3], wzr
; CHECK-NEXT: mov.s v0[3], v1[0]
; CHECK-NEXT: ret
%v.0 = insertelement <4 x float> <float undef, float undef, float undef, float 0.000000e+00>, float %a, i32 0
%v.1 = insertelement <4 x float> %v.0, float %a, i32 1
Expand Down Expand Up @@ -362,7 +364,8 @@ define <4 x half> @test_insert_v4f16_f16_zero(<4 x half> %a) {
define <8 x half> @test_insert_v8f16_f16_zero(<8 x half> %a) {
; CHECK-LABEL: test_insert_v8f16_f16_zero:
; CHECK: // %bb.0:
; CHECK-NEXT: mov.h v0[6], wzr
; CHECK-NEXT: movi.2d v1, #0000000000000000
; CHECK-NEXT: mov.h v0[6], v1[0]
; CHECK-NEXT: ret
%v.0 = insertelement <8 x half> %a, half 0.000000e+00, i32 6
ret <8 x half> %v.0
Expand All @@ -382,7 +385,8 @@ define <2 x float> @test_insert_v2f32_f32_zero(<2 x float> %a) {
define <4 x float> @test_insert_v4f32_f32_zero(<4 x float> %a) {
; CHECK-LABEL: test_insert_v4f32_f32_zero:
; CHECK: // %bb.0:
; CHECK-NEXT: mov.s v0[3], wzr
; CHECK-NEXT: movi.2d v1, #0000000000000000
; CHECK-NEXT: mov.s v0[3], v1[0]
; CHECK-NEXT: ret
%v.0 = insertelement <4 x float> %a, float 0.000000e+00, i32 3
ret <4 x float> %v.0
Expand All @@ -391,7 +395,8 @@ define <4 x float> @test_insert_v4f32_f32_zero(<4 x float> %a) {
define <2 x double> @test_insert_v2f64_f64_zero(<2 x double> %a) {
; CHECK-LABEL: test_insert_v2f64_f64_zero:
; CHECK: // %bb.0:
; CHECK-NEXT: mov.d v0[1], xzr
; CHECK-NEXT: movi.2d v1, #0000000000000000
; CHECK-NEXT: mov.d v0[1], v1[0]
; CHECK-NEXT: ret
%v.0 = insertelement <2 x double> %a, double 0.000000e+00, i32 1
ret <2 x double> %v.0
Expand Down
Loading