-
Notifications
You must be signed in to change notification settings - Fork 15k
[AArch64] Correct SCVTF/UCVTF instructions for vector input #152974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-backend-aarch64 Author: Amina Chabane (Amichaxx) ChangesImprove the pattern matching for scalar floating-point conversion instructions on AArch64, ensuring that more efficient scalar instructions are generated from vector inputs. Added a new pattern to the instruction definitions and updating test cases to reflect the improved code generation.
Full diff: https://github.com/llvm/llvm-project/pull/152974.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64InstrFormats.td b/llvm/lib/Target/AArch64/AArch64InstrFormats.td
index ba7cbccc0bcd6..4e2c30a7aa7ee 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrFormats.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrFormats.td
@@ -5520,6 +5520,8 @@ multiclass IntegerToFPSIMDScalar<bits<2> rmode, bits<3> opcode, string asm, SDPa
let Inst{31} = 1; // 64-bit FPR flag
let Inst{23-22} = 0b00; // 32-bit FPR flag
}
+ def : Pat<(v1f64 (extract_subvector (v2f64 (node (v2i64 (sext (v2i32 V64:$Rn))))), (i64 0))),
+ (!cast<Instruction>(NAME # DSr) (EXTRACT_SUBREG V64:$Rn, ssub))>;
def : Pat<(f16 (node (i32 (extractelt (v4i32 V128:$Rn), (i64 0))))),
(!cast<Instruction>(NAME # HSr) (EXTRACT_SUBREG V128:$Rn, ssub))>;
diff --git a/llvm/test/CodeGen/AArch64/fprcvt-cvtf.ll b/llvm/test/CodeGen/AArch64/fprcvt-cvtf.ll
index 9da6f583cec01..254510abacdde 100644
--- a/llvm/test/CodeGen/AArch64/fprcvt-cvtf.ll
+++ b/llvm/test/CodeGen/AArch64/fprcvt-cvtf.ll
@@ -101,9 +101,7 @@ define double @scvtf_f64i32_neg(<4 x i32> %x) {
define <1 x double> @scvtf_f64i32_simple(<1 x i32> %x) {
; CHECK-LABEL: scvtf_f64i32_simple:
; CHECK: // %bb.0:
-; CHECK-NEXT: sshll v0.2d, v0.2s, #0
-; CHECK-NEXT: scvtf v0.2d, v0.2d
-; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
+; CHECK-NEXT: scvtf d0, s0
; CHECK-NEXT: ret
;
; CHECK-NO-FPRCVT-LABEL: scvtf_f64i32_simple:
@@ -202,10 +200,6 @@ define float @scvtf_f32i64_neg(<2 x i64> %x) {
ret float %conv
}
-; This test does not give the indended result of scvtf s0, d0
-; This is due to the input being loaded as a 2 item vector and
-; therefore using vector inputs that do not match the pattern
-; This test will be fixed in a future revision
define <1 x float> @scvtf_f32i64_simple(<1 x i64> %x) {
; CHECK-LABEL: scvtf_f32i64_simple:
; CHECK: // %bb.0:
|
15dd684 to
e82a221
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Amina,
Align before merging
Amended patterns to match the signed path (sitofp(sext(...))) to SCVTFDSr and the unsigned path (uitofp(zext(...))) to UCVTFDSr. Moved to AArch64InstrInfo.td. Added test in fprcvt-cvtf.ll to verify.
b2bb043 to
e3cced3
Compare
|
@efriedma-quic Thanks for pointing out the errors with the patterns. I've since updated the patterns and added a test which I believe references the issue you pointed out. |
514eff4 to
675d79d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Thank you for reviewing @efriedma-quic and @CarolineConcatto! |
This pull request improves support for scalar floating-point conversions from integer vectors on AArch64, specifically for the
scvtfanducvtfinstructions. It fixes pattern matching so that single-element conversions from vectors now generate the expected scalar instructions and adds a new test to verify correct behavior for extracting a lane from a widened vector.Pattern matching and code generation improvements:
AArch64InstrInfo.tdto correctly match conversions fromv2i32tov1f64usingscvtfanducvtf, ensuring the scalar instructions (scvtf d0, s0anducvtf d0, s0) are generated when extracting a single lane.Test updates and additions:
scvtf_f64i32_simpleanducvtf_f64i32_simpletests infprcvt-cvtf.llto reflect the correct generation of scalar instructions, removing previous comments about incorrect codegen and showing the expected output.uitofp_sext_v2i32_extract_lane0to verify correct code generation when extracting a lane from a widened vector and converting to double.