Skip to content

Conversation

@paulwalker-arm
Copy link
Collaborator

There are no dedicated bfloat MOV instructions but we can use the half variants when the encoing allows (e.g. f16(1.875) == bf16(1.0)).

There are no dedicated bfloat MOV instructions but we can use the
half variants when the encoing allows (e.g. f16(1.875) == bf16(1.0)).
@llvmbot
Copy link
Member

llvmbot commented Mar 3, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Paul Walker (paulwalker-arm)

Changes

There are no dedicated bfloat MOV instructions but we can use the half variants when the encoing allows (e.g. f16(1.875) == bf16(1.0)).


Full diff: https://github.com/llvm/llvm-project/pull/129550.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td (+7)
  • (modified) llvm/test/CodeGen/AArch64/sve-vector-splat.ll (+29-2)
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index 4365e573d8b16..fd38bc22a4987 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -931,6 +931,13 @@ let Predicates = [HasSVE_or_SME] in {
               (FDUP_ZI_S fpimm32:$imm8)>;
     def : Pat<(nxv2f64 (splat_vector fpimm64:$imm8)),
               (FDUP_ZI_D fpimm64:$imm8)>;
+    // Some half precision immediates alias with bfloat (e.g. f16(1.875) == bf16(1.0)).
+    def : Pat<(nxv8bf16 (splat_vector fpimmbf16:$imm8)),
+              (FDUP_ZI_H (fpimm16XForm bf16:$imm8))>;
+    def : Pat<(nxv4bf16 (splat_vector fpimmbf16:$imm8)),
+              (FDUP_ZI_H (fpimm16XForm bf16:$imm8))>;
+    def : Pat<(nxv2bf16 (splat_vector fpimmbf16:$imm8)),
+              (FDUP_ZI_H (fpimm16XForm bf16:$imm8))>;
   }
 
   // Select elements from either vector (predicated)
diff --git a/llvm/test/CodeGen/AArch64/sve-vector-splat.ll b/llvm/test/CodeGen/AArch64/sve-vector-splat.ll
index 4534e8f6de05e..4a75242848343 100644
--- a/llvm/test/CodeGen/AArch64/sve-vector-splat.ll
+++ b/llvm/test/CodeGen/AArch64/sve-vector-splat.ll
@@ -482,6 +482,33 @@ define <vscale x 2 x double> @splat_nxv2f64_imm() {
   ret <vscale x 2 x double> splat(double 1.0)
 }
 
+; NOTE: f16(1.875) == bf16(1.0)
+define <vscale x 8 x bfloat> @splat_nxv8bf16_imm() {
+; CHECK-LABEL: splat_nxv8bf16_imm:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fmov z0.h, #1.87500000
+; CHECK-NEXT:    ret
+  ret <vscale x 8 x bfloat> splat(bfloat 1.0)
+}
+
+; NOTE: f16(-1.875) == bf16(-1.0)
+define <vscale x 4 x bfloat> @splat_nxv4bf16_imm() {
+; CHECK-LABEL: splat_nxv4bf16_imm:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fmov z0.h, #-1.87500000
+; CHECK-NEXT:    ret
+  ret <vscale x 4 x bfloat> splat(bfloat -1.0)
+}
+
+; NOTE: f16(1.875) == bf16(1.0)
+define <vscale x 2 x bfloat> @splat_nxv2bf16_imm() {
+; CHECK-LABEL: splat_nxv2bf16_imm:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fmov z0.h, #1.87500000
+; CHECK-NEXT:    ret
+  ret <vscale x 2 x bfloat> splat(bfloat 1.0)
+}
+
 define <vscale x 4 x i32> @splat_nxv4i32_fold(<vscale x 4 x i32> %x) {
 ; CHECK-LABEL: splat_nxv4i32_fold:
 ; CHECK:       // %bb.0:
@@ -554,8 +581,8 @@ define <vscale x 2 x double> @splat_nxv2f64_imm_out_of_range() {
 ; CHECK-LABEL: splat_nxv2f64_imm_out_of_range:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    ptrue p0.d
-; CHECK-NEXT:    adrp x8, .LCPI57_0
-; CHECK-NEXT:    add x8, x8, :lo12:.LCPI57_0
+; CHECK-NEXT:    adrp x8, .LCPI60_0
+; CHECK-NEXT:    add x8, x8, :lo12:.LCPI60_0
 ; CHECK-NEXT:    ld1rd { z0.d }, p0/z, [x8]
 ; CHECK-NEXT:    ret
   ret <vscale x 2 x double> splat(double 3.33)

Copy link
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM provided tests pass!

@paulwalker-arm paulwalker-arm merged commit 323112a into llvm:main Mar 4, 2025
13 checks passed
@paulwalker-arm paulwalker-arm deleted the sve-splat-bf16-imm branch March 4, 2025 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants