Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions llvm/include/llvm/CodeGen/TargetLowering.h
Original file line number Diff line number Diff line change
Expand Up @@ -489,6 +489,10 @@ class TargetLoweringBase {
return true;
}

/// Returns true if a floating-point load or store can be replaced with an
/// equivalent integer load or store without negatively affecting performance.
virtual bool canUseIntLoadStoreForFloatValues() const { return true; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you avoid adding a super specific I-want-this-specific-thing-in-one-place style of hook? I would expect this is already expressible in terms of legal operations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it can be, the operations are legal (and the codegen is no different from regular AArch64). The issue is the possible performance impact from accessing the same memory region with FP/vector loads/stores and GPR loads/stores in a streaming SVE function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to suggest a better name, but I'm struggling to come up with one. Perhaps you should replace can by should in the name to make the distinction more clear?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure... 😅 To me, "can" means you can do something if it's beneficial, which using integer stores for FP constants may be. And "should" means you should always do something, but it's not the case that you should always use integer loads for FP values.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about creating a virtual EVT TargetLowering::getPreferredStoreTypeForConstant(SDNode *C) const; that returns an appropriate type? This could then also do the check to see if the integer type is legal, for example.


/// Return true if it is profitable to convert a select of FP constants into
/// a constant pool load whose address depends on the select condition. The
/// parameter may be used to differentiate a select with FP compare from
Expand Down
3 changes: 3 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21587,6 +21587,9 @@ SDValue DAGCombiner::replaceStoreOfFPConstant(StoreSDNode *ST) {
// processor operation but an i64 (which is not legal) requires two. So the
// transform should not be done in this case.

if (!TLI.canUseIntLoadStoreForFloatValues())
return SDValue();

SDValue Tmp;
switch (CFP->getSimpleValueType(0).SimpleTy) {
default:
Expand Down
3 changes: 3 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -430,6 +430,9 @@ SDValue SelectionDAGLegalize::OptimizeFloatStore(StoreSDNode* ST) {
if (Value.getOpcode() == ISD::TargetConstantFP)
return SDValue();

if (!TLI.canUseIntLoadStoreForFloatValues())
return SDValue();

if (ConstantFPSDNode *CFP = dyn_cast<ConstantFPSDNode>(Value)) {
if (CFP->getValueType(0) == MVT::f32 &&
TLI.isTypeLegal(MVT::i32)) {
Expand Down
13 changes: 13 additions & 0 deletions llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -22522,6 +22522,12 @@ static SDValue replaceZeroVectorStore(SelectionDAG &DAG, StoreSDNode &St) {
if (VT.isScalableVector())
return SDValue();

// Do not replace the vector store when it could result in a streaming memory
// hazard.
const TargetLowering &TLI = DAG.getTargetLoweringInfo();
if (!TLI.canUseIntLoadStoreForFloatValues())
return SDValue();

// It is beneficial to scalarize a zero splat store for 2 or 3 i64 elements or
// 2, 3 or 4 i32 elements.
int NumVecElts = VT.getVectorNumElements();
Expand Down Expand Up @@ -27706,6 +27712,13 @@ bool AArch64TargetLowering::canMergeStoresTo(unsigned AddressSpace, EVT MemVT,
return !NoFloat || MemVT.getSizeInBits() <= 64;
}

bool AArch64TargetLowering::canUseIntLoadStoreForFloatValues() const {
// Avoid replacing FP loads/stores with integer ones when it could result in a
// streaming memory hazard.
return !Subtarget->getStreamingHazardSize() ||
(!Subtarget->isStreaming() && !Subtarget->isStreamingCompatible());
}

bool AArch64TargetLowering::preferIncOfAddToSubOfNot(EVT VT) const {
// We want inc-of-add for scalars and sub-of-not for vectors.
return VT.isScalarInteger();
Expand Down
2 changes: 2 additions & 0 deletions llvm/lib/Target/AArch64/AArch64ISelLowering.h
Original file line number Diff line number Diff line change
Expand Up @@ -840,6 +840,8 @@ class AArch64TargetLowering : public TargetLowering {
bool canMergeStoresTo(unsigned AddressSpace, EVT MemVT,
const MachineFunction &MF) const override;

bool canUseIntLoadStoreForFloatValues() const override;

bool isCheapToSpeculateCttz(Type *) const override {
return true;
}
Expand Down
186 changes: 186 additions & 0 deletions llvm/test/CodeGen/AArch64/sve-streaming-mode-fp-constant-stores.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -aarch64-streaming-hazard-size=64 -force-streaming-compatible -mattr=+sve < %s | FileCheck %s
; RUN: llc -aarch64-streaming-hazard-size=64 -force-streaming -mattr=+sme < %s | FileCheck %s
; RUN: llc -force-streaming -mattr=+sme < %s | FileCheck %s --check-prefix=NOHAZARD

target triple = "aarch64-unknown-linux-gnu"

; This test checks that in streaming[-compatible] functions if there could be
; a hazard between GPR and FPR memory operations, then integer stores are not
; used for floating-point constants.

define void @"store_f64_0.0"(ptr %num) {
; CHECK-LABEL: store_f64_0.0:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: fmov d0, xzr
; CHECK-NEXT: str d0, [x0]
; CHECK-NEXT: ret
;
; NOHAZARD-LABEL: store_f64_0.0:
; NOHAZARD: // %bb.0: // %entry
; NOHAZARD-NEXT: str xzr, [x0]
; NOHAZARD-NEXT: ret
entry:
store double 0.000000e+00, ptr %num, align 8
ret void
}

; (Integer types still use GPR stores)
define void @store_i64_0(ptr %num) {
; CHECK-LABEL: store_i64_0:
; CHECK: // %bb.0:
; CHECK-NEXT: str xzr, [x0]
; CHECK-NEXT: ret
;
; NOHAZARD-LABEL: store_i64_0:
; NOHAZARD: // %bb.0:
; NOHAZARD-NEXT: str xzr, [x0]
; NOHAZARD-NEXT: ret
store i64 0, ptr %num, align 8
ret void
}

define void @"store_f64_1.0"(ptr %num) {
; CHECK-LABEL: store_f64_1.0:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: fmov d0, #1.00000000
; CHECK-NEXT: str d0, [x0]
; CHECK-NEXT: ret
;
; NOHAZARD-LABEL: store_f64_1.0:
; NOHAZARD: // %bb.0: // %entry
; NOHAZARD-NEXT: mov x8, #4607182418800017408 // =0x3ff0000000000000
; NOHAZARD-NEXT: str x8, [x0]
; NOHAZARD-NEXT: ret
entry:
store double 1.000000e+00, ptr %num, align 8
ret void
}

define void @"store_f64_1.23456789"(ptr %num) {
; CHECK-LABEL: store_f64_1.23456789:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: adrp x8, .LCPI3_0
; CHECK-NEXT: ldr d0, [x8, :lo12:.LCPI3_0]
; CHECK-NEXT: str d0, [x0]
; CHECK-NEXT: ret
;
; NOHAZARD-LABEL: store_f64_1.23456789:
; NOHAZARD: // %bb.0: // %entry
; NOHAZARD-NEXT: mov x8, #56859 // =0xde1b
; NOHAZARD-NEXT: movk x8, #17027, lsl #16
; NOHAZARD-NEXT: movk x8, #49354, lsl #32
; NOHAZARD-NEXT: movk x8, #16371, lsl #48
; NOHAZARD-NEXT: str x8, [x0]
; NOHAZARD-NEXT: ret
entry:
store double 0x3FF3C0CA4283DE1B, ptr %num, align 8
ret void
}

define void @"store_f32_0.0"(ptr %num) {
; CHECK-LABEL: store_f32_0.0:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: fmov s0, wzr
; CHECK-NEXT: str s0, [x0]
; CHECK-NEXT: ret
;
; NOHAZARD-LABEL: store_f32_0.0:
; NOHAZARD: // %bb.0: // %entry
; NOHAZARD-NEXT: str wzr, [x0]
; NOHAZARD-NEXT: ret
entry:
store float 0.000000e+00, ptr %num, align 4
ret void
}

define void @"store_f32_1.0"(ptr %num) {
; CHECK-LABEL: store_f32_1.0:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: fmov s0, #1.00000000
; CHECK-NEXT: str s0, [x0]
; CHECK-NEXT: ret
;
; NOHAZARD-LABEL: store_f32_1.0:
; NOHAZARD: // %bb.0: // %entry
; NOHAZARD-NEXT: mov w8, #1065353216 // =0x3f800000
; NOHAZARD-NEXT: str w8, [x0]
; NOHAZARD-NEXT: ret
entry:
store float 1.000000e+00, ptr %num, align 4
ret void
}

define void @"store_f32_1.23456789"(ptr %num) {
; CHECK-LABEL: store_f32_1.23456789:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: mov w8, #1618 // =0x652
; CHECK-NEXT: movk w8, #16286, lsl #16
; CHECK-NEXT: fmov s0, w8
; CHECK-NEXT: str s0, [x0]
; CHECK-NEXT: ret
;
; NOHAZARD-LABEL: store_f32_1.23456789:
; NOHAZARD: // %bb.0: // %entry
; NOHAZARD-NEXT: mov w8, #1618 // =0x652
; NOHAZARD-NEXT: movk w8, #16286, lsl #16
; NOHAZARD-NEXT: str w8, [x0]
; NOHAZARD-NEXT: ret
entry:
store float 0x3FF3C0CA40000000, ptr %num, align 4
ret void
}

define void @"store_v4f32_0.0"(ptr %num) {
; CHECK-LABEL: store_v4f32_0.0:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: mov z0.s, #0 // =0x0
; CHECK-NEXT: str q0, [x0]
; CHECK-NEXT: ret
;
; NOHAZARD-LABEL: store_v4f32_0.0:
; NOHAZARD: // %bb.0: // %entry
; NOHAZARD-NEXT: stp xzr, xzr, [x0]
; NOHAZARD-NEXT: ret
entry:
store <4 x float> zeroinitializer, ptr %num, align 16
ret void
}

define void @"store_v4f32_1.0"(ptr %num) {
; CHECK-LABEL: store_v4f32_1.0:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: fmov z0.s, #1.00000000
; CHECK-NEXT: str q0, [x0]
; CHECK-NEXT: ret
;
; NOHAZARD-LABEL: store_v4f32_1.0:
; NOHAZARD: // %bb.0: // %entry
; NOHAZARD-NEXT: fmov z0.s, #1.00000000
; NOHAZARD-NEXT: str q0, [x0]
; NOHAZARD-NEXT: ret
entry:
store <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, ptr %num, align 16
ret void
}

define void @"store_v4f32_1.23456789"(ptr %num) {
; CHECK-LABEL: store_v4f32_1.23456789:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: mov w8, #1618 // =0x652
; CHECK-NEXT: movk w8, #16286, lsl #16
; CHECK-NEXT: mov z0.s, w8
; CHECK-NEXT: str q0, [x0]
; CHECK-NEXT: ret
;
; NOHAZARD-LABEL: store_v4f32_1.23456789:
; NOHAZARD: // %bb.0: // %entry
; NOHAZARD-NEXT: mov w8, #1618 // =0x652
; NOHAZARD-NEXT: movk w8, #16286, lsl #16
; NOHAZARD-NEXT: mov z0.s, w8
; NOHAZARD-NEXT: str q0, [x0]
; NOHAZARD-NEXT: ret
entry:
store <4 x float> <float 0x3FF3C0CA40000000, float 0x3FF3C0CA40000000, float 0x3FF3C0CA40000000, float 0x3FF3C0CA40000000>, ptr %num, align 16
ret void
}
Loading