Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2569,6 +2569,17 @@ MVT AArch64TargetLowering::getScalarShiftAmountTy(const DataLayout &DL,
bool AArch64TargetLowering::allowsMisalignedMemoryAccesses(
EVT VT, unsigned AddrSpace, Align Alignment, MachineMemOperand::Flags Flags,
unsigned *Fast) const {

// Allow SVE loads/stores where the alignment >= the size of the element type,
// even with +strict-align. The SVE loads/stores do not require memory to be
// aligned more than the element type even without unaligned accesses.
// Without this, already aligned loads and stores are forced to have 16-byte
// alignment, which is unnecessary and fails to build as
// TLI.expandUnalignedLoad() and TLI.expandUnalignedStore() don't yet support
// scalable vectors.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit on the comment; this is true for SVE's ld1/st1 instruction, but not for SVE str/ldr as those require the address to be 16-byte aligned (for data vectors, and 2-byte aligned for predicate vectors). So there is an assumption here that a store of <vscale x 4 x i32> ends up using st1, which is true in practice if the store comes from the IR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewritten the comment. Is it safe to assume the str z* is only used for spills/fills (and possibly via an intrinsic, which I don't think uses these alignment checks)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For scalable data-vectors ldr/str is only ever used for spills and fills. For predicate regs, there is only str/ldr and so their addresses must be at least 2-byte aligned.

Copy link
Collaborator

@paulwalker-arm paulwalker-arm Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid do anything to prevent an expanded use of ldr/str though. I know I've toyed with it in the past as a way to remove predicate data dependencies.

Not sure if this is relevant but they could also come in via inline asm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this prevents an expanded use of ldr/str for operations that are known to be 16-byte aligned. For operations with < 16-byte alignment any expansion in expandUnalignedLoad/Store would negate any benefit of using ldr/str over predicated loads/stores over anyway.

I don't think inline asm is relevant here, I think these hooks are mainly used things like StoreSDNode/LoadSDNode.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great news. Thanks for the update.

if (VT.isScalableVector() && Alignment >= Align(VT.getScalarSizeInBits() / 8))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Align(VT.getScalarSizeInBits() / 8) will fail an assert when VT < MVT::i8 (like a predicate MVT::i1), so this would fail when the +strict-align feature is not set. Could you add a test for this case?

Not caused by your patch, but I am surprised to see LLVM actually generate regular loads/stores when the alignment is smaller than the element size, e.g. load <4 x i32>, ptr %ptr, align 1 when the +strict-align flag is not set at all. Is this a bug?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a bug. Looking at AArch64.UnalignedAccessFaults() (used as part of the FP loads): https://developer.arm.com/documentation/ddi0602/2023-09/Shared-Pseudocode/aarch64-functions-memory?lang=en#AArch64.UnalignedAccessFaults.3 it looks like unaligned accesses are supported (depending on the configuration).

Also the langref states:

The optional constant align argument specifies the alignment of the operation (that is, the alignment of the memory address). It is the responsibility of the code emitter to ensure that the alignment information is correct.

return true;

if (Subtarget->requiresStrictAlign())
return false;

Expand Down
58 changes: 58 additions & 0 deletions llvm/test/CodeGen/AArch64/sve-load-store-strict-align.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+strict-align < %s | FileCheck %s

define void @nxv16i8(ptr %ldptr, ptr %stptr) {
; CHECK-LABEL: nxv16i8:
; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.b
; CHECK-NEXT: ld1b { z0.b }, p0/z, [x0]
; CHECK-NEXT: st1b { z0.b }, p0, [x1]
; CHECK-NEXT: ret
%l3 = load <vscale x 16 x i8>, ptr %ldptr, align 1
store <vscale x 16 x i8> %l3, ptr %stptr, align 1
ret void
}

define void @nxv8i16(ptr %ldptr, ptr %stptr) {
; CHECK-LABEL: nxv8i16:
; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.h
; CHECK-NEXT: ld1h { z0.h }, p0/z, [x0]
; CHECK-NEXT: st1h { z0.h }, p0, [x1]
; CHECK-NEXT: ret
%l3 = load <vscale x 8 x i16>, ptr %ldptr, align 2
store <vscale x 8 x i16> %l3, ptr %stptr, align 2
ret void
}

define void @nxv4i32(ptr %ldptr, ptr %stptr) {
; CHECK-LABEL: nxv4i32:
; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.s
; CHECK-NEXT: ld1w { z0.s }, p0/z, [x0]
; CHECK-NEXT: st1w { z0.s }, p0, [x1]
; CHECK-NEXT: ret
%l3 = load <vscale x 4 x i32>, ptr %ldptr, align 4
store <vscale x 4 x i32> %l3, ptr %stptr, align 4
ret void
}

define void @nxv2i64(ptr %ldptr, ptr %stptr) {
; CHECK-LABEL: nxv2i64:
; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.d
; CHECK-NEXT: ld1d { z0.d }, p0/z, [x0]
; CHECK-NEXT: st1d { z0.d }, p0, [x1]
; CHECK-NEXT: ret
%l3 = load <vscale x 2 x i64>, ptr %ldptr, align 8
store <vscale x 2 x i64> %l3, ptr %stptr, align 8
ret void
}

; FIXME: Support TLI.expandUnalignedLoad()/TLI.expandUnalignedStore() for SVE.
; define void @unaligned_nxv2i64(ptr %ldptr, ptr %stptr) {
; %l3 = load <vscale x 2 x i64>, ptr %ldptr, align 4
; store <vscale x 2 x i64> %l3, ptr %stptr, align 4
; ret void
; }
Loading