-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[AArch64][SVE] Don't require 16-byte aligned SVE loads/stores with +strict-align #119732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2569,6 +2569,17 @@ MVT AArch64TargetLowering::getScalarShiftAmountTy(const DataLayout &DL, | |
| bool AArch64TargetLowering::allowsMisalignedMemoryAccesses( | ||
| EVT VT, unsigned AddrSpace, Align Alignment, MachineMemOperand::Flags Flags, | ||
| unsigned *Fast) const { | ||
|
|
||
| // Allow SVE loads/stores where the alignment >= the size of the element type, | ||
| // even with +strict-align. The SVE loads/stores do not require memory to be | ||
| // aligned more than the element type even without unaligned accesses. | ||
| // Without this, already aligned loads and stores are forced to have 16-byte | ||
| // alignment, which is unnecessary and fails to build as | ||
| // TLI.expandUnalignedLoad() and TLI.expandUnalignedStore() don't yet support | ||
| // scalable vectors. | ||
| if (VT.isScalableVector() && Alignment >= Align(VT.getScalarSizeInBits() / 8)) | ||
|
||
| return true; | ||
|
|
||
| if (Subtarget->requiresStrictAlign()) | ||
| return false; | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,58 @@ | ||
| ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4 | ||
| ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s | ||
| ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+strict-align < %s | FileCheck %s | ||
|
|
||
| define void @nxv16i8(ptr %ldptr, ptr %stptr) { | ||
| ; CHECK-LABEL: nxv16i8: | ||
| ; CHECK: // %bb.0: | ||
| ; CHECK-NEXT: ptrue p0.b | ||
| ; CHECK-NEXT: ld1b { z0.b }, p0/z, [x0] | ||
| ; CHECK-NEXT: st1b { z0.b }, p0, [x1] | ||
| ; CHECK-NEXT: ret | ||
| %l3 = load <vscale x 16 x i8>, ptr %ldptr, align 1 | ||
| store <vscale x 16 x i8> %l3, ptr %stptr, align 1 | ||
| ret void | ||
| } | ||
|
|
||
| define void @nxv8i16(ptr %ldptr, ptr %stptr) { | ||
| ; CHECK-LABEL: nxv8i16: | ||
| ; CHECK: // %bb.0: | ||
| ; CHECK-NEXT: ptrue p0.h | ||
| ; CHECK-NEXT: ld1h { z0.h }, p0/z, [x0] | ||
| ; CHECK-NEXT: st1h { z0.h }, p0, [x1] | ||
| ; CHECK-NEXT: ret | ||
| %l3 = load <vscale x 8 x i16>, ptr %ldptr, align 2 | ||
| store <vscale x 8 x i16> %l3, ptr %stptr, align 2 | ||
| ret void | ||
| } | ||
|
|
||
| define void @nxv4i32(ptr %ldptr, ptr %stptr) { | ||
| ; CHECK-LABEL: nxv4i32: | ||
| ; CHECK: // %bb.0: | ||
| ; CHECK-NEXT: ptrue p0.s | ||
| ; CHECK-NEXT: ld1w { z0.s }, p0/z, [x0] | ||
| ; CHECK-NEXT: st1w { z0.s }, p0, [x1] | ||
| ; CHECK-NEXT: ret | ||
| %l3 = load <vscale x 4 x i32>, ptr %ldptr, align 4 | ||
| store <vscale x 4 x i32> %l3, ptr %stptr, align 4 | ||
| ret void | ||
| } | ||
|
|
||
| define void @nxv2i64(ptr %ldptr, ptr %stptr) { | ||
| ; CHECK-LABEL: nxv2i64: | ||
| ; CHECK: // %bb.0: | ||
| ; CHECK-NEXT: ptrue p0.d | ||
| ; CHECK-NEXT: ld1d { z0.d }, p0/z, [x0] | ||
| ; CHECK-NEXT: st1d { z0.d }, p0, [x1] | ||
| ; CHECK-NEXT: ret | ||
| %l3 = load <vscale x 2 x i64>, ptr %ldptr, align 8 | ||
| store <vscale x 2 x i64> %l3, ptr %stptr, align 8 | ||
| ret void | ||
| } | ||
|
|
||
| ; FIXME: Support TLI.expandUnalignedLoad()/TLI.expandUnalignedStore() for SVE. | ||
| ; define void @unaligned_nxv2i64(ptr %ldptr, ptr %stptr) { | ||
| ; %l3 = load <vscale x 2 x i64>, ptr %ldptr, align 4 | ||
| ; store <vscale x 2 x i64> %l3, ptr %stptr, align 4 | ||
| ; ret void | ||
| ; } | ||
MacDue marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit on the comment; this is true for SVE's ld1/st1 instruction, but not for SVE str/ldr as those require the address to be 16-byte aligned (for data vectors, and 2-byte aligned for predicate vectors). So there is an assumption here that a store of
<vscale x 4 x i32>ends up usingst1, which is true in practice if the store comes from the IR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewritten the comment. Is it safe to assume the
str z*is only used for spills/fills (and possibly via an intrinsic, which I don't think uses these alignment checks)?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For scalable data-vectors ldr/str is only ever used for spills and fills. For predicate regs, there is only str/ldr and so their addresses must be at least 2-byte aligned.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid do anything to prevent an expanded use of ldr/str though. I know I've toyed with it in the past as a way to remove predicate data dependencies.
Not sure if this is relevant but they could also come in via inline asm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this prevents an expanded use of ldr/str for operations that are known to be 16-byte aligned. For operations with < 16-byte alignment any expansion in expandUnalignedLoad/Store would negate any benefit of using ldr/str over predicated loads/stores over anyway.
I don't think inline asm is relevant here, I think these hooks are mainly used things like
StoreSDNode/LoadSDNode.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great news. Thanks for the update.