-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[Intrinsics][AArch64] Add intrinsics for masking off aliasing vector lanes #117007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
f9e5a7c
071728f
80a72ca
daa2ac4
3fcb9e8
6628a98
0644542
75af361
5f563d9
24df6bf
ec37dfa
8d81955
8a09412
45cbaff
1b7b0da
0a0de88
54d32ad
2066929
9b3a71a
215d2e7
ec2bfed
9f5f91a
eb8d5af
c3d6fc8
9c5631d
52fca12
9a985ab
b09d354
56f9a6b
26bf362
a84e5e2
054f859
970e7f9
fddda14
36be558
c3d2acf
8af5019
558bc3e
32e0192
3d7c2da
5402e27
5075b5f
d85d375
4dedf42
33be150
587a25c
3abc7ba
8eb12a0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -24128,8 +24128,7 @@ Overview: | |||||||||||||||
Given a vector load from %ptrA followed by a vector store to %ptrB, this | ||||||||||||||||
instruction generates a mask where an active lane indicates that the | ||||||||||||||||
write-after-read sequence can be performed safely for that lane, without the | ||||||||||||||||
danger of it turning into a read-after-write sequence and introducing a | ||||||||||||||||
store-to-load forwarding hazard. | ||||||||||||||||
danger of a write-after-read hazard occurring. | ||||||||||||||||
|
||||||||||||||||
A write-after-read hazard occurs when a write-after-read sequence for a given | ||||||||||||||||
lane in a vector ends up being executed as a read-after-write sequence due to | ||||||||||||||||
|
@@ -24149,8 +24148,7 @@ The intrinsic returns ``poison`` if the distance between ``%prtA`` and ``%ptrB`` | |||||||||||||||
is smaller than ``VF * %elementsize`` and either ``%ptrA + VF * %elementSize`` | ||||||||||||||||
or ``%ptrB + VF * %elementSize`` wrap. | ||||||||||||||||
The element of the result mask is active when loading from %ptrA then storing to | ||||||||||||||||
%ptrB is safe and doesn't result in a write-after-read sequence turning into a | ||||||||||||||||
read-after-write sequence, meaning that: | ||||||||||||||||
%ptrB is safe and doesn't result in a write-after-read hazard: | ||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit:
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done, thank you. |
||||||||||||||||
|
||||||||||||||||
* (ptrB - ptrA) <= 0 (guarantees that all lanes are loaded before any stores), or | ||||||||||||||||
* (ptrB - ptrA) >= elementSize * lane (guarantees that this lane is loaded | ||||||||||||||||
sdesmalen-arm marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
|
@@ -24188,13 +24186,19 @@ Overview: | |||||||||||||||
|
||||||||||||||||
Given a vector store to %ptrA followed by a vector load from %ptrB, this | ||||||||||||||||
instruction generates a mask where an active lane indicates that the | ||||||||||||||||
read-after-write sequence can be performed safely for that lane, without the | ||||||||||||||||
danger of it turning into a write-after-read sequence. | ||||||||||||||||
read-after-write sequence can be performed safely for that lane, without a | ||||||||||||||||
sdesmalen-arm marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
read-after-write hazard occurring or a a new store-to-load forwarding hazard | ||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. store-to-load forwarding hazard is not defined. Do we need this wording here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The wording for the store-to-load forwarding (hazard) behaviour cannot be removed, because it is the only distinction between this intrinsic and the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've re-added the hazard wording, thanks. |
||||||||||||||||
being introduced. | ||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||||||||||||||||
|
||||||||||||||||
A read-after-write hazard occurs when a read-after-write sequence for a given | ||||||||||||||||
lane in a vector ends up being executed as a write-after-read sequence due to | ||||||||||||||||
the aliasing of pointers. | ||||||||||||||||
|
||||||||||||||||
A store-to-load forwarding hazard occurs when a vector store writes to an | ||||||||||||||||
address that partially overlaps with the address of a subsequent vector load. | ||||||||||||||||
Only the overlapping addresses can be forwarded to the load if the data hasn't | ||||||||||||||||
been written to memory yet. | ||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The issue is that the load can't be performed until the write has completed, resulting in a stall that did not exist when executing as scalars. So perhaps you can write instead:
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cheers. |
||||||||||||||||
|
||||||||||||||||
Arguments: | ||||||||||||||||
"""""""""" | ||||||||||||||||
|
||||||||||||||||
|
@@ -24212,8 +24216,8 @@ The element of the result mask is active when storing to %ptrA then loading from | |||||||||||||||
%ptrB is safe and doesn't result in aliasing, meaning that: | ||||||||||||||||
|
||||||||||||||||
* abs(ptrB - ptrA) >= elementSize * lane (guarantees that the store of this lane | ||||||||||||||||
occurs before loading from this address) | ||||||||||||||||
* ptrA == ptrB doesn't introduce any new hazards and is safe | ||||||||||||||||
occurs before loading from this address), or | ||||||||||||||||
* ptrA == ptrB, doesn't introduce any new hazards | ||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||||||||||||||||
|
||||||||||||||||
Examples: | ||||||||||||||||
""""""""" | ||||||||||||||||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -784,3 +784,115 @@ entry: | |||||
%0 = call <16 x i1> @llvm.loop.dependence.war.mask.v16i1(ptr %a, ptr %b, i64 3) | ||||||
ret <16 x i1> %0 | ||||||
} | ||||||
|
||||||
define <1 x i1> @whilewr_8_scalarize(ptr %a, ptr %b) { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: can you add a section header saying these tests are about scalarising There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||||||
; CHECK-LABEL: whilewr_8_scalarize: | ||||||
; CHECK: // %bb.0: // %entry | ||||||
; CHECK-NEXT: subs x8, x1, x0 | ||||||
; CHECK-NEXT: cmp x8, #0 | ||||||
; CHECK-NEXT: cset w8, gt | ||||||
; CHECK-NEXT: cmp x1, x0 | ||||||
; CHECK-NEXT: csinc w0, w8, wzr, ne | ||||||
; CHECK-NEXT: ret | ||||||
entry: | ||||||
%0 = call <1 x i1> @llvm.loop.dependence.war.mask.v16i1(ptr %a, ptr %b, i64 1) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
(same for the tests below) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||||||
ret <1 x i1> %0 | ||||||
} | ||||||
|
||||||
define <1 x i1> @whilewr_16_scalarize(ptr %a, ptr %b) { | ||||||
; CHECK-LABEL: whilewr_16_scalarize: | ||||||
; CHECK: // %bb.0: // %entry | ||||||
; CHECK-NEXT: subs x8, x1, x0 | ||||||
; CHECK-NEXT: cmp x8, #1 | ||||||
; CHECK-NEXT: cset w8, gt | ||||||
; CHECK-NEXT: cmp x1, x0 | ||||||
; CHECK-NEXT: csinc w0, w8, wzr, ne | ||||||
; CHECK-NEXT: ret | ||||||
entry: | ||||||
%0 = call <1 x i1> @llvm.loop.dependence.war.mask.v16i1(ptr %a, ptr %b, i64 2) | ||||||
ret <1 x i1> %0 | ||||||
} | ||||||
|
||||||
define <1 x i1> @whilewr_32_scalarize(ptr %a, ptr %b) { | ||||||
; CHECK-LABEL: whilewr_32_scalarize: | ||||||
; CHECK: // %bb.0: // %entry | ||||||
; CHECK-NEXT: subs x8, x1, x0 | ||||||
; CHECK-NEXT: cmp x8, #3 | ||||||
; CHECK-NEXT: cset w8, gt | ||||||
; CHECK-NEXT: cmp x1, x0 | ||||||
; CHECK-NEXT: csinc w0, w8, wzr, ne | ||||||
; CHECK-NEXT: ret | ||||||
entry: | ||||||
%0 = call <1 x i1> @llvm.loop.dependence.war.mask.v16i1(ptr %a, ptr %b, i64 4) | ||||||
ret <1 x i1> %0 | ||||||
} | ||||||
|
||||||
define <1 x i1> @whilewr_64_scalarize(ptr %a, ptr %b) { | ||||||
; CHECK-LABEL: whilewr_64_scalarize: | ||||||
; CHECK: // %bb.0: // %entry | ||||||
; CHECK-NEXT: subs x8, x1, x0 | ||||||
; CHECK-NEXT: cmp x8, #7 | ||||||
; CHECK-NEXT: cset w8, gt | ||||||
; CHECK-NEXT: cmp x1, x0 | ||||||
; CHECK-NEXT: csinc w0, w8, wzr, ne | ||||||
; CHECK-NEXT: ret | ||||||
entry: | ||||||
%0 = call <1 x i1> @llvm.loop.dependence.war.mask.v16i1(ptr %a, ptr %b, i64 8) | ||||||
ret <1 x i1> %0 | ||||||
} | ||||||
|
||||||
define <1 x i1> @whilerw_8_scalarize(ptr %a, ptr %b) { | ||||||
; CHECK-LABEL: whilerw_8_scalarize: | ||||||
; CHECK: // %bb.0: // %entry | ||||||
; CHECK-NEXT: subs x8, x1, x0 | ||||||
; CHECK-NEXT: cmp x8, #0 | ||||||
; CHECK-NEXT: cset w8, gt | ||||||
; CHECK-NEXT: cmp x1, x0 | ||||||
; CHECK-NEXT: csinc w0, w8, wzr, ne | ||||||
; CHECK-NEXT: ret | ||||||
entry: | ||||||
%0 = call <1 x i1> @llvm.loop.dependence.raw.mask.v16i1(ptr %a, ptr %b, i64 1) | ||||||
ret <1 x i1> %0 | ||||||
} | ||||||
|
||||||
define <1 x i1> @whilerw_16_scalarize(ptr %a, ptr %b) { | ||||||
; CHECK-LABEL: whilerw_16_scalarize: | ||||||
; CHECK: // %bb.0: // %entry | ||||||
; CHECK-NEXT: subs x8, x1, x0 | ||||||
; CHECK-NEXT: cmp x8, #1 | ||||||
; CHECK-NEXT: cset w8, gt | ||||||
; CHECK-NEXT: cmp x1, x0 | ||||||
; CHECK-NEXT: csinc w0, w8, wzr, ne | ||||||
; CHECK-NEXT: ret | ||||||
entry: | ||||||
%0 = call <1 x i1> @llvm.loop.dependence.raw.mask.v16i1(ptr %a, ptr %b, i64 2) | ||||||
ret <1 x i1> %0 | ||||||
} | ||||||
|
||||||
define <1 x i1> @whilerw_32_scalarize(ptr %a, ptr %b) { | ||||||
; CHECK-LABEL: whilerw_32_scalarize: | ||||||
; CHECK: // %bb.0: // %entry | ||||||
; CHECK-NEXT: subs x8, x1, x0 | ||||||
; CHECK-NEXT: cmp x8, #3 | ||||||
; CHECK-NEXT: cset w8, gt | ||||||
; CHECK-NEXT: cmp x1, x0 | ||||||
; CHECK-NEXT: csinc w0, w8, wzr, ne | ||||||
; CHECK-NEXT: ret | ||||||
entry: | ||||||
%0 = call <1 x i1> @llvm.loop.dependence.raw.mask.v16i1(ptr %a, ptr %b, i64 4) | ||||||
ret <1 x i1> %0 | ||||||
} | ||||||
|
||||||
define <1 x i1> @whilerw_64_scalarize(ptr %a, ptr %b) { | ||||||
; CHECK-LABEL: whilerw_64_scalarize: | ||||||
; CHECK: // %bb.0: // %entry | ||||||
; CHECK-NEXT: subs x8, x1, x0 | ||||||
; CHECK-NEXT: cmp x8, #7 | ||||||
; CHECK-NEXT: cset w8, gt | ||||||
; CHECK-NEXT: cmp x1, x0 | ||||||
; CHECK-NEXT: csinc w0, w8, wzr, ne | ||||||
; CHECK-NEXT: ret | ||||||
entry: | ||||||
%0 = call <1 x i1> @llvm.loop.dependence.raw.mask.v16i1(ptr %a, ptr %b, i64 8) | ||||||
ret <1 x i1> %0 | ||||||
} |
Uh oh!
There was an error while loading. Please reload this page.