-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[PowerPC] fold i128 equality/inequality compares of two loads into a vectorized compare using vcmpequb.p when Altivec is available #158657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
b045ba0
f615532
f11a9a6
56f1aa2
f42d13d
114d929
9ea50e9
f132007
b984987
6e5ebb1
3567f74
1da0a54
1ed8dfe
f561b34
f92136d
04c83ff
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -35,18 +35,13 @@ define signext i32 @zeroEqualityTest02(ptr %x, ptr %y) { | |
| define signext i32 @zeroEqualityTest01(ptr %x, ptr %y) { | ||
| ; CHECK-LABEL: zeroEqualityTest01: | ||
| ; CHECK: # %bb.0: | ||
| ; CHECK-NEXT: ld 5, 0(3) | ||
| ; CHECK-NEXT: ld 6, 0(4) | ||
| ; CHECK-NEXT: cmpld 5, 6 | ||
| ; CHECK-NEXT: bne 0, .LBB1_2 | ||
| ; CHECK-NEXT: # %bb.1: # %loadbb1 | ||
| ; CHECK-NEXT: ld 5, 8(3) | ||
| ; CHECK-NEXT: ld 4, 8(4) | ||
| ; CHECK-NEXT: li 3, 0 | ||
| ; CHECK-NEXT: cmpld 5, 4 | ||
| ; CHECK-NEXT: beqlr 0 | ||
| ; CHECK-NEXT: .LBB1_2: # %res_block | ||
| ; CHECK-NEXT: li 3, 1 | ||
| ; CHECK-NEXT: lxvd2x 34, 0, 4 | ||
| ; CHECK-NEXT: lxvd2x 35, 0, 3 | ||
| ; CHECK-NEXT: vcmpequb. 2, 3, 2 | ||
| ; CHECK-NEXT: mfocrf 3, 2 | ||
| ; CHECK-NEXT: rlwinm 3, 3, 25, 31, 31 | ||
| ; CHECK-NEXT: cntlzw 3, 3 | ||
| ; CHECK-NEXT: srwi 3, 3, 5 | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Extra instruction? I think isolating and flipping the bit can just be rlwinm/xori.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in the patch , we just make the equal to that is Following code transforms the DAG ----> I think we can have another patch to let convert to your instructions.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am ok with addressing this in a following patch.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I consider again, I do not think we can transfer into since check whether only the bit 31 of r3 is 1 . I do not think we have a single xori instruction or other single instruction to achieve it.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. count leading zero / shift by 5 is a test for zero. Because of the prior rlwinm, this is comparing a 0/1 value. Compared to zero, 0/1 => 1/0. Can use xori with 1.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that means we have to check a long list IR , and convert to not sure whether it worth to check for the long list of IR for this optimize, it will increase complier time.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You don't need to look for that whole sequence. The and / setcc is enough. But also, in general I don't think looking for a sequence of 5 instructions is anything to worry about.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. agree with "and / setcc is enough" since |
||
| ; CHECK-NEXT: blr | ||
| %call = tail call signext i32 @memcmp(ptr %x, ptr %y, i64 16) | ||
| %not.tobool = icmp ne i32 %call, 0 | ||
|
|
@@ -85,7 +80,7 @@ define signext i32 @zeroEqualityTest03(ptr %x, ptr %y) { | |
| ; Validate with > 0 | ||
| define signext i32 @zeroEqualityTest04() { | ||
| ; CHECK-LABEL: zeroEqualityTest04: | ||
| ; CHECK: # %bb.0: # %loadbb | ||
| ; CHECK: # %bb.0: | ||
| ; CHECK-NEXT: li 3, 0 | ||
| ; CHECK-NEXT: blr | ||
| %call = tail call signext i32 @memcmp(ptr @zeroEqualityTest02.buffer1, ptr @zeroEqualityTest02.buffer2, i64 16) | ||
|
|
@@ -97,7 +92,7 @@ define signext i32 @zeroEqualityTest04() { | |
| ; Validate with < 0 | ||
| define signext i32 @zeroEqualityTest05() { | ||
| ; CHECK-LABEL: zeroEqualityTest05: | ||
| ; CHECK: # %bb.0: # %loadbb | ||
| ; CHECK: # %bb.0: | ||
| ; CHECK-NEXT: li 3, 0 | ||
| ; CHECK-NEXT: blr | ||
| %call = tail call signext i32 @memcmp(ptr @zeroEqualityTest03.buffer1, ptr @zeroEqualityTest03.buffer2, i64 16) | ||
|
|
@@ -109,7 +104,7 @@ define signext i32 @zeroEqualityTest05() { | |
| ; Validate with memcmp()?: | ||
| define signext i32 @equalityFoldTwoConstants() { | ||
| ; CHECK-LABEL: equalityFoldTwoConstants: | ||
| ; CHECK: # %bb.0: # %loadbb | ||
| ; CHECK: # %bb.0: | ||
| ; CHECK-NEXT: li 3, 1 | ||
| ; CHECK-NEXT: blr | ||
| %call = tail call signext i32 @memcmp(ptr @zeroEqualityTest04.buffer1, ptr @zeroEqualityTest04.buffer2, i64 16) | ||
|
|
@@ -121,24 +116,13 @@ define signext i32 @equalityFoldTwoConstants() { | |
| define signext i32 @equalityFoldOneConstant(ptr %X) { | ||
| ; CHECK-LABEL: equalityFoldOneConstant: | ||
| ; CHECK: # %bb.0: | ||
| ; CHECK-NEXT: li 5, 1 | ||
| ; CHECK-NEXT: ld 4, 0(3) | ||
| ; CHECK-NEXT: rldic 5, 5, 32, 31 | ||
| ; CHECK-NEXT: cmpld 4, 5 | ||
| ; CHECK-NEXT: bne 0, .LBB6_2 | ||
| ; CHECK-NEXT: # %bb.1: # %loadbb1 | ||
| ; CHECK-NEXT: lis 5, -32768 | ||
| ; CHECK-NEXT: ld 4, 8(3) | ||
| ; CHECK-NEXT: li 3, 0 | ||
| ; CHECK-NEXT: ori 5, 5, 1 | ||
| ; CHECK-NEXT: rldic 5, 5, 1, 30 | ||
| ; CHECK-NEXT: cmpld 4, 5 | ||
| ; CHECK-NEXT: beq 0, .LBB6_3 | ||
| ; CHECK-NEXT: .LBB6_2: # %res_block | ||
| ; CHECK-NEXT: li 3, 1 | ||
| ; CHECK-NEXT: .LBB6_3: # %endblock | ||
| ; CHECK-NEXT: cntlzw 3, 3 | ||
| ; CHECK-NEXT: srwi 3, 3, 5 | ||
| ; CHECK-NEXT: lxvd2x 34, 0, 3 | ||
| ; CHECK-NEXT: addis 3, 2, .LCPI6_0@toc@ha | ||
| ; CHECK-NEXT: addi 3, 3, .LCPI6_0@toc@l | ||
| ; CHECK-NEXT: lxvd2x 35, 0, 3 | ||
| ; CHECK-NEXT: vcmpequb. 2, 2, 3 | ||
| ; CHECK-NEXT: mfocrf 3, 2 | ||
| ; CHECK-NEXT: rlwinm 3, 3, 25, 31, 31 | ||
| ; CHECK-NEXT: blr | ||
| %call = tail call signext i32 @memcmp(ptr @zeroEqualityTest04.buffer1, ptr %X, i64 16) | ||
| %not.tobool = icmp eq i32 %call, 0 | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you already checked for opcode ISD::LOAD, this can be an assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I keep the code and remove the checking of