-
Notifications
You must be signed in to change notification settings - Fork 0
fsh(l/r) | sh(l/r) -> fsh(l/r) Combiner Port From DAGCombiner to GlobalISel #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@mshockwave I wrote some initial code here without any tests. That's what I'm figuring out how to do do right now. |
|
I tried configuring the tablegen LSP but it seems the yaml file that's generated doesn't include any of the .td files in the Target folder, which includes the Combine.td file - so I don't have access to the "goto definition" feature. I'm not sure if it's just because the LSP is limited in scope or if I'm missing something. Additionally, I'm not sure if it's possible to compile the LSP without having to compile ALL of mlir as well (of course I could just call "ninja tblgen-lsp", but I'm curious if there's a CMAKE flag). |
|
@mshockwave I was wondering if this'd be another potential contribution that someone of my skill level could tackle: It's a missing optimization. I could try implementing it in SelectionDAG and maybe also in GlobalISel as well? Edit: |
I think that was intended to implement as InstCombine rule -- middle-end's optimizations. In both SelectionDAG and GlobalISel I think this rule might not always be profitable, because (type) legalization might eliminate the sext anyway.
That is a really good question. And frankly I don't know. But you know what, you can try to ask on LLVM's Discourse (not Discord), which is the official forum. I think Project Infrastructure might be a good place to post. See it as a practice to engage with the community! |
I guess you mean tablegen_compile_commands.yml. The reason you didn't see Combine.td is because TableGen files in LLVM backends are organized in a relatively...ugly way: EVERY single .td file used by a backend will ultimately be recursively included into I actually raised this issue a few years ago: https://discourse.llvm.org/t/rfc-tablegen-include-what-you-use-iwyu/73168 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to adding to funnel_shift_combines as you did here, each target also selects a set of combine rules they want to use. For instance, RISCV does this in lib/Target/RISCV/RISCVCombine.td and funnel_shift_combines is not one of them. So before writing a test, you need to import funnel_shift_combines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was writing my tests and realized "wait, they're passing and it was mentioned that RISCV doesn't use these combines, what gives?"
Upon further investigation, I found that RISCV does use the funnel shifts:
https://github.com/axelcool1234/llvm-project/blob/f1f8463c457e4b2e08a53aea62b9a47490815882/llvm/lib/Target/RISCV/RISCVCombine.td#L13C1-L16C2
This includes all_combines, which includes funnel_shift_combines:
https://github.com/axelcool1234/llvm-project/blob/f1f8463c457e4b2e08a53aea62b9a47490815882/llvm/include/llvm/Target/GlobalISel/Combine.td#L2042C1-L2042C79
It seems all of these combines are BEFORE the legalize phase (RISCVPreLegalizerCombiner) - I assume you meant that I should include the funnel shift combines during the legalization phase? Should I include the funnel shift combines in RISCVPostLegalizerCombiner (it does say // TODO: Add more combines.)? And if I do, I'm not sure how I'd test it since my tests are already passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you meant that I should include the funnel shift combines during the legalization phase? Should I include the funnel shift combines in RISCVPostLegalizerCombiner
Oh, I forgot that RISC-V runs all the combiner rules pre-legalizations. And I think your combiner rule is only useful pre-legalized since RISC-V doesn't really have native support for funnel shifts, so they'll be expanded by the legalizer: https://github.com/axelcool1234/llvm-project/blob/f1f8463c457e4b2e08a53aea62b9a47490815882/llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.cpp#L193
Bottom line: I think you're good, no need to explicitly add funnel_shift_combines into RISCVCombine.td
for GISel's combiner tests it's pretty flexible, you can either do the normal codegen test (.ll -> .s) like |
06c327f to
f1f8463
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was writing my tests and realized "wait, they're passing and it was mentioned that RISCV doesn't use these combines, what gives?"
Upon further investigation, I found that RISCV does use the funnel shifts:
https://github.com/axelcool1234/llvm-project/blob/f1f8463c457e4b2e08a53aea62b9a47490815882/llvm/lib/Target/RISCV/RISCVCombine.td#L13C1-L16C2
This includes all_combines, which includes funnel_shift_combines:
https://github.com/axelcool1234/llvm-project/blob/f1f8463c457e4b2e08a53aea62b9a47490815882/llvm/include/llvm/Target/GlobalISel/Combine.td#L2042C1-L2042C79
It seems all of these combines are BEFORE the legalize phase (RISCVPreLegalizerCombiner) - I assume you meant that I should include the funnel shift combines during the legalization phase? Should I include the funnel shift combines in RISCVPostLegalizerCombiner (it does say // TODO: Add more combines.)? And if I do, I'm not sure how I'd test it since my tests are already passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mshockwave I actually wrote this comment days ago, but Nick (PhD student in Professor Franz's lab) told me that code review comments are invisible to everyone except the code owner. Woops! EDIT: I just figured it out, I was in "review" mode and I had several pending comments. I just submitted the review.
This is the first time I've written a comment on code in Github, not sure if this'll work (I hope it shows my two defs here).
I have a question: What if there's a shl | fshl rather than a fshl | shl? Will this tablegen code generate a combine rule that still simplifies that? Because I specifically wrote tablegen that ORs the FSHL as the first operand and then SHL as the second operand. Not sure if this is generalized when Tablegen compiles down to C++.
Additionally, what exactly is this (defs ...) meant for (I know it's a DAG type in TableGen)? I can tell $root is for the resultant "register" that's outputted from the specified pattern in match. Are there any other interesting "definitions" one could have in this (defs ...) DAG?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was in "review" mode and I had several pending comments. I just submitted the review.
Yeah, I forgot to submit review comments all the time too.
What if there's a shl | fshl rather than a fshl | shl? Will this tablegen code generate a combine rule that still simplifies that? Because I specifically wrote tablegen that ORs the FSHL as the first operand and then SHL as the second operand.
GISelCombiner sees code in a basic block as a DAG. In other words it follows the data flows built by these virtual reigsters rather than following the instruction order. So it doesn't matter whether it's
G_FSHL $out1, $x, $_, $y
G_SHL $out2, $x, $y
G_OR $root, $out1, $out2
or
G_SHL $out2, $x, $y
G_FSHL $out1, $x, $_, $y
G_OR $root, $out1, $out2
Also for cases like this:
G_FSHL $out1, $x, $_, $y
G_SHL $out2, $x, $y
G_OR $root, $out2, $out1
That is, the order of G_OR's operands is swapped. GISel should be able to detect that G_OR is commutative and thus generate all possible pattern matching C++ code for this combiner rule. And that answers this question:
Not sure if this is generalized when Tablegen compiles down to C++.
Yes, these combiner rules are eventually generated into C++ code that does the pattern matching through a state-machine-like algorithm. For example, RISCV's generated pre-legalization combiner code is put in <build folder>/lib/Target/RISCV/RISCVGenPreLegalizeGICombiner.inc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, what exactly is this (defs ...) meant for (I know it's a DAG type in TableGen)? I can tell $root is for the resultant "register" that's outputted from the specified pattern in match. Are there any other interesting "definitions"
TableGen constructions, especially DAG types shown here, are really context-dependent (which is IMHO one of the factors that makes TableGen code really difficult to understand for outsiders). In this context, the root record (no dollar sign) in (defs root:$root) means "the root value of this pattern". As I mentioned in my previous comment, GISelCombiner sees code as a DAG where each operation is a node. And because patterns are describing the shape of a code, there is of course a root value defined by an operation (a node) in the pattern (a DAG).
What the entire statement (defs root:$root) means -- again, in this context -- is that it declares token "$root" to represent the root value of this pattern. So every references of "$root" in the pattern (e.g. that in (G_OR $root, $out2, $out1)) effectively reference the root value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mshockwave These tests pass, but I have some concerns - although this could just be my weaknesses showing. I don't have as much experience reasoning about bit shifting as I'd wish. It's probably where I struggled the most when I took UCI's assembly course, so I'm glad I'm being challenged here as it's helping me strengthen my intuition!
Let's talk about test_fshl_i32 in RV32: we test transforming FSHL OR SHL into FSHL. Essentially, call i32 @llvm.fshl.i32(i32 %x, i32 %_, i32 %y) is turned into the equivalent code to emulate that in RISCV:
not a3, a2
sll a0, a0, a2
srli a1, a1, 1
srl a1, a1, a3
or a0, a0, a1
ret
My concern lies in the ordering of these instructions. You have four parts at play here:
- bitwise negating the shift amount
a2and storing that ina3 - Shifting the upper value
a0to the left by the shift amounta2 - Shifting the lower value
a1to the right by the negated shift valuea3 + 1(the +1 is due to two's complement) - OR the shifted values together and return
In this RISCV generated code, the order is:
- bitwise negating
- Shift upper value by the original shift amount
- Shift lower value by the negated shift amount + 1
- OR the shifted values and return
Now let's look at test_fshr_i32 in RV32:
not a3, a2
slli a0, a0, 1
sll a0, a0, a3
srl a1, a1, a2
or a0, a0, a1
ret
In this RISCV generated code, the order is:
- bitwise negating
- Shift upper value by the negated shift amount + 1
- Shift lower value by the original shift amount
- OR the shifted values and return
I see that the order seems to be the same, however whether or not the upper value is shifted by the original amount or the negated shift amount + 1 depends on whether we're doing fshr or fshl. My concern is that we could have easily had this (valid) ordering instead for fshr:
- bitwise negating
- Shift lower value by the original shift amount
- Shift upper value by the negated shift amount + 1
- OR the shifted values and return
Steps 2 and 3 are interchangeable both for fshl and fshr. LLVM seems to stay consistent, but it doesn't necessarily have to be. If this were to change, my tests would outright fail, despite the translation to RISCV being correct. What should I do, and is this a concern I should be having?
I can expand this further with another concern I have when we get into RV64. Let's see test_fshl_i32 for RV64:
not a3, a2
sllw a0, a0, a2
srliw a1, a1, 1
srlw a1, a1, a3
or a0, a0, a1
ret
Notice how all the shifts are for the lower 32-bits, which explain why the shifts are appended with w. Now let's see test_fshr_i32 for RV64:
not a3, a2
slli a0, a0, 1
sllw a0, a0, a3
srlw a1, a1, a2
or a0, a0, a1
ret
Notice how slli does not have w appended. However, I believe if it were to be slliw, it'd still be correct, I think. Perhaps I'm incorrect about this. If it'd still be correct, it seems to support my assumption that the RISCV code generation is a little bit arbitrary, which would mean my tests would fail in the future if LLVM were to generate correct, but different, RISCV code. Assuming I have to worry about this, how should I write more generalized tests that wouldn't fail if LLVM decides to generate slliw in the future (assuming that's correct, which I'm not entirely sure about)? And how do I generalize my tests such that steps 2 and 3 can be interchangeable like I mentioned above?
As always, I really appreciate your guidance as I slowly trek towards understanding the inner workings of LLVM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this were to change, my tests would outright fail, despite the translation to RISCV being correct. What should I do, and is this a concern I should be having?
So there are two ways: (1) use CHECK-DAG (2) whenever the codegen changes and fails your test, update it with utils/update_llc_tests.py.
For (1): CHECK-DAG is basically matching a subset of lines without specific ordering, as long as they fulfill certain constraints. You can check out https://llvm.org/docs/CommandGuide/FileCheck.html#the-check-dag-directive for more concrete examples.
For (2): Instead of writing those CHECK manually, once you have the desired RUN lines, generate CHECK and CHECK-NEXT using utils/update_llc_tests.py, for example:
./utils/update_llc_test_checks.py --llc-binary=/path/to/llc test/CodeGen/RISCV/GlobalISel/shift.ll
This script runs your RUN lines and generates CHECK-NEXT for each line of assembly code verbatimly. It doesn't use any regular expression placeholders (check out this if you're not familiar with them).
In fact, your test file, test/CodeGen/RISCV/GlobalISel/shift.ll was originally generated by this script (also known as UTC script -- update test checks) because it has the following comment at the very beginning of this file:
# Assertions have been autogenerated by utils/update_llc_test_checks.py...
The idea is that although we have tools like CHECK-DAG and regular expression placeholders to write tests that are less sensitive to changes in the codegen infrastructure...there are always corner cases where the changes are too big to an extent that we have to update the test manually. So rather than fiddling with CHECK-DAG and regular expressions, it's a lot easier to generate checks automatically in the first place.
Whenever a test is affected -- indicated by test failure, we can just run UTC scripts on those failed tests and check the diff to see if the changes are expected or not.
And that's why nowadays using UTC scripts are actually the recommended way to generate codegen tests, including those of GlobalISel.
17c74e0 to
733135e
Compare
|
@mshockwave Per the request of one of the reviewers of my LLVM pull request, I need to ensure that the intermediate instructions in this combiner passes a The following is my attempt at calling
With the Also, I'm having an issue with one of the AArch64 tests. The |
The funnel shift combiner rule from 4a3708c is currently missing from GlobalISel. The following is a port of that combiner to GlobalISel.
275c274 to
75ad0e0
Compare
No description provided.