Skip to content

Conversation

@axelcool1234
Copy link
Owner

No description provided.

@axelcool1234
Copy link
Owner Author

@mshockwave I wrote some initial code here without any tests. That's what I'm figuring out how to do do right now.

@axelcool1234
Copy link
Owner Author

axelcool1234 commented Feb 28, 2025

I tried configuring the tablegen LSP but it seems the yaml file that's generated doesn't include any of the .td files in the Target folder, which includes the Combine.td file - so I don't have access to the "goto definition" feature. I'm not sure if it's just because the LSP is limited in scope or if I'm missing something. Additionally, I'm not sure if it's possible to compile the LSP without having to compile ALL of mlir as well (of course I could just call "ninja tblgen-lsp", but I'm curious if there's a CMAKE flag).

@axelcool1234
Copy link
Owner Author

axelcool1234 commented Feb 28, 2025

@mshockwave I was wondering if this'd be another potential contribution that someone of my skill level could tackle:
llvm#85268

It's a missing optimization. I could try implementing it in SelectionDAG and maybe also in GlobalISel as well?
It seems the assigned contributor has been radio silent for months. What are the rules in regards to solving Github issues in the LLVM Project? Would it be considered rude to try solving this issue (only after I finish adding this funnel shift combiner rule of course) since there's already an assigned contributor?

Edit:
Woops! I didn't realize this'd link my pull request to that thread. I apologize if this pinged someone from there!

@mshockwave
Copy link

It's a missing optimization. I could try implementing it in SelectionDAG and maybe also in GlobalISel as well?

I think that was intended to implement as InstCombine rule -- middle-end's optimizations. In both SelectionDAG and GlobalISel I think this rule might not always be profitable, because (type) legalization might eliminate the sext anyway.

It seems the assigned contributor has been radio silent for months. What are the rules in regards to solving Github issues in the LLVM Project? Would it be considered rude to try solving this issue (only after I finish adding this funnel shift combiner rule of course) since there's already an assigned contributor?

That is a really good question. And frankly I don't know. But you know what, you can try to ask on LLVM's Discourse (not Discord), which is the official forum. I think Project Infrastructure might be a good place to post. See it as a practice to engage with the community!

@mshockwave
Copy link

yaml file that's generated doesn't include any of the .td files in the Target folder

I guess you mean tablegen_compile_commands.yml. The reason you didn't see Combine.td is because TableGen files in LLVM backends are organized in a relatively...ugly way: EVERY single .td file used by a backend will ultimately be recursively included into <Target name>.td (e.g. RISCV.td). For instance, Combine.td is included by RISCVCombine.td, which is included by RISCVGISel.td, which is a file included by RISCV.td.
That's why you only see, say RISCV.td, in the yaml file.

I actually raised this issue a few years ago: https://discourse.llvm.org/t/rfc-tablegen-include-what-you-use-iwyu/73168
But unfortunately it didn't go anywhere.
You're more than welcome to pick that up! I remember people reached a consensus on the viable approach in that post.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to adding to funnel_shift_combines as you did here, each target also selects a set of combine rules they want to use. For instance, RISCV does this in lib/Target/RISCV/RISCVCombine.td and funnel_shift_combines is not one of them. So before writing a test, you need to import funnel_shift_combines.

Copy link
Owner Author

@axelcool1234 axelcool1234 Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was writing my tests and realized "wait, they're passing and it was mentioned that RISCV doesn't use these combines, what gives?"

Upon further investigation, I found that RISCV does use the funnel shifts:
https://github.com/axelcool1234/llvm-project/blob/f1f8463c457e4b2e08a53aea62b9a47490815882/llvm/lib/Target/RISCV/RISCVCombine.td#L13C1-L16C2

This includes all_combines, which includes funnel_shift_combines:
https://github.com/axelcool1234/llvm-project/blob/f1f8463c457e4b2e08a53aea62b9a47490815882/llvm/include/llvm/Target/GlobalISel/Combine.td#L2042C1-L2042C79

It seems all of these combines are BEFORE the legalize phase (RISCVPreLegalizerCombiner) - I assume you meant that I should include the funnel shift combines during the legalization phase? Should I include the funnel shift combines in RISCVPostLegalizerCombiner (it does say // TODO: Add more combines.)? And if I do, I'm not sure how I'd test it since my tests are already passing.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you meant that I should include the funnel shift combines during the legalization phase? Should I include the funnel shift combines in RISCVPostLegalizerCombiner

Oh, I forgot that RISC-V runs all the combiner rules pre-legalizations. And I think your combiner rule is only useful pre-legalized since RISC-V doesn't really have native support for funnel shifts, so they'll be expanded by the legalizer: https://github.com/axelcool1234/llvm-project/blob/f1f8463c457e4b2e08a53aea62b9a47490815882/llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.cpp#L193

Bottom line: I think you're good, no need to explicitly add funnel_shift_combines into RISCVCombine.td

@mshockwave
Copy link

mshockwave commented Mar 4, 2025

@mshockwave I wrote some initial code here without any tests. That's what I'm figuring out how to do do right now.

for GISel's combiner tests it's pretty flexible, you can either do the normal codegen test (.ll -> .s) like test/CodeGen/RISCV/GlobalISel/combine.ll or MIR test like test/CodeGen/RISCV/GlobalISel/combine.mir. MIR tests, which uses -run-pass to run only a single phase in the pipeline, usually shows the differences most cleanly. But if your optimization doesn't interact a lot with other phases in the pipeline, then it doesn't really matter.

Copy link
Owner Author

@axelcool1234 axelcool1234 Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was writing my tests and realized "wait, they're passing and it was mentioned that RISCV doesn't use these combines, what gives?"

Upon further investigation, I found that RISCV does use the funnel shifts:
https://github.com/axelcool1234/llvm-project/blob/f1f8463c457e4b2e08a53aea62b9a47490815882/llvm/lib/Target/RISCV/RISCVCombine.td#L13C1-L16C2

This includes all_combines, which includes funnel_shift_combines:
https://github.com/axelcool1234/llvm-project/blob/f1f8463c457e4b2e08a53aea62b9a47490815882/llvm/include/llvm/Target/GlobalISel/Combine.td#L2042C1-L2042C79

It seems all of these combines are BEFORE the legalize phase (RISCVPreLegalizerCombiner) - I assume you meant that I should include the funnel shift combines during the legalization phase? Should I include the funnel shift combines in RISCVPostLegalizerCombiner (it does say // TODO: Add more combines.)? And if I do, I'm not sure how I'd test it since my tests are already passing.

Comment on lines 1025 to 1069
Copy link
Owner Author

@axelcool1234 axelcool1234 Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mshockwave I actually wrote this comment days ago, but Nick (PhD student in Professor Franz's lab) told me that code review comments are invisible to everyone except the code owner. Woops! EDIT: I just figured it out, I was in "review" mode and I had several pending comments. I just submitted the review.

This is the first time I've written a comment on code in Github, not sure if this'll work (I hope it shows my two defs here).

I have a question: What if there's a shl | fshl rather than a fshl | shl? Will this tablegen code generate a combine rule that still simplifies that? Because I specifically wrote tablegen that ORs the FSHL as the first operand and then SHL as the second operand. Not sure if this is generalized when Tablegen compiles down to C++.

Additionally, what exactly is this (defs ...) meant for (I know it's a DAG type in TableGen)? I can tell $root is for the resultant "register" that's outputted from the specified pattern in match. Are there any other interesting "definitions" one could have in this (defs ...) DAG?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was in "review" mode and I had several pending comments. I just submitted the review.

Yeah, I forgot to submit review comments all the time too.

What if there's a shl | fshl rather than a fshl | shl? Will this tablegen code generate a combine rule that still simplifies that? Because I specifically wrote tablegen that ORs the FSHL as the first operand and then SHL as the second operand.

GISelCombiner sees code in a basic block as a DAG. In other words it follows the data flows built by these virtual reigsters rather than following the instruction order. So it doesn't matter whether it's

G_FSHL $out1, $x, $_, $y
G_SHL $out2, $x, $y
G_OR $root, $out1, $out2

or

G_SHL $out2, $x, $y
G_FSHL $out1, $x, $_, $y
G_OR $root, $out1, $out2

Also for cases like this:

G_FSHL $out1, $x, $_, $y
G_SHL $out2, $x, $y
G_OR $root, $out2, $out1

That is, the order of G_OR's operands is swapped. GISel should be able to detect that G_OR is commutative and thus generate all possible pattern matching C++ code for this combiner rule. And that answers this question:

Not sure if this is generalized when Tablegen compiles down to C++.

Yes, these combiner rules are eventually generated into C++ code that does the pattern matching through a state-machine-like algorithm. For example, RISCV's generated pre-legalization combiner code is put in <build folder>/lib/Target/RISCV/RISCVGenPreLegalizeGICombiner.inc

Copy link

@mshockwave mshockwave Mar 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, what exactly is this (defs ...) meant for (I know it's a DAG type in TableGen)? I can tell $root is for the resultant "register" that's outputted from the specified pattern in match. Are there any other interesting "definitions"

TableGen constructions, especially DAG types shown here, are really context-dependent (which is IMHO one of the factors that makes TableGen code really difficult to understand for outsiders). In this context, the root record (no dollar sign) in (defs root:$root) means "the root value of this pattern". As I mentioned in my previous comment, GISelCombiner sees code as a DAG where each operation is a node. And because patterns are describing the shape of a code, there is of course a root value defined by an operation (a node) in the pattern (a DAG).

What the entire statement (defs root:$root) means -- again, in this context -- is that it declares token "$root" to represent the root value of this pattern. So every references of "$root" in the pattern (e.g. that in (G_OR $root, $out2, $out1)) effectively reference the root value.

Comment on lines 109 to 155
Copy link
Owner Author

@axelcool1234 axelcool1234 Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mshockwave These tests pass, but I have some concerns - although this could just be my weaknesses showing. I don't have as much experience reasoning about bit shifting as I'd wish. It's probably where I struggled the most when I took UCI's assembly course, so I'm glad I'm being challenged here as it's helping me strengthen my intuition!

Let's talk about test_fshl_i32 in RV32: we test transforming FSHL OR SHL into FSHL. Essentially, call i32 @llvm.fshl.i32(i32 %x, i32 %_, i32 %y) is turned into the equivalent code to emulate that in RISCV:

not a3, a2
sll a0, a0, a2
srli a1, a1, 1
srl a1, a1, a3
or a0, a0, a1
ret

My concern lies in the ordering of these instructions. You have four parts at play here:

  • bitwise negating the shift amount a2 and storing that in a3
  • Shifting the upper value a0 to the left by the shift amount a2
  • Shifting the lower value a1 to the right by the negated shift value a3 + 1 (the +1 is due to two's complement)
  • OR the shifted values together and return

In this RISCV generated code, the order is:

  1. bitwise negating
  2. Shift upper value by the original shift amount
  3. Shift lower value by the negated shift amount + 1
  4. OR the shifted values and return

Now let's look at test_fshr_i32 in RV32:

not a3, a2 
slli a0, a0, 1
sll a0, a0, a3
srl a1, a1, a2
or a0, a0, a1
ret

In this RISCV generated code, the order is:

  1. bitwise negating
  2. Shift upper value by the negated shift amount + 1
  3. Shift lower value by the original shift amount
  4. OR the shifted values and return

I see that the order seems to be the same, however whether or not the upper value is shifted by the original amount or the negated shift amount + 1 depends on whether we're doing fshr or fshl. My concern is that we could have easily had this (valid) ordering instead for fshr:

  1. bitwise negating
  2. Shift lower value by the original shift amount
  3. Shift upper value by the negated shift amount + 1
  4. OR the shifted values and return

Steps 2 and 3 are interchangeable both for fshl and fshr. LLVM seems to stay consistent, but it doesn't necessarily have to be. If this were to change, my tests would outright fail, despite the translation to RISCV being correct. What should I do, and is this a concern I should be having?

I can expand this further with another concern I have when we get into RV64. Let's see test_fshl_i32 for RV64:

not a3, a2
sllw a0, a0, a2
srliw a1, a1, 1
srlw a1, a1, a3
or a0, a0, a1
ret

Notice how all the shifts are for the lower 32-bits, which explain why the shifts are appended with w. Now let's see test_fshr_i32 for RV64:

not a3, a2 
slli a0, a0, 1
sllw a0, a0, a3
srlw a1, a1, a2
or a0, a0, a1
ret

Notice how slli does not have w appended. However, I believe if it were to be slliw, it'd still be correct, I think. Perhaps I'm incorrect about this. If it'd still be correct, it seems to support my assumption that the RISCV code generation is a little bit arbitrary, which would mean my tests would fail in the future if LLVM were to generate correct, but different, RISCV code. Assuming I have to worry about this, how should I write more generalized tests that wouldn't fail if LLVM decides to generate slliw in the future (assuming that's correct, which I'm not entirely sure about)? And how do I generalize my tests such that steps 2 and 3 can be interchangeable like I mentioned above?

As always, I really appreciate your guidance as I slowly trek towards understanding the inner workings of LLVM.

Copy link

@mshockwave mshockwave Mar 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this were to change, my tests would outright fail, despite the translation to RISCV being correct. What should I do, and is this a concern I should be having?

So there are two ways: (1) use CHECK-DAG (2) whenever the codegen changes and fails your test, update it with utils/update_llc_tests.py.

For (1): CHECK-DAG is basically matching a subset of lines without specific ordering, as long as they fulfill certain constraints. You can check out https://llvm.org/docs/CommandGuide/FileCheck.html#the-check-dag-directive for more concrete examples.

For (2): Instead of writing those CHECK manually, once you have the desired RUN lines, generate CHECK and CHECK-NEXT using utils/update_llc_tests.py, for example:

./utils/update_llc_test_checks.py --llc-binary=/path/to/llc test/CodeGen/RISCV/GlobalISel/shift.ll

This script runs your RUN lines and generates CHECK-NEXT for each line of assembly code verbatimly. It doesn't use any regular expression placeholders (check out this if you're not familiar with them).

In fact, your test file, test/CodeGen/RISCV/GlobalISel/shift.ll was originally generated by this script (also known as UTC script -- update test checks) because it has the following comment at the very beginning of this file:

# Assertions have been autogenerated by utils/update_llc_test_checks.py...

The idea is that although we have tools like CHECK-DAG and regular expression placeholders to write tests that are less sensitive to changes in the codegen infrastructure...there are always corner cases where the changes are too big to an extent that we have to update the test manually. So rather than fiddling with CHECK-DAG and regular expressions, it's a lot easier to generate checks automatically in the first place.
Whenever a test is affected -- indicated by test failure, we can just run UTC scripts on those failed tests and check the diff to see if the changes are expected or not.

And that's why nowadays using UTC scripts are actually the recommended way to generate codegen tests, including those of GlobalISel.

@axelcool1234 axelcool1234 force-pushed the fsh-Combiner-Port branch 6 times, most recently from 17c74e0 to 733135e Compare April 12, 2025 07:19
@axelcool1234
Copy link
Owner Author

axelcool1234 commented Apr 16, 2025

@mshockwave Per the request of one of the reviewers of my LLVM pull request, I need to ensure that the intermediate instructions in this combiner passes a hasOneUse function call (otherwise it's a bug because other code may rely on it being there and we would've not emitted it).

The following is my attempt at calling hasOneUse which required me to lower to C++ code.

// Transform: fshr z, x, y | srl x, y -> fshr z, x, y
def funnel_shift_or_shift_to_funnel_shift_right: GICombineRule<
  (defs root:$root), 
  (match (G_FSHR $out1, $z, $x, $y),
         (G_LSHR $out2, $x, $y),
         (G_OR $root, $out1, $out2):$or,
         [{ return ${or}->getOperand(0).getBlockAddress()->hasOneUse() && 
                   ${or}->getOperand(2).getBlockAddress()->hasOneUse(); }]),
  (apply (G_FSHR $root, $z, $x, $y))
>;

${or} is a MachineInstr*. What I'm trying to get is the Value of the OR instruction and the Value of the LSHR instruction in order to call hasOneUse on them. I don't believe I need to call hasOneUse on the FSHR instruction though, since it's still emitted at the end (although I'm unsure if the resulting FSHR instruction that is emitted is considered "different" from the original FSHR instruction. If that's the case, how do I ensure its dependencies are transferred to the newly emitted FHSR?).

With the MachineOperand retrieved from the getOperand call, there seems to be two ways to get its Value: getBlockAddress and getGlobal, depending on what the MachineOperand is. However, this doesn't seem to be working. I'm probably missing a far more simple way to approach this. Can you help?

Also, I'm having an issue with one of the AArch64 tests. The or_lshr_fshr_simplify test is not being simplified, while or_shl_fshl_simplify is. I did some messing around and determined that it is triggered if this instruction: %or = or i32 %shy, %fun matches the same operand order as (G_OR $root, $out1, $out2) in my combiner (which it originally wasn't, which is why it wasn't being triggered). If $out1 and $out2 are flipped, the combiner is not triggered. One of the reviewers of my code rightfully pointed out that OR instructions have commuted operands but this example seems to defy that statement. Perhaps I wrote something incorrectly somewhere?

The funnel shift combiner rule from 4a3708c is currently missing from
GlobalISel. The following is a port of that combiner to GlobalISel.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants