-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[DecoderEmitter] Support for DecodeOrder and resolve-conflicts-try-all
#157948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
For the most part RISC-V doesn't need an "order". We just need to be able to continue to check a "conflicting" encoding when a predicate fails. Most of the predicates on the conflicts are mutually exclusive. For example, the RV32Only namespace exists because the RV64 instructions with the same opcode are in the main namespace. |
Thanks. Does that mean that I can just set the DecodeOrder to be 0 in all of them and things should still work? If that's that case, why were some of these namespaces used in the first place? PS: I'll try that. |
Probably not all of them. DecoderTableZicfiss16 does need to be prioritized before DecoderTable16, DecoderTableRV32Only16, and DecoderTableZcOverlap16. DecoderTable16, DecoderTableRV32Only16, and DecoderTableZcOverlap16 should be able to be resolved solely by predicates. DecoderTable32, DecoderTableRV32Only32, DecoderTableZfinx32, DecoderTableZdinxRV32Only32 should be resolved by predicates. The DecoderTableX* namespaces need to be before the other namespaces. These are the vendor namespaces. Different vendors can overlap encodings with each other and with the standard encodings. I think conflicts between vendor tables can be resolved by predicates. Though we don't currently enforce that predicates from two different vendors can't be turned on at once. The order of the vendor tables between each other is currently arbitrary. So they can probably all have the same order. |
Ok, thanks. I guess basically for the 16 bit tables, DecoderTableZicfiss16 can get DecodeOrder = -1, so its prioritized over the others and DecoderTable16, DecoderTableRV32Only16, and DecoderTableZcOverlap16 can share the same order. I am assuming they might run into decoding conflicts that can be resolved using the |
I tried the RISCV changes you suggested and they don't work exactly. For example, I see a CM_JT vs CM_JALT failure (for pattern [0x06,0xa0]). The reason is that in the current case, we end up with a filter with just these 2 instructions:
and then create a variable ID filter for JALT and a fixed one for JT, essentially attempting JT before JALT and that decodes to a CM_JT. If I add other instructions to the mix, the logic for creating best filter does not kick in anymore, so we fall back to trying these in sequential order (which happens to be the enum order). So we attempt to decode this as CM_JALT and fail. The predicates for these 2 instructions are same, and the ordering of fixed before variable is lost when we add more instructions to the mix. When we have a mix with different decoder namespaces, we attempt a split and then it results in creating a smaller filter which then restores this behavior. So I suggest (if we want to adopt this for RISCV) we first go with simple replacement of decoder namespace with ordering and then potentially work on improving the heuristics here. As an example, if we cannot find the best filter, can we bucket instructions based on predicates for these insts and then filter (assuming that can disambiguate enough). |
Basically, we are just replacing existing namespaces that serve to resolve conflicts (hard or "soft" as in case of CM_JT/CM_JALT) with DecodeOrder. If we can to improve this further, we can explore two things (1) predicate based "splitting" if that works. (2) Currently when OPC_Decode is reached, it always returns. The assumption is that earlier CHECK_predicate will bail out of the scope if it fails. We can enhance this to have OPC_Decode have a mode where it behaves the same way. Essentially, OPC_Scope can tag whether it wants to unconditionally return or return only if successful, else pop the scope. I can potentially look into these as future enhancements that can help the "attempt all" cases be robust. |
So now we'll need to manage DecoderOrder when we add new instructions? How will we know when we need to set the order? Will we just disassemble wrong and need to debug it or will we an error message from tablegen? |
I guess the process is similar to how we add decoder namespaces today, just expressed differently so that we end up with a single decoder table. For "hard" conflicts, we will get an error message unless we have the resolve-conflicts-try-all enabled. For "soft" conflicts, we today do not get any error message today and I guess folks need to debug (so no changes there). |
From the CM_JALT/CM_JT example, it seems like we can get more "soft" failures with this change? As an experiment I removed the ZcOverlap namespace in my local repo and I got
If I move CM_MVA01S/CM_MVSA01 back to to ZcOverlap. Everything else worked and passed the tests. The encoding for C_FSDSP is documented as being reused by the other instructions. Predicate checks enforce that they are mutually exclusive. |
CM_JALT/CM_JT example was with all decoding order removed and with resolve-conflicts-try-all. As you can see above from your experiment, if we don't use resolve-conflicts-try-all, we get a decoding conflict (so decoding order or namespace removed has similar effect) because with more instructions in the mix, the decoder emitter is unable to figure out what to do. With CM_MVA01S/CM_MVSA01 moved back, I guess with the remaining insts, it can make forward progress. In that sense, replacing additional conflict resolving namespaces with DecodeOrder should have the same effect and we can follow the same procedure as before when deciding how to assign DecodeOrder for both hard and soft conflicts. I am assuming soft conflicts are resolved by debugging since tablegen does not issue any error/warnings for these. That being said, the |
With the latest version, the AArch64 issue is fixed (SoftFail handling subtleties, it seems for the purpose of decoding SoftFail is like a Success and not a failure). Also, I had to introduce 2 scope stacks. |
// If we were unable to find a useful filter and there are multiple decode | ||
// orders involved, split the candidates by decode order and create per decode | ||
// order choosers. | ||
if (hasMultipleDecodeOrders()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
findBestFilter has three calls to the same-named overload. If we want to resolve conflicts automatically, only the first call should remain. The other two are broken.
Consider these two encodings:
xy0 InstA
xyz InstB
The second call to findBestFilter will put InstA in FilteredIDs and InstB in VariableIDs. The VariableIDs only have a chance to match if FilteredIDs fail.
InstA could allow patterns 000 and 110,
InstB could allow patterns 010 and 100.
If the actual encoding is 010 or 100, the generated decoder will attempt to decode it as InstA and fail, not trying to decode it as InstB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. The OpScopeNoFail that I added was meant to address that, but maybe the current scope needs to implement that behavior. Essentially, of the ops that can fail and return from the decode function, ops like FilterValue and CheckField are scope-aware, that is they are terminal (i.e., return from decodeInstruction
) only if there is no scope, else they return to the outer scope. OPC_Decode and TryDecode (when DecodeComplete is true) and not like that, they are always terminal and return from decodeInstruction. In the case above, if they had the ScopeNoFail similar to FilterValue, when decoding the "FilteredIDs" InstA Decode/TryDecode will fail, but because of the scope, it will return and continue to the variable part, where it will succeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently represented as:
{
check Inst{0} == 0
decode InstA
}
decode InstB
If there was "try decode InstA", we would fail to decode InstA, then leave the scope and decode InstB.
I'm not sure I understand what is the semantic of OpScopeNoFail and how it could help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would fail to decode InstA, then leave the scope
For DecodeOp, that's not true. It asserts DecodeComplete and always return, irrespective of whether the status was Success or not. For TryDecode, it will fail and leave scope only if DecodeComplete = false, else it will behave similar to Decode.
My proposal is essentially to make them behave as you say, by looking at the stack depth. When DecodeComplete = true and S == Fail, we will return only if the scope stack is empty, else leave the scope contained and continue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For DecodeOp, that's not true. It asserts DecodeComplete and always return, irrespective of whether the status was Success or not.
This is the intended behavior, which is controlled by hasCompleteDecoder
flag. Are you suggesting to ignore this flag and always emit TryDecode?
If not, what's the difference between Decode and TryDecode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. Taking a step back, I looked into what hasCompleteDecoder means in the decoder emitter, hasCompleteDecoder seems to set DecodeComplete = false whenever decodeToMCInst
return Fail. So the condition (DecodeComplete == true and S != Fail) will never happen (except in the bits<0>
case, which might be an oversight?). So, we really do not need 2 outputs from decodeToMCInst
(S and DecodeComplete). That's one NFC simplification that can be implemented.
Next, I agree that in my downstream cases, encodings that have the same static bit pattern but encode different register classes in the variable portion need to be marked with hasCompleteDecoder = 0. This correctly indicates the fact that even if we match all the statically know bits, there are > 1 encoding that can match, so they need to have hasCompleteDecoder = 0.
Now just doing that does not resolve the conflict, in that, I still get the decoding conflict. To address that, it seems we need to handle the case where in a filter all remaining IDs are variable IDs and there are no filtered IDs. Or something like that. Maybe I'll first implement the NFC change above to drop the DecodeComplete
output and then go from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hasCompleteDecoder seems to set DecodeComplete = false whenever
decodeToMCInst
return Fail.
This only happens with TryDecode
; Decode
never changes DecodeComplete
and so it remains true
, AFAICT
the condition (DecodeComplete == true and S != Fail) will never happen
It can happen if Decode
succeeds (or returns SoftFail
).
except in the
bits<0>
case, which might be an oversight
It might be. I didn't add a check for hasCompleteDecoder
because I assumed that the no-bits decoder function can never fail (can it?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next, I agree that in my downstream cases, encodings that have the same static bit pattern but encode different register classes in the variable portion need to be marked with hasCompleteDecoder = 0.
Except that you can't set it on a generated decoder, it will be ignored. You can set it to false on a register class operand (in which case the generated decoder will inherit this property), but that will affect decoders of all instructions that use that register class. FWIW in my opinion hasCompleteDecoder
is deficient for these reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now just doing that does not resolve the conflict, in that, I still get the decoding conflict.
Right, because hasCompleteDecoder = false doesn't change the encoding, which is currently the only factor that guides conflict resolution process.
Sometimes this can be worked around. If you know that one of your register classes always has 0 or 1 in the bitpattern at a fixed position, you can set in in the operand encoding, and together with hasCompleteDecoder = false on that register class, this can help achieve the goal.
To address that, it seems we need to handle the case where in a filter all remaining IDs are variable IDs and there are no filtered IDs.
This case is currently impossible, usefulness()
returns 0 in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I am working on the first step here, which is to eliminate DecodeComplete
. Essentially, for TryDecode, a return status of Fail can be interpreted as incomplete decoding. This keeps the interpretation as simple as if decoding is not successful and the decoder is not complete; we need to try more. Testing with check-llvm-mc-aarch64
(the only target that uses this feature) shows all tests pass. See #159130.
step 2 then could be to lift the restriction that hasCompleteDecoder
cannot apply to generated decoders. It just means that the decoding failure for this op (from either generated or custom decoders) is not a terminal failure but a signal to attempt more (i.e., how to interpret the result of decodeToMCInst). [This is not strictly necessary but seems reasonable to me].
In general, we would expect that an instruction with hasCompleteDecoder = 0 is in a scope so that it can be popped to try another inst, unless this instruction is the last in the list of attempted ones. When we have a filter with both FilteredIDs and VariableIDs, we first insert a scope around FilteredIDs, and that seems correct, since based on the filtered bits, only of the encodings in FilteredIDs is expected to match, else fail. But if VariableFC
has multiple entries and some have hasCompleteDecoder = 0, we need to try them all I'd think, but today we don't (need to concretize this more).
Next is to then address case #3. For example, before declaring a conflict and failing, we check if among the set of instructions remaining, if they all have hasCompleteDecoder = false, we can attempt them one by one.
Anyway, just some thoughts that probably need more validation.
// enabled. | ||
enum DecoderOps { | ||
OPC_Scope = 1, // OPC_Scope(nts_t NumToSkip) | ||
OPC_ScopeNoFail, // OPC_ScopeNoFail(nts_t NumToSkip) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC it shouldn't exist. If the scope can fall through, it is equivalent to having no scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above, maybe what needs to happen in TryDecode/Decode need to be non-terminal when under a scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@s-barannikov WDTY?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking that itself will help eliminate the AArch64 Fallback and AMDGPU Fake16 namespaces.
Here is an interesting decoding conflict (I replaced some '.' with a/d/m to give names to fields):
1111 is the encoding of the PC register. One could argue that it is wrong to accept 1111 for What I'm trying to say is that "decoding order" will not always work and we need a more sophisticated solution. |
I am under the (maybe mistaken) impression that the target is responsible that for a given bit pattern and features, only a single decoding succeeds for a given bit pattern, because that's what happens on the actual HW (each bit pattern has a deterministic interpretation). So an "O0" decoder can be built by essentially segregating each instruction into its own namespace, building N decoder tables and iterating through them till one succeeds (similar to resolve-conflicts-try-all). And in such a scenario, exactly one succeeds or none (i.e., return MCDisassembler::Success or SoftFail) and that's the final result of decoding. And then what we do on top of this O0 is a CSE/commoning of checks across these N instructions. In that case, the result should be independent of the decode order attempted. But maybe we are not there yet. |
I totally agree. The reality, however, is imperfect. For example, when I tried to hoist predicate check before singleton field checks, AArch64 tests started to fail. This is because the predicated encoding was more specific than the unpredicated one, and early predicate check failure made the decoder choose the second encoding. The expected behavior is to choose the first one, fail the predicate check, and report an invalid encoding. This is my greatest concern. Any big steps in the direction of improving conflict resolution will inevitably lead to change in the check/decode order that most, if not all, targets rely on (consciously or not). I would suggest doing small steps instead. For instance, start by removing the third call to findBestFilter and develop a simple solution that fixes ARM/SystemZ conflicts (there are few). |
I also want to prioritize the downstream issue with too many namespaces becoming unmanageable and address that first. So I am thinking this: Change decode ops to not return when decoding succeed but status is fail when the stack is non-empty, but instead continue. This allows to encode attempting multiple decodings without prematurely returning without attempting all. Next, add the resolve-conflicts-try-all which helps our downstream example. Next, drop fallback and fake16 namespaces for aarch64/amdgpu. And then go from there. |
@s-barannikov I am attempting another heuristic as that seems will help our downstream cases. See https://github.com/llvm/llvm-project/pull/159201/files specifically:
For our downstream use case, where we have conflicts due to un-encoded register classes, I expect to see > 1 value in "AllUnfilteredUnknown" and we will mark then as having incomplete decoders. |
I think the quoted comment basically describes what the third call to findBestFilter does, but it only does this when there are exactly three instructions in the set. It is a hack and I'd like it very much to be removed (replaced with other, more reliable heuristic).
I think I'll call it out that helping with downstream use cases is not a good enough reason to upstream the change. We should make sure the solution is enabled and works on all upstream targets and the transition is not disruptive. Marking all encodings in "AllUnfilteredUnknown" as having incomplete decoders sounds reasonable to me. I couldn't think of any other solution that doesn't require targets to provide tie breakers. I have one question though. How these two encodings will be decoded? Neither of them is more specific than another, but both have some constant bits.
|
Reading the quoted comment more carefully, I think those will end up in sibling scopes. Did I understand correctly? |
Thanks, yeah, the code in question seems to help AArch64 in eliminating the fallback namesapce, so in that sense its useful upstream. And as it happens, I think it will/can be made to work for my downstream case. And since it kicks in at the point we previously declared conflict, it should not regress any existing code and can then be gradually adopted. Note that currently the encodings in AllUnfilteredUnknown are not forced to be incomplete, just that they can be (i.e., decoder emitter will not force anything, targets need to to get the right behavior). For the 0x/x0 case, I think we will still call it a conflict since there is no single instruction with all unfiltered bits unknown, so the new code will not kick in. |
So you want to add a heuristic and not replace the broken ones. I guess it is the least disruptive way forward... |
That's my current thinking. I'll look into the 3rd call tomorrow in more detail and see if we can generalize it, but what I am proposing seems safe to do first to avoid a large fallout and then try to unify. |
Here's an example of the conflict resolution in the 3rd call:
It seems targeted to do 2 things really: Remove one of the 2 instructions with the known bit pattern into a separate filter (say in this case tADDrSP). Then we will create a filter with a single FilteredID = tADDrSP and create a sub-chooser for the remaining two, which will then lead to another choose with a single FilteredID = tADDspr and then tADDhirr in the variable part. Now if I just disable case 3 and enable my heuristic, we will split out |
LMK if this current plan of adding a new heuristic seems reasonable and I work on cleaning up the code (maybe after your latest refactor goes in). |
I think that won't work in SystemZ case, which wants opcodes with more bits matched first. I'm not entirely sure those two opcodes with more bits cannot be InstAlias, but this is a precedent we have to deal with.
Sounds good, let me know when it is ready for review. |
Yes I 0plan to work on this after your refactor goes in |
Add support for 2 features in DecoderEmitter to help resolve conflicts without resorting to decoder namespaces.
DecodeOrder:
Add the ability to specify a
DecodeOrder
on an instruction encoding. When the decoder emitter has to attempt decoding multiple candidate instructions, it it attempt them in theirDecodeOrder
, and if theDecoderOrder
is same, but their Enum order (current behavior). This is useful in cases where filters created with both varying and fixed encoding currently always attempt the fixed encoding and then variable ones. If the set of candidates involved have differentDecodeOrder
, these attempts will now happen within eachDecodeOrder
. This helps eliminate, for example, AMDGPU's _FAKE16 decoder namesapces. Additionally, when we are unable to find a best filter and the set of instructions involved have multipleDecodeOrder
, we will first bucket them according to their decode order and then try to process each bucket. This emulates the current mode of trying the decoder namespaces in a fixed order to resolve hard conflicts.resolve-conflicts-try-all:
This option helps cases where currently we declare a decoding conflict and fail. Under this option, we will instead try all candidate encoding (in their DecoderOrder/Enum order).
Together, these 2 features help eliminate the use of decoder namespaces as a conflict resolution tool and allows coalescing multiple decode tables for such decoder namespace usage into a single one.
Adopt
DecodeOrder
feature for a few targets:_FAKE16
variants of decoder namespaces.