Skip to content

Conversation

jurahul
Copy link
Contributor

@jurahul jurahul commented Sep 10, 2025

Add support for 2 features in DecoderEmitter to help resolve conflicts without resorting to decoder namespaces.

DecodeOrder:
Add the ability to specify a DecodeOrder on an instruction encoding. When the decoder emitter has to attempt decoding multiple candidate instructions, it it attempt them in their DecodeOrder, and if the DecoderOrder is same, but their Enum order (current behavior). This is useful in cases where filters created with both varying and fixed encoding currently always attempt the fixed encoding and then variable ones. If the set of candidates involved have different DecodeOrder, these attempts will now happen within each DecodeOrder. This helps eliminate, for example, AMDGPU's _FAKE16 decoder namesapces. Additionally, when we are unable to find a best filter and the set of instructions involved have multiple DecodeOrder, we will first bucket them according to their decode order and then try to process each bucket. This emulates the current mode of trying the decoder namespaces in a fixed order to resolve hard conflicts.

resolve-conflicts-try-all:
This option helps cases where currently we declare a decoding conflict and fail. Under this option, we will instead try all candidate encoding (in their DecoderOrder/Enum order).

Together, these 2 features help eliminate the use of decoder namespaces as a conflict resolution tool and allows coalescing multiple decode tables for such decoder namespace usage into a single one.

Adopt DecodeOrder feature for a few targets:

  1. Replace AArch64 "Fallback" decoder namespace with DecodeOrder = 1.
  2. Eliminate AMDGPU's usafe of _FAKE16 variants of decoder namespaces.
  3. Eliminate some RISCV 16-bit decoder namespaces.

@topperc
Copy link
Collaborator

topperc commented Sep 10, 2025

For the most part RISC-V doesn't need an "order". We just need to be able to continue to check a "conflicting" encoding when a predicate fails. Most of the predicates on the conflicts are mutually exclusive. For example, the RV32Only namespace exists because the RV64 instructions with the same opcode are in the main namespace.

@jurahul
Copy link
Contributor Author

jurahul commented Sep 10, 2025

For the most part RISC-V doesn't need an "order". We just need to be able to continue to check a "conflicting" encoding when a predicate fails. Most of the predicates on the conflicts are mutually exclusive. For example, the RV32Only namespace exists because the RV64 instructions with the same opcode are in the main namespace.

Thanks. Does that mean that I can just set the DecodeOrder to be 0 in all of them and things should still work? If that's that case, why were some of these namespaces used in the first place?

PS: I'll try that.

@topperc
Copy link
Collaborator

topperc commented Sep 10, 2025

Thanks. Does that mean that I can just set the DecodeOrder to be 0 in all of them and things should still work?

Probably not all of them. DecoderTableZicfiss16 does need to be prioritized before DecoderTable16, DecoderTableRV32Only16, and DecoderTableZcOverlap16.

DecoderTable16, DecoderTableRV32Only16, and DecoderTableZcOverlap16 should be able to be resolved solely by predicates.

DecoderTable32, DecoderTableRV32Only32, DecoderTableZfinx32, DecoderTableZdinxRV32Only32 should be resolved by predicates.

The DecoderTableX* namespaces need to be before the other namespaces. These are the vendor namespaces. Different vendors can overlap encodings with each other and with the standard encodings. I think conflicts between vendor tables can be resolved by predicates. Though we don't currently enforce that predicates from two different vendors can't be turned on at once. The order of the vendor tables between each other is currently arbitrary. So they can probably all have the same order.

@jurahul
Copy link
Contributor Author

jurahul commented Sep 10, 2025

Ok, thanks. I guess basically for the 16 bit tables, DecoderTableZicfiss16 can get DecodeOrder = -1, so its prioritized over the others and DecoderTable16, DecoderTableRV32Only16, and DecoderTableZcOverlap16 can share the same order. I am assuming they might run into decoding conflicts that can be resolved using the resolve-conflicts-try-all assuming their predicates are mutually exclusive, so only one of them will succeed.

@jurahul
Copy link
Contributor Author

jurahul commented Sep 11, 2025

I tried the RISCV changes you suggested and they don't work exactly. For example, I see a CM_JT vs CM_JALT failure (for pattern [0x06,0xa0]). The reason is that in the current case, we end up with a filter with just these 2 instructions:

    101000........10
    101000________10  CM_JALT
    101000000_____10  CM_JT

and then create a variable ID filter for JALT and a fixed one for JT, essentially attempting JT before JALT and that decodes to a CM_JT. If I add other instructions to the mix, the logic for creating best filter does not kick in anymore, so we fall back to trying these in sequential order (which happens to be the enum order). So we attempt to decode this as CM_JALT and fail. The predicates for these 2 instructions are same, and the ordering of fixed before variable is lost when we add more instructions to the mix.

When we have a mix with different decoder namespaces, we attempt a split and then it results in creating a smaller filter which then restores this behavior. So I suggest (if we want to adopt this for RISCV) we first go with simple replacement of decoder namespace with ordering and then potentially work on improving the heuristics here. As an example, if we cannot find the best filter, can we bucket instructions based on predicates for these insts and then filter (assuming that can disambiguate enough).

@jurahul
Copy link
Contributor Author

jurahul commented Sep 11, 2025

Basically, we are just replacing existing namespaces that serve to resolve conflicts (hard or "soft" as in case of CM_JT/CM_JALT) with DecodeOrder. If we can to improve this further, we can explore two things (1) predicate based "splitting" if that works. (2) Currently when OPC_Decode is reached, it always returns. The assumption is that earlier CHECK_predicate will bail out of the scope if it fails. We can enhance this to have OPC_Decode have a mode where it behaves the same way. Essentially, OPC_Scope can tag whether it wants to unconditionally return or return only if successful, else pop the scope. I can potentially look into these as future enhancements that can help the "attempt all" cases be robust.

@topperc
Copy link
Collaborator

topperc commented Sep 11, 2025

I tried the RISCV changes you suggested and they don't work exactly. For example, I see a CM_JT vs CM_JALT failure (for pattern [0x06,0xa0]). The reason is that in the current case, we end up with a filter with just these 2 instructions:

    101000........10
    101000________10  CM_JALT
    101000000_____10  CM_JT

and then create a variable ID filter for JALT and a fixed one for JT, essentially attempting JT before JALT and that decodes to a CM_JT. If I add other instructions to the mix, the logic for creating best filter does not kick in anymore, so we fall back to trying these in sequential order (which happens to be the enum order). So we attempt to decode this as CM_JALT and fail. The predicates for these 2 instructions are same, and the ordering of fixed before variable is lost when we add more instructions to the mix.

When we have a mix with different decoder namespaces, we attempt a split and then it results in creating a smaller filter which then restores this behavior. So I suggest (if we want to adopt this for RISCV) we first go with simple replacement of decoder namespace with ordering and then potentially work on improving the heuristics here. As an example, if we cannot find the best filter, can we bucket instructions based on predicates for these insts and then filter (assuming that can disambiguate enough).

So now we'll need to manage DecoderOrder when we add new instructions? How will we know when we need to set the order? Will we just disassemble wrong and need to debug it or will we an error message from tablegen?

@jurahul
Copy link
Contributor Author

jurahul commented Sep 11, 2025

I guess the process is similar to how we add decoder namespaces today, just expressed differently so that we end up with a single decoder table. For "hard" conflicts, we will get an error message unless we have the resolve-conflicts-try-all enabled. For "soft" conflicts, we today do not get any error message today and I guess folks need to debug (so no changes there).

@topperc
Copy link
Collaborator

topperc commented Sep 11, 2025

I guess the process is similar to how we add decoder namespaces today, just expressed differently so that we end up with a single decoder table. For "hard" conflicts, we will get an error message unless we have the resolve-conflicts-try-all enabled. For "soft" conflicts, we today do not get any error message today and I guess folks need to debug (so no changes there).

From the CM_JALT/CM_JT example, it seems like we can get more "soft" failures with this change?

As an experiment I removed the ZcOverlap namespace in my local repo and I got

Decoding Conflict:
    ................
    101.............
    101...........10
    101000________10  CM_JALT
    101000000_____10  CM_JT
    101011___11___10  CM_MVA01S
    101011___01___10  CM_MVSA01
    10111010______10  CM_POP
    10111110______10  CM_POPRET
    10111100______10  CM_POPRETZ
    10111000______10  CM_PUSH
    101___________10  C_FSDSP

If I move CM_MVA01S/CM_MVSA01 back to to ZcOverlap. Everything else worked and passed the tests.

The encoding for C_FSDSP is documented as being reused by the other instructions. Predicate checks enforce that they are mutually exclusive.

@jurahul
Copy link
Contributor Author

jurahul commented Sep 11, 2025

CM_JALT/CM_JT example was with all decoding order removed and with resolve-conflicts-try-all. As you can see above from your experiment, if we don't use resolve-conflicts-try-all, we get a decoding conflict (so decoding order or namespace removed has similar effect) because with more instructions in the mix, the decoder emitter is unable to figure out what to do. With CM_MVA01S/CM_MVSA01 moved back, I guess with the remaining insts, it can make forward progress. In that sense, replacing additional conflict resolving namespaces with DecodeOrder should have the same effect and we can follow the same procedure as before when deciding how to assign DecodeOrder for both hard and soft conflicts. I am assuming soft conflicts are resolved by debugging since tablegen does not issue any error/warnings for these.

That being said, the resolve-conflicts-try-all option as implemented here is broken (unspringing as it was not exercised anywhere). The issue is that the current OPC_Decode/TryDecode return as soon as decoding is completed even if it failed. So we need a mode where we return from the decoding code only if decoding is completed and its successful, else keep trying. I have now added support for this as well with a "non-failing" scope which is useful for this linear search. With that, I can get all the RISCV MC tests to pass.

@jurahul
Copy link
Contributor Author

jurahul commented Sep 11, 2025

With the latest version, the AArch64 issue is fixed (SoftFail handling subtleties, it seems for the purpose of decoding SoftFail is like a Success and not a failure). Also, I had to introduce 2 scope stacks.

// If we were unable to find a useful filter and there are multiple decode
// orders involved, split the candidates by decode order and create per decode
// order choosers.
if (hasMultipleDecodeOrders()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findBestFilter has three calls to the same-named overload. If we want to resolve conflicts automatically, only the first call should remain. The other two are broken.
Consider these two encodings:

xy0 InstA
xyz InstB

The second call to findBestFilter will put InstA in FilteredIDs and InstB in VariableIDs. The VariableIDs only have a chance to match if FilteredIDs fail.

InstA could allow patterns 000 and 110,
InstB could allow patterns 010 and 100.

If the actual encoding is 010 or 100, the generated decoder will attempt to decode it as InstA and fail, not trying to decode it as InstB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. The OpScopeNoFail that I added was meant to address that, but maybe the current scope needs to implement that behavior. Essentially, of the ops that can fail and return from the decode function, ops like FilterValue and CheckField are scope-aware, that is they are terminal (i.e., return from decodeInstruction) only if there is no scope, else they return to the outer scope. OPC_Decode and TryDecode (when DecodeComplete is true) and not like that, they are always terminal and return from decodeInstruction. In the case above, if they had the ScopeNoFail similar to FilterValue, when decoding the "FilteredIDs" InstA Decode/TryDecode will fail, but because of the scope, it will return and continue to the variable part, where it will succeed.

Copy link
Contributor

@s-barannikov s-barannikov Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently represented as:

{
  check Inst{0} == 0
  decode InstA
}
decode InstB

If there was "try decode InstA", we would fail to decode InstA, then leave the scope and decode InstB.
I'm not sure I understand what is the semantic of OpScopeNoFail and how it could help here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would fail to decode InstA, then leave the scope

For DecodeOp, that's not true. It asserts DecodeComplete and always return, irrespective of whether the status was Success or not. For TryDecode, it will fail and leave scope only if DecodeComplete = false, else it will behave similar to Decode.

My proposal is essentially to make them behave as you say, by looking at the stack depth. When DecodeComplete = true and S == Fail, we will return only if the scope stack is empty, else leave the scope contained and continue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For DecodeOp, that's not true. It asserts DecodeComplete and always return, irrespective of whether the status was Success or not.

This is the intended behavior, which is controlled by hasCompleteDecoder flag. Are you suggesting to ignore this flag and always emit TryDecode?
If not, what's the difference between Decode and TryDecode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. Taking a step back, I looked into what hasCompleteDecoder means in the decoder emitter, hasCompleteDecoder seems to set DecodeComplete = false whenever decodeToMCInst return Fail. So the condition (DecodeComplete == true and S != Fail) will never happen (except in the bits<0> case, which might be an oversight?). So, we really do not need 2 outputs from decodeToMCInst (S and DecodeComplete). That's one NFC simplification that can be implemented.

Next, I agree that in my downstream cases, encodings that have the same static bit pattern but encode different register classes in the variable portion need to be marked with hasCompleteDecoder = 0. This correctly indicates the fact that even if we match all the statically know bits, there are > 1 encoding that can match, so they need to have hasCompleteDecoder = 0.

Now just doing that does not resolve the conflict, in that, I still get the decoding conflict. To address that, it seems we need to handle the case where in a filter all remaining IDs are variable IDs and there are no filtered IDs. Or something like that. Maybe I'll first implement the NFC change above to drop the DecodeComplete output and then go from there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasCompleteDecoder seems to set DecodeComplete = false whenever decodeToMCInst return Fail.

This only happens with TryDecode; Decode never changes DecodeComplete and so it remains true, AFAICT

the condition (DecodeComplete == true and S != Fail) will never happen

It can happen if Decode succeeds (or returns SoftFail).

except in the bits<0> case, which might be an oversight

It might be. I didn't add a check for hasCompleteDecoder because I assumed that the no-bits decoder function can never fail (can it?).

Copy link
Contributor

@s-barannikov s-barannikov Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next, I agree that in my downstream cases, encodings that have the same static bit pattern but encode different register classes in the variable portion need to be marked with hasCompleteDecoder = 0.

Except that you can't set it on a generated decoder, it will be ignored. You can set it to false on a register class operand (in which case the generated decoder will inherit this property), but that will affect decoders of all instructions that use that register class. FWIW in my opinion hasCompleteDecoder is deficient for these reasons.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now just doing that does not resolve the conflict, in that, I still get the decoding conflict.

Right, because hasCompleteDecoder = false doesn't change the encoding, which is currently the only factor that guides conflict resolution process.

Sometimes this can be worked around. If you know that one of your register classes always has 0 or 1 in the bitpattern at a fixed position, you can set in in the operand encoding, and together with hasCompleteDecoder = false on that register class, this can help achieve the goal.

To address that, it seems we need to handle the case where in a filter all remaining IDs are variable IDs and there are no filtered IDs.

This case is currently impossible, usefulness() returns 0 in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I am working on the first step here, which is to eliminate DecodeComplete. Essentially, for TryDecode, a return status of Fail can be interpreted as incomplete decoding. This keeps the interpretation as simple as if decoding is not successful and the decoder is not complete; we need to try more. Testing with check-llvm-mc-aarch64 (the only target that uses this feature) shows all tests pass. See #159130.

step 2 then could be to lift the restriction that hasCompleteDecoder cannot apply to generated decoders. It just means that the decoding failure for this op (from either generated or custom decoders) is not a terminal failure but a signal to attempt more (i.e., how to interpret the result of decodeToMCInst). [This is not strictly necessary but seems reasonable to me].

In general, we would expect that an instruction with hasCompleteDecoder = 0 is in a scope so that it can be popped to try another inst, unless this instruction is the last in the list of attempted ones. When we have a filter with both FilteredIDs and VariableIDs, we first insert a scope around FilteredIDs, and that seems correct, since based on the filtered bits, only of the encodings in FilteredIDs is expected to match, else fail. But if VariableFC has multiple entries and some have hasCompleteDecoder = 0, we need to try them all I'd think, but today we don't (need to concretize this more).

Next is to then address case #3. For example, before declaring a conflict and failing, we check if among the set of instructions remaining, if they all have hasCompleteDecoder = false, we can attempt them one by one.

Anyway, just some thoughts that probably need more validation.

// enabled.
enum DecoderOps {
OPC_Scope = 1, // OPC_Scope(nts_t NumToSkip)
OPC_ScopeNoFail, // OPC_ScopeNoFail(nts_t NumToSkip)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC it shouldn't exist. If the scope can fall through, it is equivalent to having no scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, maybe what needs to happen in TryDecode/Decode need to be non-terminal when under a scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@jurahul jurahul Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking that itself will help eliminate the AArch64 Fallback and AMDGPU Fake16 namespaces.

@s-barannikov
Copy link
Contributor

s-barannikov commented Sep 12, 2025

Here is an interesting decoding conflict (I replaced some '.' with a/d/m to give names to fields):

    111110110101....aaaadddd0000mmmm
    111110110101________11110000____  t2AUTG
    111110110101____1111____0000____  t2SMMUL
    111110110101____________0000____  t2SMMLA

t2AUTG can only match if aaaa != 1111, (otherwise it is t2SMMUL)
t2SMMUL can only match if dddd != 1111, (otherwise it is t2AUTG)
t2SMMLA can only match if aaaa != 1111 && dddd != 1111 (otherwise it is either t2SMMUL or t2AUTG)

1111 is the encoding of the PC register.
Now if we have aaaa == 1111 or dddd == 1111, no matter which order we try, we will always successfully decode the first attempted instruction. The issue is that rGPR decoder accepts 1111 as a valid (SoftFail) bitpattern, even though rGPR class does not contain PC.

One could argue that it is wrong to accept 1111 for rGPR, but it is not wrong for other instructions -- they should indeed be decoded and marked as SoftFail.

What I'm trying to say is that "decoding order" will not always work and we need a more sophisticated solution.
I was thinking about adding a "decoder predicate" -- small function or inline code that can disambiguate encodings, but I couldn't figure out what that function/code should take as input.

@jurahul
Copy link
Contributor Author

jurahul commented Sep 15, 2025

Here is an interesting decoding conflict (I replaced some '.' with a/d/m to give names to fields):

    111110110101....aaaadddd0000mmmm
    111110110101________11110000____  t2AUTG
    111110110101____1111____0000____  t2SMMUL
    111110110101____________0000____  t2SMMLA

t2AUTG can only match if aaaa != 1111, (otherwise it is t2SMMUL) t2SMMUL can only match if dddd != 1111, (otherwise it is t2AUTG) t2SMMLA can only match if aaaa != 1111 && dddd != 1111 (otherwise it is either t2SMMUL or t2AUTG)

1111 is the encoding of the PC register. Now if we have aaaa == 1111 or dddd == 1111, no matter which order we try, we will always successfully decode the first attempted instruction. The issue is that rGPR decoder accepts 1111 as a valid (SoftFail) bitpattern, even though rGPR class does not contain PC.

One could argue that it is wrong to accept 1111 for rGPR, but it is not wrong for other instructions -- they should indeed be decoded and marked as SoftFail.

What I'm trying to say is that "decoding order" will not always work and we need a more sophisticated solution. I was thinking about adding a "decoder predicate" -- small function or inline code that can disambiguate encodings, but I couldn't figure out what that function/code should take as input.

I am under the (maybe mistaken) impression that the target is responsible that for a given bit pattern and features, only a single decoding succeeds for a given bit pattern, because that's what happens on the actual HW (each bit pattern has a deterministic interpretation). So an "O0" decoder can be built by essentially segregating each instruction into its own namespace, building N decoder tables and iterating through them till one succeeds (similar to resolve-conflicts-try-all). And in such a scenario, exactly one succeeds or none (i.e., return MCDisassembler::Success or SoftFail) and that's the final result of decoding. And then what we do on top of this O0 is a CSE/commoning of checks across these N instructions. In that case, the result should be independent of the decode order attempted. But maybe we are not there yet.

@s-barannikov
Copy link
Contributor

I am under the (maybe mistaken) impression that the target is responsible that for a given bit pattern and features, only a single decoding succeeds for a given bit pattern, because that's what happens on the actual HW (each bit pattern has a deterministic interpretation). So an "O0" decoder can be built by essentially segregating each instruction into its own namespace, building N decoder tables and iterating through them till one succeeds (similar to resolve-conflicts-try-all). And in such a scenario, exactly one succeeds or none (i.e., return MCDisassembler::Success or SoftFail) and that's the final result of decoding. And then what we do on top of this O0 is a CSE/commoning of checks across these N instructions. In that case, the result should be independent of the decode order attempted. But maybe we are not there yet.

I totally agree. The reality, however, is imperfect. For example, when I tried to hoist predicate check before singleton field checks, AArch64 tests started to fail. This is because the predicated encoding was more specific than the unpredicated one, and early predicate check failure made the decoder choose the second encoding. The expected behavior is to choose the first one, fail the predicate check, and report an invalid encoding.

This is my greatest concern. Any big steps in the direction of improving conflict resolution will inevitably lead to change in the check/decode order that most, if not all, targets rely on (consciously or not).

I would suggest doing small steps instead. For instance, start by removing the third call to findBestFilter and develop a simple solution that fixes ARM/SystemZ conflicts (there are few).

@jurahul
Copy link
Contributor Author

jurahul commented Sep 15, 2025

I also want to prioritize the downstream issue with too many namespaces becoming unmanageable and address that first. So I am thinking this:

Change decode ops to not return when decoding succeed but status is fail when the stack is non-empty, but instead continue. This allows to encode attempting multiple decodings without prematurely returning without attempting all. Next, add the resolve-conflicts-try-all which helps our downstream example. Next, drop fallback and fake16 namespaces for aarch64/amdgpu. And then go from there.

@jurahul
Copy link
Contributor Author

jurahul commented Sep 16, 2025

@s-barannikov I am attempting another heuristic as that seems will help our downstream cases. See https://github.com/llvm/llvm-project/pull/159201/files specifically:

 // If we are unable to find the best filter, it generally means for all
  // unfiltered bit positions, we have atleast one instruction that has that
  // bit position unknown. Implement a heuristic to see if we can find
  // instructions that have all unfiltered bits unknown. In such cases, we
  // separate such insts and create a sub-chooser with the remaining
  // instructions with atleast one unfiltered bit known. The idea is to
  // give the remaining insts a first chance to decode since they assert more
  // known bits than the ones with all unknown ones. If the sub-chooser with the
  // remaining instructions is not able to decode, we decode the segregated
  // instructions 1-by-1. That is, we generate:
  //
  //   OPC_SCope {
  //     sub-chooser for unfiltered known bits;
  //   }
  //   OPC_Scope {
  //     decode unfiltered unknown 0
  //   }
  //   OPC_Scope {
  //     decode unfiltered unknown 1
  //   }
  //   decode unfiltered unknown (last)

For our downstream use case, where we have conflicts due to un-encoded register classes, I expect to see > 1 value in "AllUnfilteredUnknown" and we will mark then as having incomplete decoders.

@s-barannikov
Copy link
Contributor

s-barannikov commented Sep 16, 2025

I think the quoted comment basically describes what the third call to findBestFilter does, but it only does this when there are exactly three instructions in the set. It is a hack and I'd like it very much to be removed (replaced with other, more reliable heuristic).

For our downstream use case, where we have conflicts due to un-encoded register classes, I expect to see > 1 value in "AllUnfilteredUnknown" and we will mark then as having incomplete decoders.

I think I'll call it out that helping with downstream use cases is not a good enough reason to upstream the change. We should make sure the solution is enabled and works on all upstream targets and the transition is not disruptive.

Marking all encodings in "AllUnfilteredUnknown" as having incomplete decoders sounds reasonable to me. I couldn't think of any other solution that doesn't require targets to provide tie breakers.

I have one question though.

How these two encodings will be decoded? Neither of them is more specific than another, but both have some constant bits.

0x
x0

@s-barannikov
Copy link
Contributor

How these two encodings will be decoded? Neither of them is more specific than another, but both have some constant bits.

Reading the quoted comment more carefully, I think those will end up in sibling scopes. Did I understand correctly?

@jurahul
Copy link
Contributor Author

jurahul commented Sep 17, 2025

Thanks, yeah, the code in question seems to help AArch64 in eliminating the fallback namesapce, so in that sense its useful upstream. And as it happens, I think it will/can be made to work for my downstream case. And since it kicks in at the point we previously declared conflict, it should not regress any existing code and can then be gradually adopted.

Note that currently the encodings in AllUnfilteredUnknown are not forced to be incomplete, just that they can be (i.e., decoder emitter will not force anything, targets need to to get the right behavior).

For the 0x/x0 case, I think we will still call it a conflict since there is no single instruction with all unfiltered bits unknown, so the new code will not kick in.

@s-barannikov
Copy link
Contributor

s-barannikov commented Sep 17, 2025

So you want to add a heuristic and not replace the broken ones. I guess it is the least disruptive way forward...

@jurahul
Copy link
Contributor Author

jurahul commented Sep 17, 2025

That's my current thinking. I'll look into the 3rd call tomorrow in more detail and see if we can generalize it, but what I am proposing seems safe to do first to avoid a large fallout and then try to unify.

@jurahul
Copy link
Contributor Author

jurahul commented Sep 17, 2025

Here's an example of the conflict resolution in the 3rd call:

3rd call, bits 3,4
    ................
    0100............
    01000...........
    01000100........
    01000100________  tADDhirr
    01000100_1101___  tADDrSP
    010001001____101  tADDspr

It seems targeted to do 2 things really: Remove one of the 2 instructions with the known bit pattern into a separate filter (say in this case tADDrSP). Then we will create a filter with a single FilteredID = tADDrSP and create a sub-chooser for the remaining two, which will then lead to another choose with a single FilteredID = tADDspr and then tADDhirr in the variable part.

Now if I just disable case 3 and enable my heuristic, we will split out tADDhirr into its own entry and try to create a filter using tADDrSP and tADDspr and these 2 by themselves still conflict. So, we could subsume part of case 3 here and change the condition to size() == 2, but not sure if that's worth it (in terms of entangling the two heuristics).

@jurahul
Copy link
Contributor Author

jurahul commented Sep 17, 2025

LMK if this current plan of adding a new heuristic seems reasonable and I work on cleaning up the code (maybe after your latest refactor goes in).

@s-barannikov
Copy link
Contributor

Now if I just disable case 3 and enable my heuristic, we will split out tADDhirr into its own entry and try to create a filter using tADDrSP and tADDspr and these 2 by themselves still conflict. So, we could subsume part of case 3 here and change the condition to size() == 2, but not sure if that's worth it (in terms of entangling the two heuristics).

I think that won't work in SystemZ case, which wants opcodes with more bits matched first. I'm not entirely sure those two opcodes with more bits cannot be InstAlias, but this is a precedent we have to deal with.

LMK if this current plan of adding a new heuristic seems reasonable and I work on cleaning up the code (maybe after your latest refactor goes in).

Sounds good, let me know when it is ready for review.
I would humbly suggest that my refactoring goes in first because (1) it will hopefully simplify modification and (2) to not lose history of your changes after FilterChooser and DecoderTreeBuilder are extracted to a separate file (and then FilterChooser merged into DecoderTreeBuilder).

@jurahul
Copy link
Contributor Author

jurahul commented Sep 18, 2025

Yes I 0plan to work on this after your refactor goes in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants