Skip to content

Conversation

@statham-arm
Copy link
Collaborator

@statham-arm statham-arm commented Nov 5, 2024

If the assembler sees this instruction, understanding foo to be an external symbol, there's no relocation it can write that will put the whole value of foo into the 8-bit immediate field of the 16-bit Thumb add instruction. So it should report an error message pointing at the source line, and in LLVM 18, it did exactly that. But now the error is not reported, due to an indexing error in the operand list in validateInstruction, and instead the code continues to attempt assembly, ending up aborting at the llvm_unreachable at the end of getHiLoImmOpValue.

In this commit I've fixed the index in the ARM::tMOVi8 case of validateInstruction, and also the one for tADDi8 which must cope with either the 2- or 3-operand form in the input assembly source. But also, while writing the test, I found that if you assemble for Armv7-M instead of Armv6-M, the instruction has opcode t2ADDri when it goes through validateInstruction, and only turns into tMOVi8 later in processInstruction. Then it's too late for validateInstruction to report that error. So I've adjusted processInstruction to spot that case and inhibit the conversion.

If the assembler sees this instruction, understanding 'foo' to be an
external symbol, there's no relocation it can write that will put the
whole value of 'foo' into the 8-bit immediate field of the 16-bit
Thumb add instruction. So it should report an error message pointing
at the source line, and in LLVM 18, it did exactly that. But now the
error is not reported, due to an indexing error in the operand list in
`validateInstruction`, and instead the code continues to attempt
assembly, ending up aborting at the `llvm_unreachable` at the end of
`getHiLoImmOpValue`.

In this commit I've fixed the index in the `ARM::tMOVi8` case of
`validateInstruction`, and also the one for `tADDi8` which must cope
with either the 2- or 3-operand form in the input assembly source. But
also, while writing the test, I found that if you assemble for Armv7-M
instead of Armv6-M, the instruction has opcode `t2ADDri` when it goes
through `validateInstruction`, and only turns into `tMOVi8` later in
`processInstruction`. Then it's too late for `validateInstruction` to
report that error. So I've introduced a new `revalidateInstruction`
which is run _after_ modifying the instruction, and repeated the same
check there.
@llvmbot
Copy link
Member

llvmbot commented Nov 5, 2024

@llvm/pr-subscribers-backend-arm

@llvm/pr-subscribers-mc

Author: Simon Tatham (statham-arm)

Changes

If the assembler sees this instruction, understanding foo to be an external symbol, there's no relocation it can write that will put the whole value of foo into the 8-bit immediate field of the 16-bit Thumb add instruction. So it should report an error message pointing at the source line, and in LLVM 18, it did exactly that. But now the error is not reported, due to an indexing error in the operand list in validateInstruction, and instead the code continues to attempt assembly, ending up aborting at the llvm_unreachable at the end of getHiLoImmOpValue.

In this commit I've fixed the index in the ARM::tMOVi8 case of validateInstruction, and also the one for tADDi8 which must cope with either the 2- or 3-operand form in the input assembly source. But also, while writing the test, I found that if you assemble for Armv7-M instead of Armv6-M, the instruction has opcode t2ADDri when it goes through validateInstruction, and only turns into tMOVi8 later in processInstruction. Then it's too late for validateInstruction to report that error. So I've introduced a new revalidateInstruction which is run after modifying the instruction, and repeated the same check there.


Full diff: https://github.com/llvm/llvm-project/pull/115026.diff

2 Files Affected:

  • (modified) llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp (+65-9)
  • (added) llvm/test/MC/ARM/lower-upper-errors.s (+20)
diff --git a/llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp b/llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
index 0df1c336a22146..8e3e27dec76ff1 100644
--- a/llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
+++ b/llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
@@ -659,6 +659,8 @@ class ARMAsmParser : public MCTargetAsmParser {
 
   bool validateInstruction(MCInst &Inst, const OperandVector &Ops,
                            unsigned MnemonicOpsEndInd);
+  bool revalidateInstruction(MCInst &Inst, const OperandVector &Ops,
+                             unsigned MnemonicOpsEndInd, unsigned OrigOpcode);
   bool processInstruction(MCInst &Inst, const OperandVector &Ops,
                           unsigned MnemonicOpsEndInd, MCStreamer &Out);
   bool shouldOmitVectorPredicateOperand(StringRef Mnemonic,
@@ -8294,7 +8296,9 @@ bool ARMAsmParser::validateInstruction(MCInst &Inst,
     break;
   }
   case ARM::tADDi8: {
-    MCParsedAsmOperand &Op = *Operands[MnemonicOpsEndInd + 1];
+    int i = (Operands[MnemonicOpsEndInd + 1]->isImm()) ? MnemonicOpsEndInd + 1
+                                                       : MnemonicOpsEndInd + 2;
+    MCParsedAsmOperand &Op = *Operands[i];
     if (isARMMCExpr(Op) && !isThumbI8Relocation(Op))
       return Error(Op.getStartLoc(),
                    "Immediate expression for Thumb adds requires :lower0_7:,"
@@ -8302,7 +8306,7 @@ bool ARMAsmParser::validateInstruction(MCInst &Inst,
     break;
   }
   case ARM::tMOVi8: {
-    MCParsedAsmOperand &Op = *Operands[MnemonicOpsEndInd];
+    MCParsedAsmOperand &Op = *Operands[MnemonicOpsEndInd + 1];
     if (isARMMCExpr(Op) && !isThumbI8Relocation(Op))
       return Error(Op.getStartLoc(),
                    "Immediate expression for Thumb movs requires :lower0_7:,"
@@ -8653,6 +8657,41 @@ bool ARMAsmParser::validateInstruction(MCInst &Inst,
   return false;
 }
 
+// After processInstruction has transformed an instruction being assembled into
+// a different opcode, do any further validity checks that the new opcode
+// depends on.
+//
+// `Inst` contains the final modified form of the instruction, but `Operands`
+// contains the parsed operands from the _original_ instruction, because
+// nothing has updated them (`processInstruction` received them as const).
+// `OrigOpcode` contains the original value of `Inst.getOpcode()`, which should
+// give enough context to know how to understand the original operands list.
+bool ARMAsmParser::revalidateInstruction(MCInst &Inst,
+                                         const OperandVector &Operands,
+                                         unsigned MnemonicOpsEndInd,
+                                         unsigned OrigOpcode) {
+  const unsigned NewOpcode = Inst.getOpcode();
+
+  if (OrigOpcode == ARM::t2ADDri && NewOpcode == ARM::tADDi8) {
+    // t2ADDri is the Thumb 32-bit immediate add instruction, for example
+    // 'add[s] r0,r1,#0xff00'. If its immediate argument isn't a constant
+    // requiring shifting, then processInstruction can turn it into tADDi8, the
+    // simpler 16-bit Thumb immediate add (provided all the other conditions
+    // for that transformation are met). That makes it too late for
+    // validateInstruction to do this check, which it would have done if it had
+    // known from the start that the instruction was tADDi8.
+    int i = (Operands[MnemonicOpsEndInd + 1]->isImm()) ? MnemonicOpsEndInd + 1
+                                                       : MnemonicOpsEndInd + 2;
+    MCParsedAsmOperand &Op = *Operands[i];
+    if (isARMMCExpr(Op) && !isThumbI8Relocation(Op))
+      return Error(Op.getStartLoc(),
+                   "Immediate expression for Thumb adds requires :lower0_7:,"
+                   " :lower8_15:, :upper0_7: or :upper8_15:");
+  }
+
+  return false;
+}
+
 static unsigned getRealVSTOpcode(unsigned Opc, unsigned &Spacing) {
   switch(Opc) {
   default: llvm_unreachable("unexpected opcode!");
@@ -11399,7 +11438,7 @@ bool ARMAsmParser::matchAndEmitInstruction(SMLoc IDLoc, unsigned &Opcode,
   unsigned MnemonicOpsEndInd = getMnemonicOpsEndInd(Operands);
 
   switch (MatchResult) {
-  case Match_Success:
+  case Match_Success: {
     LLVM_DEBUG(dbgs() << "Parsed as: ";
                Inst.dump_pretty(dbgs(), MII.getName(Inst.getOpcode()));
                dbgs() << "\n");
@@ -11414,15 +11453,31 @@ bool ARMAsmParser::matchAndEmitInstruction(SMLoc IDLoc, unsigned &Opcode,
       return true;
     }
 
-    {
-      // Some instructions need post-processing to, for example, tweak which
-      // encoding is selected. Loop on it while changes happen so the
-      // individual transformations can chain off each other. E.g.,
-      // tPOP(r8)->t2LDMIA_UPD(sp,r8)->t2STR_POST(sp,r8)
-      while (processInstruction(Inst, Operands, MnemonicOpsEndInd, Out))
+    // Some instructions need post-processing to, for example, tweak which
+    // encoding is selected. Loop on it while changes happen so the individual
+    // transformations can chain off each other. E.g.,
+    // tPOP(r8)->t2LDMIA_UPD(sp,r8)->t2STR_POST(sp,r8)
+    //
+    // This is written as a do-while inside an if, instead of the more obvious
+    // while loop, so that after postprocessing has completed the revised
+    // instruction can be revalidated, but (to save time) only if any changes
+    // had to be made at all.
+    unsigned OrigOpcode = Inst.getOpcode();
+    if (processInstruction(Inst, Operands, MnemonicOpsEndInd, Out)) {
+      do {
         LLVM_DEBUG(dbgs() << "Changed to: ";
                    Inst.dump_pretty(dbgs(), MII.getName(Inst.getOpcode()));
                    dbgs() << "\n");
+      } while (processInstruction(Inst, Operands, MnemonicOpsEndInd, Out));
+
+      MnemonicOpsEndInd = getMnemonicOpsEndInd(Operands);
+      if (revalidateInstruction(Inst, Operands, MnemonicOpsEndInd,
+                                OrigOpcode)) {
+        // As above, advance IT/VPT positions if we're exiting early.
+        forwardITPosition();
+        forwardVPTPosition();
+        return true;
+      }
     }
 
     // Only move forward at the very end so that everything in validate
@@ -11445,6 +11500,7 @@ bool ARMAsmParser::matchAndEmitInstruction(SMLoc IDLoc, unsigned &Opcode,
       Out.emitInstruction(Inst, getSTI());
     }
     return false;
+  }
   case Match_NearMisses:
     ReportNearMisses(NearMisses, IDLoc, Operands);
     return true;
diff --git a/llvm/test/MC/ARM/lower-upper-errors.s b/llvm/test/MC/ARM/lower-upper-errors.s
new file mode 100644
index 00000000000000..34fa3c7155ba1e
--- /dev/null
+++ b/llvm/test/MC/ARM/lower-upper-errors.s
@@ -0,0 +1,20 @@
+// RUN: not llvm-mc --triple thumbv6m -show-encoding %s 2>&1 | FileCheck %s
+// RUN: not llvm-mc --triple thumbv7m -show-encoding %s 2>&1 | FileCheck %s --check-prefixes=CHECK,THUMB2
+
+// Check reporting of errors of the form "you should have used
+// :lower16: in this immediate field".
+
+// CHECK: :[[@LINE+1]]:10: error: Immediate expression for Thumb movs requires :lower0_7:, :lower8_15:, :upper0_7: or :upper8_15:
+movs r0, #foo
+
+// CHECK: :[[@LINE+1]]:10: error: Immediate expression for Thumb adds requires :lower0_7:, :lower8_15:, :upper0_7: or :upper8_15:
+adds r0, #foo
+
+// CHECK: :[[@LINE+1]]:14: error: Immediate expression for Thumb adds requires :lower0_7:, :lower8_15:, :upper0_7: or :upper8_15:
+adds r0, r0, #foo
+
+// THUMB2: :[[@LINE+1]]:10: error: immediate expression for mov requires :lower16: or :upper16
+movw r0, #foo
+
+// THUMB2: :[[@LINE+1]]:10: error: immediate expression for mov requires :lower16: or :upper16
+movt r0, #foo

Copy link
Contributor

@AlfieRichardsArm AlfieRichardsArm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! The revalidate seems like a good pattern as there's more I can imagine could use this pattern

// simpler 16-bit Thumb immediate add (provided all the other conditions
// for that transformation are met). That makes it too late for
// validateInstruction to do this check, which it would have done if it had
// known from the start that the instruction was tADDi8.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the handling for t2ADDri just skip transforming the instruction if we know the end result isn't a valid Thumb1 instruction? Seems weird to transform to a Thumb1 instruction we know won't work, then produce an error referencing the Thumb1 instruction instead of the original Thumb2 instruction.

Some additional testcases:

add r9, r0, #foo
movs r0, :lower16:#foo
movs r11, :upper8_15:#foo

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not quite as weird, because the original instruction was only Thumb2 because of LLVM's decision – the user asked for something much more general-purpose. It doesn't seem totally unreasonable to me that "adds r0, #extern_symbol" should give the same error whether or not Thumb2 is available. It looks like the same instruction, after all!

I did wonder about trying to do a more complete fix. The real problem case would be something like this (targeting v7-M or v8-M Mainline):

adds r0, r0, #foo
adds r1, r1, #bar
.equ foo, :lower0_7:some_extern_symbol
.equ bar, 0xff00

If this were ever to assemble successfully, it would have to be done by making the adds to r0 tADDi8 which can accept the necessary relocation, but the one to r1 t2ADDri which can accept the shifted immediate. But the assembler must commit to each instruction's opcode before it sees the necessary data further down the file.

So I don't think a really complete fix is possible without changing the entire structure of how the assembler works. But I can look into the alternative approach of trying to make the t2ADDritADDi8 transformation more discriminating, in the hope that we can at least fix cases that don't forward-refer to an equ.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I also think this test for :lower0_7: and friends is done in the wrong place, because it prevents even a fixup being generated, whereas it should only worry about relocations. Fixups relative to symbols defined later in the same source file, so they can be sorted out by the time assembly finishes, could be treated more leniently. But again that involves a big rework that I didn't want to block fixing this one bug.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, here's the revised version which does it that way, and also has your expanded tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The important things here are:

  • We don't crash
  • If an instruction is encodable in Thumb1 mode, it also has the same encoding in Thumb2 mode.

If we have to reject certain cases because they would introduce some ambiguity, that's fine, especially if there's some easy way for the user to rewrite their code.

Given that, current version seems okay. Maybe we should do relaxation as part of processing fixups, but that's obviously a much bigger change.

I've reverted the part of the previous commit that introduced and used
`revalidateInstruction`, replacing it with a more careful check in
`processInstruction` of whether to transform `t2ADDri` → `tADDi8` in
the first place.

I've kept the index fixes in the existing `validateInstruction`.

Also expanded the testing considerably, including making two test
files to deal with the fact that some of these unemittable
instructions aren't detected until a later phase of assembly, which
won't be reached if there are any errors in the first phase.
@github-actions
Copy link

github-actions bot commented Nov 8, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@statham-arm statham-arm merged commit d97f17a into llvm:main Nov 14, 2024
8 checks passed
@statham-arm statham-arm deleted the lower0_7-error-fix branch November 14, 2024 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:ARM llvm:mc Machine (object) code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants