Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 65 additions & 9 deletions llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -659,6 +659,8 @@ class ARMAsmParser : public MCTargetAsmParser {

bool validateInstruction(MCInst &Inst, const OperandVector &Ops,
unsigned MnemonicOpsEndInd);
bool revalidateInstruction(MCInst &Inst, const OperandVector &Ops,
unsigned MnemonicOpsEndInd, unsigned OrigOpcode);
bool processInstruction(MCInst &Inst, const OperandVector &Ops,
unsigned MnemonicOpsEndInd, MCStreamer &Out);
bool shouldOmitVectorPredicateOperand(StringRef Mnemonic,
Expand Down Expand Up @@ -8294,15 +8296,17 @@ bool ARMAsmParser::validateInstruction(MCInst &Inst,
break;
}
case ARM::tADDi8: {
MCParsedAsmOperand &Op = *Operands[MnemonicOpsEndInd + 1];
int i = (Operands[MnemonicOpsEndInd + 1]->isImm()) ? MnemonicOpsEndInd + 1
: MnemonicOpsEndInd + 2;
MCParsedAsmOperand &Op = *Operands[i];
if (isARMMCExpr(Op) && !isThumbI8Relocation(Op))
return Error(Op.getStartLoc(),
"Immediate expression for Thumb adds requires :lower0_7:,"
" :lower8_15:, :upper0_7: or :upper8_15:");
break;
}
case ARM::tMOVi8: {
MCParsedAsmOperand &Op = *Operands[MnemonicOpsEndInd];
MCParsedAsmOperand &Op = *Operands[MnemonicOpsEndInd + 1];
if (isARMMCExpr(Op) && !isThumbI8Relocation(Op))
return Error(Op.getStartLoc(),
"Immediate expression for Thumb movs requires :lower0_7:,"
Expand Down Expand Up @@ -8653,6 +8657,41 @@ bool ARMAsmParser::validateInstruction(MCInst &Inst,
return false;
}

// After processInstruction has transformed an instruction being assembled into
// a different opcode, do any further validity checks that the new opcode
// depends on.
//
// `Inst` contains the final modified form of the instruction, but `Operands`
// contains the parsed operands from the _original_ instruction, because
// nothing has updated them (`processInstruction` received them as const).
// `OrigOpcode` contains the original value of `Inst.getOpcode()`, which should
// give enough context to know how to understand the original operands list.
bool ARMAsmParser::revalidateInstruction(MCInst &Inst,
const OperandVector &Operands,
unsigned MnemonicOpsEndInd,
unsigned OrigOpcode) {
const unsigned NewOpcode = Inst.getOpcode();

if (OrigOpcode == ARM::t2ADDri && NewOpcode == ARM::tADDi8) {
// t2ADDri is the Thumb 32-bit immediate add instruction, for example
// 'add[s] r0,r1,#0xff00'. If its immediate argument isn't a constant
// requiring shifting, then processInstruction can turn it into tADDi8, the
// simpler 16-bit Thumb immediate add (provided all the other conditions
// for that transformation are met). That makes it too late for
// validateInstruction to do this check, which it would have done if it had
// known from the start that the instruction was tADDi8.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the handling for t2ADDri just skip transforming the instruction if we know the end result isn't a valid Thumb1 instruction? Seems weird to transform to a Thumb1 instruction we know won't work, then produce an error referencing the Thumb1 instruction instead of the original Thumb2 instruction.

Some additional testcases:

add r9, r0, #foo
movs r0, :lower16:#foo
movs r11, :upper8_15:#foo

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not quite as weird, because the original instruction was only Thumb2 because of LLVM's decision – the user asked for something much more general-purpose. It doesn't seem totally unreasonable to me that "adds r0, #extern_symbol" should give the same error whether or not Thumb2 is available. It looks like the same instruction, after all!

I did wonder about trying to do a more complete fix. The real problem case would be something like this (targeting v7-M or v8-M Mainline):

adds r0, r0, #foo
adds r1, r1, #bar
.equ foo, :lower0_7:some_extern_symbol
.equ bar, 0xff00

If this were ever to assemble successfully, it would have to be done by making the adds to r0 tADDi8 which can accept the necessary relocation, but the one to r1 t2ADDri which can accept the shifted immediate. But the assembler must commit to each instruction's opcode before it sees the necessary data further down the file.

So I don't think a really complete fix is possible without changing the entire structure of how the assembler works. But I can look into the alternative approach of trying to make the t2ADDritADDi8 transformation more discriminating, in the hope that we can at least fix cases that don't forward-refer to an equ.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I also think this test for :lower0_7: and friends is done in the wrong place, because it prevents even a fixup being generated, whereas it should only worry about relocations. Fixups relative to symbols defined later in the same source file, so they can be sorted out by the time assembly finishes, could be treated more leniently. But again that involves a big rework that I didn't want to block fixing this one bug.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, here's the revised version which does it that way, and also has your expanded tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The important things here are:

  • We don't crash
  • If an instruction is encodable in Thumb1 mode, it also has the same encoding in Thumb2 mode.

If we have to reject certain cases because they would introduce some ambiguity, that's fine, especially if there's some easy way for the user to rewrite their code.

Given that, current version seems okay. Maybe we should do relaxation as part of processing fixups, but that's obviously a much bigger change.

int i = (Operands[MnemonicOpsEndInd + 1]->isImm()) ? MnemonicOpsEndInd + 1
: MnemonicOpsEndInd + 2;
MCParsedAsmOperand &Op = *Operands[i];
if (isARMMCExpr(Op) && !isThumbI8Relocation(Op))
return Error(Op.getStartLoc(),
"Immediate expression for Thumb adds requires :lower0_7:,"
" :lower8_15:, :upper0_7: or :upper8_15:");
}

return false;
}

static unsigned getRealVSTOpcode(unsigned Opc, unsigned &Spacing) {
switch(Opc) {
default: llvm_unreachable("unexpected opcode!");
Expand Down Expand Up @@ -11399,7 +11438,7 @@ bool ARMAsmParser::matchAndEmitInstruction(SMLoc IDLoc, unsigned &Opcode,
unsigned MnemonicOpsEndInd = getMnemonicOpsEndInd(Operands);

switch (MatchResult) {
case Match_Success:
case Match_Success: {
LLVM_DEBUG(dbgs() << "Parsed as: ";
Inst.dump_pretty(dbgs(), MII.getName(Inst.getOpcode()));
dbgs() << "\n");
Expand All @@ -11414,15 +11453,31 @@ bool ARMAsmParser::matchAndEmitInstruction(SMLoc IDLoc, unsigned &Opcode,
return true;
}

{
// Some instructions need post-processing to, for example, tweak which
// encoding is selected. Loop on it while changes happen so the
// individual transformations can chain off each other. E.g.,
// tPOP(r8)->t2LDMIA_UPD(sp,r8)->t2STR_POST(sp,r8)
while (processInstruction(Inst, Operands, MnemonicOpsEndInd, Out))
// Some instructions need post-processing to, for example, tweak which
// encoding is selected. Loop on it while changes happen so the individual
// transformations can chain off each other. E.g.,
// tPOP(r8)->t2LDMIA_UPD(sp,r8)->t2STR_POST(sp,r8)
//
// This is written as a do-while inside an if, instead of the more obvious
// while loop, so that after postprocessing has completed the revised
// instruction can be revalidated, but (to save time) only if any changes
// had to be made at all.
unsigned OrigOpcode = Inst.getOpcode();
if (processInstruction(Inst, Operands, MnemonicOpsEndInd, Out)) {
do {
LLVM_DEBUG(dbgs() << "Changed to: ";
Inst.dump_pretty(dbgs(), MII.getName(Inst.getOpcode()));
dbgs() << "\n");
} while (processInstruction(Inst, Operands, MnemonicOpsEndInd, Out));

MnemonicOpsEndInd = getMnemonicOpsEndInd(Operands);
if (revalidateInstruction(Inst, Operands, MnemonicOpsEndInd,
OrigOpcode)) {
// As above, advance IT/VPT positions if we're exiting early.
forwardITPosition();
forwardVPTPosition();
return true;
}
}

// Only move forward at the very end so that everything in validate
Expand All @@ -11445,6 +11500,7 @@ bool ARMAsmParser::matchAndEmitInstruction(SMLoc IDLoc, unsigned &Opcode,
Out.emitInstruction(Inst, getSTI());
}
return false;
}
case Match_NearMisses:
ReportNearMisses(NearMisses, IDLoc, Operands);
return true;
Expand Down
20 changes: 20 additions & 0 deletions llvm/test/MC/ARM/lower-upper-errors.s
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
// RUN: not llvm-mc --triple thumbv6m -show-encoding %s 2>&1 | FileCheck %s
// RUN: not llvm-mc --triple thumbv7m -show-encoding %s 2>&1 | FileCheck %s --check-prefixes=CHECK,THUMB2

// Check reporting of errors of the form "you should have used
// :lower16: in this immediate field".

// CHECK: :[[@LINE+1]]:10: error: Immediate expression for Thumb movs requires :lower0_7:, :lower8_15:, :upper0_7: or :upper8_15:
movs r0, #foo

// CHECK: :[[@LINE+1]]:10: error: Immediate expression for Thumb adds requires :lower0_7:, :lower8_15:, :upper0_7: or :upper8_15:
adds r0, #foo

// CHECK: :[[@LINE+1]]:14: error: Immediate expression for Thumb adds requires :lower0_7:, :lower8_15:, :upper0_7: or :upper8_15:
adds r0, r0, #foo

// THUMB2: :[[@LINE+1]]:10: error: immediate expression for mov requires :lower16: or :upper16
movw r0, #foo

// THUMB2: :[[@LINE+1]]:10: error: immediate expression for mov requires :lower16: or :upper16
movt r0, #foo
Loading