Skip to content

Conversation

@MaskRay
Copy link
Member

@MaskRay MaskRay commented Jul 10, 2025

'$' indicates immediate values in AT&T syntax, so symbol names starting
with '$' need to be quoted or wrapped in parentheses. Parentheses are
preferred to support expressions with relocation specifiers without
modifying MCExpr internals, aligning with GCC (https://gcc.gnu.org/PR91298).

Add -output-asm-variant to llc for testing Intel syntax, avoiding
-x86-asm-syntax which affects MCAsmInfo used by input assembly
(-x86-asm-syntax=intel doesn't work with AT&T module asm)

Note: In these positions the symbol name cannot be quoted: $var:
.globl $var .type $var, @object

Close #147587

Created using spr 1.3.5-bogner
@llvmbot
Copy link
Member

llvmbot commented Jul 10, 2025

@llvm/pr-subscribers-backend-x86

Author: Fangrui Song (MaskRay)

Changes

'$' indicates immediate values in AT&T syntax, so symbol names starting
with '$' need to be quoted or wrapped in parentheses. Parentheses are
preferred to support expressions with relocation specifiers without
modifying MCExpr internals, aligning with GCC (https://gcc.gnu.org/PR91298).

Add -output-asm-variant to llc for testing Intel syntax, avoiding
-x86-asm-syntax which affects MCAsmInfo used by input assembly.

Note: In these positions the symbol name cannot be quoted: $var:
.globl $var .type $var, @<!-- -->object

Close #147587


Full diff: https://github.com/llvm/llvm-project/pull/147876.diff

6 Files Affected:

  • (modified) llvm/lib/Target/X86/MCTargetDesc/X86ATTInstPrinter.cpp (+18-2)
  • (modified) llvm/lib/Target/X86/MCTargetDesc/X86ATTInstPrinter.h (+1)
  • (modified) llvm/lib/Target/X86/MCTargetDesc/X86InstPrinterCommon.cpp (+5-1)
  • (modified) llvm/lib/Target/X86/MCTargetDesc/X86InstPrinterCommon.h (+2)
  • (modified) llvm/test/CodeGen/X86/dollar-name.ll (+70-10)
  • (modified) llvm/tools/llc/llc.cpp (+6)
diff --git a/llvm/lib/Target/X86/MCTargetDesc/X86ATTInstPrinter.cpp b/llvm/lib/Target/X86/MCTargetDesc/X86ATTInstPrinter.cpp
index 6614eea3901bd..564636959f00f 100644
--- a/llvm/lib/Target/X86/MCTargetDesc/X86ATTInstPrinter.cpp
+++ b/llvm/lib/Target/X86/MCTargetDesc/X86ATTInstPrinter.cpp
@@ -14,6 +14,7 @@
 #include "X86ATTInstPrinter.h"
 #include "X86BaseInfo.h"
 #include "X86InstComments.h"
+#include "llvm/ADT/SmallString.h"
 #include "llvm/MC/MCAsmInfo.h"
 #include "llvm/MC/MCExpr.h"
 #include "llvm/MC/MCInst.h"
@@ -35,6 +36,21 @@ using namespace llvm;
 #define PRINT_ALIAS_INSTR
 #include "X86GenAsmWriter.inc"
 
+// Print an MCExpr as an operand. Similar to GCC, wrap the output in parentheses
+// if it begins with '$', as '$' in an operand position indicates an immediate
+// value in the AT&T syntax.
+void X86ATTInstPrinter::printExprOperand(raw_ostream &OS, const MCExpr &E) {
+  SmallString<128> S;
+  {
+    raw_svector_ostream SOS(S);
+    MAI.printExpr(SOS, E);
+  }
+  if (S.starts_with("$"))
+    OS << '(' << S << ')';
+  else
+    OS << S;
+}
+
 void X86ATTInstPrinter::printRegName(raw_ostream &OS, MCRegister Reg) {
   markup(OS, Markup::Register) << '%' << getRegisterName(Reg);
 }
@@ -446,7 +462,7 @@ void X86ATTInstPrinter::printMemReference(const MCInst *MI, unsigned Op,
       O << formatImm(DispVal);
   } else {
     assert(DispSpec.isExpr() && "non-immediate displacement for LEA?");
-    MAI.printExpr(O, *DispSpec.getExpr());
+    printExprOperand(O, *DispSpec.getExpr());
   }
 
   if (IndexReg.getReg() || BaseReg.getReg()) {
@@ -501,7 +517,7 @@ void X86ATTInstPrinter::printMemOffset(const MCInst *MI, unsigned Op,
     O << formatImm(DispSpec.getImm());
   } else {
     assert(DispSpec.isExpr() && "non-immediate displacement?");
-    MAI.printExpr(O, *DispSpec.getExpr());
+    printExprOperand(O, *DispSpec.getExpr());
   }
 }
 
diff --git a/llvm/lib/Target/X86/MCTargetDesc/X86ATTInstPrinter.h b/llvm/lib/Target/X86/MCTargetDesc/X86ATTInstPrinter.h
index f49f09c5dcf3e..1452622ebcea8 100644
--- a/llvm/lib/Target/X86/MCTargetDesc/X86ATTInstPrinter.h
+++ b/llvm/lib/Target/X86/MCTargetDesc/X86ATTInstPrinter.h
@@ -23,6 +23,7 @@ class X86ATTInstPrinter final : public X86InstPrinterCommon {
                     const MCRegisterInfo &MRI)
       : X86InstPrinterCommon(MAI, MII, MRI), HasCustomInstComment(false) {}
 
+  void printExprOperand(raw_ostream &OS, const MCExpr &E) override;
   void printRegName(raw_ostream &OS, MCRegister Reg) override;
   void printInst(const MCInst *MI, uint64_t Address, StringRef Annot,
                  const MCSubtargetInfo &STI, raw_ostream &OS) override;
diff --git a/llvm/lib/Target/X86/MCTargetDesc/X86InstPrinterCommon.cpp b/llvm/lib/Target/X86/MCTargetDesc/X86InstPrinterCommon.cpp
index 7523d2aedcced..1c5f1663d4f52 100644
--- a/llvm/lib/Target/X86/MCTargetDesc/X86InstPrinterCommon.cpp
+++ b/llvm/lib/Target/X86/MCTargetDesc/X86InstPrinterCommon.cpp
@@ -26,6 +26,10 @@
 
 using namespace llvm;
 
+void X86InstPrinterCommon::printExprOperand(raw_ostream &OS, const MCExpr &E) {
+  MAI.printExpr(OS, E);
+}
+
 void X86InstPrinterCommon::printCondCode(const MCInst *MI, unsigned Op,
                                          raw_ostream &O) {
   int64_t Imm = MI->getOperand(Op).getImm();
@@ -374,7 +378,7 @@ void X86InstPrinterCommon::printPCRelImm(const MCInst *MI, uint64_t Address,
       markup(O, Markup::Immediate) << formatHex((uint64_t)Address);
     } else {
       // Otherwise, just print the expression.
-      MAI.printExpr(O, *Op.getExpr());
+      printExprOperand(O, *Op.getExpr());
     }
   }
 }
diff --git a/llvm/lib/Target/X86/MCTargetDesc/X86InstPrinterCommon.h b/llvm/lib/Target/X86/MCTargetDesc/X86InstPrinterCommon.h
index 2a7b750bd6752..2c9467ca7c615 100644
--- a/llvm/lib/Target/X86/MCTargetDesc/X86InstPrinterCommon.h
+++ b/llvm/lib/Target/X86/MCTargetDesc/X86InstPrinterCommon.h
@@ -17,11 +17,13 @@
 #include "llvm/MC/MCInstPrinter.h"
 
 namespace llvm {
+class MCExpr;
 
 class X86InstPrinterCommon : public MCInstPrinter {
 public:
   using MCInstPrinter::MCInstPrinter;
 
+  virtual void printExprOperand(raw_ostream &OS, const MCExpr &E);
   virtual void printOperand(const MCInst *MI, unsigned OpNo, raw_ostream &O) = 0;
   void printCondCode(const MCInst *MI, unsigned Op, raw_ostream &OS);
   void printCondFlags(const MCInst *MI, unsigned Op, raw_ostream &OS);
diff --git a/llvm/test/CodeGen/X86/dollar-name.ll b/llvm/test/CodeGen/X86/dollar-name.ll
index fc9d6a77f66e5..ffa03fc1f6f9a 100644
--- a/llvm/test/CodeGen/X86/dollar-name.ll
+++ b/llvm/test/CodeGen/X86/dollar-name.ll
@@ -1,18 +1,78 @@
-; RUN: llc < %s -mtriple=i386-linux | FileCheck %s
-; PR1339
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=i686 -relocation-model=static | FileCheck %s --check-prefix=STATIC
+; RUN: llc < %s -mtriple=x86_64 -relocation-model=pic | FileCheck %s --check-prefix=PIC
+; RUN: llc < %s -mtriple=x86_64 -relocation-model=pic -output-asm-variant=1 | FileCheck %s --check-prefix=INTEL-PIC
 
-@"$bar" = global i32 zeroinitializer
+@"$arr" = global [2 x i32] zeroinitializer
+@"$arr_h" = hidden global [2 x i32] zeroinitializer
 @"$qux" = external dso_local global i32
+@"$tls" = hidden thread_local global i32 0
 
 define i32 @"$foo"() nounwind {
-; CHECK: movl	$bar,
-; CHECK: addl	$qux,
-; CHECK: calll	$hen
-  %m = load i32, ptr @"$bar"
+; STATIC-LABEL: $foo:
+; STATIC:       # %bb.0:
+; STATIC-NEXT:    movl ($arr), %eax
+; STATIC-NEXT:    movl %gs:0, %ecx
+; STATIC-NEXT:    addl ($arr+4), %eax
+; STATIC-NEXT:    addl ($qux), %eax
+; STATIC-NEXT:    addl ($tls@NTPOFF)(%ecx), %eax
+; STATIC-NEXT:    pushl ($arr_h)
+; STATIC-NEXT:    pushl %eax
+; STATIC-NEXT:    calll ($hen@PLT)
+; STATIC-NEXT:    addl $8, %esp
+; STATIC-NEXT:    retl
+;
+; PIC-LABEL: $foo:
+; PIC:       # %bb.0:
+; PIC-NEXT:    pushq %rbp
+; PIC-NEXT:    pushq %r14
+; PIC-NEXT:    pushq %rbx
+; PIC-NEXT:    movq ($arr@GOTPCREL)(%rip), %r14
+; PIC-NEXT:    movl (%r14), %ebx
+; PIC-NEXT:    movl ($arr_h)(%rip), %ebp
+; PIC-NEXT:    leaq ($tls@TLSLD)(%rip), %rdi
+; PIC-NEXT:    callq __tls_get_addr@PLT
+; PIC-NEXT:    addl 4(%r14), %ebx
+; PIC-NEXT:    addl ($qux)(%rip), %ebx
+; PIC-NEXT:    addl ($tls@DTPOFF)(%rax), %ebx
+; PIC-NEXT:    movl %ebx, %edi
+; PIC-NEXT:    movl %ebp, %esi
+; PIC-NEXT:    callq ($hen@PLT)
+; PIC-NEXT:    popq %rbx
+; PIC-NEXT:    popq %r14
+; PIC-NEXT:    popq %rbp
+; PIC-NEXT:    retq
+;
+; INTEL-PIC-LABEL: $foo:
+; INTEL-PIC:       # %bb.0:
+; INTEL-PIC-NEXT:    push rbp
+; INTEL-PIC-NEXT:    push r14
+; INTEL-PIC-NEXT:    push rbx
+; INTEL-PIC-NEXT:    mov r14, qword ptr [rip + $arr@GOTPCREL]
+; INTEL-PIC-NEXT:    mov ebx, dword ptr [r14]
+; INTEL-PIC-NEXT:    mov ebp, dword ptr [rip + $arr_h]
+; INTEL-PIC-NEXT:    lea rdi, [rip + $tls@TLSLD]
+; INTEL-PIC-NEXT:    call __tls_get_addr@PLT
+; INTEL-PIC-NEXT:    add ebx, dword ptr [r14 + 4]
+; INTEL-PIC-NEXT:    add ebx, dword ptr [rip + $qux]
+; INTEL-PIC-NEXT:    add ebx, dword ptr [rax + $tls@DTPOFF]
+; INTEL-PIC-NEXT:    mov edi, ebx
+; INTEL-PIC-NEXT:    mov esi, ebp
+; INTEL-PIC-NEXT:    call $hen@PLT
+; INTEL-PIC-NEXT:    pop rbx
+; INTEL-PIC-NEXT:    pop r14
+; INTEL-PIC-NEXT:    pop rbp
+; INTEL-PIC-NEXT:    ret
+  %m = load i32, ptr @"$arr"
+  %m1 = load i32, ptr getelementptr inbounds nuw (i32, ptr @"$arr", i23 1)
+  %m2 = load i32, ptr @"$arr_h"
   %n = load i32, ptr @"$qux"
-  %t = add i32 %m, %n
-  %u = call i32 @"$hen"(i32 %t)
+  %tls_v = load i32, ptr @"$tls"
+  %t0 = add i32 %m, %m1
+  %t1 = add i32 %t0, %n
+  %t2 = add i32 %t1, %tls_v
+  %u = call i32 @"$hen"(i32 %t2, i32 %m2)
   ret i32 %u
 }
 
-declare i32 @"$hen"(i32 %a)
+declare i32 @"$hen"(i32 %a, i32 %b)
diff --git a/llvm/tools/llc/llc.cpp b/llvm/tools/llc/llc.cpp
index 140459ba2de21..93b4a50ef86d7 100644
--- a/llvm/tools/llc/llc.cpp
+++ b/llvm/tools/llc/llc.cpp
@@ -143,6 +143,10 @@ static cl::opt<bool>
 static cl::opt<bool> ShowMCEncoding("show-mc-encoding", cl::Hidden,
                                     cl::desc("Show encoding in .s output"));
 
+static cl::opt<unsigned>
+    OutputAsmVariant("output-asm-variant",
+                     cl::desc("Syntax variant to use for output printing"));
+
 static cl::opt<bool>
     DwarfDirectory("dwarf-directory", cl::Hidden,
                    cl::desc("Use .file directives with an explicit directory"),
@@ -500,6 +504,8 @@ static int compileModule(char **argv, LLVMContext &Context) {
     Options.MCOptions.ShowMCEncoding = ShowMCEncoding;
     Options.MCOptions.AsmVerbose = AsmVerbose;
     Options.MCOptions.PreserveAsmComments = PreserveComments;
+    if (OutputAsmVariant.getNumOccurrences())
+      Options.MCOptions.OutputAsmVariant = OutputAsmVariant;
     Options.MCOptions.IASSearchPaths = IncludeDirs;
     Options.MCOptions.InstPrinterOptions = InstPrinterOptions;
     Options.MCOptions.SplitDwarfFile = SplitDwarfFile;

@MaskRay
Copy link
Member Author

MaskRay commented Jul 10, 2025

@lhmouse

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
; RUN: llc < %s -mtriple=i686 -relocation-model=static | FileCheck %s --check-prefix=STATIC
; RUN: llc < %s -mtriple=x86_64 -relocation-model=pic | FileCheck %s --check-prefix=PIC
; RUN: llc < %s -mtriple=x86_64 -relocation-model=pic -output-asm-variant=1 | FileCheck %s --check-prefix=INTEL-PIC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use --x86-asm-syntax=intel instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I saw the reason in description now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--x86-asm-syntax=intel modifies X86MCAsmInfo, affecting both assembly input and output. I think we should migrate the assembly output part to -output-asm-variant #109360

Copy link
Contributor

@phoebewang phoebewang Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we don't have inline assemble in the test. Does it matter here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added module asm "mov ($foo), %eax" to dollar-name-asm.ll. It demonstrates that -x86-asm-syntax=intel cannot be used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't add that file :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, added now.

@phoebewang
Copy link
Contributor

Add -output-asm-variant to llc for testing Intel syntax, avoiding
-x86-asm-syntax which affects MCAsmInfo used by input assembly.

I think that doesn't matter since we only care about the $ before the symbol?

@MaskRay
Copy link
Member Author

MaskRay commented Jul 10, 2025

Add -output-asm-variant to llc for testing Intel syntax, avoiding
-x86-asm-syntax which affects MCAsmInfo used by input assembly.

I think that doesn't matter since we only care about the $ before the symbol?

In the Intel syntax, $ doesn't have the immediate value semantics, so $ doesn't have to be special cased.
This is really a quirk of the AT&T (aka GNU) syntax... While a few other targets also inspect $ in their AsmParser, they don't have problems using $var even without a patch like this..

@lhmouse
Copy link
Contributor

lhmouse commented Jul 10, 2025

In your commit I suspect this will not be correct: calll ($hen@PLT)
It should be calll ($hen)@PLT. The modifier is not to be mistaken as part of the symbol.

As said earlier the at&t syntax is just nonsense like in movl ($a)+16(%rip), %eax. Only the symbol itself should be quoted.

@MaskRay
Copy link
Member Author

MaskRay commented Jul 10, 2025

In your commit I suspect this will not be correct: calll ($hen@PLT) It should be calll ($hen)@PLT. The modifier is not to be mistaken as part of the symbol.

Both calll ($hen@PLT) and calll ($hen)@PLT are assembled correctly in GAS and LLVM integrated assembler.
This patch prints the first form as it minimizes changes to the generic MCAsmInfo::printExpr MCExpr::print.

As said earlier the at&t syntax is just nonsense like in movl ($a)+16(%rip), %eax. Only the symbol itself should be quoted.

Both movl ($a)+16(%rip), %eax and movl ($a+16)(%rip), %eax are acceptable.

In MC, $a+16 is encoded as:

MCBinaryExpr +
  MCSymbolRefExpr $a
  MCConstantExpr 16

To handle this and deeper nesting, the PR uses a straightforward approach that checks if the output begins with $, avoiding the complexity of examining the outermost MCExpr's structure (MCSymbolRefExpr, MCBinaryExpr, or MCSpecifierExpr).

@lhmouse
Copy link
Contributor

lhmouse commented Jul 10, 2025

In your commit I suspect this will not be correct: calll ($hen@PLT) It should be calll ($hen)@PLT. The modifier is not to be mistaken as part of the symbol.

Both calll ($hen@PLT) and calll ($hen)@PLT are assembled correctly in GAS and LLVM integrated assembler. This patch prints the first form as it minimizes changes to the generic MCAsmInfo::printExpr MCExpr::print.

Have you tried the FreeBSD assembler, or the Solaris one?

@MaskRay
Copy link
Member Author

MaskRay commented Jul 10, 2025

In your commit I suspect this will not be correct: calll ($hen@PLT) It should be calll ($hen)@PLT. The modifier is not to be mistaken as part of the symbol.

Both calll ($hen@PLT) and calll ($hen)@PLT are assembled correctly in GAS and LLVM integrated assembler. This patch prints the first form as it minimizes changes to the generic MCAsmInfo::printExpr MCExpr::print.

Have you tried the FreeBSD assembler, or the Solaris one?

No. I vaguely recall that older versions of FreeBSD used an ancient GPL 2 binutils. I believe this $ parsing has been stable in binutils for a long time. LLVM integrated assembler doesn't intend to be compatible with Solaris.
At any rate, this is about a special case $-leading symbol.

Created using spr 1.3.5-bogner
@MaskRay MaskRay requested a review from phoebewang July 11, 2025 06:44
Created using spr 1.3.5-bogner
Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@MaskRay MaskRay merged commit 1d57587 into main Jul 12, 2025
8 of 9 checks passed
@MaskRay MaskRay deleted the users/MaskRay/spr/x86-support-dollar-symbol-for-att-syntax-and-add-output-asm-variant-to-llc branch July 12, 2025 03:51
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jul 12, 2025
…asm-variant to llc

'$' indicates immediate values in AT&T syntax, so symbol names starting
with '$' need to be quoted or wrapped in parentheses. Parentheses are
preferred to support expressions with relocation specifiers without
modifying MCExpr internals, aligning with GCC (https://gcc.gnu.org/PR91298).

Add `-output-asm-variant` to llc for testing Intel syntax, avoiding
`-x86-asm-syntax` which affects MCAsmInfo used by input assembly
(-x86-asm-syntax=intel doesn't work with AT&T module asm)

Note: In these positions the symbol name cannot be quoted: `$var:`
`.globl $var` `.type $var, @object`

Close #147587

Pull Request: llvm/llvm-project#147876
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LLVM 21 X86 backend produces invalid asm syntax for $ dollar symbol names

5 participants