Skip to content

Conversation

ssijaric-nv
Copy link
Contributor

@ssijaric-nv ssijaric-nv commented Oct 1, 2025

For the medium and large code models on x86-64, the bss is split into two parts,
.bss for regular sized data, and .lbss for data larger than that specified by
-large-data-threshold (for the medium code model). This change tries to mimick
what gcc does for the medium and large code models for Fortran COMMON blocks:

program test
  implicit integer  (i-n)
  implicit real*8   (a-h, o-z)
  parameter (n1=77777, n2=77777)
  common  v2(n1,n2)
  common /ccc/ v6
  v6 = 1
  v2(6777,6777) = 2
  write (*,*) v6, v2(6777, 6777)
end program test

Currently, we generate:

0000000000000008       O *COM*  0000000000000008 ccc_
0000000b44834508       O *COM*  0000000000000008 __BLNK

and ld and lld both fail to link with -mcmodel=medium.

/usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax

or

ld.lld: error: /usr/lib/gcc/x86_64-redhat-linux/11/crtbeginS.o:(.text+0x76): relocation R_X86_64_PC32 out of range: 48394103061 is not in [-2147483648, 2147483647]; references section '.bss'

With this change, __BLNK is marked as LARGE_COMMON (SHN_AMD64_LCOMMON),
and ends up in .lbss.

*.o:
Sections:
...
1 .lbss 00000000 0000000000000000 0000000000000000 000000fb 2**0
ALLOC
...
SYMBOL TABLE:
0000000000000008 O COM 0000000000000008 ccc_
0000000b44834508 O LARGE_COMMON 0000000000000008 BLNK

a.out:
Sections:
...
25 .bss 00000009 0000000000003cf0 0000000000003cf0 00000cf0 23
ALLOC
26 .lbss b44834508 0000000000003d00 0000000000003d00 00000cf0 2
3
ALLOC

SYMBOL TABLE:
0000000000003cf0 g O .bss 0000000000000008 ccc_
0000000000003d00 g O .lbss 0000000b44834508 BLNK

In the assembly, this will appear as:

.section .lbss,"awl",@nobits
.type BLNK,@object
.largecomm BLNK,48394093832,8
.type ccc_,@object
.comm ccc_,8,8

For the large code model, all common blocks will end up in lbss.

Previously, the above example wouldn't link statically with GNU ld.

The change also includes support for parsing the largecomm directive.
There will be a PR for the chages to lld to handle the SHN_AMD64_LCOMMON
section.

For the medium and large code models on x86-64, the bss is split into two parts,
.bss for regular sized data, and .lbss for data larger than that specified by
-large-data-threshold (for the medium code model).  This change tries to mimick
what gcc does for the medium and large code models for Fortran COMMON blocks:

program test
  implicit integer  (i-n)
  implicit real*8   (a-h, o-z)
  parameter (n1=77777, n2=77777)
  common  v2(n1,n2)
  common /ccc/ v6
  v6 = 1
  v2(6777,6777) = 2
  write (*,*) v6, v2(6777, 6777)
end program test

Currently, we generate:

0000000000000008       O *COM*  0000000000000008 ccc_
0000000b44834508       O *COM*  0000000000000008 __BLNK

and ld and lld both fail to link with -mcmodel=medium.

/usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax

or

ld.lld: error: /usr/lib/gcc/x86_64-redhat-linux/11/crtbeginS.o:(.text+0x76): relocation R_X86_64_PC32 out of range: 48394103061 is not in [-2147483648, 2147483647]; references section '.bss'

With this change,  __BLNK is marked as LARGE_COMMON (SHN_AMD64_LCOMMON),
and ends up in .lbss.

*.o:
Sections:
...
1 .lbss         00000000  0000000000000000  0000000000000000  000000fb  2**0
                ALLOC
...
SYMBOL TABLE:
0000000000000008       O *COM*  0000000000000008 ccc_
0000000b44834508       O LARGE_COMMON   0000000000000008 __BLNK__

a.out:
Sections:
...
25 .bss          00000009  0000000000003cf0  0000000000003cf0  00000cf0  2**3
                 ALLOC
26 .lbss         b44834508  0000000000003d00  0000000000003d00  00000cf0  2**3
                 ALLOC

SYMBOL TABLE:
0000000000003cf0 g     O .bss   0000000000000008              ccc_
0000000000003d00 g     O .lbss  0000000b44834508              __BLNK__

In the assembly, this will appear as:

.section        .lbss,"awl",@nobits
.type   __BLNK__,@object
.largecomm      __BLNK__,48394093832,8
.type   ccc_,@object
.comm   ccc_,8,8

For the large code model, all common blocks will end up in lbss.

Previously, the above example wouldn't link statically with GNU ld.

The change also includes support for parsing the largecomm directive.
There will be a PR for the chages to lld to handle the SHN_AMD64_LCOMMON
section.

// Place the large common variables in the .lbss section for the medium and
// large code models on x86_64. (AMD64 ABI, 9.2.5 COMMON blocks)
MCSection *largeCommonSec =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LargeCommonSec

return MCAsmParserExtension::parseDirectiveCGProfile(S, Loc);
}

bool ELFAsmParser::parseDirectiveLargecomm(StringRef s, SMLoc Loc) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StringRef S

@llvmbot
Copy link
Member

llvmbot commented Oct 2, 2025

@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-llvm-binary-utilities

@llvm/pr-subscribers-llvm-mc

Author: None (ssijaric-nv)

Changes

For the medium and large code models on x86-64, the bss is split into two parts,
.bss for regular sized data, and .lbss for data larger than that specified by
-large-data-threshold (for the medium code model). This change tries to mimick
what gcc does for the medium and large code models for Fortran COMMON blocks:

program test
implicit integer (i-n)
implicit real8 (a-h, o-z)
parameter (n1=77777, n2=77777)
common v2(n1,n2)
common /ccc/ v6
v6 = 1
v2(6777,6777) = 2
write (
,*) v6, v2(6777, 6777)
end program test

Currently, we generate:

0000000000000008 O COM 0000000000000008 ccc_
0000000b44834508 O COM 0000000000000008 __BLNK

and ld and lld both fail to link with -mcmodel=medium.

/usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax

or

ld.lld: error: /usr/lib/gcc/x86_64-redhat-linux/11/crtbeginS.o:(.text+0x76): relocation R_X86_64_PC32 out of range: 48394103061 is not in [-2147483648, 2147483647]; references section '.bss'

With this change, __BLNK is marked as LARGE_COMMON (SHN_AMD64_LCOMMON),
and ends up in .lbss.

*.o:
Sections:
...
1 .lbss 00000000 0000000000000000 0000000000000000 000000fb 2**0
ALLOC
...
SYMBOL TABLE:
0000000000000008 O COM 0000000000000008 ccc_
0000000b44834508 O LARGE_COMMON 0000000000000008 BLNK

a.out:
Sections:
...
25 .bss 00000009 0000000000003cf0 0000000000003cf0 00000cf0 23
ALLOC
26 .lbss b44834508 0000000000003d00 0000000000003d00 00000cf0 2
3
ALLOC

SYMBOL TABLE:
0000000000003cf0 g O .bss 0000000000000008 ccc_
0000000000003d00 g O .lbss 0000000b44834508 BLNK

In the assembly, this will appear as:

.section .lbss,"awl",@nobits
.type BLNK,@object
.largecomm BLNK,48394093832,8
.type ccc_,@object
.comm ccc_,8,8

For the large code model, all common blocks will end up in lbss.

Previously, the above example wouldn't link statically with GNU ld.

The change also includes support for parsing the largecomm directive.
There will be a PR for the chages to lld to handle the SHN_AMD64_LCOMMON
section.


Full diff: https://github.com/llvm/llvm-project/pull/161483.diff

16 Files Affected:

  • (modified) llvm/include/llvm/BinaryFormat/ELF.h (+5)
  • (modified) llvm/include/llvm/MC/MCELFStreamer.h (+2)
  • (modified) llvm/include/llvm/MC/MCStreamer.h (+8)
  • (modified) llvm/include/llvm/MC/MCSymbolELF.h (+4)
  • (modified) llvm/include/llvm/Object/ELFTypes.h (+2)
  • (modified) llvm/include/llvm/Target/TargetLoweringObjectFile.h (+6)
  • (modified) llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp (+13-4)
  • (modified) llvm/lib/MC/ELFObjectWriter.cpp (+5-1)
  • (modified) llvm/lib/MC/MCAsmStreamer.cpp (+15-1)
  • (modified) llvm/lib/MC/MCELFStreamer.cpp (+17)
  • (modified) llvm/lib/MC/MCParser/ELFAsmParser.cpp (+50)
  • (modified) llvm/lib/MC/MCStreamer.cpp (+2)
  • (modified) llvm/lib/Target/X86/X86TargetObjectFile.cpp (+13)
  • (modified) llvm/lib/Target/X86/X86TargetObjectFile.h (+3)
  • (added) llvm/test/MC/X86/largecomm.ll (+58)
  • (modified) llvm/tools/llvm-readobj/ELFDumper.cpp (+5)
diff --git a/llvm/include/llvm/BinaryFormat/ELF.h b/llvm/include/llvm/BinaryFormat/ELF.h
index e619b186dfe3d..ca6ec30e1f95e 100644
--- a/llvm/include/llvm/BinaryFormat/ELF.h
+++ b/llvm/include/llvm/BinaryFormat/ELF.h
@@ -589,6 +589,11 @@ enum {
   SHN_MIPS_SUNDEFINED = 0xff04 // Undefined symbols for global data area
 };
 
+// x86-64 speciifc section index
+enum {
+  SHN_AMD64_LCOMMON = 0xff02, // Large FORTRAN COMMON variables
+};
+
 // ELF Relocation types for Mips
 enum {
 #include "ELFRelocs/Mips.def"
diff --git a/llvm/include/llvm/MC/MCELFStreamer.h b/llvm/include/llvm/MC/MCELFStreamer.h
index 144f6bc3bd91c..1b6e4273b1e28 100644
--- a/llvm/include/llvm/MC/MCELFStreamer.h
+++ b/llvm/include/llvm/MC/MCELFStreamer.h
@@ -56,6 +56,8 @@ class MCELFStreamer : public MCObjectStreamer {
   bool emitSymbolAttribute(MCSymbol *Symbol, MCSymbolAttr Attribute) override;
   void emitCommonSymbol(MCSymbol *Symbol, uint64_t Size,
                         Align ByteAlignment) override;
+  void emitLargeCommonSymbol(MCSymbol *Symbol, uint64_t Size,
+                             Align ByteAlignment) override;
 
   void emitELFSize(MCSymbol *Symbol, const MCExpr *Value) override;
   void emitELFSymverDirective(const MCSymbol *OriginalSym, StringRef Name,
diff --git a/llvm/include/llvm/MC/MCStreamer.h b/llvm/include/llvm/MC/MCStreamer.h
index 79c715e3820a6..194cc799d5650 100644
--- a/llvm/include/llvm/MC/MCStreamer.h
+++ b/llvm/include/llvm/MC/MCStreamer.h
@@ -677,6 +677,14 @@ class LLVM_ABI MCStreamer {
   virtual void emitCommonSymbol(MCSymbol *Symbol, uint64_t Size,
                                 Align ByteAlignment) = 0;
 
+  /// Emit a large common (.largecomm) symbol.
+  ///
+  /// \param Symbol - The common symbol to emit.
+  /// \param Size - The size of the common symbol.
+  /// \param ByteAlignment - The alignment of the common symbol in bytes.
+  virtual void emitLargeCommonSymbol(MCSymbol *Symbol, uint64_t Size,
+                                     Align ByteAlignment);
+
   /// Emit a local common (.lcomm) symbol.
   ///
   /// \param Symbol - The common symbol to emit.
diff --git a/llvm/include/llvm/MC/MCSymbolELF.h b/llvm/include/llvm/MC/MCSymbolELF.h
index 1af175a18e8ec..e2377eff688c6 100644
--- a/llvm/include/llvm/MC/MCSymbolELF.h
+++ b/llvm/include/llvm/MC/MCSymbolELF.h
@@ -17,6 +17,7 @@ class MCSymbolELF : public MCSymbol {
   /// An expression describing how to calculate the size of a symbol. If a
   /// symbol has no size this field will be NULL.
   const MCExpr *SymbolSize = nullptr;
+  bool IsLargeCommon = false;
 
 public:
   MCSymbolELF(const MCSymbolTableEntry *Name, bool isTemporary)
@@ -48,6 +49,9 @@ class MCSymbolELF : public MCSymbol {
   void setMemtag(bool Tagged);
   bool isMemtag() const;
 
+  bool isLargeCommon() const { return IsLargeCommon; }
+  void setIsLargeCommon(bool V) { IsLargeCommon = V; }
+
 private:
   void setIsBindingSet() const;
 };
diff --git a/llvm/include/llvm/Object/ELFTypes.h b/llvm/include/llvm/Object/ELFTypes.h
index 5a26e2fc31458..ed4695dccb6a8 100644
--- a/llvm/include/llvm/Object/ELFTypes.h
+++ b/llvm/include/llvm/Object/ELFTypes.h
@@ -252,6 +252,8 @@ struct Elf_Sym_Impl : Elf_Sym_Base<ELFT> {
     return getType() == ELF::STT_COMMON || st_shndx == ELF::SHN_COMMON;
   }
 
+  bool isLargeCommon() const { return st_shndx == ELF::SHN_AMD64_LCOMMON; }
+
   bool isDefined() const { return !isUndefined(); }
 
   bool isProcessorSpecific() const {
diff --git a/llvm/include/llvm/Target/TargetLoweringObjectFile.h b/llvm/include/llvm/Target/TargetLoweringObjectFile.h
index 4d6cbc5540131..d1c16a3f4578f 100644
--- a/llvm/include/llvm/Target/TargetLoweringObjectFile.h
+++ b/llvm/include/llvm/Target/TargetLoweringObjectFile.h
@@ -305,6 +305,12 @@ class LLVM_ABI TargetLoweringObjectFile : public MCObjectFileInfo {
     return nullptr;
   }
 
+  virtual MCSection *LargeSectionForCommon(const GlobalObject *GO,
+                                           SectionKind Kind,
+                                           const TargetMachine &TM) const {
+    return nullptr;
+  }
+
 protected:
   virtual MCSection *SelectSectionForGlobal(const GlobalObject *GO,
                                             SectionKind Kind,
diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
index 701a6a2f0f7a0..9240e18e12443 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
@@ -782,11 +782,18 @@ void AsmPrinter::emitGlobalVariable(const GlobalVariable *GV) {
     OutContext.reportError(SMLoc(), "symbol '" + Twine(GVSym->getName()) +
                                         "' is already defined");
 
+  SectionKind GVKind = TargetLoweringObjectFile::getKindForGlobal(GV, TM);
+
+  // Place the large common variables in the .lbss section for the medium and
+  // large code models on x86_64. (AMD64 ABI, 9.2.5 COMMON blocks)
+  MCSection *LargeCommonSec =
+      getObjFileLowering().LargeSectionForCommon(GV, GVKind, TM);
+  if (LargeCommonSec)
+    OutStreamer->switchSection(LargeCommonSec);
+
   if (MAI->hasDotTypeDotSizeDirective())
     OutStreamer->emitSymbolAttribute(EmittedSym, MCSA_ELF_TypeObject);
 
-  SectionKind GVKind = TargetLoweringObjectFile::getKindForGlobal(GV, TM);
-
   const DataLayout &DL = GV->getDataLayout();
   uint64_t Size = DL.getTypeAllocSize(GV->getValueType());
 
@@ -802,10 +809,12 @@ void AsmPrinter::emitGlobalVariable(const GlobalVariable *GV) {
   if (GVKind.isCommon()) {
     if (Size == 0) Size = 1;   // .comm Foo, 0 is undefined, avoid it.
     // .comm _foo, 42, 4
-    OutStreamer->emitCommonSymbol(GVSym, Size, Alignment);
+    if (LargeCommonSec)
+      OutStreamer->emitLargeCommonSymbol(GVSym, Size, Alignment);
+    else
+      OutStreamer->emitCommonSymbol(GVSym, Size, Alignment);
     return;
   }
-
   // Determine to which section this global should be emitted.
   MCSection *TheSection = getObjFileLowering().SectionForGlobal(GV, GVKind, TM);
 
diff --git a/llvm/lib/MC/ELFObjectWriter.cpp b/llvm/lib/MC/ELFObjectWriter.cpp
index 759d3e0e14291..aa19e92ad74d3 100644
--- a/llvm/lib/MC/ELFObjectWriter.cpp
+++ b/llvm/lib/MC/ELFObjectWriter.cpp
@@ -27,6 +27,7 @@
 #include "llvm/MC/MCELFObjectWriter.h"
 #include "llvm/MC/MCExpr.h"
 #include "llvm/MC/MCFixup.h"
+#include "llvm/MC/MCObjectFileInfo.h"
 #include "llvm/MC/MCObjectWriter.h"
 #include "llvm/MC/MCSection.h"
 #include "llvm/MC/MCSectionELF.h"
@@ -544,7 +545,10 @@ void ELFWriter::computeSymbolTable(const RevGroupMapTy &RevGroupMap) {
       auto Shndx = Symbol.getIndex();
       if (!Shndx) {
         assert(!Local);
-        Shndx = ELF::SHN_COMMON;
+        if (Symbol.isLargeCommon())
+          Shndx = ELF::SHN_AMD64_LCOMMON;
+        else
+          Shndx = ELF::SHN_COMMON;
       }
       MSD.SectionIndex = Shndx;
     } else if (Symbol.isUndefined()) {
diff --git a/llvm/lib/MC/MCAsmStreamer.cpp b/llvm/lib/MC/MCAsmStreamer.cpp
index be8c022f39ad1..903ddcc8cdd39 100644
--- a/llvm/lib/MC/MCAsmStreamer.cpp
+++ b/llvm/lib/MC/MCAsmStreamer.cpp
@@ -228,7 +228,8 @@ class MCAsmStreamer final : public MCStreamer {
   void emitELFSize(MCSymbol *Symbol, const MCExpr *Value) override;
   void emitCommonSymbol(MCSymbol *Symbol, uint64_t Size,
                         Align ByteAlignment) override;
-
+  void emitLargeCommonSymbol(MCSymbol *Symbol, uint64_t Size,
+                             Align ByteAlignment) override;
   /// Emit a local common (.lcomm) symbol.
   ///
   /// @param Symbol - The common symbol to emit.
@@ -1075,6 +1076,19 @@ void MCAsmStreamer::emitCommonSymbol(MCSymbol *Symbol, uint64_t Size,
   }
 }
 
+void MCAsmStreamer::emitLargeCommonSymbol(MCSymbol *Symbol, uint64_t Size,
+                                          Align ByteAlignment) {
+  OS << "\t.largecomm\t";
+  Symbol->print(OS, MAI);
+  OS << ',' << Size;
+
+  if (MAI->getCOMMDirectiveAlignmentIsInBytes())
+    OS << ',' << ByteAlignment.value();
+  else
+    OS << ',' << Log2(ByteAlignment);
+  EmitEOL();
+}
+
 void MCAsmStreamer::emitLocalCommonSymbol(MCSymbol *Symbol, uint64_t Size,
                                           Align ByteAlign) {
   OS << "\t.lcomm\t";
diff --git a/llvm/lib/MC/MCELFStreamer.cpp b/llvm/lib/MC/MCELFStreamer.cpp
index 2881d7cfab4ba..8314fe4d2f14c 100644
--- a/llvm/lib/MC/MCELFStreamer.cpp
+++ b/llvm/lib/MC/MCELFStreamer.cpp
@@ -244,6 +244,23 @@ bool MCELFStreamer::emitSymbolAttribute(MCSymbol *S, MCSymbolAttr Attribute) {
   return true;
 }
 
+void MCELFStreamer::emitLargeCommonSymbol(MCSymbol *S, uint64_t Size,
+                                          Align ByteAlignment) {
+  auto *Symbol = static_cast<MCSymbolELF *>(S);
+  getAssembler().registerSymbol(*Symbol);
+
+  if (!Symbol->isBindingSet())
+    Symbol->setBinding(ELF::STB_GLOBAL);
+
+  Symbol->setType(ELF::STT_OBJECT);
+
+  if (Symbol->declareCommon(Size, ByteAlignment))
+    report_fatal_error(Twine("Symbol: ") + Symbol->getName() +
+                       " redeclared as different type");
+  Symbol->setIsLargeCommon(true);
+  Symbol->setSize(MCConstantExpr::create(Size, getContext()));
+}
+
 void MCELFStreamer::emitCommonSymbol(MCSymbol *S, uint64_t Size,
                                      Align ByteAlignment) {
   auto *Symbol = static_cast<MCSymbolELF *>(S);
diff --git a/llvm/lib/MC/MCParser/ELFAsmParser.cpp b/llvm/lib/MC/MCParser/ELFAsmParser.cpp
index 6195355626fd5..237b430e3d7b4 100644
--- a/llvm/lib/MC/MCParser/ELFAsmParser.cpp
+++ b/llvm/lib/MC/MCParser/ELFAsmParser.cpp
@@ -77,6 +77,7 @@ class ELFAsmParser : public MCAsmParserExtension {
       &ELFAsmParser::parseDirectiveSymbolAttribute>(".hidden");
     addDirectiveHandler<&ELFAsmParser::parseDirectiveSubsection>(".subsection");
     addDirectiveHandler<&ELFAsmParser::parseDirectiveCGProfile>(".cg_profile");
+    addDirectiveHandler<&ELFAsmParser::parseDirectiveLargecomm>(".largecomm");
   }
 
   // FIXME: Part of this logic is duplicated in the MCELFStreamer. What is
@@ -126,6 +127,7 @@ class ELFAsmParser : public MCAsmParserExtension {
   bool parseDirectiveSymbolAttribute(StringRef, SMLoc);
   bool parseDirectiveSubsection(StringRef, SMLoc);
   bool parseDirectiveCGProfile(StringRef, SMLoc);
+  bool parseDirectiveLargecomm(StringRef, SMLoc);
 
 private:
   bool parseSectionName(StringRef &SectionName);
@@ -886,6 +888,54 @@ bool ELFAsmParser::parseDirectiveCGProfile(StringRef S, SMLoc Loc) {
   return MCAsmParserExtension::parseDirectiveCGProfile(S, Loc);
 }
 
+bool ELFAsmParser::parseDirectiveLargecomm(StringRef, SMLoc Loc) {
+  if (getParser().checkForValidSection())
+    return true;
+
+  MCSymbol *Sym;
+  if (getParser().parseSymbol(Sym))
+    return TokError("expected identifier in directive");
+
+  if (getLexer().isNot(AsmToken::Comma))
+    return TokError("expected a comma");
+  Lex();
+
+  int64_t Size;
+  SMLoc SizeLoc = getLexer().getLoc();
+  if (getParser().parseAbsoluteExpression(Size))
+    return true;
+
+  int64_t Pow2Alignment = 0;
+  SMLoc Pow2AlignmentLoc;
+  if (getLexer().is(AsmToken::Comma)) {
+    Lex();
+    Pow2AlignmentLoc = getLexer().getLoc();
+    if (getParser().parseAbsoluteExpression(Pow2Alignment))
+      return true;
+
+    // If this target takes alignments in bytes (not log) validate and convert.
+    if (getLexer().getMAI().getCOMMDirectiveAlignmentIsInBytes()) {
+      if (!isPowerOf2_64(Pow2Alignment))
+        return Error(Pow2AlignmentLoc, "alignment must be a power of 2");
+      Pow2Alignment = Log2_64(Pow2Alignment);
+    }
+  }
+
+  if (parseEOL())
+    return true;
+
+  if (Size < 0)
+    return Error(SizeLoc, "size must be non-negative");
+
+  Sym->redefineIfPossible();
+  if (!Sym->isUndefined())
+    return Error(Loc, "invalid symbol redefinition");
+
+  getStreamer().emitLargeCommonSymbol(Sym, Size, Align(1ULL << Pow2Alignment));
+
+  return false;
+}
+
 namespace llvm {
 
 MCAsmParserExtension *createELFAsmParser() {
diff --git a/llvm/lib/MC/MCStreamer.cpp b/llvm/lib/MC/MCStreamer.cpp
index bc7398120096e..f4f7d78ec557f 100644
--- a/llvm/lib/MC/MCStreamer.cpp
+++ b/llvm/lib/MC/MCStreamer.cpp
@@ -1309,6 +1309,8 @@ void MCStreamer::emitELFSymverDirective(const MCSymbol *OriginalSym,
                                         StringRef Name, bool KeepOriginalSym) {}
 void MCStreamer::emitLocalCommonSymbol(MCSymbol *Symbol, uint64_t Size,
                                        Align ByteAlignment) {}
+void MCStreamer::emitLargeCommonSymbol(MCSymbol *Symbol, uint64_t Size,
+                                       Align ByteAlignment) {}
 void MCStreamer::emitZerofill(MCSection *, MCSymbol *, uint64_t, Align, SMLoc) {
 }
 void MCStreamer::emitTBSSSymbol(MCSection *Section, MCSymbol *Symbol,
diff --git a/llvm/lib/Target/X86/X86TargetObjectFile.cpp b/llvm/lib/Target/X86/X86TargetObjectFile.cpp
index fc3c06fcc1c9b..60e6f090652de 100644
--- a/llvm/lib/Target/X86/X86TargetObjectFile.cpp
+++ b/llvm/lib/Target/X86/X86TargetObjectFile.cpp
@@ -8,8 +8,11 @@
 
 #include "X86TargetObjectFile.h"
 #include "MCTargetDesc/X86MCAsmInfo.h"
+#include "X86TargetMachine.h"
 #include "llvm/BinaryFormat/Dwarf.h"
+#include "llvm/BinaryFormat/ELF.h"
 #include "llvm/MC/MCExpr.h"
+#include "llvm/MC/MCSectionELF.h"
 #include "llvm/MC/MCValue.h"
 #include "llvm/Target/TargetMachine.h"
 
@@ -71,3 +74,13 @@ const MCExpr *X86_64ELFTargetObjectFile::getIndirectSymViaGOTPCRel(
   const MCExpr *Off = MCConstantExpr::create(FinalOffset, getContext());
   return MCBinaryExpr::createAdd(Res, Off, getContext());
 }
+
+MCSection *X86_64ELFTargetObjectFile::LargeSectionForCommon(
+    const GlobalObject *GV, SectionKind Kind, const TargetMachine &TM) const {
+  auto &X86_TM = static_cast<const X86TargetMachine &>(TM);
+  if (GV && Kind.isCommon() && TM.isLargeGlobalValue(GV) && !X86_TM.isJIT()) {
+    unsigned Flags = ELF::SHF_ALLOC | ELF::SHF_WRITE | ELF::SHF_X86_64_LARGE;
+    return getContext().getELFSection(".lbss", ELF::SHT_NOBITS, Flags);
+  }
+  return nullptr;
+}
diff --git a/llvm/lib/Target/X86/X86TargetObjectFile.h b/llvm/lib/Target/X86/X86TargetObjectFile.h
index cb096f4a9febb..4fc944302fcf6 100644
--- a/llvm/lib/Target/X86/X86TargetObjectFile.h
+++ b/llvm/lib/Target/X86/X86TargetObjectFile.h
@@ -56,6 +56,9 @@ namespace llvm {
                               const MCValue &MV, int64_t Offset,
                               MachineModuleInfo *MMI,
                               MCStreamer &Streamer) const override;
+
+    MCSection *LargeSectionForCommon(const GlobalObject *GO, SectionKind Kind,
+                                     const TargetMachine &TM) const override;
   };
 
 } // end namespace llvm
diff --git a/llvm/test/MC/X86/largecomm.ll b/llvm/test/MC/X86/largecomm.ll
new file mode 100644
index 0000000000000..52d51d562c47e
--- /dev/null
+++ b/llvm/test/MC/X86/largecomm.ll
@@ -0,0 +1,58 @@
+; RUN: llc -filetype=asm -code-model=medium %s --large-data-threshold=65636 -o - | FileCheck %s --check-prefix=CHECKASM-MEDIUM
+; RUN: llc -filetype=asm -code-model=large %s -o - | FileCheck %s --check-prefix=CHECKASM-LARGE
+; RUN: llc -filetype=asm -code-model=medium %s --large-data-threshold=65636 -o - | llvm-mc -triple x86_64-linux-gnu -filetype=obj - | llvm-readelf -s - | FileCheck %s --check-prefix=CHECKOBJ-MEDIUM
+; RUN: llc -filetype=asm -code-model=large %s -o - | llvm-mc -triple x86_64-linux-gnu -filetype=obj - | llvm-readelf -s - | FileCheck %s --check-prefix=CHECKOBJ-LARGE
+
+; CHECKASM-MEDIUM:       .section	.lbss,"awl",@nobits
+; CHECKASM-MEDIUM-NEXT:	 .type	__BLNK__,@object                # @__BLNK__
+; CHECKASM-MEDIUM-NEXT:  .largecomm	__BLNK__,48394093832,8
+; CHECKASM-MEDIUM-NEXT:  .type	ccc_,@object                    # @ccc_
+; CHECKASM-MEDIUM-NEXT:  .comm	ccc_,8,8
+
+; CHECKASM-LARGE:       .section	.lbss,"awl",@nobits
+; CHECKASM-LARGE-NEXT:	.type	__BLNK__,@object                # @__BLNK__
+; CHECKASM-LARGE-NEXT:  .largecomm	__BLNK__,48394093832,8
+; CHECKASM-LARGE-NEXT:  .type	ccc_,@object                    # @ccc_
+; CHECKASM-LARGE-NEXT:  .largecomm	ccc_,8,8
+
+; CHECKOBJ-MEDIUM:       8 OBJECT  GLOBAL DEFAULT   COM ccc_
+; CHECKOBJ-MEDIUM:       48394093832 OBJECT GLOBAL DEFAULT LARGE_COMMON __BLNK
+
+; CHECKOBJ-LARGE:        8 OBJECT  GLOBAL DEFAULT   LARGE_COMMON ccc_
+; CHECKOBJ-LARGE:        48394093832 OBJECT GLOBAL DEFAULT LARGE_COMMON __BLNK__
+
+source_filename = "FIRModule"
+target triple = "x86_64-unknown-linux-gnu"
+
+@__BLNK__ = common global [48394093832 x i8] zeroinitializer, align 8
+@ccc_ = common global [8 x i8] zeroinitializer, align 8
+@_QFECn1 = internal constant i32 77777
+@_QFECn2 = internal constant i32 77777
+
+define void @_QQmain() #0 {
+  store double 1.000000e+00, ptr @ccc_, align 8
+  store double 2.000000e+00, ptr getelementptr inbounds nuw (i8, ptr @__BLNK__, i64 61600176), align 8
+  ret void
+}
+
+declare void @_FortranAProgramStart(i32, ptr, ptr, ptr) #1
+
+declare void @_FortranAProgramEndStatement() #1
+
+define i32 @main(i32 %0, ptr %1, ptr %2) #0 {
+  call void @_FortranAProgramStart(i32 %0, ptr %1, ptr %2, ptr null)
+  call void @_QQmain()
+  call void @_FortranAProgramEndStatement()
+  ret i32 0
+}
+
+attributes #0 = { "frame-pointer"="all" "target-cpu"="x86-64" }
+attributes #1 = { "frame-pointer"="all" }
+
+!llvm.ident = !{!0}
+!llvm.module.flags = !{!1, !2, !3}
+
+!0 = !{!"flang version 22.0.0 (https://github.com/llvm/llvm-project.git e1afe25356b8d2ee14f5f88bdb6c2a1526ed14ef)"}
+!1 = !{i32 2, !"Debug Info Version", i32 3}
+!2 = !{i32 8, !"PIC Level", i32 2}
+!3 = !{i32 7, !"PIE Level", i32 2}
diff --git a/llvm/tools/llvm-readobj/ELFDumper.cpp b/llvm/tools/llvm-readobj/ELFDumper.cpp
index ab93316907cc6..7d34727f07f6d 100644
--- a/llvm/tools/llvm-readobj/ELFDumper.cpp
+++ b/llvm/tools/llvm-readobj/ELFDumper.cpp
@@ -1052,6 +1052,8 @@ ELFDumper<ELFT>::getSymbolSectionIndex(const Elf_Sym &Symbol, unsigned SymIndex,
     return CreateErr("SHN_ABS");
   if (Ndx == ELF::SHN_COMMON)
     return CreateErr("SHN_COMMON");
+  if (Ndx == ELF::SHN_AMD64_LCOMMON)
+    return CreateErr("SHN_AMD64_LCOMMON");
   return CreateErr("SHN_LORESERVE", Ndx - SHN_LORESERVE);
 }
 
@@ -4308,6 +4310,9 @@ std::string GNUELFDumper<ELFT>::getSymbolSectionNdx(
   default:
     // Find if:
     // Processor specific
+    if (this->Obj.getHeader().e_machine == EM_X86_64 &&
+        SectionIndex == ELF::SHN_AMD64_LCOMMON)
+      return "LARGE_COMMON";
     if (SectionIndex >= ELF::SHN_LOPROC && SectionIndex <= ELF::SHN_HIPROC)
       return std::string("PRC[0x") +
              to_string(format_hex_no_prefix(SectionIndex, 4)) + "]";

@MaskRay
Copy link
Member

MaskRay commented Oct 3, 2025

First, could you provide justification for why we need to make changes related to COMMON blocks? These have been obsolescent since the Fortran 90 standard and were deleted entirely from Fortran 2018.
The large common feature is only available for x86-64 and not even supported on other popular 64-bit architectures like AArch64.

Two main concerns with the proposed changes:

Increasing sizeof(MCSymbolELF) negatively impacts memory usage.
The modifications to lld/ELF/InputFiles.cpp add overhead for all users, even though the large common feature is quite niche.


SHN_AMD64_LCOMMON is a Solaris-specific constant name, not used by glibc/binutils.
We can use SHN_X86_64_LCOMMON instead.

For 161697, I don't think we should add LbssSection. Just reuse BssSection.
I am more concerned with the lld/ELF/InputFiles.cpp change, which adds overhead to every user not using the niche large common feature.

default:
// Find if:
// Processor specific
if (this->Obj.getHeader().e_machine == EM_X86_64 &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect to see explicit llvm-readobj tests (i.e. located under test/tools/llvm-readobj) for the new code in ELFDumper.cpp, testing its behaviour. This likely would require yaml2obj support and should be in its own PR (together with the functional llvm-readobj changes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @jh7370. I'll create a separate PR for the llvm-readobj change and add the tests.

@ssijaric-nv
Copy link
Contributor Author

Thank you, @MaskRay, for having a look at this. This is an attempt to address #149222 to match the gfortran behavior.

The increase in size of MCSymbolELF is concerning. I didn't want to introduce a new kind for this corner case, (e.g. LargeCommon) to the 'enum Kind' in MCSymbol.h, even though there is space in the 'kind' bitfield to hold an additional kind. I'll look for a different way to represent large common symbols without increasing the MCSymbolELF size. Will rename SHN_AMD64_LCOMMON to SHN_X86_64_LCOMMON.

For the lld change, I'll remove LbssSection. Thanks. I tested the linker change with the Fortran tests that are included in the llvm-test-suite. I'll see if I can get reliable link times with and without the change with the llvm-test-suite Fortran tests, or a larger application.

@MaskRay
Copy link
Member

MaskRay commented Oct 5, 2025

I have some concerns about adding large COMMON support for the obsoleted Fortran feature (and removed from Fortran 2018):

Architecture-specific limitation: This feature is x86-64 specific, which introduces architectural divergence for what is essentially a legacy feature.
SHN_X86_64_LCOMMON isn't even defined in glibc's elf.h.
While the processor-specific section index range SHN_LOPROC~SHN_HIPROC is present, it's really unused in all linkers and binary utilities on Linux.
Adding support just for SHN_X86_64_LCOMMON would be annoying and introduce overhead in a hot patch in linkers.

Is this solving a real problem? Could you clarify whether #149222 represents an actual real-world use case?

If the goal is to eliminate COMMON blocks entirely, the modern solution is to use modules: define the variable in a module in a single source file and compile it once. This is the recommended practice in modern Fortran.
If you specifically need the linker behavior where duplicate symbols don't cause errors, and want these symbols in a .lbss section, consider changing flang to lower to COMDAT section groups instead:

.section .lbss.foo,"awG",%nobits,foo,comdat
.globl foo
.space 8
.size foo, 8

This approach achieves the same result while working across all architectures, not just x86-64.

@ssijaric-nv
Copy link
Contributor Author

Thanks, @MaskRay, for looking at this. We ran into this issue when trying to build a Fortran SPEC CPU2000 benchmark, where converting common blocks to modules is not feasible. The reduced test case from the issue is representative of the common blocks found in the benchmark.

We thought we would match the gfortran behavior, which generates the x86-64 specific section for large common blocks:

0000000b44834508 O LARGE_COMMON 0000000000000020 __BLNK__

where LARGE_COMMON is 0xff02.

The llvm-readobj dump of the gfortran produced object is:

14: 0000000000000020 48394093832 OBJECT GLOBAL DEFAULT PRC[0xff02] __BLNK__
The assembly produced by gfortran is:

.size   main, .-main
.section        .lbss,"aw"
.largecomm      __BLNK__,48394093832,32
.comm   ccc_,8,16

where largecomm is x86-64 specific (https://sourceware.org/binutils/docs/as/i386_002dDirectives.html)

Thank you for suggesting the COMDAT approach! It's much cleaner.

@MaskRay
Copy link
Member

MaskRay commented Oct 7, 2025

Thanks for the comment that this is for a Fortran SPEC CPU2000 benchmark, where converting common blocks to modules is not feasible. The gas .largecomm and GCC code landed in 2005, before https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU/m/NNuA0P7pAQAJ

At the time we were adding it to the psABI there were only hear-say inhouse requirements for actually having
ELF executables or shared libs that had > 2GB text, and I have never seen such in public.

The large data issue is also relevant for aarch64 and riscv64 and I would certainly recommend that they use .section .lbss.foo,"awG",%nobits,foo,comdat instead.
I think it's best to treat .largecomm/SHN_X86_64_LCOMMON as legacy features with annoying side effects and better replacement, so we should not implement them in LLVM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants