Skip to content

Conversation

@pcc
Copy link
Contributor

@pcc pcc commented Apr 15, 2025

The purpose of this flag is to allow the compiler to assume that each
object file passed to the linker has been compiled using a unique
source file name. This is useful for reducing link times when doing
ThinLTO in combination with whole-program devirtualization or CFI,
as it allows modules without exported symbols to be built with ThinLTO.

Created using spr 1.3.6-beta.1
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. llvm:transforms labels Apr 15, 2025
@pcc pcc requested review from teresajohnson and vitalybuka April 15, 2025 02:04
@llvmbot
Copy link
Member

llvmbot commented Apr 15, 2025

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang

Author: Peter Collingbourne (pcc)

Changes

The purpose of this flag is to allow the compiler to assume that each
object file passed to the linker has been compiled using a unique
source file name. This is useful for reducing link times when doing
ThinLTO in combination with whole-program devirtualization or CFI,
as it allows modules without exported symbols to be built with ThinLTO.


Full diff: https://github.com/llvm/llvm-project/pull/135728.diff

10 Files Affected:

  • (modified) clang/docs/ControlFlowIntegrity.rst (+11-5)
  • (modified) clang/docs/UsersManual.rst (+7)
  • (modified) clang/include/clang/Basic/CodeGenOptions.def (+2)
  • (modified) clang/include/clang/Driver/Options.td (+7)
  • (modified) clang/lib/CodeGen/CodeGenModule.cpp (+4)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+3)
  • (added) clang/test/CodeGen/unique-source-file-names.c (+2)
  • (added) clang/test/Driver/unique-source-file-names.c (+5)
  • (modified) llvm/lib/Transforms/Utils/ModuleUtils.cpp (+11-11)
  • (added) llvm/test/Transforms/ThinLTOBitcodeWriter/unique-source-file-names.ll (+22)
diff --git a/clang/docs/ControlFlowIntegrity.rst b/clang/docs/ControlFlowIntegrity.rst
index a88271faf6b42..baff9ab54ff26 100644
--- a/clang/docs/ControlFlowIntegrity.rst
+++ b/clang/docs/ControlFlowIntegrity.rst
@@ -19,11 +19,12 @@ of undefined behavior that can potentially allow attackers to subvert the
 program's control flow. These schemes have been optimized for performance,
 allowing developers to enable them in release builds.
 
-To enable Clang's available CFI schemes, use the flag ``-fsanitize=cfi``.
-You can also enable a subset of available :ref:`schemes <cfi-schemes>`.
-As currently implemented, all schemes rely on link-time optimization (LTO);
-so it is required to specify ``-flto``, and the linker used must support LTO,
-for example via the `gold plugin`_.
+To enable Clang's available CFI schemes, use the flag
+``-fsanitize=cfi``. You can also enable a subset of available
+:ref:`schemes <cfi-schemes>`. As currently implemented, all schemes
+except for ``kcfi`` rely on link-time optimization (LTO); so it is
+required to specify ``-flto`` or ``-flto=thin``, and the linker used
+must support LTO, for example via the `gold plugin`_.
 
 To allow the checks to be implemented efficiently, the program must
 be structured such that certain object files are compiled with CFI
@@ -41,6 +42,11 @@ default visibility setting is ``-fvisibility=default``, which would disable
 CFI checks for classes without visibility attributes. Most users will want
 to specify ``-fvisibility=hidden``, which enables CFI checks for such classes.
 
+When using ``-fsanitize=cfi*`` with ``-flto=thin``, it is recommended
+to reduce link times by passing `-funique-source-file-names
+<UsersManual.html#cmdoption-f-no-unique-source-file-names>`_, provided
+that your program is compatible with it.
+
 Experimental support for :ref:`cross-DSO control flow integrity
 <cfi-cross-dso>` exists that does not require classes to have hidden LTO
 visibility. This cross-DSO support has unstable ABI at this time.
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index 2a93c2552d7dc..d473503931144 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2297,6 +2297,13 @@ are listed below.
    pure ThinLTO, as all split regular LTO modules are merged and LTO linked
    with regular LTO.
 
+.. option:: -f[no-]unique-source-file-names
+
+   When enabled, allows the compiler to assume that each object file
+   passed to the linker has been compiled using a unique source file
+   name. This is useful for reducing link times when doing ThinLTO
+   in combination with whole-program devirtualization or CFI.
+
 .. option:: -fforce-emit-vtables
 
    In order to improve devirtualization, forces emitting of vtables even in
diff --git a/clang/include/clang/Basic/CodeGenOptions.def b/clang/include/clang/Basic/CodeGenOptions.def
index a436c0ec98d5b..c5990fb248689 100644
--- a/clang/include/clang/Basic/CodeGenOptions.def
+++ b/clang/include/clang/Basic/CodeGenOptions.def
@@ -278,6 +278,8 @@ CODEGENOPT(SanitizeCfiICallNormalizeIntegers, 1, 0) ///< Normalize integer types
                                                     ///< CFI icall function signatures
 CODEGENOPT(SanitizeCfiCanonicalJumpTables, 1, 0) ///< Make jump table symbols canonical
                                                  ///< instead of creating a local jump table.
+CODEGENOPT(UniqueSourceFileNames, 1, 0) ///< Allow the compiler to assume that TUs
+                                        ///< have unique source file names at link time
 CODEGENOPT(SanitizeKcfiArity, 1, 0) ///< Embed arity in KCFI patchable function prefix
 CODEGENOPT(SanitizeCoverageType, 2, 0) ///< Type of sanitizer coverage
                                        ///< instrumentation.
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index c9d2bc5e81976..e9acb20348654 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4140,6 +4140,13 @@ def ftrigraphs : Flag<["-"], "ftrigraphs">, Group<f_Group>,
 def fno_trigraphs : Flag<["-"], "fno-trigraphs">, Group<f_Group>,
   HelpText<"Do not process trigraph sequences">,
   Visibility<[ClangOption, CC1Option]>;
+defm unique_source_file_names: BoolOption<"f", "unique-source-file-names",
+  CodeGenOpts<"UniqueSourceFileNames">, DefaultFalse,
+  PosFlag<SetTrue, [], [CC1Option], "Allow">,
+  NegFlag<SetFalse, [], [], "Do not allow">,
+  BothFlags<[], [ClangOption], " the compiler to assume that each translation unit has a unique "
+                               "source file name at link time">>,
+  Group<f_clang_Group>;
 def funsigned_bitfields : Flag<["-"], "funsigned-bitfields">, Group<f_Group>;
 def funsigned_char : Flag<["-"], "funsigned-char">, Group<f_Group>;
 def fno_unsigned_char : Flag<["-"], "fno-unsigned-char">;
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index 4a48c2f35ff23..26e09fe239242 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -1144,6 +1144,10 @@ void CodeGenModule::Release() {
                               1);
   }
 
+  if (CodeGenOpts.UniqueSourceFileNames) {
+    getModule().addModuleFlag(llvm::Module::Max, "Unique Source File Names", 1);
+  }
+
   if (LangOpts.Sanitize.has(SanitizerKind::KCFI)) {
     getModule().addModuleFlag(llvm::Module::Override, "kcfi", 1);
     // KCFI assumes patchable-function-prefix is the same for all indirectly
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 65910e7fdaaa6..8506a5c00e7bc 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7744,6 +7744,9 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
   Args.addOptInFlag(CmdArgs, options::OPT_fexperimental_late_parse_attributes,
                     options::OPT_fno_experimental_late_parse_attributes);
 
+  Args.addOptInFlag(CmdArgs, options::OPT_funique_source_file_names,
+                    options::OPT_fno_unique_source_file_names);
+
   // Setup statistics file output.
   SmallString<128> StatsFile = getStatsFileName(Args, Output, Input, D);
   if (!StatsFile.empty()) {
diff --git a/clang/test/CodeGen/unique-source-file-names.c b/clang/test/CodeGen/unique-source-file-names.c
new file mode 100644
index 0000000000000..1d5a4a5e8e4c5
--- /dev/null
+++ b/clang/test/CodeGen/unique-source-file-names.c
@@ -0,0 +1,2 @@
+// RUN: %clang_cc1 -funique-source-file-names -triple x86_64-linux-gnu -emit-llvm %s -o - | FileCheck %s
+// CHECK:  !{i32 7, !"Unique Source File Names", i32 1}
diff --git a/clang/test/Driver/unique-source-file-names.c b/clang/test/Driver/unique-source-file-names.c
new file mode 100644
index 0000000000000..8322f0e37b0c7
--- /dev/null
+++ b/clang/test/Driver/unique-source-file-names.c
@@ -0,0 +1,5 @@
+// RUN: %clang -funique-source-file-names -### %s 2> %t
+// RUN: FileCheck < %t %s
+
+// CHECK: "-cc1"
+// CHECK: "-funique-source-file-names"
diff --git a/llvm/lib/Transforms/Utils/ModuleUtils.cpp b/llvm/lib/Transforms/Utils/ModuleUtils.cpp
index 1c31e851ef4b2..6b923c9660137 100644
--- a/llvm/lib/Transforms/Utils/ModuleUtils.cpp
+++ b/llvm/lib/Transforms/Utils/ModuleUtils.cpp
@@ -355,17 +355,17 @@ std::string llvm::getUniqueModuleId(Module *M) {
     Md5.update(ArrayRef<uint8_t>{0});
   };
 
-  for (auto &F : *M)
-    AddGlobal(F);
-  for (auto &GV : M->globals())
-    AddGlobal(GV);
-  for (auto &GA : M->aliases())
-    AddGlobal(GA);
-  for (auto &IF : M->ifuncs())
-    AddGlobal(IF);
-
-  if (!ExportsSymbols)
-    return "";
+  auto *UniqueSourceFileNames = mdconst::extract_or_null<ConstantInt>(
+          M->getModuleFlag("Unique Source File Names"));
+  if (UniqueSourceFileNames && UniqueSourceFileNames->getZExtValue()) {
+    Md5.update(M->getSourceFileName());
+  } else {
+    for (auto &GV : M->global_values())
+      AddGlobal(GV);
+
+    if (!ExportsSymbols)
+      return "";
+  }
 
   MD5::MD5Result R;
   Md5.final(R);
diff --git a/llvm/test/Transforms/ThinLTOBitcodeWriter/unique-source-file-names.ll b/llvm/test/Transforms/ThinLTOBitcodeWriter/unique-source-file-names.ll
new file mode 100644
index 0000000000000..0f3fd566f9b1c
--- /dev/null
+++ b/llvm/test/Transforms/ThinLTOBitcodeWriter/unique-source-file-names.ll
@@ -0,0 +1,22 @@
+; RUN: opt -thinlto-bc -thin-link-bitcode-file=%t2 -thinlto-split-lto-unit -o %t %s
+; RUN: llvm-modextract -b -n 1 -o %t1 %t
+; RUN: llvm-dis -o - %t1 | FileCheck %s
+
+source_filename = "unique-source-file-names.c"
+
+@llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 65535, ptr @f, ptr null }]
+
+; CHECK: @g.45934e8a5251fb7adbecfff71a4e70ed =
+@g = internal global i8 42, !type !0
+
+declare void @sink(ptr)
+
+define internal void @f() {
+  call void @sink(ptr @g)
+  ret void
+}
+
+!0 = !{i32 0, !"typeid"}
+
+!llvm.module.flags = !{!1}
+!1 = !{i32 1, !"Unique Source File Names", i32 1}

@llvmbot
Copy link
Member

llvmbot commented Apr 15, 2025

@llvm/pr-subscribers-clang-codegen

Author: Peter Collingbourne (pcc)

Changes

The purpose of this flag is to allow the compiler to assume that each
object file passed to the linker has been compiled using a unique
source file name. This is useful for reducing link times when doing
ThinLTO in combination with whole-program devirtualization or CFI,
as it allows modules without exported symbols to be built with ThinLTO.


Full diff: https://github.com/llvm/llvm-project/pull/135728.diff

10 Files Affected:

  • (modified) clang/docs/ControlFlowIntegrity.rst (+11-5)
  • (modified) clang/docs/UsersManual.rst (+7)
  • (modified) clang/include/clang/Basic/CodeGenOptions.def (+2)
  • (modified) clang/include/clang/Driver/Options.td (+7)
  • (modified) clang/lib/CodeGen/CodeGenModule.cpp (+4)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+3)
  • (added) clang/test/CodeGen/unique-source-file-names.c (+2)
  • (added) clang/test/Driver/unique-source-file-names.c (+5)
  • (modified) llvm/lib/Transforms/Utils/ModuleUtils.cpp (+11-11)
  • (added) llvm/test/Transforms/ThinLTOBitcodeWriter/unique-source-file-names.ll (+22)
diff --git a/clang/docs/ControlFlowIntegrity.rst b/clang/docs/ControlFlowIntegrity.rst
index a88271faf6b42..baff9ab54ff26 100644
--- a/clang/docs/ControlFlowIntegrity.rst
+++ b/clang/docs/ControlFlowIntegrity.rst
@@ -19,11 +19,12 @@ of undefined behavior that can potentially allow attackers to subvert the
 program's control flow. These schemes have been optimized for performance,
 allowing developers to enable them in release builds.
 
-To enable Clang's available CFI schemes, use the flag ``-fsanitize=cfi``.
-You can also enable a subset of available :ref:`schemes <cfi-schemes>`.
-As currently implemented, all schemes rely on link-time optimization (LTO);
-so it is required to specify ``-flto``, and the linker used must support LTO,
-for example via the `gold plugin`_.
+To enable Clang's available CFI schemes, use the flag
+``-fsanitize=cfi``. You can also enable a subset of available
+:ref:`schemes <cfi-schemes>`. As currently implemented, all schemes
+except for ``kcfi`` rely on link-time optimization (LTO); so it is
+required to specify ``-flto`` or ``-flto=thin``, and the linker used
+must support LTO, for example via the `gold plugin`_.
 
 To allow the checks to be implemented efficiently, the program must
 be structured such that certain object files are compiled with CFI
@@ -41,6 +42,11 @@ default visibility setting is ``-fvisibility=default``, which would disable
 CFI checks for classes without visibility attributes. Most users will want
 to specify ``-fvisibility=hidden``, which enables CFI checks for such classes.
 
+When using ``-fsanitize=cfi*`` with ``-flto=thin``, it is recommended
+to reduce link times by passing `-funique-source-file-names
+<UsersManual.html#cmdoption-f-no-unique-source-file-names>`_, provided
+that your program is compatible with it.
+
 Experimental support for :ref:`cross-DSO control flow integrity
 <cfi-cross-dso>` exists that does not require classes to have hidden LTO
 visibility. This cross-DSO support has unstable ABI at this time.
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index 2a93c2552d7dc..d473503931144 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2297,6 +2297,13 @@ are listed below.
    pure ThinLTO, as all split regular LTO modules are merged and LTO linked
    with regular LTO.
 
+.. option:: -f[no-]unique-source-file-names
+
+   When enabled, allows the compiler to assume that each object file
+   passed to the linker has been compiled using a unique source file
+   name. This is useful for reducing link times when doing ThinLTO
+   in combination with whole-program devirtualization or CFI.
+
 .. option:: -fforce-emit-vtables
 
    In order to improve devirtualization, forces emitting of vtables even in
diff --git a/clang/include/clang/Basic/CodeGenOptions.def b/clang/include/clang/Basic/CodeGenOptions.def
index a436c0ec98d5b..c5990fb248689 100644
--- a/clang/include/clang/Basic/CodeGenOptions.def
+++ b/clang/include/clang/Basic/CodeGenOptions.def
@@ -278,6 +278,8 @@ CODEGENOPT(SanitizeCfiICallNormalizeIntegers, 1, 0) ///< Normalize integer types
                                                     ///< CFI icall function signatures
 CODEGENOPT(SanitizeCfiCanonicalJumpTables, 1, 0) ///< Make jump table symbols canonical
                                                  ///< instead of creating a local jump table.
+CODEGENOPT(UniqueSourceFileNames, 1, 0) ///< Allow the compiler to assume that TUs
+                                        ///< have unique source file names at link time
 CODEGENOPT(SanitizeKcfiArity, 1, 0) ///< Embed arity in KCFI patchable function prefix
 CODEGENOPT(SanitizeCoverageType, 2, 0) ///< Type of sanitizer coverage
                                        ///< instrumentation.
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index c9d2bc5e81976..e9acb20348654 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4140,6 +4140,13 @@ def ftrigraphs : Flag<["-"], "ftrigraphs">, Group<f_Group>,
 def fno_trigraphs : Flag<["-"], "fno-trigraphs">, Group<f_Group>,
   HelpText<"Do not process trigraph sequences">,
   Visibility<[ClangOption, CC1Option]>;
+defm unique_source_file_names: BoolOption<"f", "unique-source-file-names",
+  CodeGenOpts<"UniqueSourceFileNames">, DefaultFalse,
+  PosFlag<SetTrue, [], [CC1Option], "Allow">,
+  NegFlag<SetFalse, [], [], "Do not allow">,
+  BothFlags<[], [ClangOption], " the compiler to assume that each translation unit has a unique "
+                               "source file name at link time">>,
+  Group<f_clang_Group>;
 def funsigned_bitfields : Flag<["-"], "funsigned-bitfields">, Group<f_Group>;
 def funsigned_char : Flag<["-"], "funsigned-char">, Group<f_Group>;
 def fno_unsigned_char : Flag<["-"], "fno-unsigned-char">;
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index 4a48c2f35ff23..26e09fe239242 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -1144,6 +1144,10 @@ void CodeGenModule::Release() {
                               1);
   }
 
+  if (CodeGenOpts.UniqueSourceFileNames) {
+    getModule().addModuleFlag(llvm::Module::Max, "Unique Source File Names", 1);
+  }
+
   if (LangOpts.Sanitize.has(SanitizerKind::KCFI)) {
     getModule().addModuleFlag(llvm::Module::Override, "kcfi", 1);
     // KCFI assumes patchable-function-prefix is the same for all indirectly
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 65910e7fdaaa6..8506a5c00e7bc 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7744,6 +7744,9 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
   Args.addOptInFlag(CmdArgs, options::OPT_fexperimental_late_parse_attributes,
                     options::OPT_fno_experimental_late_parse_attributes);
 
+  Args.addOptInFlag(CmdArgs, options::OPT_funique_source_file_names,
+                    options::OPT_fno_unique_source_file_names);
+
   // Setup statistics file output.
   SmallString<128> StatsFile = getStatsFileName(Args, Output, Input, D);
   if (!StatsFile.empty()) {
diff --git a/clang/test/CodeGen/unique-source-file-names.c b/clang/test/CodeGen/unique-source-file-names.c
new file mode 100644
index 0000000000000..1d5a4a5e8e4c5
--- /dev/null
+++ b/clang/test/CodeGen/unique-source-file-names.c
@@ -0,0 +1,2 @@
+// RUN: %clang_cc1 -funique-source-file-names -triple x86_64-linux-gnu -emit-llvm %s -o - | FileCheck %s
+// CHECK:  !{i32 7, !"Unique Source File Names", i32 1}
diff --git a/clang/test/Driver/unique-source-file-names.c b/clang/test/Driver/unique-source-file-names.c
new file mode 100644
index 0000000000000..8322f0e37b0c7
--- /dev/null
+++ b/clang/test/Driver/unique-source-file-names.c
@@ -0,0 +1,5 @@
+// RUN: %clang -funique-source-file-names -### %s 2> %t
+// RUN: FileCheck < %t %s
+
+// CHECK: "-cc1"
+// CHECK: "-funique-source-file-names"
diff --git a/llvm/lib/Transforms/Utils/ModuleUtils.cpp b/llvm/lib/Transforms/Utils/ModuleUtils.cpp
index 1c31e851ef4b2..6b923c9660137 100644
--- a/llvm/lib/Transforms/Utils/ModuleUtils.cpp
+++ b/llvm/lib/Transforms/Utils/ModuleUtils.cpp
@@ -355,17 +355,17 @@ std::string llvm::getUniqueModuleId(Module *M) {
     Md5.update(ArrayRef<uint8_t>{0});
   };
 
-  for (auto &F : *M)
-    AddGlobal(F);
-  for (auto &GV : M->globals())
-    AddGlobal(GV);
-  for (auto &GA : M->aliases())
-    AddGlobal(GA);
-  for (auto &IF : M->ifuncs())
-    AddGlobal(IF);
-
-  if (!ExportsSymbols)
-    return "";
+  auto *UniqueSourceFileNames = mdconst::extract_or_null<ConstantInt>(
+          M->getModuleFlag("Unique Source File Names"));
+  if (UniqueSourceFileNames && UniqueSourceFileNames->getZExtValue()) {
+    Md5.update(M->getSourceFileName());
+  } else {
+    for (auto &GV : M->global_values())
+      AddGlobal(GV);
+
+    if (!ExportsSymbols)
+      return "";
+  }
 
   MD5::MD5Result R;
   Md5.final(R);
diff --git a/llvm/test/Transforms/ThinLTOBitcodeWriter/unique-source-file-names.ll b/llvm/test/Transforms/ThinLTOBitcodeWriter/unique-source-file-names.ll
new file mode 100644
index 0000000000000..0f3fd566f9b1c
--- /dev/null
+++ b/llvm/test/Transforms/ThinLTOBitcodeWriter/unique-source-file-names.ll
@@ -0,0 +1,22 @@
+; RUN: opt -thinlto-bc -thin-link-bitcode-file=%t2 -thinlto-split-lto-unit -o %t %s
+; RUN: llvm-modextract -b -n 1 -o %t1 %t
+; RUN: llvm-dis -o - %t1 | FileCheck %s
+
+source_filename = "unique-source-file-names.c"
+
+@llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 65535, ptr @f, ptr null }]
+
+; CHECK: @g.45934e8a5251fb7adbecfff71a4e70ed =
+@g = internal global i8 42, !type !0
+
+declare void @sink(ptr)
+
+define internal void @f() {
+  call void @sink(ptr @g)
+  ret void
+}
+
+!0 = !{i32 0, !"typeid"}
+
+!llvm.module.flags = !{!1}
+!1 = !{i32 1, !"Unique Source File Names", i32 1}

@github-actions
Copy link

github-actions bot commented Apr 15, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this will give a duplicate symbol linker error if used inappropriately? Any chance of more subtle bugs? Should there be some error detection at LTO link time? I.e. if the module flag is set on the LTO linked modules can and should we detect and error on duplicate source file names?

Md5.update(M->getSourceFileName());
} else {
for (auto &GV : M->global_values())
AddGlobal(GV);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider inlining the lambda since there is now only one use

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

To enable Clang's available CFI schemes, use the flag
``-fsanitize=cfi``. You can also enable a subset of available
:ref:`schemes <cfi-schemes>`. As currently implemented, all schemes
except for ``kcfi`` rely on link-time optimization (LTO); so it is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to separate commit: 984ec70

Created using spr 1.3.6-beta.1
Copy link
Contributor Author

@pcc pcc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this will give a duplicate symbol linker error if used inappropriately? Any chance of more subtle bugs?

As far as LTO is concerned I think it can only result in a duplicate symbol error as the resulting symbols will have external strong linkage. There are two non-LTO uses of getModuleUniqueId in the tree:

  • The PowerPC AsmPrinter. It looks like this one is used to produce symbols with external linkage, so the usage error would be detected at link time.
  • The ASan pass. It's unclear what would happen if used incorrectly (a discarded section error I think?) but it doesn't look like we can do much about it anyway because LTO isn't necessarily enabled with ASan.

Should there be some error detection at LTO link time? I.e. if the module flag is set on the LTO linked modules can and should we detect and error on duplicate source file names?

I don't think it is needed as the LTO-specific cases are already detectable.

To enable Clang's available CFI schemes, use the flag
``-fsanitize=cfi``. You can also enable a subset of available
:ref:`schemes <cfi-schemes>`. As currently implemented, all schemes
except for ``kcfi`` rely on link-time optimization (LTO); so it is
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to separate commit: 984ec70

Md5.update(M->getSourceFileName());
} else {
for (auto &GV : M->global_values())
AddGlobal(GV);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@teresajohnson
Copy link
Contributor

I assume this will give a duplicate symbol linker error if used inappropriately? Any chance of more subtle bugs?

As far as LTO is concerned I think it can only result in a duplicate symbol error as the resulting symbols will have external strong linkage. There are two non-LTO uses of getModuleUniqueId in the tree:

  • The PowerPC AsmPrinter. It looks like this one is used to produce symbols with external linkage, so the usage error would be detected at link time.
  • The ASan pass. It's unclear what would happen if used incorrectly (a discarded section error I think?) but it doesn't look like we can do much about it anyway because LTO isn't necessarily enabled with ASan.

Could that be addressed by only allowing this flag under LTO?

Should there be some error detection at LTO link time? I.e. if the module flag is set on the LTO linked modules can and should we detect and error on duplicate source file names?

I don't think it is needed as the LTO-specific cases are already detectable.

Probably at least add a note to the new documentation for the option that it will result in linker errors if used inappropriately.

I was mostly thinking that detecting and erroring at LTO time would result in a clearer error (like at the start of LTO::addModule, where it is looking for partially split LTO units, which we do by adding the module flag to the index flags). E.g. I could see a case where someone adds this option to their makefiles, time passes and they forget about it, a new file is added that violates the constraints, and boom they get a linker error that they don't understand (and then the compiler gets an bug report).

Created using spr 1.3.6-beta.1
@pcc
Copy link
Contributor Author

pcc commented Apr 15, 2025

Could that be addressed by only allowing this flag under LTO?

-fsanitize=address -flto=thin would have the same issue (if there is indeed an issue, which I think there probably isn't).

Probably at least add a note to the new documentation for the option that it will result in linker errors if used inappropriately.

Added.

I was mostly thinking that detecting and erroring at LTO time would result in a clearer error (like at the start of LTO::addModule, where it is looking for partially split LTO units, which we do by adding the module flag to the index flags).

I don't think it would be detected at that point. The duplicate symbol error would be detected by the linker when it adds the bitcode file to its symbol table and likely cause it to error out before calling LTO::addModule which is only called once the resolutions are known. So if we wanted a better error message it would need to be added to each linker.

@teresajohnson
Copy link
Contributor

I was mostly thinking that detecting and erroring at LTO time would result in a clearer error (like at the start of LTO::addModule, where it is looking for partially split LTO units, which we do by adding the module flag to the index flags).

I don't think it would be detected at that point. The duplicate symbol error would be detected by the linker when it adds the bitcode file to its symbol table and likely cause it to error out before calling LTO::addModule which is only called once the resolutions are known. So if we wanted a better error message it would need to be added to each linker.

I didn't mean to catch the duplicate symbol, but rather to error if the flag was given but we have multiple LTO modules with the same source name, in which case we can give a specific error that this option was used when the source names were not in fact unique.

I'll lgtm this patch, but something to consider as a follow on.

@pcc pcc merged commit a5aa0c4 into main Apr 15, 2025
7 of 10 checks passed
@pcc pcc deleted the users/pcc/spr/introduce-funique-source-file-names-flag branch April 15, 2025 18:12
Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "name" in this context the whole path? Or just the filename? I know many projects have files with the same name in different folders (including LLVM itsef).

@pcc
Copy link
Contributor Author

pcc commented Apr 15, 2025

Is the "name" in this context the whole path? Or just the filename? I know many projects have files with the same name in different folders (including LLVM itsef).

It is the whole path. Let me clarify this in the documentation.

@pcc
Copy link
Contributor Author

pcc commented Apr 15, 2025

#135832

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 15, 2025
The purpose of this flag is to allow the compiler to assume that each
object file passed to the linker has been compiled using a unique
source file name. This is useful for reducing link times when doing
ThinLTO in combination with whole-program devirtualization or CFI,
as it allows modules without exported symbols to be built with ThinLTO.

Reviewers: vitalybuka, teresajohnson

Reviewed By: teresajohnson

Pull Request: llvm/llvm-project#135728
@vitalybuka
Copy link
Collaborator

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:codegen IR generation bugs: mangling, exceptions, etc. clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants