[Serialization] Fix lazy template loading #133057

hahnjo · 2025-03-26T09:26:51Z

Hash inner template arguments: The code is applied from ODRHash::AddDecl with the reasoning given in the comment, to reduce collisions. This was particularly visible with STL types templated on std::pair where its template arguments were not taken into account.
Complete only needed partial specializations: It is unclear (to me) why this needs to be done "for safety", but this change significantly improves the effectiveness of lazy loading.
Load only needed partial specializations: Similar as the last commit, it is unclear why we need to load all specializations, including non-partial ones, when we have a TPL.
Remove bail-out logic in TemplateArgumentHasher: While it is correct to assign a single fixed hash to all template
arguments, it can reduce the effectiveness of lazy loading and is not actually needed: we are allowed to ignore parts that cannot be handled because they will be analogously ignored by all hashings.

llvmbot · 2025-03-26T09:27:35Z

@llvm/pr-subscribers-clang-modules

@llvm/pr-subscribers-clang

Author: Jonas Hahnfeld (hahnjo)

Changes

Hash inner template arguments: The code is applied from ODRHash::AddDecl with the reasoning given in the comment, to reduce collisions. This was particularly visible with STL types templated on std::pair where its template arguments were not taken into account.
Complete only needed partial specializations: It is unclear (to me) why this needs to be done "for safety", but this change significantly improves the effectiveness of lazy loading.
Load only needed partial specializations: Similar as the last commit, it is unclear why we need to load all specializations, including non-partial ones, when we have a TPL.
Remove bail-out logic in TemplateArgumentHasher: While it is correct to assign a single fixed hash to all template
arguments, it can reduce the effectiveness of lazy loading and is not actually needed: we are allowed to ignore parts that cannot be handled because they will be analogously ignored by all hashings.

Full diff: https://github.com/llvm/llvm-project/pull/133057.diff

3 Files Affected:

(modified) clang/lib/AST/DeclTemplate.cpp (-6)
(modified) clang/lib/Serialization/ASTReader.cpp (+2-8)
(modified) clang/lib/Serialization/TemplateArgumentHasher.cpp (+18-33)

diff --git a/clang/lib/AST/DeclTemplate.cpp b/clang/lib/AST/DeclTemplate.cpp
index c0f5be51db5f3..8560c3928aa84 100644
--- a/clang/lib/AST/DeclTemplate.cpp
+++ b/clang/lib/AST/DeclTemplate.cpp
@@ -367,12 +367,6 @@ bool RedeclarableTemplateDecl::loadLazySpecializationsImpl(
   if (!ExternalSource)
     return false;
 
-  // If TPL is not null, it implies that we're loading specializations for
-  // partial templates. We need to load all specializations in such cases.
-  if (TPL)
-    return ExternalSource->LoadExternalSpecializations(this->getCanonicalDecl(),
-                                                       /*OnlyPartial=*/false);
-
   return ExternalSource->LoadExternalSpecializations(this->getCanonicalDecl(),
                                                      Args);
 }
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 0cd2cedb48dd9..eb0496c97eb3b 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -7891,14 +7891,8 @@ void ASTReader::CompleteRedeclChain(const Decl *D) {
     }
   }
 
-  if (Template) {
-    // For partitial specialization, load all the specializations for safety.
-    if (isa<ClassTemplatePartialSpecializationDecl,
-            VarTemplatePartialSpecializationDecl>(D))
-      Template->loadLazySpecializationsImpl();
-    else
-      Template->loadLazySpecializationsImpl(Args);
-  }
+  if (Template)
+    Template->loadLazySpecializationsImpl(Args);
 }
 
 CXXCtorInitializer **
diff --git a/clang/lib/Serialization/TemplateArgumentHasher.cpp b/clang/lib/Serialization/TemplateArgumentHasher.cpp
index 3c7177b83ba52..5fb363c4ab148 100644
--- a/clang/lib/Serialization/TemplateArgumentHasher.cpp
+++ b/clang/lib/Serialization/TemplateArgumentHasher.cpp
@@ -21,17 +21,6 @@ using namespace clang;
 namespace {
 
 class TemplateArgumentHasher {
-  // If we bail out during the process of calculating hash values for
-  // template arguments for any reason. We're allowed to do it since
-  // TemplateArgumentHasher are only required to give the same hash value
-  // for the same template arguments, but not required to give different
-  // hash value for different template arguments.
-  //
-  // So in the worst case, it is still a valid implementation to give all
-  // inputs the same BailedOutValue as output.
-  bool BailedOut = false;
-  static constexpr unsigned BailedOutValue = 0x12345678;
-
   llvm::FoldingSetNodeID ID;
 
 public:
@@ -41,14 +30,7 @@ class TemplateArgumentHasher {
 
   void AddInteger(unsigned V) { ID.AddInteger(V); }
 
-  unsigned getValue() {
-    if (BailedOut)
-      return BailedOutValue;
-
-    return ID.computeStableHash();
-  }
-
-  void setBailedOut() { BailedOut = true; }
+  unsigned getValue() { return ID.computeStableHash(); }
 
   void AddType(const Type *T);
   void AddQualType(QualType T);
@@ -92,8 +74,7 @@ void TemplateArgumentHasher::AddTemplateArgument(TemplateArgument TA) {
   case TemplateArgument::Expression:
     // If we meet expression in template argument, it implies
     // that the template is still dependent. It is meaningless
-    // to get a stable hash for the template. Bail out simply.
-    BailedOut = true;
+    // to get a stable hash for the template.
     break;
   case TemplateArgument::Pack:
     AddInteger(TA.pack_size());
@@ -110,10 +91,9 @@ void TemplateArgumentHasher::AddStructuralValue(const APValue &Value) {
 
   // 'APValue::Profile' uses pointer values to make hash for LValue and
   // MemberPointer, but they differ from one compiler invocation to another.
-  // It may be difficult to handle such cases. Bail out simply.
+  // It may be difficult to handle such cases.
 
   if (Kind == APValue::LValue || Kind == APValue::MemberPointer) {
-    BailedOut = true;
     return;
   }
 
@@ -135,14 +115,11 @@ void TemplateArgumentHasher::AddTemplateName(TemplateName Name) {
   case TemplateName::DependentTemplate:
   case TemplateName::SubstTemplateTemplateParm:
   case TemplateName::SubstTemplateTemplateParmPack:
-    BailedOut = true;
     break;
   case TemplateName::UsingTemplate: {
     UsingShadowDecl *USD = Name.getAsUsingShadowDecl();
     if (USD)
       AddDecl(USD->getTargetDecl());
-    else
-      BailedOut = true;
     break;
   }
   case TemplateName::DeducedTemplate:
@@ -167,7 +144,6 @@ void TemplateArgumentHasher::AddDeclarationName(DeclarationName Name) {
   case DeclarationName::ObjCZeroArgSelector:
   case DeclarationName::ObjCOneArgSelector:
   case DeclarationName::ObjCMultiArgSelector:
-    BailedOut = true;
     break;
   case DeclarationName::CXXConstructorName:
   case DeclarationName::CXXDestructorName:
@@ -194,16 +170,29 @@ void TemplateArgumentHasher::AddDeclarationName(DeclarationName Name) {
 void TemplateArgumentHasher::AddDecl(const Decl *D) {
   const NamedDecl *ND = dyn_cast<NamedDecl>(D);
   if (!ND) {
-    BailedOut = true;
     return;
   }
 
   AddDeclarationName(ND->getDeclName());
+
+  // If this was a specialization we should take into account its template
+  // arguments. This helps to reduce collisions coming when visiting template
+  // specialization types (eg. when processing type template arguments).
+  ArrayRef<TemplateArgument> Args;
+  if (auto *CTSD = dyn_cast<ClassTemplateSpecializationDecl>(D))
+    Args = CTSD->getTemplateArgs().asArray();
+  else if (auto *VTSD = dyn_cast<VarTemplateSpecializationDecl>(D))
+    Args = VTSD->getTemplateArgs().asArray();
+  else if (auto *FD = dyn_cast<FunctionDecl>(D))
+    if (FD->getTemplateSpecializationArgs())
+      Args = FD->getTemplateSpecializationArgs()->asArray();
+
+  for (auto &TA : Args)
+    AddTemplateArgument(TA);
 }
 
 void TemplateArgumentHasher::AddQualType(QualType T) {
   if (T.isNull()) {
-    BailedOut = true;
     return;
   }
   SplitQualType split = T.split();
@@ -213,7 +202,6 @@ void TemplateArgumentHasher::AddQualType(QualType T) {
 
 // Process a Type pointer.  Add* methods call back into TemplateArgumentHasher
 // while Visit* methods process the relevant parts of the Type.
-// Any unhandled type will make the hash computation bail out.
 class TypeVisitorHelper : public TypeVisitor<TypeVisitorHelper> {
   typedef TypeVisitor<TypeVisitorHelper> Inherited;
   llvm::FoldingSetNodeID &ID;
@@ -245,9 +233,6 @@ class TypeVisitorHelper : public TypeVisitor<TypeVisitorHelper> {
 
   void Visit(const Type *T) { Inherited::Visit(T); }
 
-  // Unhandled types. Bail out simply.
-  void VisitType(const Type *T) { Hash.setBailedOut(); }
-
   void VisitAdjustedType(const AdjustedType *T) {
     AddQualType(T->getOriginalType());
   }

ChuanqiXu9 · 2025-03-26T09:32:18Z

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

hahnjo · 2025-03-26T09:51:12Z

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing.

vgvassilev · 2025-03-26T09:57:54Z

@ilya-biryukov, would you mind giving this PR a test on your infrastructure and if it works maybe share some performance results?

hahnjo · 2025-03-26T10:07:42Z

Performance measurements with LLVM

I tested these patches for building LLVM itself with modules (LLVM_ENABLE_MODULES=ON). To work around #130795, I apply #131354 before building Clang. In terms of overall performance for the entire build, I'm not able to measure a difference in memory consumption because that is dominated by the linker. The run time performance is very noisy, so it's hard to make accurate statements but it looks unaffected as well.

I did some measurements for individual files, chosen by searching for large object files and excluding generated files. For each version, I first build LLVM completely to populate the module.cache and then delete and rebuild only one object file. Run time performance is not hugely affected, it seems to get slightly faster with this PR.

Maximum resident set size (kbytes) from /usr/bin/time -v:

object file	before*	`main`	this PR
`lib/Analysis/CMakeFiles/LLVMAnalysis.dir/ScalarEvolution.cpp.o`	543100	515184	445784
`lib/Passes/CMakeFiles/LLVMPasses.dir/PassBuilder.cpp.o`	923036	884160	805960
`lib/Transforms/IPO/CMakeFiles/LLVMipo.dir/AttributorAttributes.cpp.o`	639184	600076	522512
`lib/Transforms/Vectorize/CMakeFiles/LLVMVectorize.dir/SLPVectorizer.cpp.o`	876580	857404	776572

before*: reverting fb2c9d9, c5e4afe, 30ea0f0, 20e9049 on current main

Backport upstream PR llvm/llvm-project#133057

ilya-biryukov · 2025-03-26T16:25:51Z

@ilya-biryukov, would you mind giving this PR a test on your infrastructure and if it works maybe share some performance results?

Sure, let me try kicking it off. Note that our infrastructure is much better at detecting the compilations timing out than providing proper benchmarking at scale (there are a few targeted benchmarks too, though).
That means we're good and detecting big regressions, but won't be able to provide very reliable performance measurements.

I'll try to give you what we have, though.

ChuanqiXu9 · 2025-03-27T02:00:46Z

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing.

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

ChuanqiXu9 · 2025-03-27T02:03:19Z

Complete only needed partial specializations: It is unclear (to me) why this needs to be done "for safety", but this change significantly improves the effectiveness of lazy loading.

This comes from the logic: if we have a partial template specialization A<int, T, U> and we need a full specialization for A<int, double, double>, we hope the partial specialization to be loaded

Backport upstream PR llvm/llvm-project#133057

hahnjo · 2025-03-28T09:34:27Z

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

I'm ok with pushing the commits one-by-one after the PR is reviewed, just let me know.

Complete only needed partial specializations: It is unclear (to me) why this needs to be done "for safety", but this change significantly improves the effectiveness of lazy loading.

This comes from the logic: if we have a partial template specialization A<int, T, U> and we need a full specialization for A<int, double, double>, we hope the partial specialization to be loaded

Sure, but in my understanding, that's not needed on the ASTReader side but is taken care of by Sema (?). For the following example:

//--- partial.cppm
export module partial;

export template <typename S, typename T, typename U>
struct Partial {
  static constexpr int Value() { return 0; }
};

export template <typename T, typename U>
struct Partial<int, T, U> {
  static constexpr int Value() { return 1; }
};

//--- partial.cpp
import partial;

static_assert(Partial<int, double, double>::Value() == 1);

(I assume that's what you have in mind?) I see two calls to ASTReader::CompleteRedeclChain (with this PR applied): The first asks for the full instantiation Partial<int, double, double> and regardless of what we load, the answer to the query is that it's not defined yet. The second asks for the partial specialization Partial<int, T, U> and then instantiation proceeds to do the right thing.

vgvassilev · 2025-03-28T09:38:15Z

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing.

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

This is a relatively small patch focused on reducing the round trips to modules deserialization. I see this as an atomic change that if it goes in partially would defeat its purpose. What's the goal of a partial optimization?

ChuanqiXu9 · 2025-03-28T09:40:18Z

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

I'm ok with pushing the commits one-by-one after the PR is reviewed, just let me know.

Complete only needed partial specializations: It is unclear (to me) why this needs to be done "for safety", but this change significantly improves the effectiveness of lazy loading.

This comes from the logic: if we have a partial template specialization A<int, T, U> and we need a full specialization for A<int, double, double>, we hope the partial specialization to be loaded

Sure, but in my understanding, that's not needed on the ASTReader side but is taken care of by Sema (?). For the following example:
//--- partial.cppm
export module partial;

export template <typename S, typename T, typename U>
struct Partial {
  static constexpr int Value() { return 0; }
};

export template <typename T, typename U>
struct Partial<int, T, U> {
  static constexpr int Value() { return 1; }
};

//--- partial.cpp
import partial;

static_assert(Partial<int, double, double>::Value() == 1);
(I assume that's what you have in mind?) I see two calls to ASTReader::CompleteRedeclChain (with this PR applied): The first asks for the full instantiation Partial<int, double, double> and regardless of what we load, the answer to the query is that it's not defined yet. The second asks for the partial specialization Partial<int, T, U> and then instantiation proceeds to do the right thing.

If it works, I feel good with it.

ChuanqiXu9 · 2025-03-28T09:43:08Z

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing.

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

This is a relatively small patch focused on reducing the round trips to modules deserialization. I see this as an atomic change that if it goes in partially would defeat its purpose. What's the goal of a partial optimization?

I think partial optimizations are optimization too. If these codes are not dependent on each other, it should be better to split them.

Given the scale of the patch, it may not be serious problem actually. I still think it is better to land them separately, but if you want to save some typings. I don't feel too bad.

vgvassilev · 2025-03-28T09:46:02Z

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing.

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

This is a relatively small patch focused on reducing the round trips to modules deserialization. I see this as an atomic change that if it goes in partially would defeat its purpose. What's the goal of a partial optimization?

I think partial optimizations are optimization too. If these codes are not dependent on each other, it should be better to split them.

Given the scale of the patch, it may not be serious problem actually. I still think it is better to land them separately, but if you want to save some typings. I don't feel too bad.

Honestly I am more concerned about the tests that @ilya-biryukov is running. As long as they are happy I do not particularly care about commit style. Although it'd be weird to land 40 line patch in many commits :)

ChuanqiXu9 · 2025-03-28T09:53:04Z

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing.

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

This is a relatively small patch focused on reducing the round trips to modules deserialization. I see this as an atomic change that if it goes in partially would defeat its purpose. What's the goal of a partial optimization?

I think partial optimizations are optimization too. If these codes are not dependent on each other, it should be better to split them.
Given the scale of the patch, it may not be serious problem actually. I still think it is better to land them separately, but if you want to save some typings. I don't feel too bad.

Honestly I am more concerned about the tests that @ilya-biryukov is running. As long as they are happy I do not particularly care about commit style. Although it'd be weird to land 40 line patch in many commits :)

I don't feel odd. I remember it is (or was) LLVM's policy that smaller patches are preferred : )

Backport upstream PR llvm/llvm-project#133057

ilya-biryukov · 2025-03-28T12:18:35Z

The small-scale benchmarks we had show 10% improvement in CPU and 23% improvement in memory usage for some compilations!

We did hit one compiler error that does not reproduce without modules, however:
error: use of overloaded operator '=' is ambiguous

We're in the process of getting a small reproducer (please bear with us, it takes some time) that we can share. @emaxx-google is working on it.

vgvassilev · 2025-03-28T12:44:17Z

The small-scale benchmarks we had show 10% improvement in CPU and 23% improvement in memory usage for some compilations!

That's very good news. I think we can further reduce these times. IIRC, we still deserialize declarations that we do not need. One of the places to look is the logic that kicks in when at module loading time:

llvm-project/clang/lib/Serialization/ASTReader.cpp

Line 3426 in 772173f

llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,

We did hit one compiler error that does not reproduce without modules, however: error: use of overloaded operator '=' is ambiguous

Ouch. If that's the only issue on your infrastructure that's probably not so bad.

We're in the process of getting a small reproducer (please bear with us, it takes some time) that we can share. @emaxx-google is working on it.

emaxx-google · 2025-04-01T14:32:48Z

Here's the (almost) minimized reproducer for this error: use of overloaded operator '=' is ambiguous error: https://pastebin.com/Ux7TiQhw . (The minimization tool isn't perfect, we know, but we opted to share this result sooner rather than later.)

UPD: To run the reproducer, first "unpack" the archive into separate files using LLVM's split-file (e.g., split-file repro.txt repro/), then run the makefile: CLANG=path/to/clang make -k -C repro.

hahnjo · 2025-04-01T15:24:06Z

Here's the (almost) minimized reproducer for this error: use of overloaded operator '=' is ambiguous error: https://pastebin.com/Ux7TiQhw . (The minimization tool isn't perfect, we know, but we opted to share this result sooner rather than later.)

UPD: To run the reproducer, first "unpack" the archive into separate files using LLVM's split-file (e.g., split-file repro.txt repro/), then run the makefile: CLANG=path/to/clang make -k -C repro.

Thanks for the efforts! I only had a very quick look and it seems the paste is not complete. For example, head1.h has

class Class1 {
public:

and many other definitions look incomplete as well. Can you check if there was maybe a mistake?

emaxx-google · 2025-04-01T15:28:51Z

Here's the (almost) minimized reproducer for this error: use of overloaded operator '=' is ambiguous error: https://pastebin.com/Ux7TiQhw . (The minimization tool isn't perfect, we know, but we opted to share this result sooner rather than later.)
UPD: To run the reproducer, first "unpack" the archive into separate files using LLVM's split-file (e.g., split-file repro.txt repro/), then run the makefile: CLANG=path/to/clang make -k -C repro.

Thanks for the efforts! I only had a very quick look and it seems the paste is not complete. For example, head1.h has
class Class1 {
public:
and many other definitions look incomplete as well. Can you check if there was maybe a mistake?

That's how it looks like - the minimizer tool (based on C-Reduce/C-Vise) basically works by randomly removing chunks of code, which does often end up with code that looks corrupted. The tool could at least do a better job by merging/inlining unnecessary headers, macros, etc., but at least the output, as shared, should be sufficient to trigger the error in question (error: use of overloaded operator '=' is ambiguous). Let me know whether this works for you.

hahnjo · 2025-04-01T18:10:39Z

I had a closer look, but I get plenty of compile errors already on main - including

./head15.h:20:7: error: use of overloaded operator '=' is ambiguous (with operand types 'std::vector<absl::string_view>' and 'strings_internal::Splitter<typename strings_internal::SelectDelimiter<char>::type, AllowEmpty, std::string>' (aka 'Splitter<char, strings_internal::AllowEmpty, basic_string>'))

I haven't even applied the change in this PR - what am I missing?

ChuanqiXu9

BTW, I've landed the 3 patches (excluding the problematic reported by Google) internally and we've used it for months. It looks good.

hahnjo · 2025-07-04T06:38:59Z

I don't think we should merge a partial state of this PR for two reasons: 1. The three patches alone don't actually bring much benefit. When I measured in ROOT, we definitely needed the (currently) problematic one for lazy template loading to become really effective. 2. As commented before, I suspect that the patch only reveals a problem that is already there right now. We manage to trigger it with that change, but I think it must be understood first before changing the baseline.

vgvassilev · 2025-07-04T14:05:36Z

I don't think we should merge a partial state of this PR for two reasons: 1. The three patches alone don't actually bring much benefit. When I measured in ROOT, we definitely needed the (currently) problematic one for lazy template loading to become really effective. 2. As commented before, I suspect that the patch only reveals a problem that is already there right now. We manage to trigger it with that change, but I think it must be understood first before changing the baseline.

@hahnjo, if we have a working reproducer could we see if @zygoloid's comment helps us:

If this issue is indeed specific to conversion function templates, perhaps a path forward would be to disable some part of the lazy loading for only those templates to unblock this patch while the reason for the problem is being investigated?

That would mean that we exclude from lazy serialization the templated conversion functions.

hahnjo · 2025-07-04T14:10:11Z

@hahnjo, if we have a working reproducer could we see if @zygoloid's comment helps us:

If this issue is indeed specific to conversion function templates, perhaps a path forward would be to disable some part of the lazy loading for only those templates to unblock this patch while the reason for the problem is being investigated?

That would mean that we exclude from lazy serialization the templated conversion functions.

I replied to this suggestion in #133057 (comment). That still stands, I don't see that anything changed since May.

vgvassilev · 2025-08-13T13:16:38Z

@hahnjo, if we have a working reproducer could we see if @zygoloid's comment helps us:

If this issue is indeed specific to conversion function templates, perhaps a path forward would be to disable some part of the lazy loading for only those templates to unblock this patch while the reason for the problem is being investigated?

That would mean that we exclude from lazy serialization the templated conversion functions.

I replied to this suggestion in #133057 (comment). That still stands, I don't see that anything changed since May.

I'd like to move forward with this PR as is. It is too good to be held a hostage of one single failing example downstream. Any objections @ilya-biryukov, @emaxx-google?

hahnjo · 2025-08-13T13:19:30Z

I object to merging any of my changes. The failures must be debugged, understood, and fixed.

hahnjo · 2025-08-18T17:01:11Z

Invested some time today and I think the fix is #154158. It works for the reproducer posted in #133057 (comment), @ilya-biryukov or @emaxx-google if you have some time, would it be possible to test the two PRs together in your infrastructure?

With lazy template loading, it is possible to find non-canonical FunctionDecls, depending on when redecl chains are completed. This is a problem for templated conversion operators that would allow to call either the copy assignment or the move assignment operator. This ambiguity is resolved by isBetterReferenceBindingKind (called from CompareStandardConversionSequences) ranking rvalue refs over lvalue refs. Unfortunately, this fix is hard to test in isolation without the changes in llvm#133057 that make lazy template loading more likely to complete redecl chains at "inconvenient" times. The added reproducer passes before and after this commit, but would have failed with the proposed changes of the linked PR. Kudos to Maksim Ivanov for providing an initial version of the reproducer that I further simplified.

emaxx-google · 2025-08-25T23:13:57Z

Invested some time today and I think the fix is #154158. It works for the reproducer posted in #133057 (comment), @ilya-biryukov or @emaxx-google if you have some time, would it be possible to test the two PRs together in your infrastructure?

Unfortunately after repeating the testing on the google monorepo with these two changes cherry-picked, there's a lot of errors observed. Minimizing examples would take a few days, but to name some error messages:

'std::tuple<std::string_view>::operator=' from module '...' is not present in definition of 'std::tuple<std::string_view>' provided earlier
'std::pair<std::string, std::string>::operator=' from module '...' is not present in definition of 'std::pair<std::string, std::string>' provided earlier

hahnjo · 2025-08-26T19:06:54Z

Invested some time today and I think the fix is #154158. It works for the reproducer posted in #133057 (comment), @ilya-biryukov or @emaxx-google if you have some time, would it be possible to test the two PRs together in your infrastructure?

Unfortunately after repeating the testing on the google monorepo with these two changes cherry-picked, there's a lot of errors observed. Minimizing examples would take a few days, but to name some error messages:
* `'std::tuple<std::string_view>::operator=' from module '...' is not present in definition of 'std::tuple<std::string_view>' provided earlier`

* `'std::pair<std::string, std::string>::operator=' from module '...' is not present in definition of 'std::pair<std::string, std::string>' provided earlier`

Hm, and that's after or instead of the previous problems. Ie did #154158 fix a problem and we get past that point, now seeing "later" issues?

emaxx-google · 2025-08-27T01:43:55Z

Invested some time today and I think the fix is #154158. It works for the reproducer posted in #133057 (comment), @ilya-biryukov or @emaxx-google if you have some time, would it be possible to test the two PRs together in your infrastructure?

Unfortunately after repeating the testing on the google monorepo with these two changes cherry-picked, there's a lot of errors observed. Minimizing examples would take a few days, but to name some error messages:
* `'std::tuple<std::string_view>::operator=' from module '...' is not present in definition of 'std::tuple<std::string_view>' provided earlier`

* `'std::pair<std::string, std::string>::operator=' from module '...' is not present in definition of 'std::pair<std::string, std::string>' provided earlier`
Hm, and that's after or instead of the previous problems. Ie did #154158 fix a problem and we get past that point, now seeing "later" issues?

Yes it seems that the original target in question compiles successfully now. Also all new errors seem to come from a rather special build mode.

hahnjo · 2025-08-27T07:37:28Z

Yes it seems that the original target in question compiles successfully now.

Ok, that's progress at least. I will then land the PR with the fix some time soon and we can take that item off the list.

Also all new errors seem to come from a rather special build mode.

Hm, maybe you can share more information already before providing a reproducer, such as special compiler flags used?

(edit: looking at the call chain leading up to err_module_odr_violation_missing_decl and a bit of git history, I assume the "special build mode" is using -fno-skip-odr-check-in-gmf and that maybe doesn't play nicely with lazy template loading of templated member functions, such as operator= of std::pair and std::tuple?)

ChuanqiXu9 · 2025-08-27T07:43:58Z

Ok, that's progress at least. I will then land the PR with the fix some time soon and we can take that item off the list.

How can we make sure the new reported errors are not triggered by #154158?

hahnjo · 2025-08-30T10:13:20Z

Ok, that's progress at least. I will then land the PR with the fix some time soon and we can take that item off the list.

How can we make sure the new reported errors are not triggered by #154158?

#154158 "relaxes" a check so that more conversion functions will compare equal (and subsequently be disambiguated). Also the fact that @emaxx-google mentions a "special build mode" (which probably involves -fno-skip-odr-check-in-gmf) makes me suspect that we're dealing with a different error here.

ChuanqiXu9 · 2025-09-01T01:51:47Z

Ok, that's progress at least. I will then land the PR with the fix some time soon and we can take that item off the list.

How can we make sure the new reported errors are not triggered by #154158?

#154158 "relaxes" a check so that more conversion functions will compare equal (and subsequently be disambiguated). Also the fact that @emaxx-google mentions a "special build mode" (which probably involves -fno-skip-odr-check-in-gmf) makes me suspect that we're dealing with a different error here.

I believe Google was not using -fno-skip-odr-check-in-gmf. It is specific to C++20 named modules. #154158 may trigger other loading, so I am not sure they are not relevent.

hahnjo · 2025-09-04T10:50:52Z

Hm, then I need more information how to reproduce before I can meaningfully look into it...

hahnjo · 2025-09-15T14:46:46Z

@ilya-biryukov @emaxx-google ping, any updates on this? I would be curious to learn more details about the "rather special build mode" and, if possible, if #154158 on its own is good to go

emaxx-google · 2025-09-24T00:24:27Z

@ilya-biryukov @emaxx-google ping, any updates on this? I would be curious to learn more details about the "rather special build mode" and, if possible, if #154158 on its own is good to go

I asked for help from the corresponding team a couple of weeks ago, but we haven't got anything that we can share yet, sorry.

With lazy template loading, it is possible to find non-canonical FunctionDecls, depending on when redecl chains are completed. This is a problem for templated conversion operators that would allow to call either the copy assignment or the move assignment operator. This ambiguity is resolved by isBetterReferenceBindingKind (called from CompareStandardConversionSequences) ranking rvalue refs over lvalue refs. Unfortunately, this fix is hard to test in isolation without the changes in #133057 that make lazy template loading more likely to complete redecl chains at "inconvenient" times. The added reproducer passes before and after this commit, but would have failed with the proposed changes of the linked PR. Kudos to Maksim Ivanov for providing an initial version of the reproducer that I further simplified.

With lazy template loading, it is possible to find non-canonical FunctionDecls, depending on when redecl chains are completed. This is a problem for templated conversion operators that would allow to call either the copy assignment or the move assignment operator. This ambiguity is resolved by isBetterReferenceBindingKind (called from CompareStandardConversionSequences) ranking rvalue refs over lvalue refs. Unfortunately, this fix is hard to test in isolation without the changes in llvm/llvm-project#133057 that make lazy template loading more likely to complete redecl chains at "inconvenient" times. The added reproducer passes before and after this commit, but would have failed with the proposed changes of the linked PR. Kudos to Maksim Ivanov for providing an initial version of the reproducer that I further simplified.

With lazy template loading, it is possible to find non-canonical FunctionDecls, depending on when redecl chains are completed. This is a problem for templated conversion operators that would allow to call either the copy assignment or the move assignment operator. This ambiguity is resolved by isBetterReferenceBindingKind (called from CompareStandardConversionSequences) ranking rvalue refs over lvalue refs. Unfortunately, this fix is hard to test in isolation without the changes in llvm#133057 that make lazy template loading more likely to complete redecl chains at "inconvenient" times. The added reproducer passes before and after this commit, but would have failed with the proposed changes of the linked PR. Kudos to Maksim Ivanov for providing an initial version of the reproducer that I further simplified.

The code is applied from ODRHash::AddDecl with the reasoning given in the comment, to reduce collisions. This was particularly visible with STL types templated on std::pair where its template arguments were not taken into account.

It is unclear (to me) why this needs to be done "for safety", but this change significantly improves the effectiveness of lazy loading.

Similar as the last commit, it is unclear why we need to load all specializations, including non-partial ones, when we have a TPL.

While it is correct to assign a single fixed hash to all template arguments, it can reduce the effectiveness of lazy loading and is not actually needed: we are allowed to ignore parts that cannot be handled because they will be analogously ignored by all hashings.

hahnjo · 2025-10-18T11:54:41Z

I rebased the changes after #154158. Would it be possible to re-test / give information about the observed failures? I understand (public) reproducers take time, but it would be great to get at least something to understand how to trigger...

hahnjo requested review from ChuanqiXu9, ilya-biryukov and vgvassilev March 26, 2025 09:26

llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules labels Mar 26, 2025

hahnjo added a commit to devajithvs/root that referenced this pull request Mar 26, 2025

[llvm-project] Fix lazy template loading

77dcc83

Backport upstream PR llvm/llvm-project#133057

hahnjo added a commit to devajithvs/root that referenced this pull request Mar 26, 2025

[llvm-project] Fix lazy template loading

f4e61d3

Backport upstream PR llvm/llvm-project#133057

hahnjo added a commit to devajithvs/root that referenced this pull request Mar 28, 2025

[llvm-project] Fix lazy template loading

599e306

Backport upstream PR llvm/llvm-project#133057

hahnjo added a commit to devajithvs/root that referenced this pull request Mar 28, 2025

[llvm-project] Fix lazy template loading

1b65dcd

Backport upstream PR llvm/llvm-project#133057

ChuanqiXu9 approved these changes Jul 4, 2025

View reviewed changes

hahnjo marked this pull request as draft July 4, 2025 06:36

hahnjo mentioned this pull request Aug 19, 2025

[Sema] Compare canonical conversion function #154158

Merged

hahnjo force-pushed the clang-modules-lazy branch from 5b401ca to 5b5e242 Compare August 19, 2025 12:26

hahnjo added 4 commits October 18, 2025 13:37

[Serialization] Hash inner template arguments

cae06d1

The code is applied from ODRHash::AddDecl with the reasoning given in the comment, to reduce collisions. This was particularly visible with STL types templated on std::pair where its template arguments were not taken into account.

[Serialization] Complete only needed partial specializations

c8183e7

It is unclear (to me) why this needs to be done "for safety", but this change significantly improves the effectiveness of lazy loading.

[Serialization] Load only needed partial specializations

3a272ba

Similar as the last commit, it is unclear why we need to load all specializations, including non-partial ones, when we have a TPL.

hahnjo force-pushed the clang-modules-lazy branch from 5b5e242 to c44e2b2 Compare October 18, 2025 11:48

[Serialization] Fix lazy template loading #133057

Are you sure you want to change the base?

[Serialization] Fix lazy template loading #133057

Uh oh!

Conversation

hahnjo commented Mar 26, 2025

Uh oh!

llvmbot commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChuanqiXu9 commented Mar 26, 2025

Uh oh!

hahnjo commented Mar 26, 2025

Uh oh!

vgvassilev commented Mar 26, 2025

Uh oh!

hahnjo commented Mar 26, 2025

Performance measurements with LLVM

Uh oh!

ilya-biryukov commented Mar 26, 2025

Uh oh!

ChuanqiXu9 commented Mar 27, 2025

Uh oh!

ChuanqiXu9 commented Mar 27, 2025

Uh oh!

hahnjo commented Mar 28, 2025

Uh oh!

vgvassilev commented Mar 28, 2025

Uh oh!

ChuanqiXu9 commented Mar 28, 2025

Uh oh!

ChuanqiXu9 commented Mar 28, 2025

Uh oh!

vgvassilev commented Mar 28, 2025

Uh oh!

ChuanqiXu9 commented Mar 28, 2025

Uh oh!

ilya-biryukov commented Mar 28, 2025

Uh oh!

vgvassilev commented Mar 28, 2025

Uh oh!

emaxx-google commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hahnjo commented Apr 1, 2025

Uh oh!

emaxx-google commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hahnjo commented Apr 1, 2025

Uh oh!

ChuanqiXu9 left a comment

Choose a reason for hiding this comment

Uh oh!

hahnjo commented Jul 4, 2025

Uh oh!

vgvassilev commented Jul 4, 2025

Uh oh!

hahnjo commented Jul 4, 2025

Uh oh!

vgvassilev commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hahnjo commented Aug 13, 2025

Uh oh!

hahnjo commented Aug 18, 2025

Uh oh!

emaxx-google commented Aug 25, 2025

Uh oh!

hahnjo commented Aug 26, 2025

Uh oh!

emaxx-google commented Aug 27, 2025

Uh oh!

hahnjo commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChuanqiXu9 commented Aug 27, 2025

Uh oh!

hahnjo commented Aug 30, 2025

Uh oh!

llvmbot commented Mar 26, 2025 •

edited

Loading

emaxx-google commented Apr 1, 2025 •

edited

Loading

emaxx-google commented Apr 1, 2025 •

edited

Loading

vgvassilev commented Aug 13, 2025 •

edited

Loading

hahnjo commented Aug 27, 2025 •

edited

Loading