[Serialization] Fix lazy template loading #133057

hahnjo · 2025-03-26T09:26:51Z

Hash inner template arguments: The code is applied from ODRHash::AddDecl with the reasoning given in the comment, to reduce collisions. This was particularly visible with STL types templated on std::pair where its template arguments were not taken into account.
Complete only needed partial specializations: It is unclear (to me) why this needs to be done "for safety", but this change significantly improves the effectiveness of lazy loading.
Load only needed partial specializations: Similar as the last commit, it is unclear why we need to load all specializations, including non-partial ones, when we have a TPL.
Remove bail-out logic in TemplateArgumentHasher: While it is correct to assign a single fixed hash to all template
arguments, it can reduce the effectiveness of lazy loading and is not actually needed: we are allowed to ignore parts that cannot be handled because they will be analogously ignored by all hashings.

llvmbot · 2025-03-26T09:27:35Z

@llvm/pr-subscribers-clang-modules

@llvm/pr-subscribers-clang

Author: Jonas Hahnfeld (hahnjo)

Changes

Hash inner template arguments: The code is applied from ODRHash::AddDecl with the reasoning given in the comment, to reduce collisions. This was particularly visible with STL types templated on std::pair where its template arguments were not taken into account.
Complete only needed partial specializations: It is unclear (to me) why this needs to be done "for safety", but this change significantly improves the effectiveness of lazy loading.
Load only needed partial specializations: Similar as the last commit, it is unclear why we need to load all specializations, including non-partial ones, when we have a TPL.
Remove bail-out logic in TemplateArgumentHasher: While it is correct to assign a single fixed hash to all template
arguments, it can reduce the effectiveness of lazy loading and is not actually needed: we are allowed to ignore parts that cannot be handled because they will be analogously ignored by all hashings.

Full diff: https://github.com/llvm/llvm-project/pull/133057.diff

3 Files Affected:

(modified) clang/lib/AST/DeclTemplate.cpp (-6)
(modified) clang/lib/Serialization/ASTReader.cpp (+2-8)
(modified) clang/lib/Serialization/TemplateArgumentHasher.cpp (+18-33)

diff --git a/clang/lib/AST/DeclTemplate.cpp b/clang/lib/AST/DeclTemplate.cpp
index c0f5be51db5f3..8560c3928aa84 100644
--- a/clang/lib/AST/DeclTemplate.cpp
+++ b/clang/lib/AST/DeclTemplate.cpp
@@ -367,12 +367,6 @@ bool RedeclarableTemplateDecl::loadLazySpecializationsImpl(
   if (!ExternalSource)
     return false;
 
-  // If TPL is not null, it implies that we're loading specializations for
-  // partial templates. We need to load all specializations in such cases.
-  if (TPL)
-    return ExternalSource->LoadExternalSpecializations(this->getCanonicalDecl(),
-                                                       /*OnlyPartial=*/false);
-
   return ExternalSource->LoadExternalSpecializations(this->getCanonicalDecl(),
                                                      Args);
 }
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 0cd2cedb48dd9..eb0496c97eb3b 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -7891,14 +7891,8 @@ void ASTReader::CompleteRedeclChain(const Decl *D) {
     }
   }
 
-  if (Template) {
-    // For partitial specialization, load all the specializations for safety.
-    if (isa<ClassTemplatePartialSpecializationDecl,
-            VarTemplatePartialSpecializationDecl>(D))
-      Template->loadLazySpecializationsImpl();
-    else
-      Template->loadLazySpecializationsImpl(Args);
-  }
+  if (Template)
+    Template->loadLazySpecializationsImpl(Args);
 }
 
 CXXCtorInitializer **
diff --git a/clang/lib/Serialization/TemplateArgumentHasher.cpp b/clang/lib/Serialization/TemplateArgumentHasher.cpp
index 3c7177b83ba52..5fb363c4ab148 100644
--- a/clang/lib/Serialization/TemplateArgumentHasher.cpp
+++ b/clang/lib/Serialization/TemplateArgumentHasher.cpp
@@ -21,17 +21,6 @@ using namespace clang;
 namespace {
 
 class TemplateArgumentHasher {
-  // If we bail out during the process of calculating hash values for
-  // template arguments for any reason. We're allowed to do it since
-  // TemplateArgumentHasher are only required to give the same hash value
-  // for the same template arguments, but not required to give different
-  // hash value for different template arguments.
-  //
-  // So in the worst case, it is still a valid implementation to give all
-  // inputs the same BailedOutValue as output.
-  bool BailedOut = false;
-  static constexpr unsigned BailedOutValue = 0x12345678;
-
   llvm::FoldingSetNodeID ID;
 
 public:
@@ -41,14 +30,7 @@ class TemplateArgumentHasher {
 
   void AddInteger(unsigned V) { ID.AddInteger(V); }
 
-  unsigned getValue() {
-    if (BailedOut)
-      return BailedOutValue;
-
-    return ID.computeStableHash();
-  }
-
-  void setBailedOut() { BailedOut = true; }
+  unsigned getValue() { return ID.computeStableHash(); }
 
   void AddType(const Type *T);
   void AddQualType(QualType T);
@@ -92,8 +74,7 @@ void TemplateArgumentHasher::AddTemplateArgument(TemplateArgument TA) {
   case TemplateArgument::Expression:
     // If we meet expression in template argument, it implies
     // that the template is still dependent. It is meaningless
-    // to get a stable hash for the template. Bail out simply.
-    BailedOut = true;
+    // to get a stable hash for the template.
     break;
   case TemplateArgument::Pack:
     AddInteger(TA.pack_size());
@@ -110,10 +91,9 @@ void TemplateArgumentHasher::AddStructuralValue(const APValue &Value) {
 
   // 'APValue::Profile' uses pointer values to make hash for LValue and
   // MemberPointer, but they differ from one compiler invocation to another.
-  // It may be difficult to handle such cases. Bail out simply.
+  // It may be difficult to handle such cases.
 
   if (Kind == APValue::LValue || Kind == APValue::MemberPointer) {
-    BailedOut = true;
     return;
   }
 
@@ -135,14 +115,11 @@ void TemplateArgumentHasher::AddTemplateName(TemplateName Name) {
   case TemplateName::DependentTemplate:
   case TemplateName::SubstTemplateTemplateParm:
   case TemplateName::SubstTemplateTemplateParmPack:
-    BailedOut = true;
     break;
   case TemplateName::UsingTemplate: {
     UsingShadowDecl *USD = Name.getAsUsingShadowDecl();
     if (USD)
       AddDecl(USD->getTargetDecl());
-    else
-      BailedOut = true;
     break;
   }
   case TemplateName::DeducedTemplate:
@@ -167,7 +144,6 @@ void TemplateArgumentHasher::AddDeclarationName(DeclarationName Name) {
   case DeclarationName::ObjCZeroArgSelector:
   case DeclarationName::ObjCOneArgSelector:
   case DeclarationName::ObjCMultiArgSelector:
-    BailedOut = true;
     break;
   case DeclarationName::CXXConstructorName:
   case DeclarationName::CXXDestructorName:
@@ -194,16 +170,29 @@ void TemplateArgumentHasher::AddDeclarationName(DeclarationName Name) {
 void TemplateArgumentHasher::AddDecl(const Decl *D) {
   const NamedDecl *ND = dyn_cast<NamedDecl>(D);
   if (!ND) {
-    BailedOut = true;
     return;
   }
 
   AddDeclarationName(ND->getDeclName());
+
+  // If this was a specialization we should take into account its template
+  // arguments. This helps to reduce collisions coming when visiting template
+  // specialization types (eg. when processing type template arguments).
+  ArrayRef<TemplateArgument> Args;
+  if (auto *CTSD = dyn_cast<ClassTemplateSpecializationDecl>(D))
+    Args = CTSD->getTemplateArgs().asArray();
+  else if (auto *VTSD = dyn_cast<VarTemplateSpecializationDecl>(D))
+    Args = VTSD->getTemplateArgs().asArray();
+  else if (auto *FD = dyn_cast<FunctionDecl>(D))
+    if (FD->getTemplateSpecializationArgs())
+      Args = FD->getTemplateSpecializationArgs()->asArray();
+
+  for (auto &TA : Args)
+    AddTemplateArgument(TA);
 }
 
 void TemplateArgumentHasher::AddQualType(QualType T) {
   if (T.isNull()) {
-    BailedOut = true;
     return;
   }
   SplitQualType split = T.split();
@@ -213,7 +202,6 @@ void TemplateArgumentHasher::AddQualType(QualType T) {
 
 // Process a Type pointer.  Add* methods call back into TemplateArgumentHasher
 // while Visit* methods process the relevant parts of the Type.
-// Any unhandled type will make the hash computation bail out.
 class TypeVisitorHelper : public TypeVisitor<TypeVisitorHelper> {
   typedef TypeVisitor<TypeVisitorHelper> Inherited;
   llvm::FoldingSetNodeID &ID;
@@ -245,9 +233,6 @@ class TypeVisitorHelper : public TypeVisitor<TypeVisitorHelper> {
 
   void Visit(const Type *T) { Inherited::Visit(T); }
 
-  // Unhandled types. Bail out simply.
-  void VisitType(const Type *T) { Hash.setBailedOut(); }
-
   void VisitAdjustedType(const AdjustedType *T) {
     AddQualType(T->getOriginalType());
   }

ChuanqiXu9 · 2025-03-26T09:32:18Z

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

hahnjo · 2025-03-26T09:51:12Z

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing.

vgvassilev · 2025-03-26T09:57:54Z

@ilya-biryukov, would you mind giving this PR a test on your infrastructure and if it works maybe share some performance results?

hahnjo · 2025-03-26T10:07:42Z

Performance measurements with LLVM

I tested these patches for building LLVM itself with modules (LLVM_ENABLE_MODULES=ON). To work around #130795, I apply #131354 before building Clang. In terms of overall performance for the entire build, I'm not able to measure a difference in memory consumption because that is dominated by the linker. The run time performance is very noisy, so it's hard to make accurate statements but it looks unaffected as well.

I did some measurements for individual files, chosen by searching for large object files and excluding generated files. For each version, I first build LLVM completely to populate the module.cache and then delete and rebuild only one object file. Run time performance is not hugely affected, it seems to get slightly faster with this PR.

Maximum resident set size (kbytes) from /usr/bin/time -v:

object file	before*	`main`	this PR
`lib/Analysis/CMakeFiles/LLVMAnalysis.dir/ScalarEvolution.cpp.o`	543100	515184	445784
`lib/Passes/CMakeFiles/LLVMPasses.dir/PassBuilder.cpp.o`	923036	884160	805960
`lib/Transforms/IPO/CMakeFiles/LLVMipo.dir/AttributorAttributes.cpp.o`	639184	600076	522512
`lib/Transforms/Vectorize/CMakeFiles/LLVMVectorize.dir/SLPVectorizer.cpp.o`	876580	857404	776572

before*: reverting fb2c9d9, c5e4afe, 30ea0f0, 20e9049 on current main

Backport upstream PR llvm/llvm-project#133057

ilya-biryukov · 2025-03-26T16:25:51Z

@ilya-biryukov, would you mind giving this PR a test on your infrastructure and if it works maybe share some performance results?

Sure, let me try kicking it off. Note that our infrastructure is much better at detecting the compilations timing out than providing proper benchmarking at scale (there are a few targeted benchmarks too, though).
That means we're good and detecting big regressions, but won't be able to provide very reliable performance measurements.

I'll try to give you what we have, though.

ChuanqiXu9 · 2025-03-27T02:00:46Z

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing.

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

ChuanqiXu9 · 2025-03-27T02:03:19Z

Complete only needed partial specializations: It is unclear (to me) why this needs to be done "for safety", but this change significantly improves the effectiveness of lazy loading.

This comes from the logic: if we have a partial template specialization A<int, T, U> and we need a full specialization for A<int, double, double>, we hope the partial specialization to be loaded

Backport upstream PR llvm/llvm-project#133057

hahnjo · 2025-03-28T09:34:27Z

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

I'm ok with pushing the commits one-by-one after the PR is reviewed, just let me know.

Complete only needed partial specializations: It is unclear (to me) why this needs to be done "for safety", but this change significantly improves the effectiveness of lazy loading.

This comes from the logic: if we have a partial template specialization A<int, T, U> and we need a full specialization for A<int, double, double>, we hope the partial specialization to be loaded

Sure, but in my understanding, that's not needed on the ASTReader side but is taken care of by Sema (?). For the following example:

//--- partial.cppm
export module partial;

export template <typename S, typename T, typename U>
struct Partial {
  static constexpr int Value() { return 0; }
};

export template <typename T, typename U>
struct Partial<int, T, U> {
  static constexpr int Value() { return 1; }
};

//--- partial.cpp
import partial;

static_assert(Partial<int, double, double>::Value() == 1);

(I assume that's what you have in mind?) I see two calls to ASTReader::CompleteRedeclChain (with this PR applied): The first asks for the full instantiation Partial<int, double, double> and regardless of what we load, the answer to the query is that it's not defined yet. The second asks for the partial specialization Partial<int, T, U> and then instantiation proceeds to do the right thing.

vgvassilev · 2025-03-28T09:38:15Z

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing.

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

This is a relatively small patch focused on reducing the round trips to modules deserialization. I see this as an atomic change that if it goes in partially would defeat its purpose. What's the goal of a partial optimization?

ChuanqiXu9 · 2025-03-28T09:40:18Z

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

I'm ok with pushing the commits one-by-one after the PR is reviewed, just let me know.

Complete only needed partial specializations: It is unclear (to me) why this needs to be done "for safety", but this change significantly improves the effectiveness of lazy loading.

This comes from the logic: if we have a partial template specialization A<int, T, U> and we need a full specialization for A<int, double, double>, we hope the partial specialization to be loaded

Sure, but in my understanding, that's not needed on the ASTReader side but is taken care of by Sema (?). For the following example:
//--- partial.cppm
export module partial;

export template <typename S, typename T, typename U>
struct Partial {
  static constexpr int Value() { return 0; }
};

export template <typename T, typename U>
struct Partial<int, T, U> {
  static constexpr int Value() { return 1; }
};

//--- partial.cpp
import partial;

static_assert(Partial<int, double, double>::Value() == 1);
(I assume that's what you have in mind?) I see two calls to ASTReader::CompleteRedeclChain (with this PR applied): The first asks for the full instantiation Partial<int, double, double> and regardless of what we load, the answer to the query is that it's not defined yet. The second asks for the partial specialization Partial<int, T, U> and then instantiation proceeds to do the right thing.

If it works, I feel good with it.

ChuanqiXu9 · 2025-03-28T09:43:08Z

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing.

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

This is a relatively small patch focused on reducing the round trips to modules deserialization. I see this as an atomic change that if it goes in partially would defeat its purpose. What's the goal of a partial optimization?

I think partial optimizations are optimization too. If these codes are not dependent on each other, it should be better to split them.

Given the scale of the patch, it may not be serious problem actually. I still think it is better to land them separately, but if you want to save some typings. I don't feel too bad.

vgvassilev · 2025-03-28T09:46:02Z

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing.

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

This is a relatively small patch focused on reducing the round trips to modules deserialization. I see this as an atomic change that if it goes in partially would defeat its purpose. What's the goal of a partial optimization?

I think partial optimizations are optimization too. If these codes are not dependent on each other, it should be better to split them.

Given the scale of the patch, it may not be serious problem actually. I still think it is better to land them separately, but if you want to save some typings. I don't feel too bad.

Honestly I am more concerned about the tests that @ilya-biryukov is running. As long as they are happy I do not particularly care about commit style. Although it'd be weird to land 40 line patch in many commits :)

ChuanqiXu9 · 2025-03-28T09:53:04Z

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing.

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

This is a relatively small patch focused on reducing the round trips to modules deserialization. I see this as an atomic change that if it goes in partially would defeat its purpose. What's the goal of a partial optimization?

I think partial optimizations are optimization too. If these codes are not dependent on each other, it should be better to split them.
Given the scale of the patch, it may not be serious problem actually. I still think it is better to land them separately, but if you want to save some typings. I don't feel too bad.

Honestly I am more concerned about the tests that @ilya-biryukov is running. As long as they are happy I do not particularly care about commit style. Although it'd be weird to land 40 line patch in many commits :)

I don't feel odd. I remember it is (or was) LLVM's policy that smaller patches are preferred : )

Backport upstream PR llvm/llvm-project#133057

ilya-biryukov · 2025-03-28T12:18:35Z

The small-scale benchmarks we had show 10% improvement in CPU and 23% improvement in memory usage for some compilations!

We did hit one compiler error that does not reproduce without modules, however:
error: use of overloaded operator '=' is ambiguous

We're in the process of getting a small reproducer (please bear with us, it takes some time) that we can share. @emaxx-google is working on it.

vgvassilev · 2025-03-28T12:44:17Z

The small-scale benchmarks we had show 10% improvement in CPU and 23% improvement in memory usage for some compilations!

That's very good news. I think we can further reduce these times. IIRC, we still deserialize declarations that we do not need. One of the places to look is the logic that kicks in when at module loading time:

llvm-project/clang/lib/Serialization/ASTReader.cpp

Line 3426 in 772173f

llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,

We did hit one compiler error that does not reproduce without modules, however: error: use of overloaded operator '=' is ambiguous

Ouch. If that's the only issue on your infrastructure that's probably not so bad.

We're in the process of getting a small reproducer (please bear with us, it takes some time) that we can share. @emaxx-google is working on it.

emaxx-google · 2025-04-01T14:32:48Z

Here's the (almost) minimized reproducer for this error: use of overloaded operator '=' is ambiguous error: https://pastebin.com/Ux7TiQhw . (The minimization tool isn't perfect, we know, but we opted to share this result sooner rather than later.)

UPD: To run the reproducer, first "unpack" the archive into separate files using LLVM's split-file (e.g., split-file repro.txt repro/), then run the makefile: CLANG=path/to/clang make -k -C repro.

hahnjo · 2025-04-01T15:24:06Z

Here's the (almost) minimized reproducer for this error: use of overloaded operator '=' is ambiguous error: https://pastebin.com/Ux7TiQhw . (The minimization tool isn't perfect, we know, but we opted to share this result sooner rather than later.)

UPD: To run the reproducer, first "unpack" the archive into separate files using LLVM's split-file (e.g., split-file repro.txt repro/), then run the makefile: CLANG=path/to/clang make -k -C repro.

Thanks for the efforts! I only had a very quick look and it seems the paste is not complete. For example, head1.h has

class Class1 {
public:

and many other definitions look incomplete as well. Can you check if there was maybe a mistake?

emaxx-google · 2025-04-01T15:28:51Z

Here's the (almost) minimized reproducer for this error: use of overloaded operator '=' is ambiguous error: https://pastebin.com/Ux7TiQhw . (The minimization tool isn't perfect, we know, but we opted to share this result sooner rather than later.)
UPD: To run the reproducer, first "unpack" the archive into separate files using LLVM's split-file (e.g., split-file repro.txt repro/), then run the makefile: CLANG=path/to/clang make -k -C repro.

Thanks for the efforts! I only had a very quick look and it seems the paste is not complete. For example, head1.h has
class Class1 {
public:
and many other definitions look incomplete as well. Can you check if there was maybe a mistake?

That's how it looks like - the minimizer tool (based on C-Reduce/C-Vise) basically works by randomly removing chunks of code, which does often end up with code that looks corrupted. The tool could at least do a better job by merging/inlining unnecessary headers, macros, etc., but at least the output, as shared, should be sufficient to trigger the error in question (error: use of overloaded operator '=' is ambiguous). Let me know whether this works for you.

hahnjo · 2025-04-01T18:10:39Z

I had a closer look, but I get plenty of compile errors already on main - including

./head15.h:20:7: error: use of overloaded operator '=' is ambiguous (with operand types 'std::vector<absl::string_view>' and 'strings_internal::Splitter<typename strings_internal::SelectDelimiter<char>::type, AllowEmpty, std::string>' (aka 'Splitter<char, strings_internal::AllowEmpty, basic_string>'))

I haven't even applied the change in this PR - what am I missing?

alexfh · 2025-11-24T11:37:07Z

Ok, if possible maybe you can keep it running in parallel, we will still need a reproducer as a regression test later on.

The test case is down to ~130MB and 9K files, but still nothing I can share directly.

I've done some manual build graph cleanup and got the makefile in the test case from a few thousand down to a handful of rules. The reduction started running a bit faster since then: the size of inputs is now ~65MB spread across ~4.5K files. Looking at how it goes, I may have a shareable test case by the end of the week.

hahnjo · 2025-11-27T10:44:54Z

I could allocate some time to debugging this, and here's what I got.

Thanks for trying all that!

It looks like it happens as you suspected ("D->hasExternalLexicalStorage() returns false and we skip the call to CompleteType").

Ok, at least the understanding of the problem is progressing.

I tried changing that condition to getExternalSource() && !D->getDefinition(), but it doesn't help - I'm still getting the same assertion:

Too bad, that would have been too easy I guess... Now I wonder if we maybe need to change the order of operations, ie first complete the redecl chain, then notice that we can get the definition externally, and then actually go and do that. Conceptually something like

  // Complete the redecl chain (if necessary).
  D = D->getMostRecentDecl();

  if (D->hasExternalLexicalStorage() && !D->getDefinition())
    getExternalSource()->CompleteType(const_cast<RecordDecl*>(D));

Not sure if that changes anything (at least it doesn't fail Clang tests when applied to current master on its own), but as I said earlier "redecl chains are a bit black magic for me". But if we are getting close to getting a sharable reproducer, I will be able to have a look myself.

Please let me know, if you have any more ideas I could try.

Only if you have more time to spare, the second part of point 4. above, ie which of the commits is the culprit: Is it a single one and everything is fine if you just exclude / revert that one?

alexfh · 2025-11-28T02:37:42Z

I've done some manual build graph cleanup and got the makefile in the test case from a few thousand down to a handful of rules. The reduction started running a bit faster since then: the size of inputs is now ~65MB spread across ~4.5K files. Looking at how it goes, I may have a shareable test case by the end of the week.

10MB and 2.7K files

alexfh · 2025-11-29T02:32:40Z

After manually cleaning up the ~10KB that CVise could squeeze out of the initial 400+MB, I got this:

//--- a.cppmap
module "a" {
header "a.h"
}
//--- b.cppmap
module "b" {
header "b.h"
}
//--- d.cppmap
module "d" {
header "d.h"
}
//--- stl.cppmap
module "stl" {
header "stl.h"
}
//--- a.h
#include "b.h"
namespace {
void a(absl::set<char> c) {
  absl::map<int> a;
  absl::set<int> b;
  c.end();
  c.contains();
}
}  // namespace
//--- b.h
#include "c.h"
void b() { absl::set<char> x; }
//--- c.h
#include "stl.h"
inline namespace internal {
template <typename>
class set_base {
 public:
  struct iterator {
    void u() const;
  };
  iterator end() const { return {}; }
  void contains() const { end().u(); }
  pair<iterator> e();
  template <typename H>
  friend H h();
};
template <typename>
class map_base {
  template <typename H>
  friend H h();
};
}  // namespace internal
namespace absl {
template <typename T>
class set : public set_base<T> {};
template <typename T>
class map : public map_base<T> {};
}  // namespace absl
//--- d.h
#include "c.h"
void d() { absl::set<char> x; }
//--- stl.h
#ifndef _STL_H_
#define _STL_H_
template <class>
struct pair;
#endif
//--- main.cc
#include "c.h"
void f(absl::set<char> o) { o.contains(); }

$ $CLANG -fmodule-name=stl -fmodule-map-file=stl.cppmap -Xclang=-fno-cxx-modules -xc++ -Xclang=-emit-module -fmodules -std=c++20 -c stl.cppmap -o stl.pcm

$ $CLANG -fmodule-name=d -fmodule-map-file=d.cppmap -Xclang=-fno-cxx-modules -xc++ -Xclang=-emit-module -fmodules -fmodule-file=stl.pcm -std=c++20 -c d.cppmap -o d.pcm

$ $CLANG -fmodule-name=b -fmodule-map-file=b.cppmap -Xclang=-fno-cxx-modules -xc++ -Xclang=-emit-module -fmodules -fmodule-file=stl.pcm -std=c++20 -c b.cppmap -o b.pcm

$ $CLANG -fmodule-name=a -fmodule-map-file=a.cppmap -Xclang=-fno-cxx-modules -xc++ -Xclang=-emit-module -fmodules -fmodule-file=stl.pcm -fmodule-file=b.pcm -fmodule-file=d.pcm -std=c++20 -c a.cppmap -o a.pcm

$ $CLANG -Xclang=-fno-cxx-modules -fmodules -fmodule-file=a.pcm -std=c++20 -c main.cc -o main.o
assert.h assertion failed at clang/lib/AST/RecordLayoutBuilder.cpp:3377 in const ASTRecordLayout &clang::ASTContext::getASTRecordLayout(const RecordDecl *) const: D && "Cannot get layout of forward declarations!"
*** Check failure stack trace: ***
    @     0x55e64e1aed24  __assert_fail
    @     0x55e64a2c7646  clang::ASTContext::getASTRecordLayout()
    @     0x55e649c95e48  clang::ASTContext::getTypeInfoImpl()
    @     0x55e649c97297  clang::ASTContext::getTypeInfo()
    @     0x55e649c979bc  clang::ASTContext::getTypeAlignInChars()
    @     0x55e6481188d3  clang::CodeGen::CodeGenFunction::CreateIRTemp()
    @     0x55e6483843bc  clang::CodeGen::CodeGenFunction::StartFunction()
    @     0x55e648386716  clang::CodeGen::CodeGenFunction::GenerateCode()
    @     0x55e6483b20ec  clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition()
    @     0x55e6483a9662  clang::CodeGen::CodeGenModule::EmitGlobalDefinition()
    @     0x55e64839a226  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x55e64839a242  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x55e648396c6e  clang::CodeGen::CodeGenModule::Release()
    @     0x55e6484dc8ae  (anonymous namespace)::CodeGeneratorImpl::HandleTranslationUnit()
...

alexfh · 2025-11-29T02:37:58Z

There's also a number of compilations that became significantly slower (or hit an infinite loop in Clang).

I will try to find time to get to this part too.

hahnjo · 2025-12-01T08:15:03Z

After manually cleaning up the ~10KB that CVise could squeeze out of the initial 400+MB, I got this:

Thanks for your efforts, very useful! That reduced example actually crashes on main already, I opened #170084 (after some more simplifications). If your full internal code is currently compiling without problems, it is likely that this PR "just" exposes the issue which git-bisect says was introduced by 91cdd35 on August 9.

hahnjo · 2025-12-01T09:27:18Z

Ok, this wasn't too bad actually: #170090 fixes the reproducer, so fingers crossed it also addresses the issue in the full-blown internal code 🤞 @alexfh if you have more cycles available, could you test the two PRs together?

alexfh · 2025-12-04T06:22:39Z

Ok, this wasn't too bad actually: #170090 fixes the reproducer, so fingers crossed it also addresses the issue in the full-blown internal code 🤞 @alexfh if you have more cycles available, could you test the two PRs together?

With both PRs I'm getting at least this Clang assertion error on some of our code. There's more, and I'll try to create a reproducer for this one, but maybe the stack trace is enough to figure out what's wrong:

assert.h assertion failed at clang/include/clang/AST/Redeclarable.h:262 in redecl_iterator &clang::Redeclarable<clang::TagDecl>::redecl_iterator::operator++() [de
cl_type = clang::TagDecl]: 0 && "Passed first decl twice, invalid redecl chain!"
    @     0x561251408044  __assert_fail
    @     0x56124c214b91  clang::ASTDeclMerger::MergeDefinitionData()
    @     0x56124c21722e  clang::ASTDeclReader::VisitClassTemplateSpecializationDeclImpl()
    @     0x56124c207953  clang::declvisitor::Base<>::Visit()
    @     0x56124c207135  clang::ASTDeclReader::Visit()
    @     0x56124d69ddcf  clang::StackExhaustionHandler::runWithSufficientStackSpace()
    @     0x56124c22e7f1  clang::ASTReader::ReadDeclRecord()
    @     0x56124c1aa595  clang::ASTReader::GetDecl()
    @     0x56124c237639  clang::ASTReader::ReadDeclAs<>()
    @     0x56124c20800b  clang::ASTDeclReader::VisitDecl()
    @     0x56124c209e52  clang::ASTDeclReader::VisitValueDecl()
    @     0x56124c20a175  clang::ASTDeclReader::VisitDeclaratorDecl()
    @     0x56124c20ae6a  clang::ASTDeclReader::VisitFunctionDecl()
    @     0x56124c2152b6  clang::ASTDeclReader::VisitCXXMethodDecl()
    @     0x56124c207135  clang::ASTDeclReader::Visit()
    @     0x56124d69ddcf  clang::StackExhaustionHandler::runWithSufficientStackSpace()
    @     0x56124c22e7f1  clang::ASTReader::ReadDeclRecord()
    @     0x56124c1aa595  clang::ASTReader::GetDecl()
    @     0x56124c24af78  clang::ASTStmtReader::VisitMemberExpr()
    @     0x56124c25b2c1  clang::ASTReader::ReadStmtFromStream()
    @     0x56124c1b585d  clang::ASTReader::GetExternalDeclStmt()
    @     0x56124d1f8839  clang::FunctionDecl::getBody()
    @     0x56124d2053f1  clang::FunctionDecl::getBody()
    @     0x56124b52493d  clang::CodeGen::CodeGenFunction::GenerateCode()
    @     0x56124b55084c  clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition()
    @     0x56124b547ea2  clang::CodeGen::CodeGenModule::EmitGlobalDefinition()
    @     0x56124b538856  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x56124b538872  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x56124b538872  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x56124b538872  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x56124b538872  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x56124b538872  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x56124b538872  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x56124b538872  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x56124b5351ee  clang::CodeGen::CodeGenModule::Release()
    @     0x56124b67acee  (anonymous namespace)::CodeGeneratorImpl::HandleTranslationUnit()
    @     0x56124b1d3b9a  clang::BackendConsumer::HandleTranslationUnit()
    @     0x56124c077788  clang::ParseAST()
    @     0x56124bdad5aa  clang::FrontendAction::Execute()
    @     0x56124bd2145d  clang::CompilerInstance::ExecuteAction()
    @     0x56124b1d316e  clang::ExecuteCompilerInvocation()
    @     0x56124b1c65ea  cc1_main()
    @     0x56124b1c3619  ExecuteCC1Tool()
    @     0x56124b1c5dcc  llvm::function_ref<>::callback_fn<>()
    @     0x56124bee291e  llvm::function_ref<>::callback_fn<>()
    @     0x56125108241c  llvm::CrashRecoveryContext::RunSafely()
    @     0x56124bee1e04  clang::driver::CC1Command::Execute()
    @     0x56124be9f453  clang::driver::Compilation::ExecuteCommand()
    @     0x56124be9f6df  clang::driver::Compilation::ExecuteJobs()
    @     0x56124beb9dc0  clang::driver::Driver::ExecuteCompilation()
    @     0x56124b1c2c1e  clang_main()
    @     0x56124b1c0f74  main

hahnjo · 2025-12-04T08:56:34Z

With both PRs I'm getting at least this Clang assertion error on some of our code. There's more, and I'll try to create a reproducer for this one, but maybe the stack trace is enough to figure out what's wrong:

assert.h assertion failed at clang/include/clang/AST/Redeclarable.h:262 in redecl_iterator &clang::Redeclarable<clang::TagDecl>::redecl_iterator::operator++() [de
cl_type = clang::TagDecl]: 0 && "Passed first decl twice, invalid redecl chain!"
    @     0x561251408044  __assert_fail
    @     0x56124c214b91  clang::ASTDeclMerger::MergeDefinitionData()
    @     0x56124c21722e  clang::ASTDeclReader::VisitClassTemplateSpecializationDeclImpl()
    @     0x56124c207953  clang::declvisitor::Base<>::Visit()
    @     0x56124c207135  clang::ASTDeclReader::Visit()
    @     0x56124d69ddcf  clang::StackExhaustionHandler::runWithSufficientStackSpace()

Thanks for testing! "Luckily" it's the exact code that I touched in MergeDefinitionData in #170090. I guess it comes down to @ChuanqiXu9's comment in #170090 (comment) that redecls() can trigger deserialization and that may invalidate the iterator (?). Unfortunately, I don't have an easy idea to fix it - going back to the old code and just calling getMostRecentDecl at the beginning doesn't pass the Modules/GH170084.cpp test case...

ChuanqiXu9 · 2025-12-04T09:02:19Z

With both PRs I'm getting at least this Clang assertion error on some of our code. There's more, and I'll try to create a reproducer for this one, but maybe the stack trace is enough to figure out what's wrong:
assert.h assertion failed at clang/include/clang/AST/Redeclarable.h:262 in redecl_iterator &clang::Redeclarable<clang::TagDecl>::redecl_iterator::operator++() [de
cl_type = clang::TagDecl]: 0 && "Passed first decl twice, invalid redecl chain!"
    @     0x561251408044  __assert_fail
    @     0x56124c214b91  clang::ASTDeclMerger::MergeDefinitionData()
    @     0x56124c21722e  clang::ASTDeclReader::VisitClassTemplateSpecializationDeclImpl()
    @     0x56124c207953  clang::declvisitor::Base<>::Visit()
    @     0x56124c207135  clang::ASTDeclReader::Visit()
    @     0x56124d69ddcf  clang::StackExhaustionHandler::runWithSufficientStackSpace()
Thanks for testing! "Luckily" it's the exact code that I touched in MergeDefinitionData in #170090. I guess it comes down to @ChuanqiXu9's comment in #170090 (comment) that redecls() can trigger deserialization and that may invalidate the iterator (?). Unfortunately, I don't have an easy idea to fix it - going back to the old code and just calling getMostRecentDecl at the beginning doesn't pass the Modules/GH170084.cpp test case...

I feel it may not be too hard to implement a noload_redecls(). You can get the most recent decl without calling getMostRecentDecl () (that will trigger the deserialization of all redecls, I hate it) in Redeclarable.h internally. And you can simply traverse the redecls.

hahnjo · 2025-12-04T09:11:19Z

Unfortunately, I don't have an easy idea to fix it - going back to the old code and just calling getMostRecentDecl at the beginning doesn't pass the Modules/GH170084.cpp test case...

I feel it may not be too hard to implement a noload_redecls(). You can get the most recent decl without calling getMostRecentDecl () (that will trigger the deserialization of all redecls, I hate it) in Redeclarable.h internally. And you can simply traverse the redecls.

As I hinted above, that doesn't work. More specifically, I tried

diff --git a/clang/lib/Serialization/ASTReaderDecl.cpp b/clang/lib/Serialization/ASTReaderDecl.cpp
index 882d54f31280..d63824c4ba18 100644
--- a/clang/lib/Serialization/ASTReaderDecl.cpp
+++ b/clang/lib/Serialization/ASTReaderDecl.cpp
@@ -2107,8 +2107,9 @@ void ASTDeclMerger::MergeDefinitionData(
     auto *Def = DD.Definition;
     DD = std::move(MergeDD);
     DD.Definition = Def;
-    for (auto *D : Def->redecls())
-      cast<CXXRecordDecl>(D)->DefinitionData = &DD;
+    Def = Def->getMostRecentDecl();
+    while ((Def = Def->getPreviousDecl()))
+      cast<CXXRecordDecl>(Def)->DefinitionData = &DD;
     return;
   }

ChuanqiXu9 · 2025-12-04T09:13:28Z

Unfortunately, I don't have an easy idea to fix it - going back to the old code and just calling getMostRecentDecl at the beginning doesn't pass the Modules/GH170084.cpp test case...

I feel it may not be too hard to implement a noload_redecls(). You can get the most recent decl without calling getMostRecentDecl () (that will trigger the deserialization of all redecls, I hate it) in Redeclarable.h internally. And you can simply traverse the redecls.

As I hinted above, that doesn't work. More specifically, I tried
diff --git a/clang/lib/Serialization/ASTReaderDecl.cpp b/clang/lib/Serialization/ASTReaderDecl.cpp
index 882d54f31280..d63824c4ba18 100644
--- a/clang/lib/Serialization/ASTReaderDecl.cpp
+++ b/clang/lib/Serialization/ASTReaderDecl.cpp
@@ -2107,8 +2107,9 @@ void ASTDeclMerger::MergeDefinitionData(
     auto *Def = DD.Definition;
     DD = std::move(MergeDD);
     DD.Definition = Def;
-    for (auto *D : Def->redecls())
-      cast<CXXRecordDecl>(D)->DefinitionData = &DD;
+    Def = Def->getMostRecentDecl();
+    while ((Def = Def->getPreviousDecl()))
+      cast<CXXRecordDecl>(Def)->DefinitionData = &DD;
     return;
   }
 

You can't use getMostRecentDecl , this will trigger deserializations. Then it is the same with before. You need to implement noload_redecls(), possibly in Redeclarable.h

hahnjo · 2025-12-04T10:17:34Z

As I hinted above, that doesn't work. More specifically, I tried

diff --git a/clang/lib/Serialization/ASTReaderDecl.cpp b/clang/lib/Serialization/ASTReaderDecl.cpp
index 882d54f31280..d63824c4ba18 100644
--- a/clang/lib/Serialization/ASTReaderDecl.cpp
+++ b/clang/lib/Serialization/ASTReaderDecl.cpp
@@ -2107,8 +2107,9 @@ void ASTDeclMerger::MergeDefinitionData(
     auto *Def = DD.Definition;
     DD = std::move(MergeDD);
     DD.Definition = Def;
-    for (auto *D : Def->redecls())
-      cast<CXXRecordDecl>(D)->DefinitionData = &DD;
+    Def = Def->getMostRecentDecl();
+    while ((Def = Def->getPreviousDecl()))
+      cast<CXXRecordDecl>(Def)->DefinitionData = &DD;
     return;
   }

I'm stupid, the condition during the first iteration of the while loop will skip over the most recent declaration 😵 the following works:

--- a/clang/lib/Serialization/ASTReaderDecl.cpp
+++ b/clang/lib/Serialization/ASTReaderDecl.cpp
@@ -2107,8 +2107,10 @@ void ASTDeclMerger::MergeDefinitionData(
     auto *Def = DD.Definition;
     DD = std::move(MergeDD);
     DD.Definition = Def;
-    for (auto *D : Def->redecls())
-      cast<CXXRecordDecl>(D)->DefinitionData = &DD;
+    Def = Def->getMostRecentDecl();
+    do {
+      cast<CXXRecordDecl>(Def)->DefinitionData = &DD;
+    } while ((Def = Def->getPreviousDecl()));
     return;
   }

You can't use getMostRecentDecl , this will trigger deserializations. Then it is the same with before. You need to implement noload_redecls(), possibly in Redeclarable.h

If I understand correctly, we could implement this locally in ASTReaderDecl.cpp, matching a number of patterns in ASTReader::finishPendingActions():

diff --git a/clang/lib/Serialization/ASTReaderDecl.cpp b/clang/lib/Serialization/ASTReaderDecl.cpp
index 882d54f31280..ec7c2449d77e 100644
--- a/clang/lib/Serialization/ASTReaderDecl.cpp
+++ b/clang/lib/Serialization/ASTReaderDecl.cpp
@@ -2107,8 +2107,10 @@ void ASTDeclMerger::MergeDefinitionData(
     auto *Def = DD.Definition;
     DD = std::move(MergeDD);
     DD.Definition = Def;
-    for (auto *D : Def->redecls())
-      cast<CXXRecordDecl>(D)->DefinitionData = &DD;
+    // FIXME: this duplicates logic from ASTReader::finishPendingActions()...
+    for (auto *R = Reader.getMostRecentExistingDecl(Def); R;
+         R = R->getPreviousDecl())
+      cast<CXXRecordDecl>(R)->DefinitionData = &DD;
     return;
   }

... which then of course begs the question why the existing code in ASTReader::finishPendingActions() doesn't already take care of it?

Another possibility is to use merged_decls():

diff --git a/clang/lib/Serialization/ASTReaderDecl.cpp b/clang/lib/Serialization/ASTReaderDecl.cpp
index 882d54f31280..1112c64eeb02 100644
--- a/clang/lib/Serialization/ASTReaderDecl.cpp
+++ b/clang/lib/Serialization/ASTReaderDecl.cpp
@@ -2107,8 +2107,8 @@ void ASTDeclMerger::MergeDefinitionData(
     auto *Def = DD.Definition;
     DD = std::move(MergeDD);
     DD.Definition = Def;
-    for (auto *D : Def->redecls())
-      cast<CXXRecordDecl>(D)->DefinitionData = &DD;
+    for (auto *D : merged_redecls(Def))
+      D->DefinitionData = &DD;
     return;
   }

Which of these alternatives is preferable (assuming one of them actually works for @alexfh)?

ChuanqiXu9 · 2025-12-05T01:52:29Z

As I hinted above, that doesn't work. More specifically, I tried

diff --git a/clang/lib/Serialization/ASTReaderDecl.cpp b/clang/lib/Serialization/ASTReaderDecl.cpp
index 882d54f31280..d63824c4ba18 100644
--- a/clang/lib/Serialization/ASTReaderDecl.cpp
+++ b/clang/lib/Serialization/ASTReaderDecl.cpp
@@ -2107,8 +2107,9 @@ void ASTDeclMerger::MergeDefinitionData(
     auto *Def = DD.Definition;
     DD = std::move(MergeDD);
     DD.Definition = Def;
-    for (auto *D : Def->redecls())
-      cast<CXXRecordDecl>(D)->DefinitionData = &DD;
+    Def = Def->getMostRecentDecl();
+    while ((Def = Def->getPreviousDecl()))
+      cast<CXXRecordDecl>(Def)->DefinitionData = &DD;
     return;
   }

I'm stupid, the condition during the first iteration of the while loop will skip over the most recent declaration 😵 the following works:

--- a/clang/lib/Serialization/ASTReaderDecl.cpp
+++ b/clang/lib/Serialization/ASTReaderDecl.cpp
@@ -2107,8 +2107,10 @@ void ASTDeclMerger::MergeDefinitionData(
     auto *Def = DD.Definition;
     DD = std::move(MergeDD);
     DD.Definition = Def;
-    for (auto *D : Def->redecls())
-      cast<CXXRecordDecl>(D)->DefinitionData = &DD;
+    Def = Def->getMostRecentDecl();
+    do {
+      cast<CXXRecordDecl>(Def)->DefinitionData = &DD;
+    } while ((Def = Def->getPreviousDecl()));
     return;
   }

You can't use getMostRecentDecl , this will trigger deserializations. Then it is the same with before. You need to implement noload_redecls(), possibly in Redeclarable.h

If I understand correctly, we could implement this locally in ASTReaderDecl.cpp, matching a number of patterns in ASTReader::finishPendingActions():

diff --git a/clang/lib/Serialization/ASTReaderDecl.cpp b/clang/lib/Serialization/ASTReaderDecl.cpp
index 882d54f31280..ec7c2449d77e 100644
--- a/clang/lib/Serialization/ASTReaderDecl.cpp
+++ b/clang/lib/Serialization/ASTReaderDecl.cpp
@@ -2107,8 +2107,10 @@ void ASTDeclMerger::MergeDefinitionData(
     auto *Def = DD.Definition;
     DD = std::move(MergeDD);
     DD.Definition = Def;
-    for (auto *D : Def->redecls())
-      cast<CXXRecordDecl>(D)->DefinitionData = &DD;
+    // FIXME: this duplicates logic from ASTReader::finishPendingActions()...
+    for (auto *R = Reader.getMostRecentExistingDecl(Def); R;
+         R = R->getPreviousDecl())
+      cast<CXXRecordDecl>(R)->DefinitionData = &DD;
     return;
   }

... which then of course begs the question why the existing code in ASTReader::finishPendingActions() doesn't already take care of it?

Another possibility is to use merged_decls():

diff --git a/clang/lib/Serialization/ASTReaderDecl.cpp b/clang/lib/Serialization/ASTReaderDecl.cpp
index 882d54f31280..1112c64eeb02 100644
--- a/clang/lib/Serialization/ASTReaderDecl.cpp
+++ b/clang/lib/Serialization/ASTReaderDecl.cpp
@@ -2107,8 +2107,8 @@ void ASTDeclMerger::MergeDefinitionData(
     auto *Def = DD.Definition;
     DD = std::move(MergeDD);
     DD.Definition = Def;
-    for (auto *D : Def->redecls())
-      cast<CXXRecordDecl>(D)->DefinitionData = &DD;
+    for (auto *D : merged_redecls(Def))
+      D->DefinitionData = &DD;
     return;
   }

Which of these alternatives is preferable (assuming one of them actually works for @alexfh)?

I feel they are workarounds. The noload_redecls() new interface may be the cleanest solution.

hahnjo · 2025-12-05T07:47:39Z

I feel they are workarounds. The noload_redecls() new interface may be the cleanest solution.

For the first approach, I can potentially agree, even though we are already deserializing with the current code so it can hardly be worse than that. For the other two, can you please elaborate why you consider them workarounds? As I pointed out, they are patterns that are already used in ASTReader.cpp and ASTReaderDecl.cpp, for precisely the reasons we are running into now. Also my understanding is that getMostRecentExistingDecl is implementation wise exactly what you have in mind for noload_redecls(), minus the iterator interface that I personally don't want to spend effort on.

ChuanqiXu9 · 2025-12-05T07:58:54Z

I feel they are workarounds. The noload_redecls() new interface may be the cleanest solution.

For the first approach, I can potentially agree, even though we are already deserializing with the current code so it can hardly be worse than that. For the other two, can you please elaborate why you consider them workarounds? As I pointed out, they are patterns that are already used in ASTReader.cpp and ASTReaderDecl.cpp, for precisely the reasons we are running into now.

Maybe we have understanding for workarounds, but that doesn't matter. noload_redecls is pretty clear than the patterns you mentioned.

Also my understanding is that getMostRecentExistingDecl is implementation wise exactly what you have in mind for noload_redecls(), minus the iterator interface that I personally don't want to spend effort on.

Yeah, if you already have this, why do you think it is burden to implement noload_redecls()? You've already reached it!

hahnjo · 2025-12-05T08:02:30Z

Yeah, if you already have this, why do you think it is burden to implement noload_redecls()? You've already reached it!

Because it's a local implementation that works in ASTReader. I don't want to think about the more general picture of a function that is usable in all of Clang (that will need rounds of review), nor fight with C++ iterators.

ChuanqiXu9 · 2025-12-05T08:11:01Z

Yeah, if you already have this, why do you think it is burden to implement noload_redecls()? You've already reached it!

Because it's a local implementation that works in ASTReader. I don't want to think about the more general picture of a function that is usable in all of Clang (that will need rounds of review), nor fight with C++ iterators.

But getPreviousDecl may trigger deserialization too. Please look at getPrevious() in Redeclarable::DeclLink::getPrevious. I don't feel it is hard or you need to fight any thing. We already have getLatestNotUpdated() and we can implement getPreviousNotUpdated() similarly, then we're almost done. If we're tired of invalid iterations, we can use make_early_inc_range. HDYT?

The patch have already been stalled by 8 months. I feel all of us are patient enough.

hahnjo · 2025-12-05T08:14:25Z

Let me say it clearly: I will not spend time on it, if I've demonstrated that existing patterns would solve the problem. We have lazy template loading working locally since years, since March with the upstream implementation including these fixes. I think it would be worthwhile to have it fixed upstream because right now it's basically useless, but there's a limit to my involvement.

hahnjo · 2025-12-05T08:15:30Z

I don't feel it is hard or you need to fight any thing.

I invite you to submit a patch. Once that passed review and is confirmed working with the Google internal code base, I can come back to this PR.

ChuanqiXu9 · 2025-12-05T08:17:15Z

I don't feel it is hard or you need to fight any thing.

I invite you to submit a patch. Once that passed review and is confirmed working with the Google internal code base, I can come back to this PR.

I can do it. But I don't think that need to be tested by google. It is just an new API. And the testing process is too slow.

hahnjo · 2025-12-05T08:22:41Z

But I don't think that need to be tested by google. It is just an new API.

To clarify, what needs testing is the new API plus this PR on top to see if it addresses the problem reported by @alexfh, and if there are others afterwards.

ChuanqiXu9 · 2025-12-05T09:22:36Z

Sent #170823

@alexfh please test this with this PR.

vgvassilev · 2025-12-05T15:40:29Z

I'd feel uncomfortable exposing a noload_redecls interface because it can potentially break the AST invariants. I think allowing non-modules developers to choose when to complete a redeclaration chain will introduce more bugs than it can fix ;)

I'd propose to move forward with getMostRecentExistingDecl if it does the job. We can have a separate PR discussing the ergonomics if there is a strong feeling about going this way.

ChuanqiXu9 · 2025-12-08T08:19:35Z

I'd feel uncomfortable exposing a noload_redecls interface because it can potentially break the AST invariants. I think allowing non-modules developers to choose when to complete a redeclaration chain will introduce more bugs than it can fix ;)

I'd propose to move forward with getMostRecentExistingDecl if it does the job. We can have a separate PR discussing the ergonomics if there is a strong feeling about going this way.

I'd like it as:
(1) We already have noload_lookup and other similar "unpublic" API (e.g., getOwningModuleID invalidateCachedLinkage) in DeclBase.h.
(2) Merging Redecls is a major performance hurting point of using modules.
(3) In the past, there are many "fix" which just tried to mitigate the bug by loading all redecls without understanding the real problem.

alexfh · 2025-12-09T03:14:29Z

Sent #170823

@alexfh please test this with this PR.

I've started testing of #170090 + #170823 + this PR. I'll get back with the results in a couple of days.

vgvassilev · 2025-12-11T09:58:15Z

I'd feel uncomfortable exposing a noload_redecls interface because it can potentially break the AST invariants. I think allowing non-modules developers to choose when to complete a redeclaration chain will introduce more bugs than it can fix ;)
I'd propose to move forward with getMostRecentExistingDecl if it does the job. We can have a separate PR discussing the ergonomics if there is a strong feeling about going this way.

I'd like it as: (1) We already have noload_lookup and other similar "unpublic" API (e.g., getOwningModuleID invalidateCachedLinkage) in DeclBase.h.

Well, sure, noload_lookup is done to check what's the current state before completing something externally. In fact, I think we should keep these interfaces at minimum and only use them where we really have no other way out.

(2) Merging Redecls is a major performance hurting point of using modules.

Only when the modules approach is not bottom up. The non-bottom up approach has bigger problems than performance at the moment and frankly we cannot reasonably support it right now...

(3) In the past, there are many "fix" which just tried to mitigate the bug by loading all redecls without understanding the real problem.

Does that need to happen in this PR? I believe it brings down the peak memory significantly for most of the module users and we should not wait too long to get this merged.

ChuanqiXu9 · 2025-12-11T10:54:46Z

I'd feel uncomfortable exposing a noload_redecls interface because it can potentially break the AST invariants. I think allowing non-modules developers to choose when to complete a redeclaration chain will introduce more bugs than it can fix ;)
I'd propose to move forward with getMostRecentExistingDecl if it does the job. We can have a separate PR discussing the ergonomics if there is a strong feeling about going this way.

I'd like it as: (1) We already have noload_lookup and other similar "unpublic" API (e.g., getOwningModuleID invalidateCachedLinkage) in DeclBase.h.

Well, sure, noload_lookup is done to check what's the current state before completing something externally. In fact, I think we should keep these interfaces at minimum and only use them where we really have no other way out.

I feel it sounds over nervous. It is not that bad. And the interfaces will look very consistency. We add the interface doesn't mean we ask programmers to use that. It is simply an interface doing the natural things.

(2) Merging Redecls is a major performance hurting point of using modules.

Only when the modules approach is not bottom up. The non-bottom up approach has bigger problems than performance at the moment and frankly we cannot reasonably support it right now...

Even with the bottom up approach, we may face the merging problems due to generated decls (e.g, template instantiations) but just mitigates the problem.

And for C++20 named modules, if we don't export macros (which a lot of people are not in favor), the user may have to face that problem in practice.

For that I sent https://discourse.llvm.org/t/request-for-discussion-clang-a-bidirectional-decl-query-system/88945 and noload_decls is necessary for that.

(3) In the past, there are many "fix" which just tried to mitigate the bug by loading all redecls without understanding the real problem.

Does that need to happen in this PR? I believe it brings down the peak memory significantly for most of the module users and we should not wait too long to get this merged.

If the testing results shows the current failure in google side shows it is needed, yes it is needed for the PR. Otherwise, it is not.

hahnjo requested review from ChuanqiXu9, ilya-biryukov and vgvassilev March 26, 2025 09:26

llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules labels Mar 26, 2025

hahnjo added a commit to devajithvs/root that referenced this pull request Mar 26, 2025

[llvm-project] Fix lazy template loading

77dcc83

Backport upstream PR llvm/llvm-project#133057

hahnjo added a commit to devajithvs/root that referenced this pull request Mar 26, 2025

[llvm-project] Fix lazy template loading

f4e61d3

Backport upstream PR llvm/llvm-project#133057

hahnjo added a commit to devajithvs/root that referenced this pull request Mar 28, 2025

[llvm-project] Fix lazy template loading

599e306

Backport upstream PR llvm/llvm-project#133057

hahnjo added a commit to devajithvs/root that referenced this pull request Mar 28, 2025

[llvm-project] Fix lazy template loading

1b65dcd

Backport upstream PR llvm/llvm-project#133057

hahnjo mentioned this pull request Dec 1, 2025

[clang][modules] Cannot get layout of forward declarations with type in template scope #170084

Closed

hahnjo mentioned this pull request Dec 1, 2025

[clang] Propagate definition data to all redecls #170090

Merged

[Serialization] Fix lazy template loading #133057

Are you sure you want to change the base?

[Serialization] Fix lazy template loading #133057

Uh oh!

Conversation

hahnjo commented Mar 26, 2025

Uh oh!

llvmbot commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChuanqiXu9 commented Mar 26, 2025

Uh oh!

hahnjo commented Mar 26, 2025

Uh oh!

vgvassilev commented Mar 26, 2025

Uh oh!

hahnjo commented Mar 26, 2025

Performance measurements with LLVM

Uh oh!

ilya-biryukov commented Mar 26, 2025

Uh oh!

ChuanqiXu9 commented Mar 27, 2025

Uh oh!

ChuanqiXu9 commented Mar 27, 2025

Uh oh!

hahnjo commented Mar 28, 2025

Uh oh!

vgvassilev commented Mar 28, 2025

Uh oh!

ChuanqiXu9 commented Mar 28, 2025

Uh oh!

ChuanqiXu9 commented Mar 28, 2025

Uh oh!

vgvassilev commented Mar 28, 2025

Uh oh!

ChuanqiXu9 commented Mar 28, 2025

Uh oh!

ilya-biryukov commented Mar 28, 2025

Uh oh!

vgvassilev commented Mar 28, 2025

Uh oh!

emaxx-google commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hahnjo commented Apr 1, 2025

Uh oh!

emaxx-google commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hahnjo commented Apr 1, 2025

Uh oh!

alexfh commented Nov 24, 2025

Uh oh!

hahnjo commented Nov 27, 2025

Uh oh!

alexfh commented Nov 28, 2025

Uh oh!

alexfh commented Nov 29, 2025

Uh oh!

alexfh commented Nov 29, 2025

Uh oh!

hahnjo commented Dec 1, 2025

Uh oh!

hahnjo commented Dec 1, 2025

Uh oh!

alexfh commented Dec 4, 2025

Uh oh!

hahnjo commented Dec 4, 2025

Uh oh!

ChuanqiXu9 commented Dec 4, 2025

Uh oh!

hahnjo commented Dec 4, 2025

Uh oh!

ChuanqiXu9 commented Dec 4, 2025

Uh oh!

hahnjo commented Dec 4, 2025

Uh oh!

ChuanqiXu9 commented Dec 5, 2025

Uh oh!

hahnjo commented Dec 5, 2025

llvmbot commented Mar 26, 2025 •

edited

Loading

emaxx-google commented Apr 1, 2025 •

edited

Loading

emaxx-google commented Apr 1, 2025 •

edited

Loading

vgvassilev commented Dec 5, 2025 •

edited

Loading