-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[Serialization] Fix lazy template loading #133057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@llvm/pr-subscribers-clang-modules @llvm/pr-subscribers-clang Author: Jonas Hahnfeld (hahnjo) Changes
Full diff: https://github.com/llvm/llvm-project/pull/133057.diff 3 Files Affected:
diff --git a/clang/lib/AST/DeclTemplate.cpp b/clang/lib/AST/DeclTemplate.cpp
index c0f5be51db5f3..8560c3928aa84 100644
--- a/clang/lib/AST/DeclTemplate.cpp
+++ b/clang/lib/AST/DeclTemplate.cpp
@@ -367,12 +367,6 @@ bool RedeclarableTemplateDecl::loadLazySpecializationsImpl(
if (!ExternalSource)
return false;
- // If TPL is not null, it implies that we're loading specializations for
- // partial templates. We need to load all specializations in such cases.
- if (TPL)
- return ExternalSource->LoadExternalSpecializations(this->getCanonicalDecl(),
- /*OnlyPartial=*/false);
-
return ExternalSource->LoadExternalSpecializations(this->getCanonicalDecl(),
Args);
}
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 0cd2cedb48dd9..eb0496c97eb3b 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -7891,14 +7891,8 @@ void ASTReader::CompleteRedeclChain(const Decl *D) {
}
}
- if (Template) {
- // For partitial specialization, load all the specializations for safety.
- if (isa<ClassTemplatePartialSpecializationDecl,
- VarTemplatePartialSpecializationDecl>(D))
- Template->loadLazySpecializationsImpl();
- else
- Template->loadLazySpecializationsImpl(Args);
- }
+ if (Template)
+ Template->loadLazySpecializationsImpl(Args);
}
CXXCtorInitializer **
diff --git a/clang/lib/Serialization/TemplateArgumentHasher.cpp b/clang/lib/Serialization/TemplateArgumentHasher.cpp
index 3c7177b83ba52..5fb363c4ab148 100644
--- a/clang/lib/Serialization/TemplateArgumentHasher.cpp
+++ b/clang/lib/Serialization/TemplateArgumentHasher.cpp
@@ -21,17 +21,6 @@ using namespace clang;
namespace {
class TemplateArgumentHasher {
- // If we bail out during the process of calculating hash values for
- // template arguments for any reason. We're allowed to do it since
- // TemplateArgumentHasher are only required to give the same hash value
- // for the same template arguments, but not required to give different
- // hash value for different template arguments.
- //
- // So in the worst case, it is still a valid implementation to give all
- // inputs the same BailedOutValue as output.
- bool BailedOut = false;
- static constexpr unsigned BailedOutValue = 0x12345678;
-
llvm::FoldingSetNodeID ID;
public:
@@ -41,14 +30,7 @@ class TemplateArgumentHasher {
void AddInteger(unsigned V) { ID.AddInteger(V); }
- unsigned getValue() {
- if (BailedOut)
- return BailedOutValue;
-
- return ID.computeStableHash();
- }
-
- void setBailedOut() { BailedOut = true; }
+ unsigned getValue() { return ID.computeStableHash(); }
void AddType(const Type *T);
void AddQualType(QualType T);
@@ -92,8 +74,7 @@ void TemplateArgumentHasher::AddTemplateArgument(TemplateArgument TA) {
case TemplateArgument::Expression:
// If we meet expression in template argument, it implies
// that the template is still dependent. It is meaningless
- // to get a stable hash for the template. Bail out simply.
- BailedOut = true;
+ // to get a stable hash for the template.
break;
case TemplateArgument::Pack:
AddInteger(TA.pack_size());
@@ -110,10 +91,9 @@ void TemplateArgumentHasher::AddStructuralValue(const APValue &Value) {
// 'APValue::Profile' uses pointer values to make hash for LValue and
// MemberPointer, but they differ from one compiler invocation to another.
- // It may be difficult to handle such cases. Bail out simply.
+ // It may be difficult to handle such cases.
if (Kind == APValue::LValue || Kind == APValue::MemberPointer) {
- BailedOut = true;
return;
}
@@ -135,14 +115,11 @@ void TemplateArgumentHasher::AddTemplateName(TemplateName Name) {
case TemplateName::DependentTemplate:
case TemplateName::SubstTemplateTemplateParm:
case TemplateName::SubstTemplateTemplateParmPack:
- BailedOut = true;
break;
case TemplateName::UsingTemplate: {
UsingShadowDecl *USD = Name.getAsUsingShadowDecl();
if (USD)
AddDecl(USD->getTargetDecl());
- else
- BailedOut = true;
break;
}
case TemplateName::DeducedTemplate:
@@ -167,7 +144,6 @@ void TemplateArgumentHasher::AddDeclarationName(DeclarationName Name) {
case DeclarationName::ObjCZeroArgSelector:
case DeclarationName::ObjCOneArgSelector:
case DeclarationName::ObjCMultiArgSelector:
- BailedOut = true;
break;
case DeclarationName::CXXConstructorName:
case DeclarationName::CXXDestructorName:
@@ -194,16 +170,29 @@ void TemplateArgumentHasher::AddDeclarationName(DeclarationName Name) {
void TemplateArgumentHasher::AddDecl(const Decl *D) {
const NamedDecl *ND = dyn_cast<NamedDecl>(D);
if (!ND) {
- BailedOut = true;
return;
}
AddDeclarationName(ND->getDeclName());
+
+ // If this was a specialization we should take into account its template
+ // arguments. This helps to reduce collisions coming when visiting template
+ // specialization types (eg. when processing type template arguments).
+ ArrayRef<TemplateArgument> Args;
+ if (auto *CTSD = dyn_cast<ClassTemplateSpecializationDecl>(D))
+ Args = CTSD->getTemplateArgs().asArray();
+ else if (auto *VTSD = dyn_cast<VarTemplateSpecializationDecl>(D))
+ Args = VTSD->getTemplateArgs().asArray();
+ else if (auto *FD = dyn_cast<FunctionDecl>(D))
+ if (FD->getTemplateSpecializationArgs())
+ Args = FD->getTemplateSpecializationArgs()->asArray();
+
+ for (auto &TA : Args)
+ AddTemplateArgument(TA);
}
void TemplateArgumentHasher::AddQualType(QualType T) {
if (T.isNull()) {
- BailedOut = true;
return;
}
SplitQualType split = T.split();
@@ -213,7 +202,6 @@ void TemplateArgumentHasher::AddQualType(QualType T) {
// Process a Type pointer. Add* methods call back into TemplateArgumentHasher
// while Visit* methods process the relevant parts of the Type.
-// Any unhandled type will make the hash computation bail out.
class TypeVisitorHelper : public TypeVisitor<TypeVisitorHelper> {
typedef TypeVisitor<TypeVisitorHelper> Inherited;
llvm::FoldingSetNodeID &ID;
@@ -245,9 +233,6 @@ class TypeVisitorHelper : public TypeVisitor<TypeVisitorHelper> {
void Visit(const Type *T) { Inherited::Visit(T); }
- // Unhandled types. Bail out simply.
- void VisitType(const Type *T) { Hash.setBailedOut(); }
-
void VisitAdjustedType(const AdjustedType *T) {
AddQualType(T->getOriginalType());
}
|
|
While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land. |
I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing. |
|
@ilya-biryukov, would you mind giving this PR a test on your infrastructure and if it works maybe share some performance results? |
Performance measurements with LLVMI tested these patches for building LLVM itself with modules ( I did some measurements for individual files, chosen by searching for large object files and excluding generated files. For each version, I first build LLVM completely to populate the
before*: reverting fb2c9d9, c5e4afe, 30ea0f0, 20e9049 on current |
Backport upstream PR llvm/llvm-project#133057
Backport upstream PR llvm/llvm-project#133057
Sure, let me try kicking it off. Note that our infrastructure is much better at detecting the compilations timing out than providing proper benchmarking at scale (there are a few targeted benchmarks too, though). I'll try to give you what we have, though. |
Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine. |
This comes from the logic: if we have a partial template specialization |
Backport upstream PR llvm/llvm-project#133057
I'm ok with pushing the commits one-by-one after the PR is reviewed, just let me know.
Sure, but in my understanding, that's not needed on the //--- partial.cppm
export module partial;
export template <typename S, typename T, typename U>
struct Partial {
static constexpr int Value() { return 0; }
};
export template <typename T, typename U>
struct Partial<int, T, U> {
static constexpr int Value() { return 1; }
};
//--- partial.cpp
import partial;
static_assert(Partial<int, double, double>::Value() == 1);(I assume that's what you have in mind?) I see two calls to |
This is a relatively small patch focused on reducing the round trips to modules deserialization. I see this as an atomic change that if it goes in partially would defeat its purpose. What's the goal of a partial optimization? |
If it works, I feel good with it. |
I think partial optimizations are optimization too. If these codes are not dependent on each other, it should be better to split them. Given the scale of the patch, it may not be serious problem actually. I still think it is better to land them separately, but if you want to save some typings. I don't feel too bad. |
Honestly I am more concerned about the tests that @ilya-biryukov is running. As long as they are happy I do not particularly care about commit style. Although it'd be weird to land 40 line patch in many commits :) |
I don't feel odd. I remember it is (or was) LLVM's policy that smaller patches are preferred : ) |
Backport upstream PR llvm/llvm-project#133057
|
The small-scale benchmarks we had show 10% improvement in CPU and 23% improvement in memory usage for some compilations! We did hit one compiler error that does not reproduce without modules, however: We're in the process of getting a small reproducer (please bear with us, it takes some time) that we can share. @emaxx-google is working on it. |
That's very good news. I think we can further reduce these times. IIRC, we still deserialize declarations that we do not need. One of the places to look is the logic that kicks in when at module loading time:
Ouch. If that's the only issue on your infrastructure that's probably not so bad.
|
|
Here's the (almost) minimized reproducer for this UPD: To run the reproducer, first "unpack" the archive into separate files using LLVM's |
Thanks for the efforts! I only had a very quick look and it seems the paste is not complete. For example, class Class1 {
public:and many other definitions look incomplete as well. Can you check if there was maybe a mistake? |
That's how it looks like - the minimizer tool (based on C-Reduce/C-Vise) basically works by randomly removing chunks of code, which does often end up with code that looks corrupted. The tool could at least do a better job by merging/inlining unnecessary headers, macros, etc., but at least the output, as shared, should be sufficient to trigger the error in question ( |
|
I had a closer look, but I get plenty of compile errors already on I haven't even applied the change in this PR - what am I missing? |
Not really without further understanding what is going wrong under which circumstances... Some pointers (not sure how much time you would be able to spend on your side debugging the problem):
Ok, if possible maybe you can keep it running in parallel, we will still need a reproducer as a regression test later on. |
I could allocate some time to debugging this, and here's what I got. The crash stack is: Looking closer at The crash is in the function call above ^^ The CXXRecordDecl that triggers the crash is: Just in case a full dump of It looks like it happens as you suspected (" One more thing I noticed while trying to debug what's happening in the Please let me know, if you have any more ideas I could try.
The test case is down to ~130MB and 9K files, but still nothing I can share directly. |
I've done some manual build graph cleanup and got the makefile in the test case from a few thousand down to a handful of rules. The reduction started running a bit faster since then: the size of inputs is now ~65MB spread across ~4.5K files. Looking at how it goes, I may have a shareable test case by the end of the week. |
Thanks for trying all that!
Ok, at least the understanding of the problem is progressing.
Too bad, that would have been too easy I guess... Now I wonder if we maybe need to change the order of operations, ie first complete the redecl chain, then notice that we can get the definition externally, and then actually go and do that. Conceptually something like // Complete the redecl chain (if necessary).
D = D->getMostRecentDecl();
if (D->hasExternalLexicalStorage() && !D->getDefinition())
getExternalSource()->CompleteType(const_cast<RecordDecl*>(D));Not sure if that changes anything (at least it doesn't fail Clang tests when applied to current
Only if you have more time to spare, the second part of point 4. above, ie which of the commits is the culprit: Is it a single one and everything is fine if you just exclude / revert that one? |
10MB and 2.7K files |
|
After manually cleaning up the ~10KB that CVise could squeeze out of the initial 400+MB, I got this: |
I will try to find time to get to this part too. |
Thanks for your efforts, very useful! That reduced example actually crashes on |
With both PRs I'm getting at least this Clang assertion error on some of our code. There's more, and I'll try to create a reproducer for this one, but maybe the stack trace is enough to figure out what's wrong: |
Thanks for testing! "Luckily" it's the exact code that I touched in |
I feel it may not be too hard to implement a |
As I hinted above, that doesn't work. More specifically, I tried diff --git a/clang/lib/Serialization/ASTReaderDecl.cpp b/clang/lib/Serialization/ASTReaderDecl.cpp
index 882d54f31280..d63824c4ba18 100644
--- a/clang/lib/Serialization/ASTReaderDecl.cpp
+++ b/clang/lib/Serialization/ASTReaderDecl.cpp
@@ -2107,8 +2107,9 @@ void ASTDeclMerger::MergeDefinitionData(
auto *Def = DD.Definition;
DD = std::move(MergeDD);
DD.Definition = Def;
- for (auto *D : Def->redecls())
- cast<CXXRecordDecl>(D)->DefinitionData = ⅅ
+ Def = Def->getMostRecentDecl();
+ while ((Def = Def->getPreviousDecl()))
+ cast<CXXRecordDecl>(Def)->DefinitionData = ⅅ
return;
}
|
You can't use |
I'm stupid, the condition during the first iteration of the --- a/clang/lib/Serialization/ASTReaderDecl.cpp
+++ b/clang/lib/Serialization/ASTReaderDecl.cpp
@@ -2107,8 +2107,10 @@ void ASTDeclMerger::MergeDefinitionData(
auto *Def = DD.Definition;
DD = std::move(MergeDD);
DD.Definition = Def;
- for (auto *D : Def->redecls())
- cast<CXXRecordDecl>(D)->DefinitionData = ⅅ
+ Def = Def->getMostRecentDecl();
+ do {
+ cast<CXXRecordDecl>(Def)->DefinitionData = ⅅ
+ } while ((Def = Def->getPreviousDecl()));
return;
}
If I understand correctly, we could implement this locally in diff --git a/clang/lib/Serialization/ASTReaderDecl.cpp b/clang/lib/Serialization/ASTReaderDecl.cpp
index 882d54f31280..ec7c2449d77e 100644
--- a/clang/lib/Serialization/ASTReaderDecl.cpp
+++ b/clang/lib/Serialization/ASTReaderDecl.cpp
@@ -2107,8 +2107,10 @@ void ASTDeclMerger::MergeDefinitionData(
auto *Def = DD.Definition;
DD = std::move(MergeDD);
DD.Definition = Def;
- for (auto *D : Def->redecls())
- cast<CXXRecordDecl>(D)->DefinitionData = ⅅ
+ // FIXME: this duplicates logic from ASTReader::finishPendingActions()...
+ for (auto *R = Reader.getMostRecentExistingDecl(Def); R;
+ R = R->getPreviousDecl())
+ cast<CXXRecordDecl>(R)->DefinitionData = ⅅ
return;
}
... which then of course begs the question why the existing code in Another possibility is to use diff --git a/clang/lib/Serialization/ASTReaderDecl.cpp b/clang/lib/Serialization/ASTReaderDecl.cpp
index 882d54f31280..1112c64eeb02 100644
--- a/clang/lib/Serialization/ASTReaderDecl.cpp
+++ b/clang/lib/Serialization/ASTReaderDecl.cpp
@@ -2107,8 +2107,8 @@ void ASTDeclMerger::MergeDefinitionData(
auto *Def = DD.Definition;
DD = std::move(MergeDD);
DD.Definition = Def;
- for (auto *D : Def->redecls())
- cast<CXXRecordDecl>(D)->DefinitionData = ⅅ
+ for (auto *D : merged_redecls(Def))
+ D->DefinitionData = ⅅ
return;
}
Which of these alternatives is preferable (assuming one of them actually works for @alexfh)? |
I feel they are workarounds. The |
For the first approach, I can potentially agree, even though we are already deserializing with the current code so it can hardly be worse than that. For the other two, can you please elaborate why you consider them workarounds? As I pointed out, they are patterns that are already used in |
Maybe we have understanding for workarounds, but that doesn't matter.
Yeah, if you already have this, why do you think it is burden to implement |
Because it's a local implementation that works in |
But The patch have already been stalled by 8 months. I feel all of us are patient enough. |
|
Let me say it clearly: I will not spend time on it, if I've demonstrated that existing patterns would solve the problem. We have lazy template loading working locally since years, since March with the upstream implementation including these fixes. I think it would be worthwhile to have it fixed upstream because right now it's basically useless, but there's a limit to my involvement. |
I invite you to submit a patch. Once that passed review and is confirmed working with the Google internal code base, I can come back to this PR. |
I can do it. But I don't think that need to be tested by google. It is just an new API. And the testing process is too slow. |
To clarify, what needs testing is the new API plus this PR on top to see if it addresses the problem reported by @alexfh, and if there are others afterwards. |
|
I'd feel uncomfortable exposing a I'd propose to move forward with |
I'd like it as: |
ODRHash::AddDeclwith the reasoning given in the comment, to reduce collisions. This was particularly visible with STL types templated onstd::pairwhere its template arguments were not taken into account.TPL.TemplateArgumentHasher: While it is correct to assign a single fixed hash to all templatearguments, it can reduce the effectiveness of lazy loading and is not actually needed: we are allowed to ignore parts that cannot be handled because they will be analogously ignored by all hashings.