[clang] Improve nested name specifier AST representation #147835

mizvekov · 2025-07-09T21:39:29Z

This is a major change on how we represent nested name qualifications in the AST.

The nested name specifier itself and how it's stored is changed. The prefixes for types are handled within the type hierarchy, which makes canonicalization for them super cheap, no memory allocation required. Also translating a type into nested name specifier form becomes a no-op. An identifier is stored as a DependentNameType. The nested name specifier gains a lightweight handle class, to be used instead of passing around pointers, which is similar to what is implemented for TemplateName. There is still one free bit available, and this handle can be used within a PointerUnion and PointerIntPair, which should keep bit-packing aficionados happy.
The ElaboratedType node is removed, all type nodes in which it could previously apply to can now store the elaborated keyword and name qualifier, tail allocating when present.
TagTypes can now point to the exact declaration found when producing these, as opposed to the previous situation of there only existing one TagType per entity. This increases the amount of type sugar retained, and can have several applications, for example in tracking module ownership, and other tools which care about source file origins, such as IWYU. These TagTypes are lazily allocated, in order to limit the increase in AST size.

This patch offers a great performance benefit.

It greatly improves compilation time for stdexec. For one datapoint, for test_on2.cpp in that project, which is the slowest compiling test, this patch improves -c compilation time by about 7.2%, with the -fsyntax-only improvement being at ~12%.

This has great results on compile-time-tracker as well:

This patch also further enables other optimziations in the future, and will reduce the performance impact of template specialization resugaring when that lands.

It has some other miscelaneous drive-by fixes.

About the review: Yes the patch is huge, sorry about that. Part of the reason is that I started by the nested name specifier part, before the ElaboratedType part, but that had a huge performance downside, as ElaboratedType is a big performance hog. I didn't have the steam to go back and change the patch after the fact.

There is also a lot of internal API changes, and it made sense to remove ElaboratedType in one go, versus removing it from one type at a time, as that would present much more churn to the users. Also, the nested name specifier having a different API avoids missing changes related to how prefixes work now, which could make existing code compile but not work.

How to review: The important changes are all in clang/include/clang/AST and clang/lib/AST, with also important changes in clang/lib/Sema/TreeTransform.h.

The rest and bulk of the changes are mostly consequences of the changes in API.

PS: TagType::getDecl is renamed to getOriginalDecl in this patch, just for easier to rebasing. I plan to rename it back after this lands.

Fixes #136624
Fixes #43179
Fixes #68670
Fixes #92757

This fixes a regression introduced in #147835 When parsing a lambda where the call operator has a late parsed attribute, we would try to build a 'this' type for the lambda, but in a lambda 'this' never refers to the lambda class itself. This late parsed attribute can be added implicitly by the -ftrapping-math flag. This patch patch makes it so CXXThisScopeRAII ignores lambdas. This became observable in #147835 because that made clang lazily create tag types, and it removed a workaround for a lambda dependency bug where any previously created tag type for the lambda is dicarded after its dependency is recalculated. But the 'this' scope created above would defeat this laziness and create the lambda type too soon, before its dependency was updated. Since this regression was never released, there are no release notes. Fixes #161657

This fixes a regression introduced in #147835 When parsing a lambda where the call operator has a late parsed attribute, we would try to build a 'this' type for the lambda, but in a lambda 'this' never refers to the lambda class itself. This late parsed attribute can be added implicitly by the -ftrapping-math flag. This patch patch makes it so CXXThisScopeRAII ignores lambdas. This became observable in #147835 because that made clang lazily create tag types, and it removed a workaround for a lambda dependency bug where any previously created tag type for the lambda is discarded after its dependency is recalculated. But the 'this' scope created above would defeat this laziness and create the lambda type too soon, before its dependency was updated. Since this regression was never released, there are no release notes. Fixes #161657

…161765) This fixes a regression introduced in llvm#147835 When parsing a lambda where the call operator has a late parsed attribute, we would try to build a 'this' type for the lambda, but in a lambda 'this' never refers to the lambda class itself. This late parsed attribute can be added implicitly by the -ftrapping-math flag. This patch patch makes it so CXXThisScopeRAII ignores lambdas. This became observable in llvm#147835 because that made clang lazily create tag types, and it removed a workaround for a lambda dependency bug where any previously created tag type for the lambda is discarded after its dependency is recalculated. But the 'this' scope created above would defeat this laziness and create the lambda type too soon, before its dependency was updated. Since this regression was never released, there are no release notes. Fixes llvm#161657

emaxx-google · 2025-10-09T01:30:23Z

Hello, did these changes modify the semantics of NestedNameSpecifierLoc::getLocalSourceRange()? For a code like void n::C::f() {}, previous versions of Clang returned C from getLocalSourceRange(), but now it's returning n::C. The new behavior seems to contradict the documentation of this method.

mizvekov · 2025-10-09T03:24:53Z

Hello, did these changes modify the semantics of NestedNameSpecifierLoc::getLocalSourceRange()? For a code like void n::C::f() {}, previous versions of Clang returned C from getLocalSourceRange(), but now it's returning n::C. The new behavior seems to contradict the documentation of this method.

I think the documentation is right and matches what's happening.
It's just that what's considered local has changed in the case of a NestedNameSpecifier which is a Type.
The type was previously split-up, each chunk being its own nested name specifier.
Now a Type is always a NestedNameSpecifier component by itself.

Sounds like you want to get the beginning of the range as something like:

if (auto TL = NNSL->getAsTypeLoc())
  return TL.getNonPrefixBeginLoc();
else
  return NNSL->getLocalBeginLoc();

I didn't add a helper for this because there was just a single user in the llvm codebase using this pattern.
See the clang-tools-extra/clangd/FindTarget.cpp change from this patch.

hokein · 2025-10-09T08:57:26Z

I think the documentation is right and matches what's happening.
It's just that what's considered local has changed in the case of a NestedNameSpecifier which is a Type.
The type was previously split-up, each chunk being its own nested name specifier.
Now a Type is always a NestedNameSpecifier component by itself.

Just my two cents.

I think the term "local" is ambiguous, and my mental model has always been that it refers to the last specifier in a qualified name. This seems to have been the behavior, and the public API doc implies it.

This recent change seems to break that model and introduces behavior that feels inconsistent:

For namespace specifier ns1::ns2::, getLocalSourceRange returns the range of "ns2::";
For type specifier ns1::C1::, getLocalSourceRange returns the range of "ns1::C1::";

This difference is confusing. While I understand the reasoning behind the internal implementation, API users aren't likely to be aware of it, and the "last piece" model seems more intuitive.

If this new behavior is intended, could we update the comments for getLocalSourceRange() to clarify these new rules and provide examples for each case? or even would it make sense to preserve the old behavior for the sake of API consistency?

mizvekov · 2025-10-10T01:04:32Z

Yeah, all except one user of getLocalSourceRange expected the behavior to be what it is now.

I propose adding a note about this in the documentation, add a new helper which does what getLocalSourceRange used to do, and migrate the single existing user to use that.

emaxx-google · 2025-10-12T01:06:47Z

Hello, did these changes modify the semantics of NestedNameSpecifierLoc::getLocalSourceRange()? For a code like void n::C::f() {}, previous versions of Clang returned C from getLocalSourceRange(), but now it's returning n::C. The new behavior seems to contradict the documentation of this method.

I think the documentation is right and matches what's happening. It's just that what's considered local has changed in the case of a NestedNameSpecifier which is a Type. The type was previously split-up, each chunk being its own nested name specifier. Now a Type is always a NestedNameSpecifier component by itself.

Sounds like you want to get the beginning of the range as something like:

Thanks, this workaround works for me. It's just that such subtle changes of the API (the function interface wasn't changed, its documentation wasn't changed and changelog wasn't clear on this backwards-incompatible change) are very hard to pinpoint&address in downstream projects.

This adds a note to the documentation of getLocalSourceRange, and cleans up one of the users of this function which was changed in #147835

This adds a note to the documentation of getLocalSourceRange, and cleans up one of the users of this function which was changed in #147835 This also removes `TypeLoc::getNonPrefixBeginLoc`, which was recently introduced in the above patch, as that became unused.

mizvekov · 2025-10-13T15:18:58Z

After another look at this, we can remove that single potential user and that's a simplification over the status quo.

See #163206

Also removes TypeLoc::getNonPrefixBeginLoc, which becomes unused after this change.

@emaxx-google see if your problem can be solved with a change similar to the above patch, that would be better than the solution I posted earlier.

I added the note on this patch, but also see that the example in that documentation already reflected that change.

This adds a note to the documentation of getLocalSourceRange, and cleans up one of the users of this function which was changed in #147835 This also removes `TypeLoc::getNonPrefixBeginLoc`, which was recently introduced in the above patch, as that became unused.

This adds a note to the documentation of getLocalSourceRange, and cleans up one of the users of this function which was changed in llvm/llvm-project#147835 This also removes `TypeLoc::getNonPrefixBeginLoc`, which was recently introduced in the above patch, as that became unused.

This rename was made as part of #147835 in order to ease rebasing the PR, and give a nice window for other patches to get rebased as well. This has been a while already, so lets go ahead and rename it back.

mizvekov · 2025-10-13T21:45:43Z

I just posted #163271 which will rename getOriginalDecl back to getDecl, as I think enough time has passed and there has been a big enough window to facilitate any rebases.

Let me know if anyone needs more time.

Ideally we would like to get this in before the release, so that upstream users that need to support multiple clang versions don't need a bunch of extra #if #elses.

This adds a note to the documentation of getLocalSourceRange, and cleans up one of the users of this function which was changed in llvm#147835 This also removes `TypeLoc::getNonPrefixBeginLoc`, which was recently introduced in the above patch, as that became unused.

This rename was made as part of #147835 in order to ease rebasing the PR, and give a nice window for other patches to get rebased as well. This has been a while already, so lets go ahead and rename it back.

This rename was made as part of #147835 in order to ease rebasing the PR, and give a nice window for other patches to get rebased as well. It has been a while already, so lets go ahead and rename it back.

…cl (#163271) This rename was made as part of llvm/llvm-project#147835 in order to ease rebasing the PR, and give a nice window for other patches to get rebased as well. It has been a while already, so lets go ahead and rename it back.

mizvekov requested review from AaronBallman, cor3ntin, erichkeane, rjmccall, zygoloid and zyn0217 July 9, 2025 21:39

mizvekov self-assigned this Jul 9, 2025

mizvekov requested review from Endilll, QuietMisdreavus, cyndyishida and daniel-grumberg as code owners July 9, 2025 21:39

mizvekov mentioned this pull request Oct 3, 2025

[clang] fix lambda dependency issue with late parse attributes #161765

Merged

emaxx-google mentioned this pull request Oct 6, 2025

Support for LLVM 22 marxin/cvise#373

Merged

mizvekov added a commit that referenced this pull request Oct 13, 2025

[clang] NFC: NNS getLocalSourceRange user cleanup

bd77676

This adds a note to the documentation of getLocalSourceRange, and cleans up one of the users of this function which was changed in #147835

mizvekov mentioned this pull request Oct 13, 2025

[clang] NFC: NNS getLocalSourceRange user cleanup #163206

Merged

mizvekov mentioned this pull request Oct 13, 2025

[clang] NFC: rename TagType::getOriginalDecl back to getDecl #163271

Merged

philnik777 added a commit to philnik777/llvm-project that referenced this pull request Oct 14, 2025

[libc++][C++03] Cherry-pick llvm#123236 and llvm#147835

0394df1

philnik777 added a commit to philnik777/llvm-project that referenced this pull request Oct 14, 2025

[libc++][C++03] Cherry-pick llvm#123236 and llvm#147835

0ec3265

philnik777 added a commit that referenced this pull request Oct 14, 2025

[libc++][C++03] Cherry-pick #123236 and #147835 (#163363)

6fc2318

akadutta pushed a commit to akadutta/llvm-project that referenced this pull request Oct 14, 2025

[libc++][C++03] Cherry-pick llvm#123236 and llvm#147835 (llvm#163363)

fc5f9f5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[clang] Improve nested name specifier AST representation #147835

[clang] Improve nested name specifier AST representation #147835

mizvekov commented Jul 9, 2025 •

edited

Loading

Uh oh!

emaxx-google commented Oct 9, 2025

Uh oh!

mizvekov commented Oct 9, 2025

Uh oh!

hokein commented Oct 9, 2025

Uh oh!

mizvekov commented Oct 10, 2025

Uh oh!

emaxx-google commented Oct 12, 2025 •

edited

Loading

Uh oh!

mizvekov commented Oct 13, 2025

Uh oh!

mizvekov commented Oct 13, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

35 participants

[clang] Improve nested name specifier AST representation #147835

[clang] Improve nested name specifier AST representation #147835

Conversation

mizvekov commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emaxx-google commented Oct 9, 2025

Uh oh!

mizvekov commented Oct 9, 2025

Uh oh!

hokein commented Oct 9, 2025

Uh oh!

mizvekov commented Oct 10, 2025

Uh oh!

emaxx-google commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mizvekov commented Oct 13, 2025

Uh oh!

mizvekov commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

35 participants

mizvekov commented Jul 9, 2025 •

edited

Loading

emaxx-google commented Oct 12, 2025 •

edited

Loading

mizvekov commented Oct 13, 2025 •

edited

Loading