Skip to content

Merging upstream 91cdd350 [clang] Improve nested name specifier AST representation#249

Open
YukinoHayakawa wants to merge 4828 commits intobloomberg:p2996from
UsagiEngine:yukino-merge-upstream-91cdd350-20251223
Open

Merging upstream 91cdd350 [clang] Improve nested name specifier AST representation#249
YukinoHayakawa wants to merge 4828 commits intobloomberg:p2996from
UsagiEngine:yukino-merge-upstream-91cdd350-20251223

Conversation

@YukinoHayakawa
Copy link

@YukinoHayakawa YukinoHayakawa commented Dec 29, 2025

This PR merges upstream 91cdd35 [clang] Improve nested name specifier AST representation which, bumped clang version to 22 and did a painful refactoring on NestedNameSpecifier for which I am not really sure how to handle this change best.

Many major changes were introduced up to this commit including a refactoring of the type system, removing ElaboratedType, Identifier as SpecifierKind, and so on, and replaced many types as simply a TagType<> etc.

NestedNameSpecifier got compressed from 24 bytes to 8 bytes thanks to their aggressive bitwise operations which forced me to set the alignment of affected types to 16 to accommodate the new StoredKind bits. I am not sure whether this is the best way to deal with it.

Again, I tested on my own test code and game code. For test cases that don't compile, I cross checked with the compiler on CE and it seems that errors are identical. Currently I haven't triggered any ICE but I suppose there would be. I simply tried to stick with the original reflection logic and tried not to break them.

If you are interested, please perform some thorough tests on this merge.

For code I've touched, I have all left todo [merge:yukino... markers for you to inspect.

This PR concludes #248

and can replace

#243
#244

tbaederr and others added 30 commits August 7, 2025 11:32
... so we don't have to create Pointer instances when we don't need
them.
llvm#152457)

Judging from the reaction to
llvm#152302, we are not ready to
make this a fatal error.

Remove the specific version number, and update the libc message to match
the others' wording.
This fixes llvm#152097

This commit fixes two instances of a (somewhat) recently enabled
assertion. One with a test, the other I can't reproduce (might be dead
code) but certainly looks like an instance of the same problem.

The PR that introduced the regression:
llvm#117558

With this patch, the AVR backend is usable again for TinyGo.
This patch extends llvm#149095 for EOR and ORR.

It uses a simple partition scheme to try to find two suitable disjoint
bitmasks that can be used with EOR/ORR to reconstruct the original mask.

Fixes: llvm#148987.
…lvm#151940)

We need to reject plans that contain recipes with invalid costs. LICM
can move recipes with invalid costs out of the loop region, which then
get missed by the main cost computation.

Extend the logic to check recipes for invalid cost currently only
covering the middle block to include all skeleton blocks.

Fixes llvm#144358 
Fixes llvm#151664

PR: llvm#151940
…xpr (llvm#152363)

Closes llvm#152324.
Part of llvm#30794.

This PR adds `constexpr` support for the following AVX512 integer
reduction intrinsics:

- `_mm512_reduce_add_epi32`
- `_mm512_reduce_add_epi64`
- `_mm512_reduce_mul_epi32`
- `_mm512_reduce_mul_epi64`
- `_mm512_reduce_and_epi32`
- `_mm512_reduce_and_epi64`
- `_mm512_reduce_or_epi32`
- `_mm512_reduce_or_epi64`
- `_mm512_reduce_max_epi32`
- `_mm512_reduce_max_epi64`
- `_mm512_reduce_min_epi32`
- `_mm512_reduce_min_epi64`
- `_mm512_reduce_max_epu32`
- `_mm512_reduce_max_epu64`
- `_mm512_reduce_min_epu32`
- `_mm512_reduce_min_epu64`

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
Auto-generate checks for
llvm#151925. Also update some
naming to make more consistent with other tests.
…#152138)

This commit converts RetainCountChecker to the new checker family
framework that was introduced in the commit
6833076

This commit also performs some minor cleanup around the parts that had
to be changed, but lots of technical debt still remains in this old
codebase.
Added by llvm#150846.

Checks the size of a structure, which is only correct for 64-bit
systems.
…ntrinsics to be used in constexpr (llvm#152435)

Fixed llvm#152313

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
Previously, specializing the GraphWriter class required a full class
specialization.
This change introduces CRTP for GraphWriter, allowing for partial
specialization.

This change is in support of printing the module dependency graph as
part of the RFC for driver-managed module builds, for which we want to
print the graph nodes in a more human-readable format by:
- Printing descriptive IDs instead of pointer addresses as node labels.
- Printing the full node labels separately from the node relations to
avoid clutter.

With this approach, only GraphWriter::writeNodes() needs to be
specialized (, aside from DOTGraphTraits).

RFC for driver-managed module builds:
https://discourse.llvm.org/t/rfc-modules-support-simple-c-20-modules-use-from-the-clang-driver-without-a-build-system
Desc is only used once and we can get that information from the Block as
well.
[NVPTX] Add Prefetch tensormap intrinsics
This PR adds prefetch intrinsics with the relevant tensormap_space.

* Lit tests are added as part of prefetch.ll
* The generated PTX is verified with a 12.3 ptxas executable.
* Added docs for these intrinsics in NVPTXUsage.rst.

For more information, refer to the PTX ISA for prefetch intrinsic :
[Prefetch
Tensormap](https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-prefetch-prefetchu)

 @durga4github @schwarzschild-radius
Changes: The original patch, landed as 1336675, was reverted due to a
bug in LoopVectorize resulting in a crash. The bug has now been fixed by
95c32bf ([VPlan] Return invalid cost if any skeleton block has invalid
costs), and this reland is identical to the original patch.
…ops. (llvm#148424)

Adds `linalg-morph-ops` pass to convert an op from one representation to another: 
   named-op <--> category_op (elementwise, contraction, ..) <--> generic
e.g.
```mlir
  %exp = linalg.exp ins(%A : tensor<16x8xf32>) outs(%B :  tensor<16x8xf32>) -> tensor<16x8xf32>
```
After `mlir-opt -linalg-morph-ops=named-to-category ..`

```mlir
  %0 = linalg.elementwise kind=#linalg.elementwise_kind<exp> ins(%arg0 : tensor<16x8xf32> ..

Note: this is generalization of 
`--linalg-generalize-named-ops` is the path `named-op --> generic-op`
`--linalg-specialize-generic-ops` is the path `named-op <-- generic-op`

email: quic_mabsar@quicinc.com
…ows (llvm#152318)

Currrently flang-rt assumes that LLVM was always built with the dynamic
MSVC runtime. This may not be the case, if the user has specified a
different runtime with -DCMAKE_MSVC_RUNTIME_LIBRARY. Since this flag is
implied by -DLLVM_ENABLE_RPMALLOC=On, which is used by the Windows
release script, this is causing that script to fail.

Fixes llvm#151920
Split out from llvm#150248:

Use the size of the alloca instead of the size passed to the lifetime
intrinsic.

As a bonus, this handles dynamic allocas correctly (see the added test)
instead of doing a memset with size -1...
…llvm#152478)

Adds missing C++ run lines to test files containing `constexpr` tests.
Also adds missing 32/64-bit test coverage to the following tests:
- `clang/test/CodeGen/X86/avx512-reduceIntrin.c`
- `clang/test/CodeGen/X86/avx512-reduceMinMaxIntrin.c`
- `clang/test/CodeGen/X86/avx512vpopcntdq-builtins.c`
- `clang/test/CodeGen/X86/avx512vpopcntdqvl-builtins.c`

Additionally, fixes a `_mm512_popcnt_epi64` `constexpr` test that
incorrectly assumed 32-bit integers, leading to incorrect bit counts.
This change updates the test result to assume 64-bit integers.
We currently log every single test that we run in premerge. This leads
to gigantic logs (200k+ lines on Linux) that can be difficult to parse
through. Having an indicator of progress is nice, especially for the
LLVM tests, but is not strictly necessary and not often used (I
imagine). Having a progress indicator from lit that works in CI cases is
on my TODO list.

For the rare cases where someone does need to see the list of tests that
run, the JUnit XML emitted by lit is available in the artifacts.
…lvm#152007)

This allows not having the END CRITICAL directive in certain situations.
Update semantic checks and symbol resolution.
…m#152466)

`add_conformance_test` checks for libc and prints a warning if it is not
found. However, this warning ends up being printed once for each test,
spamming the cmake log. Moving it up to the folder cmake allows it to
be reported only once.
llvm#152483)

This way all the tracking is self-contained in `TrackingOutputBuffer`
and we can test the `SuffixRange` properly.
Added initial check for potential fmad conversion in reductions and
operands vectorization.
When `LLVM_INSTALL_TOOLCHAIN_ONLY=ON`, the MLIR shared library
(`libMLIR*`) is not installed even though it is built with the
`INSTALL_WITH_TOOLCHAIN` argument to the `add_mlir_library` cmake
function. This patch ensures that `libMLIR*` is installed when
`LLVM_INSTALL_TOOLCHAIN_ONLY=ON`.

Patch verified
[here](llvm#151247 (comment)).

fixes llvm#151247
This patch teaches moveFromOldBuckets to take an iterator_range so
that it can use a range-based for loop.
getActiveBits() already returns unsigned.
krishna2803 and others added 28 commits August 8, 2025 17:06
…ctions (llvm#152784)

Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
…lvm#152038)

These are the strided versions of `riscv.segN.store.mask` intrinsics.
Instead of using the word 'offset' it's probably better to just say
'stride'.

NFC.
Removes the `(batch_)matmul_transpose_{a|b}` variants from OpDSL and
replace it with `matmul affine_maps [...]` whenever appropriate. This is
in line with the
[plan](https://discourse.llvm.org/t/rfc-op-explosion-in-linalg/82863),
and can be done since llvm#104783 merged.

See:
https://discourse.llvm.org/t/deprecate-batch-matmul-transpose-a-b-linalg-operations/87245

Issues investigated:
* pad transform tests that could use `matmul` instead, so change to
that.
* ArmSME test using transpose actually needed it, so changed to `matmul`
+ affine maps.

Arm tests validated by @banach-space (thanks!!).
Unlike ptrtoint, ptrtoaddr does not capture provenance, only the address.
Note: As defined by the LangRef, we always treat `ptrtoaddr` as a
location-independent address capture since it is a direct inspection of the
pointer address.

Reviewed By: nikic

Pull Request: llvm#152221
…e argument (llvm#152791)

Fixes llvm#152754 

- Fixes the ArgOperand index in `DXILOpLowering.cpp` used to obtain the
pointer operand of a lifetime intrinsic.
- Updates the tests
`llvm/test/CodeGen/DirectX/legalize-lifetimes-valver-1.5.ll`,
`llvm/test/CodeGen/DirectX/legalize-lifetimes-valver-1.6.ll`,
`llvm/test/CodeGen/DirectX/ShaderFlags/lifetimes-noint64op.ll`, and
`llvm/test/tools/dxil-dis/lifetimes.ll` to use the new size-less
lifetime intrinsic
- Removes lifetime intrinsics from the test
`llvm/test/CodeGen/DirectX/legalize-memset.ll` to be consistent with the
corresponding memcpy test which does not have lifetime intrinsics.
(Removal of lifetime intrinsics from tests like this was suggested here
in the past:
llvm#139173 (comment))
- Rewrites the lifetime legalization functions in the EmbedDXILPass to
re-add the explicit size argument for DXIL
This patch fixes:

  lldb/unittests/DAP/ProtocolTypesTest.cpp:112:67: error: missing
  field 'adapterData' initializer
  [-Werror,-Wmissing-field-initializers]

  lldb/unittests/DAP/ProtocolTypesTest.cpp:571:70: error: missing
  field 'adapterData' initializer
  [-Werror,-Wmissing-field-initializers]
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
…m#152805)

This test runs `mlir-opt %s | mlir-opt %s | FileCheck` to test the round
trip behavior, but the second command takes input from the pipe, not the
lit test, so it should be `mlir-opt %s | mlir-opt | FileCheck`.

For some reason I haven't figured out, this causes ~50% flakiness when
testing in certain environments (not reproducible in my shell, but
reproduces in an internal buildbot), due to the pipeline raising
`SIGPIPE`.

Test added in llvm#148424.
…152593)

Static analysis complained that:

  child_range(&Init, &Init+1);

in the children member function was potentially out of bounds. This is
false b/c it is forming an iterator range but it would be invalid if
Init was a nullptr.

I add an assertion in the constructor for this and remove to FIXME
checks that are related to this. I checked the various usages and we
always valid the argument is not nullptr.
Adds missing 16-bit test cases to the test that src mods are not applied
to integers in instructions with canonicalizing patterns.
llvm#152813)

We'll remove the size estimator after, this change is to get the `ml-*`
build bots green after the aforementioned PR.

We never used the size estimator again after the initial DQN-based
training. Should we want to again, we now have IR2Vec, which the old
estimator was approximating in functionality.
Summary:
Small fix that just ignores all the extra lanes if we're running the
server from a platform that potentially has more.
Without linker relaxation enabled for a particular relocatable file or
section (e.g., using .option norelax), the assembler will not generate
R_RISCV_ALIGN relocations for alignment directives. This becomes
problematic in a two-stage linking process:

```
ld -r a.o b.o -o ab.o
// b.o is norelax. Its alignment information is lost in ab.o.
ld ab.o -o ab
```

When ab.o is linked into an executable, the preceding relaxed section
(a.o's content) might shrink. Since there's no R_RISCV_ALIGN relocation
in b.o for the linker to act upon, the `.word 0x3a393837` data in b.o
may end up unaligned in the final executable.

To address the issue, this patch inserts NOP bytes and synthesizes an
R_RISCV_ALIGN relocation at the beginning of a text section when the
alignment >= 4.

For simplicity, when RVC is disabled, we synthesize an ALIGN relocation
(addend: 2) for a 4-byte aligned section, allowing the linker to trim
the excess 2 bytes.

See also https://sourceware.org/bugzilla/show_bug.cgi?id=33236

Pull Request: llvm#151639
This patch defines a couple of helper functions so that we can convert
four loops to range-based for loops.
This is a major change on how we represent nested name qualifications in
the AST.

* The nested name specifier itself and how it's stored is changed. The
prefixes for types are handled within the type hierarchy, which makes
canonicalization for them super cheap, no memory allocation required.
Also translating a type into nested name specifier form becomes a no-op.
An identifier is stored as a DependentNameType. The nested name
specifier gains a lightweight handle class, to be used instead of
passing around pointers, which is similar to what is implemented for
TemplateName. There is still one free bit available, and this handle can
be used within a PointerUnion and PointerIntPair, which should keep
bit-packing aficionados happy.
* The ElaboratedType node is removed, all type nodes in which it could
previously apply to can now store the elaborated keyword and name
qualifier, tail allocating when present.
* TagTypes can now point to the exact declaration found when producing
these, as opposed to the previous situation of there only existing one
TagType per entity. This increases the amount of type sugar retained,
and can have several applications, for example in tracking module
ownership, and other tools which care about source file origins, such as
IWYU. These TagTypes are lazily allocated, in order to limit the
increase in AST size.

This patch offers a great performance benefit.

It greatly improves compilation time for
[stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for
`test_on2.cpp` in that project, which is the slowest compiling test,
this patch improves `-c` compilation time by about 7.2%, with the
`-fsyntax-only` improvement being at ~12%.

This has great results on compile-time-tracker as well:

![image](https://github.com/user-attachments/assets/700dce98-2cab-4aa8-97d1-b038c0bee831)

This patch also further enables other optimziations in the future, and
will reduce the performance impact of template specialization resugaring
when that lands.

It has some other miscelaneous drive-by fixes.

About the review: Yes the patch is huge, sorry about that. Part of the
reason is that I started by the nested name specifier part, before the
ElaboratedType part, but that had a huge performance downside, as
ElaboratedType is a big performance hog. I didn't have the steam to go
back and change the patch after the fact.

There is also a lot of internal API changes, and it made sense to remove
ElaboratedType in one go, versus removing it from one type at a time, as
that would present much more churn to the users. Also, the nested name
specifier having a different API avoids missing changes related to how
prefixes work now, which could make existing code compile but not work.

How to review: The important changes are all in
`clang/include/clang/AST` and `clang/lib/AST`, with also important
changes in `clang/lib/Sema/TreeTransform.h`.

The rest and bulk of the changes are mostly consequences of the changes
in API.

PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just
for easier to rebasing. I plan to rename it back after this lands.

Fixes llvm#136624
Fixes llvm#43179
Fixes llvm#68670
Fixes llvm#92757
Notable upstream changes:

- Rearrangement of NamespaceDecl, NamespaceAlias removed.
- Improved consteval propagation.

# Conflicts:
#	clang/include/clang/Basic/DeclNodes.td
#	clang/include/clang/Basic/LangOptions.def
#	clang/include/clang/Serialization/TypeBitCodes.def
#	clang/lib/AST/DeclCXX.cpp
#	clang/lib/Driver/ToolChains/Clang.cpp
#	clang/lib/Parse/ParseExprCXX.cpp
#	clang/lib/Parse/ParseTemplate.cpp
#	clang/lib/Sema/SemaCXXScopeSpec.cpp
#	clang/lib/Sema/SemaDeclCXX.cpp
#	clang/lib/Sema/SemaExprCXX.cpp
#	clang/lib/Sema/SemaLambda.cpp
…Initializer because removing it would break using std::meta::info to initialize constexpr variables in non-const-evaluated functions.
… when it fails to deduce a temp variable's type made from it. e.g. when reflecting `^^decltype([](this auto &&) { })::operator()`
…erge-upstream-91cdd350-20251223

fixing allocation alignment problems caused by NNS refactoring

# Conflicts:
#	clang/include/clang/AST/AbstractBasicReader.h
#	clang/include/clang/AST/AbstractBasicWriter.h
#	clang/include/clang/AST/NestedNameSpecifier.h
#	clang/include/clang/AST/RecursiveASTVisitor.h
#	clang/include/clang/AST/Type.h
#	clang/lib/AST/ASTContext.cpp
#	clang/lib/AST/ASTImporter.cpp
#	clang/lib/AST/ASTStructuralEquivalence.cpp
#	clang/lib/AST/ComputeDependence.cpp
#	clang/lib/AST/ItaniumMangle.cpp
#	clang/lib/AST/NestedNameSpecifier.cpp
#	clang/lib/AST/ODRHash.cpp
#	clang/lib/AST/Type.cpp
#	clang/lib/Index/IndexTypeSourceInfo.cpp
#	clang/lib/Sema/SemaCXXScopeSpec.cpp
#	clang/lib/Sema/SemaExpr.cpp
#	clang/lib/Sema/SemaExprCXX.cpp
#	clang/lib/Sema/SemaHLSL.cpp
#	clang/lib/Sema/SemaTemplate.cpp
#	clang/lib/Sema/TreeTransform.h
#	clang/lib/Serialization/ASTReader.cpp
#	clang/lib/Serialization/ASTWriter.cpp
@YukinoHayakawa YukinoHayakawa marked this pull request as ready for review December 29, 2025 22:14
@YukinoHayakawa
Copy link
Author

Seems CI is failing a job not caused by my changes. I hope it has been fixed in the upstream. I'll continue to try to merge the upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.