Skip to content

Conversation

@pranavk
Copy link
Owner

@pranavk pranavk commented Dec 3, 2025

This fixes 123.

tblah and others added 30 commits December 1, 2025 10:05
Previously we were less specific for POINTER/TARGET: encoding that they
could alias with (almost) anything.

In the new system, the "target data" tree is now a sibling of the other
trees (e.g. "global data"). POITNTER variables go at the root of the
"target data" tree, whereas TARGET variables get their own nodes under
that tree. For example,

```
integer, pointer :: ip
real, pointer :: rp
integer, target :: it
integer, target :: it2(:)
real, target :: rt
integer :: i
real :: r
```
- `ip` and `rp` may alias with any variable except `i` and `r`.
- `it`, `it2`, and `rt` may alias only with `ip` or `rp`.
- `i` and `r` cannot alias with any other variable.

Fortran 2023 15.5.2.14 gives restrictions on entities associated with
dummy arguments. These do not allow non-target globals to be modified
through dummy arguments and therefore I don't think we need to make all
globals alias with dummy arguments.

I haven't implemented it in this patch, but I wonder whether it is ever
possible for `ip` to alias with `rt` or even `it2`.

While I was updating the tests I fixed up some tests that still assumed
that local alloc tbaa wasn't the default.

I found no functional regressions in the gfortran test suite, fujitsu
test suite, spec2017, or a selection of HPC apps we test internally.
A barrier will pause execution until all threads reach it. If some go to
a different barrier then we deadlock. This manifests in that the
finalization callback must only be run once. Fix by ensuring we always
go through the same finalization block whether the thread in cancelled
or not and no matter which cancellation point causes the cancellation.

The old callback only affected PARALLEL, so it has been moved into the
code generating PARALLEL. For this reason, we don't need similar changes
for other cancellable constructs. We need to create the barrier on the
shared exit from the outlined function instead of only on the cancelled
branch to make sure that threads exiting normally (without cancellation)
meet the same barriers as those which were cancelled. For example,
previously we might have generated code like

```
...
  %ret = call i32 @__kmpc_cancel(...)
  %cond = icmp eq i32 %ret, 0
  br i1 %cond, label %continue, label %cancel

continue:
  // do the rest of the callback, eventually branching to %fini
  br label %fini

cancel:
  // Populated by the callback:
  // unsafe: if any thread makes it to the end without being cancelled
  // it won't reach this barrier and then the program will deadlock
  %unused = call i32 @__kmpc_cancel_barrier(...)
  br label %fini

fini:
  // run destructors etc
  ret
```

In the new version the barrier is moved into fini. I generate it *after*
the destructors because the standard describes the barrier as occurring
after the end of the parallel region.

```
...
  %ret = call i32 @__kmpc_cancel(...)
  %cond = icmp eq i32 %ret, 0
  br i1 %cond, label %continue, label %cancel

continue:
  // do the rest of the callback, eventually branching to %fini
  br label %fini

cancel:
  br label %fini

fini:
  // run destructors etc
  // safe so long as every exit from the function happens via this block:
  %unused = call i32 @__kmpc_cancel_barrier(...)
  ret
```

To achieve this, the barrier is now generated alongside the finalization
code instead of in the callback. This is the reason for the changes to
the unit test.

I'm unsure if I should keep the incorrect barrier generation callback
only on the cancellation branch in clang with the OMPIRBuilder backend
because that would match clang's ordinary codegen. Right now I have
opted to remove it entirely because it is a deadlock waiting to happen.

---

This re-lands llvm#164586 with a small fix for a failing buildbot running
address sanitizer on clang lit tests.

In the previous version of the patch I added an insertion point guard
"just to be safe" and never removed it. There isn't insertion point
guarding on the other route out of this function and we do not
preserve the insertion point around getFiniBB either so it is not
needed here.

The problem flagged by the sanitizers was because the saved insertion
point pointed to an instruction which was then removed inside the FiniCB
for some clang codegen functions. The instruction was freed when it was
removed. Then accessing it to restore the insertion point was a use
after free bug.
As noted in the reproducer provided in
llvm#164762 (comment),
on RISC-V after LTO we sometimes have trip counts exposed to vectorized
loops. The loop vectorizer will have generated calls to
@llvm.experimental.get.vector.length, but there are [some
properties](https://llvm.org/docs/LangRef.html#id2399) about the
intrinsic we can use to simplify it:

- The result is always less than both Count and MaxLanes
- If Count <= MaxLanes, then the result is Count

This teaches SCCP to handle these cases with the intrinsic, which allows
some single-iteration-after-LTO loops to be unfolded.

llvm#169293 is related and also simplifies the intrinsic in InstCombine via
computeKnownBits, but it can't fully remove the loop since
computeKnownBits only does limited reasoning on recurrences.
They don't have side-effects, so this should be fine.

Fixes llvm#170064
Similar to how getElementCount avoids the need to reason about fixed and
scalable ElementCounts separately, this patch adds getTypeSize to do the
same for TypeSize.

It also goes through and replaces some of the manual uses of getVScale
with getTypeSize/getElementCount where possible.
…n into after region (llvm#169892)

When a `scf.if` directly precedes a `scf.condition` in the before region
of a `scf.while` and both share the same condition, move the if into the
after region of the loop. This helps simplify the control flow to enable
uplifting `scf.while` to `scf.for`.
During InsertNegateRAState pass we check the annotations on
instructions,
to decide where to generate the OpNegateRAState CFIs in the output
binary.

As only instructions in the input binary were annotated, we have to make
a judgement on instructions generated by other BOLT passes.
Incorrect placement may cause issues when an (async) unwind request
is received during the new "unknown" instructions.

This patch adds more logic to make a more informed decision on by taking
into account:
- unknown instructions in a BasicBlock with other instruction have the
same RAState. Previously, if the BasicBlock started with an unknown
instruction,
the RAState was copied from the preceding block. Now, the RAState is
copied from
  the succeeding instructions in the same block.
- Some BasicBlocks may only contain instructions with unknown RAState,
As explained in issue llvm#160989, these blocks already have incorrect
unwind info. Because of this, the last known RAState based on the layout order
is copied.

Updated bolt/docs/PacRetDesign.md to reflect changes.
…er region in scf-uplift-while-to-for" (llvm#169888)

Reverts llvm#165216
It is implemented in llvm#169892 .
…70095)

From OpenMP 4.0:

> When an if clause is present on a cancel construct and the if
expression
> evaluates to false, the cancel construct does not activate
cancellation.
> The cancellation point associated with the cancel construct is always
> encountered regardless of the value of the if expression.

This wording is retained unmodified in OpenMP 6.0.

This re-opens the already approved PR llvm#164587, which was closed by
accident. The only changes are a rebase.
…lvm#170096)

Similar to fdiv, we should be trying to concat these high latency instructions together
…together. (llvm#170098)

Can only do this for 128->256 cases as we can't safely convert to the RCP14/RSQRT14 variants
Python multiprocessing is limited to 60 workers at most:

https://github.com/python/cpython/blob/6bc65c30ff1fd0b581a2c93416496fc720bc442c/Lib/concurrent/futures/process.py#L669-L672

The limit being per thread pool, we can work around it by using multiple
pools on windows when we want to actually use more workers.
…lvm#170097)

We can't read from those and will run into an assertion sooner or later.

Fixes llvm#170031
From the review in
llvm#169527 (comment),
there are some users where we want to extend or truncate a ConstantRange
only if it's not already the destination bitwidth. Previously this
asserted, so this PR relaxes it to just be a no-op, similar to
IRBuilder::createZExt and friends.
Follow-up for llvm#169047. The previous PR moved some functions from DA to
Delinearization, but the member function declarations were not updated
accordingly. This patch removes them.
…VECTOR. (llvm#162308)

This PR implements catch all handling for widening the scalable
subvector operand (INSERT_SUBVECTOR) or result (EXTRACT_SUBVECTOR). It
does this via the stack using masked memory operations. With general
handling available we can add optimiations for specific cases.
This PR re-lands llvm#165873.

This PR extends the gpu.subgroup_mma_* ops to support fp64 type.
The extension requires special handling during the lowering to nvvm due
to the return type for load ops for fragment a and b (they return a
scalar instead of a struct).

The original PR did not guard the new test based on the required
architecture (sm80) which lead to a failure on the cuda runners with T4
GPUs.
… together (llvm#170113)

Similar to fdiv, we should be trying to concat these high latency instructions together
…_from_uid() (llvm#168554)

Reland llvm#164392 with Fortran support moved to follow-up PR
…lvm#169926)

Add two test cases where dependencies are missed due to overflows. These
will be fixed by llvm#169927 and llvm#169928, respectively.
These f80 fp types are only supported on X86 and can be removed from
AArch64. It looks like they were copied from another backend by mistake.
Test seems to pass after re-enabling without any additional changes.
This uses lit substitution, which fixes running this test on
some environment where 'python' isn't in the path.
Meinersbur and others added 28 commits December 2, 2025 17:48
Vector registers have synthetic values for display purposes. This causes
SBValue::GetExpressionPath to dispatch
to ValueObjectSynthetic instead of ValueObjectRegister, producing
incorrect results.

Fixes llvm#147144
…70326)

4394aa6 introduced the test
amdgcn_weak_alias, which is failing on the reverse iteration build, due
to the the order of the aliasees being different. This failure is a test
issue, not a bug, so the metadata checks are removed.
)

Currently we try to hoist the transformed IV increment instruction to
the header block to help with generation of postincrement instructions,
but this only works if the user instruction is also in the header. We
should instead be trying to insert it in the same block as the user.
…#169795)

Similar to llvm#169156 again, this is mostly for denormal handling as there
is no rounding step in a minnum/maxnum.
Add additional tests where extra no-alias checks are needed, as future
extensions of llvm#168771.
This moves a few existing debug info flags that were floating in the
general pool of unorganised flags over to the existing groups for debug
info flags (so that they are presented together in documentation).

As a tiny further tweak, this also fixes the spelling of "DWARF" in the
flag docs for consistency with other flags.
One of the previous PRs
llvm#169267 has reintroduced block
count to layout propagation that was removed in
llvm#168504. This PR patches the
issue.
Adding llvm:Support dep since plugin started using llvm/ADT/...
)

In Debug builds, the names of adjusted pointers have a pointer-specific
name prefix which doesn't exist in non-debug builds.

This causes differences in output when looking at the output of SROA
with a Debug or Release compiler.

For most of our ongoing testing, we use essentially Release+Asserts
build (basically release but without NDEBUG defined), however we ship a
Release compiler. Therefore we want to say with reasonable confidence
that building a large project with Release vs a Release+Asserts build
gives us the same output when the same compiler version is used.

This difference however, makes it difficult to prove that the output is
the same if the only difference is the name when using LTO builds and
looking at bitcode.

Hence this change is being proposed.
I had a case where the frontend was generating a zero elem array in
non-shader code so it was just crashing in a release build.
Add a real error and make it not crash.

---------

Signed-off-by: Nick Sarnie <[email protected]>
…ual (llvm#170191)

This change makes StackFrame methods virtual to enable subclass
overrides and introduces BorrowedStackFrame, a wrapper that presents an
existing StackFrame with a different frame index.

This enables creating synthetic frame views or renumbering frames
without copying the underlying frame data, which is useful for frame
manipulation scenarios.

This also adds a new borrowed-info format entity to show what was the
original frame index of the borrowed frame.

Signed-off-by: Med Ismail Bennani <[email protected]>
In MFMA rewrite pass, prevent AGPR_32 reg class assignment for scale
operands, not permitted by instruction format.

---------

Co-authored-by: Matt Arsenault <[email protected]>
* Adds lowering for `amdgpu.make_dma_base`
…#161870)" (llvm#170236)

This patch re-lands llvm#161870 with fixes to the previous test failures.

rdar://161834688

Signed-off-by: Med Ismail Bennani <[email protected]>
@gemini-code-assist
Copy link

Summary of Changes

Hello @pranavk, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the CI/CD infrastructure and significantly improving BOLT's capabilities, particularly for AArch64 architectures. It introduces a new premerge advisor service to streamline failure analysis and updates BOLT to correctly handle modern security features like PAC-RET hardening. The changes also include various quality-of-life improvements and refactoring across Clang-Tidy and Clang-Doc tools.

Highlights

  • Premerge Advisor Integration: Introduced new Python scripts (premerge_advisor_explain.py and premerge_advisor_upload.py) to interact with a premerge advisor service. These scripts are integrated into the CI/CD workflow to provide explanations for test failures and upload results, enhancing the feedback loop for pull requests.
  • AArch64 PAC-RET Hardening Support in BOLT: Implemented comprehensive support for AArch64 Pointer Authentication and Return (PAC-RET) hardening. This includes new BOLT passes (MarkRAStates and InsertNegateRAStatePass) to correctly handle DW_CFA_AARCH64_negate_ra_state DWARF instructions, ensuring proper unwinding and code generation for hardened binaries. This involved extensive updates to BinaryContext, BinaryFunction, and MCPlusBuilder to track and manipulate RA states.
  • CI/CD Pipeline Enhancements: Updated various CI/CD scripts to improve robustness and functionality. This includes adding cffi, cryptography, pygithub, and related dependencies to all_requirements.txt, enhancing error handling in cache_lit_timing_files.py, refining test report generation logic in generate_test_report_lib.py, and ensuring .ninja_log files are saved for better debugging.
  • BOLT AArch64 Relaxation and Runtime Hooks: Replaced ADRRelaxationPass with AArch64RelaxationPass to handle both ADR and LDR instruction relaxation on AArch64. Improved runtime library initialization and finalization hooks (DT_INIT, DT_INIT_ARRAY, DT_FINI, DT_FINI_ARRAY) for better instrumentation control in shared libraries and executables.
  • Clang-Tidy and Clang-Doc Refinements: Numerous updates across Clang-Tidy checks, including renaming several Abseil checks for clarity and consistency, and minor refactoring in core Clang-Tidy components. Clang-Doc now uses a new MustacheGenerator base class for HTML/Markdown generation, improving template-based documentation output.
Ignored Files
  • Ignored by pattern: .github/workflows/** (55)
    • .github/workflows/bazel-checks.yml
    • .github/workflows/build-ci-container-tooling.yml
    • .github/workflows/build-ci-container-windows.yml
    • .github/workflows/build-ci-container.yml
    • .github/workflows/build-metrics-container.yml
    • .github/workflows/check-ci.yml
    • .github/workflows/ci-post-commit-analyzer.yml
    • .github/workflows/commit-access-greeter.yml
    • .github/workflows/commit-access-review.py
    • .github/workflows/commit-access-review.yml
    • .github/workflows/containers/github-action-ci-tooling/Dockerfile
    • .github/workflows/containers/github-action-ci-windows/Dockerfile
    • .github/workflows/containers/github-action-ci/Dockerfile
    • .github/workflows/docs.yml
    • .github/workflows/email-check.yaml
    • .github/workflows/gha-codeql.yml
    • .github/workflows/hlsl-test-all.yaml
    • .github/workflows/issue-release-workflow.yml
    • .github/workflows/issue-subscriber.yml
    • .github/workflows/issue-write.yml
    • .github/workflows/libc-fullbuild-tests.yml
    • .github/workflows/libc-overlay-tests.yml
    • .github/workflows/libclang-abi-tests.yml
    • .github/workflows/libclang-python-tests.yml
    • .github/workflows/libcxx-build-and-test.yaml
    • .github/workflows/libcxx-build-containers.yml
    • .github/workflows/libcxx-check-generated-files.yml
    • .github/workflows/libcxx-run-benchmarks.yml
    • .github/workflows/llvm-abi-tests.yml
    • .github/workflows/llvm-bugs.yml
    • .github/workflows/merged-prs.yml
    • .github/workflows/mlir-spirv-tests.yml
    • .github/workflows/new-issues.yml
    • .github/workflows/new-prs.yml
    • .github/workflows/pr-code-format.yml
    • .github/workflows/pr-code-lint.yml
    • .github/workflows/pr-request-release-note.yml
    • .github/workflows/pr-subscriber.yml
    • .github/workflows/premerge.yaml
    • .github/workflows/release-asset-audit.py
    • .github/workflows/release-asset-audit.yml
    • .github/workflows/release-binaries-all.yml
    • .github/workflows/release-binaries-save-stage/action.yml
    • .github/workflows/release-binaries-setup-stage/action.yml
    • .github/workflows/release-binaries.yml
    • .github/workflows/release-documentation.yml
    • .github/workflows/release-doxygen.yml
    • .github/workflows/release-lit.yml
    • .github/workflows/release-sources.yml
    • .github/workflows/release-tasks.yml
    • ... and -5 more files (display capped at 50)
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant amount of new functionality, primarily centered around a 'premerge advisor' for CI, which involves new Python scripts, dependencies, and updates to CI workflows. It also includes a large number of refactorings, typo fixes, and modernizations across the codebase, particularly in the BOLT toolchain and clang-tidy configurations. My review has identified a few critical issues, such as hardcoded IP addresses in a new CI script, which should be made configurable. I've also found several typos, including one in a script that could affect log collection. Overall, the changes seem to improve robustness and add valuable CI features, but the identified issues should be addressed.

Comment on lines +17 to +20
PREMERGE_ADVISOR_URLS = [
"http://34.82.126.63:5000/upload",
"http://136.114.125.23:5000/upload",
]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Hardcoding IP addresses is not a good practice as it makes the configuration difficult to manage, update, and secure. These URLs should be passed in as environment variables or command-line arguments.


ninja -C "${BUILD_DIR}" ${runtime_targets_needs_reconfig} \
|& tee ninja_runtimes_needs_reconfig1.log
cp ${BUILD_DIR}/.ninja_log ninja_runtimes_needs_reconig.ninja_log

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a typo in the filename. reconig should be reconfig.

Suggested change
cp ${BUILD_DIR}/.ninja_log ninja_runtimes_needs_reconig.ninja_log
cp ${BUILD_DIR}/.ninja_log ninja_runtimes_needs_reconfig.ninja_log

Comment on lines +47 to +48
for premerge_advisor_url in PREMERGE_ADVISOR_URLS:
requests.post(premerge_advisor_url, json=failure_info, timeout=5)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The result of the requests.post call is not checked. If the upload fails, it will do so silently. It's better to check the status code and log a warning or error if the request was not successful.

Suggested change
for premerge_advisor_url in PREMERGE_ADVISOR_URLS:
requests.post(premerge_advisor_url, json=failure_info, timeout=5)
for premerge_advisor_url in PREMERGE_ADVISOR_URLS:
try:
response = requests.post(premerge_advisor_url, json=failure_info, timeout=5)
response.raise_for_status()
except requests.exceptions.RequestException as e:
print(f"Failed to upload to {premerge_advisor_url}: {e}", file=sys.stderr)

```
$ llvm-bolt <executable> -o <executable>.bolt -data=perf.fdata -reorder-blocks=ext-tsp -reorder-functions=hfsort -split-functions -split-all-cold -split-eh -dyno-stats
$ llvm-bolt <executable> -o <executable>.bolt -data=perf.fdata -reorder-blocks=ext-tsp -reorder-functions=cdsort -split-functions -split-all-cold -split-eh -dyno-stats
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a typo in the recommended command. cdsort is likely the intended function reordering algorithm, not hfsort which has been superseded.

Suggested change
```
$ llvm-bolt <executable> -o <executable>.bolt -data=perf.fdata -reorder-blocks=ext-tsp -reorder-functions=cdsort -split-functions -split-all-cold -split-eh -dyno-stats


if (!hasValidCodePadding(BF)) {
NumInvalid++;
if (HasRelocations) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Typo in assertion message: unexpect should be unexpected.

             "unexpected non-zero sized symbol at end of section");

// The rest of dynamic relocations - DT_RELA.
// The static executable might have .rela.dyn secion and not have PT_DYNAMIC
// The static executable might have .rela.dyn section and not have PT_DYNAMIC
if (!DynamicRelocationsSize && BC->IsStaticExecutable) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Typo in comment: secion should be section.

  // The static executable might have .rela.dyn section and not have PT_DYNAMIC

#endif

// Function constains trampoline to _start,
// Function constrains trampoline to _start,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Typo in comment: constains should be contains.

// Function contains trampoline to _start,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.