Skip to content

Conversation

@pranavk
Copy link
Owner

@pranavk pranavk commented Dec 3, 2025

This fixes 123.

tblah and others added 30 commits December 1, 2025 10:05
Previously we were less specific for POINTER/TARGET: encoding that they
could alias with (almost) anything.

In the new system, the "target data" tree is now a sibling of the other
trees (e.g. "global data"). POITNTER variables go at the root of the
"target data" tree, whereas TARGET variables get their own nodes under
that tree. For example,

```
integer, pointer :: ip
real, pointer :: rp
integer, target :: it
integer, target :: it2(:)
real, target :: rt
integer :: i
real :: r
```
- `ip` and `rp` may alias with any variable except `i` and `r`.
- `it`, `it2`, and `rt` may alias only with `ip` or `rp`.
- `i` and `r` cannot alias with any other variable.

Fortran 2023 15.5.2.14 gives restrictions on entities associated with
dummy arguments. These do not allow non-target globals to be modified
through dummy arguments and therefore I don't think we need to make all
globals alias with dummy arguments.

I haven't implemented it in this patch, but I wonder whether it is ever
possible for `ip` to alias with `rt` or even `it2`.

While I was updating the tests I fixed up some tests that still assumed
that local alloc tbaa wasn't the default.

I found no functional regressions in the gfortran test suite, fujitsu
test suite, spec2017, or a selection of HPC apps we test internally.
A barrier will pause execution until all threads reach it. If some go to
a different barrier then we deadlock. This manifests in that the
finalization callback must only be run once. Fix by ensuring we always
go through the same finalization block whether the thread in cancelled
or not and no matter which cancellation point causes the cancellation.

The old callback only affected PARALLEL, so it has been moved into the
code generating PARALLEL. For this reason, we don't need similar changes
for other cancellable constructs. We need to create the barrier on the
shared exit from the outlined function instead of only on the cancelled
branch to make sure that threads exiting normally (without cancellation)
meet the same barriers as those which were cancelled. For example,
previously we might have generated code like

```
...
  %ret = call i32 @__kmpc_cancel(...)
  %cond = icmp eq i32 %ret, 0
  br i1 %cond, label %continue, label %cancel

continue:
  // do the rest of the callback, eventually branching to %fini
  br label %fini

cancel:
  // Populated by the callback:
  // unsafe: if any thread makes it to the end without being cancelled
  // it won't reach this barrier and then the program will deadlock
  %unused = call i32 @__kmpc_cancel_barrier(...)
  br label %fini

fini:
  // run destructors etc
  ret
```

In the new version the barrier is moved into fini. I generate it *after*
the destructors because the standard describes the barrier as occurring
after the end of the parallel region.

```
...
  %ret = call i32 @__kmpc_cancel(...)
  %cond = icmp eq i32 %ret, 0
  br i1 %cond, label %continue, label %cancel

continue:
  // do the rest of the callback, eventually branching to %fini
  br label %fini

cancel:
  br label %fini

fini:
  // run destructors etc
  // safe so long as every exit from the function happens via this block:
  %unused = call i32 @__kmpc_cancel_barrier(...)
  ret
```

To achieve this, the barrier is now generated alongside the finalization
code instead of in the callback. This is the reason for the changes to
the unit test.

I'm unsure if I should keep the incorrect barrier generation callback
only on the cancellation branch in clang with the OMPIRBuilder backend
because that would match clang's ordinary codegen. Right now I have
opted to remove it entirely because it is a deadlock waiting to happen.

---

This re-lands llvm#164586 with a small fix for a failing buildbot running
address sanitizer on clang lit tests.

In the previous version of the patch I added an insertion point guard
"just to be safe" and never removed it. There isn't insertion point
guarding on the other route out of this function and we do not
preserve the insertion point around getFiniBB either so it is not
needed here.

The problem flagged by the sanitizers was because the saved insertion
point pointed to an instruction which was then removed inside the FiniCB
for some clang codegen functions. The instruction was freed when it was
removed. Then accessing it to restore the insertion point was a use
after free bug.
As noted in the reproducer provided in
llvm#164762 (comment),
on RISC-V after LTO we sometimes have trip counts exposed to vectorized
loops. The loop vectorizer will have generated calls to
@llvm.experimental.get.vector.length, but there are [some
properties](https://llvm.org/docs/LangRef.html#id2399) about the
intrinsic we can use to simplify it:

- The result is always less than both Count and MaxLanes
- If Count <= MaxLanes, then the result is Count

This teaches SCCP to handle these cases with the intrinsic, which allows
some single-iteration-after-LTO loops to be unfolded.

llvm#169293 is related and also simplifies the intrinsic in InstCombine via
computeKnownBits, but it can't fully remove the loop since
computeKnownBits only does limited reasoning on recurrences.
They don't have side-effects, so this should be fine.

Fixes llvm#170064
Similar to how getElementCount avoids the need to reason about fixed and
scalable ElementCounts separately, this patch adds getTypeSize to do the
same for TypeSize.

It also goes through and replaces some of the manual uses of getVScale
with getTypeSize/getElementCount where possible.
…n into after region (llvm#169892)

When a `scf.if` directly precedes a `scf.condition` in the before region
of a `scf.while` and both share the same condition, move the if into the
after region of the loop. This helps simplify the control flow to enable
uplifting `scf.while` to `scf.for`.
During InsertNegateRAState pass we check the annotations on
instructions,
to decide where to generate the OpNegateRAState CFIs in the output
binary.

As only instructions in the input binary were annotated, we have to make
a judgement on instructions generated by other BOLT passes.
Incorrect placement may cause issues when an (async) unwind request
is received during the new "unknown" instructions.

This patch adds more logic to make a more informed decision on by taking
into account:
- unknown instructions in a BasicBlock with other instruction have the
same RAState. Previously, if the BasicBlock started with an unknown
instruction,
the RAState was copied from the preceding block. Now, the RAState is
copied from
  the succeeding instructions in the same block.
- Some BasicBlocks may only contain instructions with unknown RAState,
As explained in issue llvm#160989, these blocks already have incorrect
unwind info. Because of this, the last known RAState based on the layout order
is copied.

Updated bolt/docs/PacRetDesign.md to reflect changes.
…er region in scf-uplift-while-to-for" (llvm#169888)

Reverts llvm#165216
It is implemented in llvm#169892 .
…70095)

From OpenMP 4.0:

> When an if clause is present on a cancel construct and the if
expression
> evaluates to false, the cancel construct does not activate
cancellation.
> The cancellation point associated with the cancel construct is always
> encountered regardless of the value of the if expression.

This wording is retained unmodified in OpenMP 6.0.

This re-opens the already approved PR llvm#164587, which was closed by
accident. The only changes are a rebase.
…lvm#170096)

Similar to fdiv, we should be trying to concat these high latency instructions together
…together. (llvm#170098)

Can only do this for 128->256 cases as we can't safely convert to the RCP14/RSQRT14 variants
Python multiprocessing is limited to 60 workers at most:

https://github.com/python/cpython/blob/6bc65c30ff1fd0b581a2c93416496fc720bc442c/Lib/concurrent/futures/process.py#L669-L672

The limit being per thread pool, we can work around it by using multiple
pools on windows when we want to actually use more workers.
…lvm#170097)

We can't read from those and will run into an assertion sooner or later.

Fixes llvm#170031
From the review in
llvm#169527 (comment),
there are some users where we want to extend or truncate a ConstantRange
only if it's not already the destination bitwidth. Previously this
asserted, so this PR relaxes it to just be a no-op, similar to
IRBuilder::createZExt and friends.
Follow-up for llvm#169047. The previous PR moved some functions from DA to
Delinearization, but the member function declarations were not updated
accordingly. This patch removes them.
…VECTOR. (llvm#162308)

This PR implements catch all handling for widening the scalable
subvector operand (INSERT_SUBVECTOR) or result (EXTRACT_SUBVECTOR). It
does this via the stack using masked memory operations. With general
handling available we can add optimiations for specific cases.
This PR re-lands llvm#165873.

This PR extends the gpu.subgroup_mma_* ops to support fp64 type.
The extension requires special handling during the lowering to nvvm due
to the return type for load ops for fragment a and b (they return a
scalar instead of a struct).

The original PR did not guard the new test based on the required
architecture (sm80) which lead to a failure on the cuda runners with T4
GPUs.
… together (llvm#170113)

Similar to fdiv, we should be trying to concat these high latency instructions together
…_from_uid() (llvm#168554)

Reland llvm#164392 with Fortran support moved to follow-up PR
…lvm#169926)

Add two test cases where dependencies are missed due to overflows. These
will be fixed by llvm#169927 and llvm#169928, respectively.
These f80 fp types are only supported on X86 and can be removed from
AArch64. It looks like they were copied from another backend by mistake.
Test seems to pass after re-enabling without any additional changes.
This uses lit substitution, which fixes running this test on
some environment where 'python' isn't in the path.
Meinersbur and others added 28 commits December 2, 2025 17:48
Vector registers have synthetic values for display purposes. This causes
SBValue::GetExpressionPath to dispatch
to ValueObjectSynthetic instead of ValueObjectRegister, producing
incorrect results.

Fixes llvm#147144
…70326)

4394aa6 introduced the test
amdgcn_weak_alias, which is failing on the reverse iteration build, due
to the the order of the aliasees being different. This failure is a test
issue, not a bug, so the metadata checks are removed.
)

Currently we try to hoist the transformed IV increment instruction to
the header block to help with generation of postincrement instructions,
but this only works if the user instruction is also in the header. We
should instead be trying to insert it in the same block as the user.
…#169795)

Similar to llvm#169156 again, this is mostly for denormal handling as there
is no rounding step in a minnum/maxnum.
Add additional tests where extra no-alias checks are needed, as future
extensions of llvm#168771.
This moves a few existing debug info flags that were floating in the
general pool of unorganised flags over to the existing groups for debug
info flags (so that they are presented together in documentation).

As a tiny further tweak, this also fixes the spelling of "DWARF" in the
flag docs for consistency with other flags.
One of the previous PRs
llvm#169267 has reintroduced block
count to layout propagation that was removed in
llvm#168504. This PR patches the
issue.
Adding llvm:Support dep since plugin started using llvm/ADT/...
)

In Debug builds, the names of adjusted pointers have a pointer-specific
name prefix which doesn't exist in non-debug builds.

This causes differences in output when looking at the output of SROA
with a Debug or Release compiler.

For most of our ongoing testing, we use essentially Release+Asserts
build (basically release but without NDEBUG defined), however we ship a
Release compiler. Therefore we want to say with reasonable confidence
that building a large project with Release vs a Release+Asserts build
gives us the same output when the same compiler version is used.

This difference however, makes it difficult to prove that the output is
the same if the only difference is the name when using LTO builds and
looking at bitcode.

Hence this change is being proposed.
I had a case where the frontend was generating a zero elem array in
non-shader code so it was just crashing in a release build.
Add a real error and make it not crash.

---------

Signed-off-by: Nick Sarnie <[email protected]>
…ual (llvm#170191)

This change makes StackFrame methods virtual to enable subclass
overrides and introduces BorrowedStackFrame, a wrapper that presents an
existing StackFrame with a different frame index.

This enables creating synthetic frame views or renumbering frames
without copying the underlying frame data, which is useful for frame
manipulation scenarios.

This also adds a new borrowed-info format entity to show what was the
original frame index of the borrowed frame.

Signed-off-by: Med Ismail Bennani <[email protected]>
In MFMA rewrite pass, prevent AGPR_32 reg class assignment for scale
operands, not permitted by instruction format.

---------

Co-authored-by: Matt Arsenault <[email protected]>
* Adds lowering for `amdgpu.make_dma_base`
…#161870)" (llvm#170236)

This patch re-lands llvm#161870 with fixes to the previous test failures.

rdar://161834688

Signed-off-by: Med Ismail Bennani <[email protected]>
@gemini-code-assist
Copy link

Summary of Changes

Hello @pranavk, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a comprehensive set of enhancements across the project's build and CI/CD systems, with a strong emphasis on improving AArch64 architecture support and integrating advanced premerge quality checks. The changes aim to streamline development workflows, ensure correct handling of platform-specific features like pointer authentication, and leverage containerization for more consistent build environments.

Highlights

  • Bazel and CI/CD Infrastructure Updates: The pull request introduces significant updates to the Bazel build system and continuous integration/continuous deployment (CI/CD) infrastructure. This includes adding new Python dependencies, refining test reporting mechanisms, and integrating with a premerge advisor service for enhanced code quality and stability.
  • AArch64 Support and Pointer Authentication: Extensive changes have been made to improve AArch64 support within BOLT, particularly concerning pointer authentication (PAC-RET) hardening. New passes (MarkRAStates, InsertNegateRAState) are introduced to correctly handle DW_CFA_AARCH64_negate_ra_state directives, ensuring proper unwinding information after optimizations. Additionally, an AArch64RelaxationPass is added to handle non-local ADR/LDR instructions.
  • Premerge Advisor Integration: Two new Python scripts, premerge_advisor_explain.py and premerge_advisor_upload.py, are added to integrate with an external premerge advisor service. These scripts parse test failures and build logs, send relevant data to the advisor, and can generate GitHub comments with explanations for identified issues, streamlining the review process.
  • Containerization and GitHub Actions: New GitHub Actions (build-container/action.yml, push-container/action.yml) are introduced to standardize container building and pushing processes using Podman. This enhances the reproducibility and deployment of CI environments and tools.
  • Code Quality and Maintainability: Various code quality improvements and refactorings are present across the codebase. This includes updating Python dependency management, fixing typos in documentation and code comments, and adjusting build scripts for better robustness and clarity. The .clang-tidy configuration has been updated, and new maintainer files (Maintainers.rst) have been added for clang-tools-extra.
Ignored Files
  • Ignored by pattern: .github/workflows/** (55)
    • .github/workflows/bazel-checks.yml
    • .github/workflows/build-ci-container-tooling.yml
    • .github/workflows/build-ci-container-windows.yml
    • .github/workflows/build-ci-container.yml
    • .github/workflows/build-metrics-container.yml
    • .github/workflows/check-ci.yml
    • .github/workflows/ci-post-commit-analyzer.yml
    • .github/workflows/commit-access-greeter.yml
    • .github/workflows/commit-access-review.py
    • .github/workflows/commit-access-review.yml
    • .github/workflows/containers/github-action-ci-tooling/Dockerfile
    • .github/workflows/containers/github-action-ci-windows/Dockerfile
    • .github/workflows/containers/github-action-ci/Dockerfile
    • .github/workflows/docs.yml
    • .github/workflows/email-check.yaml
    • .github/workflows/gha-codeql.yml
    • .github/workflows/hlsl-test-all.yaml
    • .github/workflows/issue-release-workflow.yml
    • .github/workflows/issue-subscriber.yml
    • .github/workflows/issue-write.yml
    • .github/workflows/libc-fullbuild-tests.yml
    • .github/workflows/libc-overlay-tests.yml
    • .github/workflows/libclang-abi-tests.yml
    • .github/workflows/libclang-python-tests.yml
    • .github/workflows/libcxx-build-and-test.yaml
    • .github/workflows/libcxx-build-containers.yml
    • .github/workflows/libcxx-check-generated-files.yml
    • .github/workflows/libcxx-run-benchmarks.yml
    • .github/workflows/llvm-abi-tests.yml
    • .github/workflows/llvm-bugs.yml
    • .github/workflows/merged-prs.yml
    • .github/workflows/mlir-spirv-tests.yml
    • .github/workflows/new-issues.yml
    • .github/workflows/new-prs.yml
    • .github/workflows/pr-code-format.yml
    • .github/workflows/pr-code-lint.yml
    • .github/workflows/pr-request-release-note.yml
    • .github/workflows/pr-subscriber.yml
    • .github/workflows/premerge.yaml
    • .github/workflows/release-asset-audit.py
    • .github/workflows/release-asset-audit.yml
    • .github/workflows/release-binaries-all.yml
    • .github/workflows/release-binaries-save-stage/action.yml
    • .github/workflows/release-binaries-setup-stage/action.yml
    • .github/workflows/release-binaries.yml
    • .github/workflows/release-documentation.yml
    • .github/workflows/release-doxygen.yml
    • .github/workflows/release-lit.yml
    • .github/workflows/release-sources.yml
    • .github/workflows/release-tasks.yml
    • ... and -5 more files (display capped at 50)
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a wide range of changes, including significant enhancements to the CI infrastructure, new features for AArch64 such as PAC-RET and BTI support, and numerous bug fixes, refactorings, and cleanups across the codebase. The introduction of a pre-merge advisor service is a notable addition. Overall, the changes are positive and improve the project's infrastructure and feature set. I've provided a few comments on potential improvements and minor issues.

@@ -1,3 +1,4 @@
HeaderFilterRegex: ''

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Setting HeaderFilterRegex: '' at the root level might unintentionally disable clang-tidy checks for all header files, as the default is '.*' which checks all headers. An empty string makes clang-tidy fall back to checking file extensions, which might not cover all intended headers. If the goal is to override this in subdirectories, it might be safer to set a regex that matches nothing (e.g., using an invalid regex) or to rely on the default and have subdirectories be more restrictive.

Comment on lines +56 to +61
except exceptions.ClientError as client_error:
print(
"::warning file=cache_lit_timing_files.py::Failed to list blobs "
"in bucket."
)
sys.exit(0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The script exits with status 0 even when it fails to list blobs in the GCS bucket. This will cause the CI step to pass silently, which could hide configuration or permission issues. It might be better to exit with a non-zero status to fail the step and alert developers of the problem. If this failure is considered non-critical, this behavior might be acceptable, but it's worth reconsidering.


ninja -C "${BUILD_DIR}" ${runtime_targets_needs_reconfig} \
|& tee ninja_runtimes_needs_reconfig1.log
cp ${BUILD_DIR}/.ninja_log ninja_runtimes_needs_reconig.ninja_log

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There appears to be a typo in the filename. It's likely that ninja_runtimes_needs_reconig.ninja_log should be ninja_runtimes_needs_reconfig.ninja_log.

Suggested change
cp ${BUILD_DIR}/.ninja_log ninja_runtimes_needs_reconig.ninja_log
cp ${BUILD_DIR}/.ninja_log ninja_runtimes_needs_reconfig.ninja_log

Comment on lines +17 to +20
PREMERGE_ADVISOR_URLS = [
"http://34.82.126.63:5000/upload",
"http://136.114.125.23:5000/upload",
]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The URLs for the pre-merge advisor service are hardcoded as IP addresses. It would be more maintainable and robust to use DNS names instead. This would make it easier to update the service locations in the future without changing the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet