-
Notifications
You must be signed in to change notification settings - Fork 0
[Bazel] Fixes 123 #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Previously we were less specific for POINTER/TARGET: encoding that they could alias with (almost) anything. In the new system, the "target data" tree is now a sibling of the other trees (e.g. "global data"). POITNTER variables go at the root of the "target data" tree, whereas TARGET variables get their own nodes under that tree. For example, ``` integer, pointer :: ip real, pointer :: rp integer, target :: it integer, target :: it2(:) real, target :: rt integer :: i real :: r ``` - `ip` and `rp` may alias with any variable except `i` and `r`. - `it`, `it2`, and `rt` may alias only with `ip` or `rp`. - `i` and `r` cannot alias with any other variable. Fortran 2023 15.5.2.14 gives restrictions on entities associated with dummy arguments. These do not allow non-target globals to be modified through dummy arguments and therefore I don't think we need to make all globals alias with dummy arguments. I haven't implemented it in this patch, but I wonder whether it is ever possible for `ip` to alias with `rt` or even `it2`. While I was updating the tests I fixed up some tests that still assumed that local alloc tbaa wasn't the default. I found no functional regressions in the gfortran test suite, fujitsu test suite, spec2017, or a selection of HPC apps we test internally.
A barrier will pause execution until all threads reach it. If some go to a different barrier then we deadlock. This manifests in that the finalization callback must only be run once. Fix by ensuring we always go through the same finalization block whether the thread in cancelled or not and no matter which cancellation point causes the cancellation. The old callback only affected PARALLEL, so it has been moved into the code generating PARALLEL. For this reason, we don't need similar changes for other cancellable constructs. We need to create the barrier on the shared exit from the outlined function instead of only on the cancelled branch to make sure that threads exiting normally (without cancellation) meet the same barriers as those which were cancelled. For example, previously we might have generated code like ``` ... %ret = call i32 @__kmpc_cancel(...) %cond = icmp eq i32 %ret, 0 br i1 %cond, label %continue, label %cancel continue: // do the rest of the callback, eventually branching to %fini br label %fini cancel: // Populated by the callback: // unsafe: if any thread makes it to the end without being cancelled // it won't reach this barrier and then the program will deadlock %unused = call i32 @__kmpc_cancel_barrier(...) br label %fini fini: // run destructors etc ret ``` In the new version the barrier is moved into fini. I generate it *after* the destructors because the standard describes the barrier as occurring after the end of the parallel region. ``` ... %ret = call i32 @__kmpc_cancel(...) %cond = icmp eq i32 %ret, 0 br i1 %cond, label %continue, label %cancel continue: // do the rest of the callback, eventually branching to %fini br label %fini cancel: br label %fini fini: // run destructors etc // safe so long as every exit from the function happens via this block: %unused = call i32 @__kmpc_cancel_barrier(...) ret ``` To achieve this, the barrier is now generated alongside the finalization code instead of in the callback. This is the reason for the changes to the unit test. I'm unsure if I should keep the incorrect barrier generation callback only on the cancellation branch in clang with the OMPIRBuilder backend because that would match clang's ordinary codegen. Right now I have opted to remove it entirely because it is a deadlock waiting to happen. --- This re-lands llvm#164586 with a small fix for a failing buildbot running address sanitizer on clang lit tests. In the previous version of the patch I added an insertion point guard "just to be safe" and never removed it. There isn't insertion point guarding on the other route out of this function and we do not preserve the insertion point around getFiniBB either so it is not needed here. The problem flagged by the sanitizers was because the saved insertion point pointed to an instruction which was then removed inside the FiniCB for some clang codegen functions. The instruction was freed when it was removed. Then accessing it to restore the insertion point was a use after free bug.
As noted in the reproducer provided in llvm#164762 (comment), on RISC-V after LTO we sometimes have trip counts exposed to vectorized loops. The loop vectorizer will have generated calls to @llvm.experimental.get.vector.length, but there are [some properties](https://llvm.org/docs/LangRef.html#id2399) about the intrinsic we can use to simplify it: - The result is always less than both Count and MaxLanes - If Count <= MaxLanes, then the result is Count This teaches SCCP to handle these cases with the intrinsic, which allows some single-iteration-after-LTO loops to be unfolded. llvm#169293 is related and also simplifies the intrinsic in InstCombine via computeKnownBits, but it can't fully remove the loop since computeKnownBits only does limited reasoning on recurrences.
They don't have side-effects, so this should be fine. Fixes llvm#170064
Similar to how getElementCount avoids the need to reason about fixed and scalable ElementCounts separately, this patch adds getTypeSize to do the same for TypeSize. It also goes through and replaces some of the manual uses of getVScale with getTypeSize/getElementCount where possible.
…n into after region (llvm#169892) When a `scf.if` directly precedes a `scf.condition` in the before region of a `scf.while` and both share the same condition, move the if into the after region of the loop. This helps simplify the control flow to enable uplifting `scf.while` to `scf.for`.
During InsertNegateRAState pass we check the annotations on instructions, to decide where to generate the OpNegateRAState CFIs in the output binary. As only instructions in the input binary were annotated, we have to make a judgement on instructions generated by other BOLT passes. Incorrect placement may cause issues when an (async) unwind request is received during the new "unknown" instructions. This patch adds more logic to make a more informed decision on by taking into account: - unknown instructions in a BasicBlock with other instruction have the same RAState. Previously, if the BasicBlock started with an unknown instruction, the RAState was copied from the preceding block. Now, the RAState is copied from the succeeding instructions in the same block. - Some BasicBlocks may only contain instructions with unknown RAState, As explained in issue llvm#160989, these blocks already have incorrect unwind info. Because of this, the last known RAState based on the layout order is copied. Updated bolt/docs/PacRetDesign.md to reflect changes.
…er region in scf-uplift-while-to-for" (llvm#169888) Reverts llvm#165216 It is implemented in llvm#169892 .
Reverts llvm#169544 [Regressed](https://lab.llvm.org/buildbot/#/builders/143/builds/12956) gfortran test suite
…70095) From OpenMP 4.0: > When an if clause is present on a cancel construct and the if expression > evaluates to false, the cancel construct does not activate cancellation. > The cancellation point associated with the cancel construct is always > encountered regardless of the value of the if expression. This wording is retained unmodified in OpenMP 6.0. This re-opens the already approved PR llvm#164587, which was closed by accident. The only changes are a rebase.
…lvm#170096) Similar to fdiv, we should be trying to concat these high latency instructions together
…together. (llvm#170098) Can only do this for 128->256 cases as we can't safely convert to the RCP14/RSQRT14 variants
Python multiprocessing is limited to 60 workers at most: https://github.com/python/cpython/blob/6bc65c30ff1fd0b581a2c93416496fc720bc442c/Lib/concurrent/futures/process.py#L669-L672 The limit being per thread pool, we can work around it by using multiple pools on windows when we want to actually use more workers.
…lvm#170097) We can't read from those and will run into an assertion sooner or later. Fixes llvm#170031
From the review in llvm#169527 (comment), there are some users where we want to extend or truncate a ConstantRange only if it's not already the destination bitwidth. Previously this asserted, so this PR relaxes it to just be a no-op, similar to IRBuilder::createZExt and friends.
Follow-up for llvm#169047. The previous PR moved some functions from DA to Delinearization, but the member function declarations were not updated accordingly. This patch removes them.
…VECTOR. (llvm#162308) This PR implements catch all handling for widening the scalable subvector operand (INSERT_SUBVECTOR) or result (EXTRACT_SUBVECTOR). It does this via the stack using masked memory operations. With general handling available we can add optimiations for specific cases.
This PR re-lands llvm#165873. This PR extends the gpu.subgroup_mma_* ops to support fp64 type. The extension requires special handling during the lowering to nvvm due to the return type for load ops for fragment a and b (they return a scalar instead of a struct). The original PR did not guard the new test based on the required architecture (sm80) which lead to a failure on the cuda runners with T4 GPUs.
… together (llvm#170113) Similar to fdiv, we should be trying to concat these high latency instructions together
…_from_uid() (llvm#168554) Reland llvm#164392 with Fortran support moved to follow-up PR
…lvm#169926) Add two test cases where dependencies are missed due to overflows. These will be fixed by llvm#169927 and llvm#169928, respectively.
These f80 fp types are only supported on X86 and can be removed from AArch64. It looks like they were copied from another backend by mistake.
Test seems to pass after re-enabling without any additional changes.
This uses lit substitution, which fixes running this test on some environment where 'python' isn't in the path.
…xtension (llvm#169137)" This reverts commit e7748e9. It broke the Windows build https://github.com/llvm/llvm-project/actions/runs/19842117405/job/56852610863 https://lab.llvm.org/buildbot/#/builders/166/builds/4535 After llvm#170142 fixed another issue, this was also the remaining reason for this buildbot to fail: https://lab.llvm.org/buildbot/#/builders/207/builds/10423
Vector registers have synthetic values for display purposes. This causes SBValue::GetExpressionPath to dispatch to ValueObjectSynthetic instead of ValueObjectRegister, producing incorrect results. Fixes llvm#147144
…#169795) Similar to llvm#169156 again, this is mostly for denormal handling as there is no rounding step in a minnum/maxnum.
Add additional tests where extra no-alias checks are needed, as future extensions of llvm#168771.
This moves a few existing debug info flags that were floating in the general pool of unorganised flags over to the existing groups for debug info flags (so that they are presented together in documentation). As a tiny further tweak, this also fixes the spelling of "DWARF" in the flag docs for consistency with other flags.
Fix a comment post llvm#162167
One of the previous PRs llvm#169267 has reintroduced block count to layout propagation that was removed in llvm#168504. This PR patches the issue.
Adding llvm:Support dep since plugin started using llvm/ADT/...
) In Debug builds, the names of adjusted pointers have a pointer-specific name prefix which doesn't exist in non-debug builds. This causes differences in output when looking at the output of SROA with a Debug or Release compiler. For most of our ongoing testing, we use essentially Release+Asserts build (basically release but without NDEBUG defined), however we ship a Release compiler. Therefore we want to say with reasonable confidence that building a large project with Release vs a Release+Asserts build gives us the same output when the same compiler version is used. This difference however, makes it difficult to prove that the output is the same if the only difference is the name when using LTO builds and looking at bitcode. Hence this change is being proposed.
… 63))) (llvm#170128) Follow up to llvm#71844 after shl implementation
I had a case where the frontend was generating a zero elem array in non-shader code so it was just crashing in a release build. Add a real error and make it not crash. --------- Signed-off-by: Nick Sarnie <[email protected]>
…ual (llvm#170191) This change makes StackFrame methods virtual to enable subclass overrides and introduces BorrowedStackFrame, a wrapper that presents an existing StackFrame with a different frame index. This enables creating synthetic frame views or renumbering frames without copying the underlying frame data, which is useful for frame manipulation scenarios. This also adds a new borrowed-info format entity to show what was the original frame index of the borrowed frame. Signed-off-by: Med Ismail Bennani <[email protected]>
In MFMA rewrite pass, prevent AGPR_32 reg class assignment for scale operands, not permitted by instruction format. --------- Co-authored-by: Matt Arsenault <[email protected]>
* Adds lowering for `amdgpu.make_dma_base`
…#161870)" (llvm#170236) This patch re-lands llvm#161870 with fixes to the previous test failures. rdar://161834688 Signed-off-by: Med Ismail Bennani <[email protected]>
Summary of ChangesHello @pranavk, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on enhancing the CI/CD infrastructure and significantly improving BOLT's capabilities, particularly for AArch64 architectures. It introduces a new premerge advisor service to streamline failure analysis and updates BOLT to correctly handle modern security features like PAC-RET hardening. The changes also include various quality-of-life improvements and refactoring across Clang-Tidy and Clang-Doc tools. Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant amount of new functionality, primarily centered around a 'premerge advisor' for CI, which involves new Python scripts, dependencies, and updates to CI workflows. It also includes a large number of refactorings, typo fixes, and modernizations across the codebase, particularly in the BOLT toolchain and clang-tidy configurations. My review has identified a few critical issues, such as hardcoded IP addresses in a new CI script, which should be made configurable. I've also found several typos, including one in a script that could affect log collection. Overall, the changes seem to improve robustness and add valuable CI features, but the identified issues should be addressed.
| PREMERGE_ADVISOR_URLS = [ | ||
| "http://34.82.126.63:5000/upload", | ||
| "http://136.114.125.23:5000/upload", | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| ninja -C "${BUILD_DIR}" ${runtime_targets_needs_reconfig} \ | ||
| |& tee ninja_runtimes_needs_reconfig1.log | ||
| cp ${BUILD_DIR}/.ninja_log ninja_runtimes_needs_reconig.ninja_log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for premerge_advisor_url in PREMERGE_ADVISOR_URLS: | ||
| requests.post(premerge_advisor_url, json=failure_info, timeout=5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result of the requests.post call is not checked. If the upload fails, it will do so silently. It's better to check the status code and log a warning or error if the request was not successful.
| for premerge_advisor_url in PREMERGE_ADVISOR_URLS: | |
| requests.post(premerge_advisor_url, json=failure_info, timeout=5) | |
| for premerge_advisor_url in PREMERGE_ADVISOR_URLS: | |
| try: | |
| response = requests.post(premerge_advisor_url, json=failure_info, timeout=5) | |
| response.raise_for_status() | |
| except requests.exceptions.RequestException as e: | |
| print(f"Failed to upload to {premerge_advisor_url}: {e}", file=sys.stderr) |
| ``` | ||
| $ llvm-bolt <executable> -o <executable>.bolt -data=perf.fdata -reorder-blocks=ext-tsp -reorder-functions=hfsort -split-functions -split-all-cold -split-eh -dyno-stats | ||
| $ llvm-bolt <executable> -o <executable>.bolt -data=perf.fdata -reorder-blocks=ext-tsp -reorder-functions=cdsort -split-functions -split-all-cold -split-eh -dyno-stats | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a typo in the recommended command. cdsort is likely the intended function reordering algorithm, not hfsort which has been superseded.
| ``` | |
| $ llvm-bolt <executable> -o <executable>.bolt -data=perf.fdata -reorder-blocks=ext-tsp -reorder-functions=cdsort -split-functions -split-all-cold -split-eh -dyno-stats |
|
|
||
| if (!hasValidCodePadding(BF)) { | ||
| NumInvalid++; | ||
| if (HasRelocations) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // The rest of dynamic relocations - DT_RELA. | ||
| // The static executable might have .rela.dyn secion and not have PT_DYNAMIC | ||
| // The static executable might have .rela.dyn section and not have PT_DYNAMIC | ||
| if (!DynamicRelocationsSize && BC->IsStaticExecutable) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #endif | ||
|
|
||
| // Function constains trampoline to _start, | ||
| // Function constrains trampoline to _start, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes 123.