Skip to content

Conversation

@buggfg
Copy link
Contributor

@buggfg buggfg commented Sep 9, 2025

We recommend setting dfa-jump-thread to be enabled by default. It’s a mature optimization that’s been supported since GCC 9.1.0. At the -O2 opt level, both the GCC and ICX compilers have this optimization enabled by default.

Once it’s enabled, we saw a 13% performance improvement in the CoreMark benchmark on the X86 platform (Intel i9-11900K Rocket Lake), and even a 15% increase on the KunMingHu FPGA. Additionally, we verified the correctness of this pass using SPEC 2017.

Co-Authored-By: YinZd <[email protected]>
Co-Authored-By: ict-ql <[email protected]>
Co-Authored-By: Lin Wang <[email protected]>
@github-actions
Copy link

github-actions bot commented Sep 9, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@efriedma-quic
Copy link
Collaborator

See previous discussion on #83033.

@nikic nikic requested a review from XChy September 10, 2025 08:37
@nikic
Copy link
Contributor

nikic commented Sep 10, 2025

This pass currently does not have a maintainer. We should have one if we enable it by default.

@XChy What is your opinion on the current quality of this pass?

@buggfg Beyond running SPEC, have you performed any due diligence to ensure this pass is ready for default enablement? Have you performed any targeted fuzzing for this enablement?

I'll provide new compile-time numbers.

@XChy
Copy link
Member

XChy commented Sep 10, 2025

The current implementation is conservative and optimizes only one switch. After numerous fixes, this pass has become more stable and safe. For me, it's acceptable to enable this pass now if there is no severe compile-time regression. And enabling it will uncover more hidden problems.
My main concern is that compile-time may increase massively if we optimize multiple switches in the future, though we don't allow it currently.
Anyway, I am not against enabling DFAJumpThreading by default.

@nikic
Copy link
Contributor

nikic commented Sep 10, 2025

Compile-time: https://llvm-compile-time-tracker.com/compare.php?from=d685508bec02a676383b284d268fe8a2e4cbf7f3&to=ba220372f5c68b12c15ce2efbccf4edc9ea300f1&stat=instructions:u

Generally looks okay, with some outliers that need to be investigated (libclamav_nsis_LZMADecode.c +95%, NativeInlineSiteSymbol.cpp +33%). These might be fine assuming the increase is due to unavoidable second order effects from the transformation being performed.

@buggfg
Copy link
Contributor Author

buggfg commented Sep 11, 2025

This pass currently does not have a maintainer. We should have one if we enable it by default.

@XChy What is your opinion on the current quality of this pass?

@buggfg Beyond running SPEC, have you performed any due diligence to ensure this pass is ready for default enablement? Have you performed any targeted fuzzing for this enablement?

I'll provide new compile-time numbers.

Hi, I’m really glad to hear that both reviewers support enabling DFAJumpThreading by default. I haven’t had the chance to do any additional research beyond the SPEC2017 benchmarks yet. However, I believe that enabling this optimization by default is essential for the embedded domain, as it will help increase LLVM's impact and uncover other potential issues. :)

@XChy
Copy link
Member

XChy commented Sep 27, 2025

@nikic, I can be a candidate for maintainer, as I am familiar with the codebase of this pass and actively involved in it. But it would be better if there were someone else to co-maintain. I'm concerned that only one person will be working on it, actually. @dybv-sc @UsmanNadeem, do you volunteer too?

@UsmanNadeem
Copy link
Contributor

@XChy Yes, I can volunteer to be a co-maintainer as well.

We are also interested in seeing DFA Jumpthreading being enabled by default. I think some others have also shown interest in it as well. See my previous patch: #83033

@XChy
Copy link
Member

XChy commented Oct 1, 2025

Generally looks okay, with some outliers that need to be investigated (libclamav_nsis_LZMADecode.c +95%, NativeInlineSiteSymbol.cpp +33%)

Will look into these cases.

@XChy
Copy link
Member

XChy commented Oct 1, 2025

I profiled libclamav_nsis_LZMADecode with valgrind. I found that the SLPVectorizer becomes slower after DFAJumpThreading. Specifically, the time proportion of SLPVectorizer::BoUpSLP::buildTreeRec increases from 10% to 37%.

Before enableing DFAJumpThreading:
image

After:
image

@XChy
Copy link
Member

XChy commented Oct 3, 2025

I believe #161632 has resolved most outliers. The bottleneck mostly lies in the later optimizations in the pipeline after duplicating blocks, instead of lying in DFAJumpThreading itself.

@XChy XChy requested a review from UsmanNadeem October 7, 2025 15:20
@XChy
Copy link
Member

XChy commented Oct 7, 2025

After effort on reducing compile-time, we get a better number: https://llvm-compile-time-tracker.com/compare.php?from=ed113e7904943565b4cd05588f6b639e40187510&to=2b901ca2e1b77d2b7a31cbcb57a921aa662341f9&stat=instructions:u.
@nikic, are you satisfied with the current compile-time?

@nikic
Copy link
Contributor

nikic commented Oct 7, 2025

After effort on reducing compile-time, we get a better number: https://llvm-compile-time-tracker.com/compare.php?from=ed113e7904943565b4cd05588f6b639e40187510&to=2b901ca2e1b77d2b7a31cbcb57a921aa662341f9&stat=instructions:u. @nikic, are you satisfied with the current compile-time?

Yeah, those results look great.

I also started a compile-time run on llvm-opt-benchmark with these results: dtcxzyw/llvm-opt-benchmark#2906 (comment) If you have time, looking at the big outlier there (openssl/smime.ll with +175%) would be good.

@nikic
Copy link
Contributor

nikic commented Oct 7, 2025

@zyw-bot csmith-fuzz

@XChy
Copy link
Member

XChy commented Oct 8, 2025

looking at the big outlier there (openssl/smime.ll with +175%) would be good.

#162447 resolves it partially, but the main cause remains there: the number of phi nodes increases massively, and thus SLPVectorizer slows down. Not come up with a solution for DFAJumpThreading yet. Maybe we can redesign the SSA updating method there.

@nikic
Copy link
Contributor

nikic commented Oct 10, 2025

@XChy Can you share how the before/after IR for that case looks like?

@XChy
Copy link
Member

XChy commented Oct 10, 2025

@XChy Can you share how the before/after IR for that case looks like?

@nikic Sure, see also https://gist.github.com/XChy/9fd567c44be7f5097b7edb973ef236f8.

@XChy
Copy link
Member

XChy commented Oct 10, 2025

The IR file may be too big to read. To illustrate it more clearly, we can imagine a threadable path BB1 -> BB2 -> BB3, where the terminator of BB2 is a switch with many successors. Duplicating BB2 requires inserting many phi nodes into these successors for the cloned instructions throughout the path.
To avoid such cases, I only come up with a workaround: create a common block with the switch and insert common phis into this common block. Though I don't think it's a good solution since it may require inserting additional branches..

@nikic
Copy link
Contributor

nikic commented Oct 10, 2025

Would it make sense to limit the number of phi nodes?

Looking at this example, I'm also somewhat surprised that we perform the optimization in a case where the result still has multiple large dispatch switches -- is that expected/profitable?

@XChy
Copy link
Member

XChy commented Oct 10, 2025

Would it make sense to limit the number of phi nodes?

Yes, but the problem is that it's hard to predict the exact number of inserted phi nodes before threading. Coarsely, we can set a threshold for the number of unduplicated successors.

Looking at this example, I'm also somewhat surprised that we perform the optimization in a case where the result still has multiple large dispatch switches -- is that expected/profitable?

The multiple large dispatch switches are not the targets to thread over; it's just like a sub-switch under the switch. It's semantically correct to duplicate such switches. For the small sub-switch, it's as profitable as common branches. But for such a big sub-switch, I am not quite sure without testing on real benchmarks. If the threading path is cold, I guess it's unprofitable due to the increase in code size (compares and jump tables?).

@XChy
Copy link
Member

XChy commented Oct 10, 2025

It occurs to me that the cost models of some targets, like X86/AArch64, assume the codesize cost of branches to be the basic cost. Thus, the duplication cost of switches is not correctly estimated. We can adjust the threading cost here.

@XChy
Copy link
Member

XChy commented Nov 1, 2025

Looks like we have resolved most outliers. How do you think about the current status of this pass? @nikic @UsmanNadeem

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://llvm.org/docs/DeveloperPolicy.html#adding-or-enabling-a-new-llvm-pass:

Maintenance: The pass (and any analyses it depends on) must have at least one maintainer.

The pass has two maintainers: https://github.com/llvm/llvm-project/blob/main/llvm/Maintainers.md#dfajumpthreading

Usefulness: There should be evidence that the pass improves performance (or whatever metric it optimizes for) on real-world workloads. Improvements seen only on synthetic benchmarks may be insufficient.

There are improvements in benchmarks as indicated in the PR description. I also know that it helps zlib-rs performance. Generally I think it's pretty clear that this will provide real-world performance benefits for state machine type code. Which will only affect a small subset of code, but...

Compile-Time: The pass should not have a large impact on compile-time, where the evaluation of what “large” means is up to reviewer discretion, and may differ based on the value the pass provides. In any case, it is expected that a concerted effort has been made to mitigate the compile-time impact, both for the average case, and for pathological cases.

...the latest compile-time numbers indicate that the pass is essentially free.

On llvm-opt-benchmark we do see some regressions (up to 20%) on specific files, but these are all cases where the transform triggers. Regressions mostly seem to be due to second order effects, where further compilation slows down after the transform. It's likely that this can be further mitigated, but I think the current state is good enough.

Correctness: The pass should have no known correctness issues (except global correctness issues that affect all of LLVM). If an old pass is being enabled (rather than implementing a new one incrementally), additional due diligence is required. The pass should be fully reviewed to ensure that it still complies with current quality standards. Fuzzing with disabled profitability checks may help gain additional confidence in the implementation.

There are no open DFAJumpThreading issues that affect correctness, only one missed optimization issue (#70767).

I've not reviewed this code myself, but I believe @XChy has been looking at it a lot. There is probably more we could do to gain confidence in the correctness of the pass, but I think at this point I'm fine with flipping the switch.

LGTM

@nikic
Copy link
Contributor

nikic commented Nov 1, 2025

One thing that I am wondering about is the position of the pass. It's currently in the function simplification pipeline, but I'm not sure whether this is the right position. It might make more sense to perform it as part of the optimization rather than simplification pipeline. But think doesn't need to block the default enablement, we can always adjust the position later.

@XChy
Copy link
Member

XChy commented Nov 1, 2025

@nikic Thanks for your help in pushing this optimization forward! I have opened a meta issue #165984 to track the problems to be solved at its initial stage. I am happy to see the enablement of this pass. @UsmanNadeem, just to double-check, do you think it's prepared to enable DFAJumpThreading at this stage?

@UsmanNadeem
Copy link
Contributor

@UsmanNadeem, just to double-check, do you think it's prepared to enable DFAJumpThreading at this stage?

Sorry for the delayed replies, I was busy with the LLVM dev conference/travel.

Yes it looks good to me. All the issues I know of have been resolved and any refinements can be done later. I think some downstream users have already been using this pass (judging from the occasional bug/perf issue reports), wider usage after default enablement might uncover more opportunities.

Copy link
Member

@XChy XChy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, cheers.

@XChy XChy merged commit 0ba7bfc into llvm:main Nov 4, 2025
10 checks passed
@github-actions
Copy link

github-actions bot commented Nov 4, 2025

@buggfg Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@gulfemsavrun
Copy link
Contributor

We have started seeing a test failure in lld :: ELF/arm-thunk-arm-thumb-reuse.s after landing this patch.

FAIL: lld :: ELF/arm-thunk-arm-thumb-reuse.s (833 of 3156)
******************** TEST 'lld :: ELF/arm-thunk-arm-thumb-reuse.s' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
split-file /b/s/w/ir/x/w/llvm-llvm-project/lld/test/ELF/arm-thunk-arm-thumb-reuse.s /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp
# executed command: split-file /b/s/w/ir/x/w/llvm-llvm-project/lld/test/ELF/arm-thunk-arm-thumb-reuse.s /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp
# RUN: at line 3
/b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/bin/llvm-mc -arm-add-build-attributes -filetype=obj -triple=thumbv7a-none-linux-gnueabi /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp/test.s -o /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp.o
# executed command: /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/bin/llvm-mc -arm-add-build-attributes -filetype=obj -triple=thumbv7a-none-linux-gnueabi /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp/test.s -o /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp.o
# RUN: at line 4
/b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/bin/ld.lld --script /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp/script /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp.o -o /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp2
# executed command: /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/bin/ld.lld --script /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp/script /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp.o -o /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp2
# RUN: at line 5
/b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/bin/llvm-objdump --no-print-imm-hex --no-show-raw-insn -d /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp2 | /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/bin/FileCheck /b/s/w/ir/x/w/llvm-llvm-project/lld/test/ELF/arm-thunk-arm-thumb-reuse.s
# executed command: /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/bin/llvm-objdump --no-print-imm-hex --no-show-raw-insn -d /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp2
# executed command: /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/bin/FileCheck /b/s/w/ir/x/w/llvm-llvm-project/lld/test/ELF/arm-thunk-arm-thumb-reuse.s
# RUN: at line 7
/b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/bin/llvm-mc -arm-add-build-attributes -filetype=obj -triple=thumbv7aeb-none-linux-gnueabi -mcpu=cortex-a8 /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp/test.s -o /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp.o
# executed command: /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/bin/llvm-mc -arm-add-build-attributes -filetype=obj -triple=thumbv7aeb-none-linux-gnueabi -mcpu=cortex-a8 /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp/test.s -o /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp.o
# RUN: at line 8
/b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/bin/ld.lld --script /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp/script /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp.o -o /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp2
# executed command: /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/bin/ld.lld --script /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp/script /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp.o -o /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp2
# RUN: at line 9
/b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/bin/llvm-objdump --no-print-imm-hex --no-show-raw-insn -d /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp2 | /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/bin/FileCheck /b/s/w/ir/x/w/llvm-llvm-project/lld/test/ELF/arm-thunk-arm-thumb-reuse.s
# executed command: /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/bin/llvm-objdump --no-print-imm-hex --no-show-raw-insn -d /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp2
# executed command: /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/bin/FileCheck /b/s/w/ir/x/w/llvm-llvm-project/lld/test/ELF/arm-thunk-arm-thumb-reuse.s
# RUN: at line 10
/b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/bin/ld.lld --be8 --script /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp/script /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp.o -o /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp2
# executed command: /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/bin/ld.lld --be8 --script /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp/script /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp.o -o /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp2
# RUN: at line 11
/b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/bin/llvm-objdump --no-print-imm-hex --no-show-raw-insn -d /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp2 | /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/bin/FileCheck /b/s/w/ir/x/w/llvm-llvm-project/lld/test/ELF/arm-thunk-arm-thumb-reuse.s
# executed command: /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/bin/llvm-objdump --no-print-imm-hex --no-show-raw-insn -d /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp2
# executed command: /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/bin/FileCheck /b/s/w/ir/x/w/llvm-llvm-project/lld/test/ELF/arm-thunk-arm-thumb-reuse.s
# .---command stderr------------
# | /b/s/w/ir/x/w/llvm-llvm-project/lld/test/ELF/arm-thunk-arm-thumb-reuse.s:38:16: error: CHECK-NEXT: expected string not found in input
# | // CHECK-NEXT: 10000: bl 0x10010 <__ARMv7ABSLongThunk_far>
# |                ^
# | <stdin>:6:19: note: scanning from here
# | 00010000 <_start>:
# |                   ^
# | <stdin>:12:1: note: possible intended match here
# | 00010010 <__ARMv7ABSLongThunk_far>:
# | ^
# | 
# | Input file: <stdin>
# | Check file: /b/s/w/ir/x/w/llvm-llvm-project/lld/test/ELF/arm-thunk-arm-thumb-reuse.s
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |            1:  
# |            2: /b/s/w/ir/x/w/llvm_build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/tools/lld/test/ELF/Output/arm-thunk-arm-thumb-reuse.s.tmp2: file format elf32-bigarm 
# |            3:  
# |            4: Disassembly of section .text: 
# |            5:  
# |            6: 00010000 <_start>: 
# | next:38'0                       X error: no match found
# |            7:  10000: andeq lr, r0, #0, #22 
# | next:38'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |            8:  10004: <unknown> 
# | next:38'0     ~~~~~~~~~~~~~~~~~~
# |            9:  10008: <unknown> 
# | next:38'0     ~~~~~~~~~~~~~~~~~~
# |           10:  1000c: <unknown> 
# | next:38'0     ~~~~~~~~~~~~~~~~~~
# |           11:  
# | next:38'0     ~
# |           12: 00010010 <__ARMv7ABSLongThunk_far>: 
# | next:38'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | next:38'1     ?                                    possible intended match
# |           13:  10010: sbceq lr, r0, r0, lsl #6 
# | next:38'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           14:  10014: sbceq lr, r0, r1, asr #6 
# | next:38'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           15:  10018: ldclne p1, c14, [pc], #188 
# | next:38'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           16:  
# | next:38'0     ~
# |           17: 0001001c <__Thumbv7ABSLongThunk_far2>: 
# | next:38'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |            .
# |            .
# |            .
# | >>>>>>
# `-----------------------------
# error: command failed with exit status: 1

--

********************

https://luci-milo.appspot.com/ui/p/fuchsia/builders/prod/clang-linux-x64/b8699130233180762193/overview

We confirmed that reverting this patch resolves the issue. For context, the failure only reproduces in builders configured to run a two-stage build with PGO enabled. I'm working on gathering more detailed information, but I wanted to give you a quick heads-up.

petrhosek added a commit that referenced this pull request Nov 10, 2025
XChy pushed a commit that referenced this pull request Nov 11, 2025
Reverts #157646, DFAJumpThread is causing miscompiles
when building Clang with PGO, see #166868 for details.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Nov 11, 2025
…." (#167352)

Reverts llvm/llvm-project#157646, DFAJumpThread is causing miscompiles
when building Clang with PGO, see llvm/llvm-project#166868 for details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants