-
Notifications
You must be signed in to change notification settings - Fork 792
LLVM and SPIRV-LLVM-Translator pulldown (WW42 2025) #20373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
Conversation
These includes are not used by ClangSYCLLinker.cpp directly. Explicitly include FormatVariadic.h for formatv declaration, which was implicitly included by removed headers.
This change moves the getUsualDeleteParams function into the FunctionDecl class so that it can be shared between LLVM IR and CIR codegen.
Build on Clang-BOLT infrastructure to collect sample profile for CSSPGO. Add CSSPGO.cmake and BOLT-CSSPGO.cmake to automate CSSPGO/+BOLT Clang builds. Note that `CLANG_PGO_TRAINING_DATA_SOURCE_DIR` is required as built-in training set is inadequate for collecting sampled profile. Hardware compatibility: CSSPGO requires synchronized (0-skid) call and branch stacks, which is only available with Intel PEBS (Sandy Bridge+), AMD Zen3 with BRS, Zen4 with LBRv2+LBR_PMC_FREEZE, and Zen5 with LBRv2. This patch adds support for Intel `br_inst_retired.near_taken:uppp` event. Test Plan: Added BOLT-CSSPGO.cmake with same use as BOLT-PGO.cmake, e.g. for bootstrapped ThinLTO+CSSPGO+BOLT, with CSSPGO profile collected from LLVM build, and BOLT profile collected from Hello World (instrumentation): ``` cmake -B clang-csspgo-bolt -S /path/to/llvm-project/llvm \ -DLLVM_ENABLE_LLD=ON -DBOOTSTRAP_LLVM_ENABLE_LLD=ON \ -DBOOTSTRAP_BOOTSTRAP_LLVM_ENABLE_LLD=ON \ -DPGO_INSTRUMENT_LTO=Thin \ -DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=/path/to/llvm-project/llvm \ -GNinja -C /path/to/llvm-project/clang/cmake/caches/BOLT-CSSPGO.cmake ninja stage2-clang-bolt ... warning: Sample PGO is estimated to optimize better with 19.5x more samples. Please consider increasing sampling rate or profiling for longer duration to get more samples. ... [2800/2801] Optimizing Clang with BOLT BOLT-INFO: 8189 out of 106942 functions in the binary (7.7%) have non-empty execution profile 1377639 : taken branches (-42.1%) ``` Performance testing with Clang: - Setup: Clang-BOLT testing harness aaupov/llvm-devmtg-2022@9f2b46f - CSSPGO training: building LLVM, - InstrPGO training: building Hello World, - BOLT training: building Hello World, instrumentation, - benchmark: building small LLVM tool (not), - 2S Intel SKX Xeon 6138 with 40C/80T and 256GB RAM, using 20C/40T for build, - Results, wall time, lower is better - Baseline (bootstrapped build): 10.36s, - InstrPGO + ThinLTO: 9.34s, - CSSPGO + ThinLTO: 8.85s. - BOLT results, for reference: - Baseline: 9.09s, - InstrPGO + ThinLTO: 9.09s, - CSSPGO + ThinLTO: 8.58s. --------- Co-authored-by: Matthias Braun <[email protected]>
This patch adds documentation files for GFX12.
There's a pattern throughout LLVM of cl::opts being exported. That in itself is probably a bit unfortunate, but what's especially bad about it is that a lot of those symbols are in the global namespace. Move them into the llvm namespace. While doing this, I noticed some other variables in the global namespace and moved them as well.
This test failed during testing on the RISC-V target because we couldn't strip the main label from the binary. main is dynamically linked when the -fPIC flag is enabled. The RISC-V ABI requires that executables support loading at arbitrary addresses to enable shared libraries and secure loading (ASLR). In PIC mode, function addresses cannot be hardcoded in the code. Instead, code is generated to load addresses from the GOT/PLT tables, which are initialized by the dynamic loader. The reference to main thus ends up in .dynsym and is dynamically bound. We cannot strip main or any other dynamically linked functions because these functions are referenced indirectly via dynamic linking tables (.plt and .got). Removing these symbols would break the dynamic linking mechanism needed to resolve function addresses at runtime, causing the executable to fail to correctly call them.
Co-authored-by: Andy Kaylor <[email protected]>
…(#156952) Comparison predicates (equal, not equal, greater than, etc.) provide important semantic information about program behavior. Previously, IR2Vec only captured that a comparison was happening but not what kind of comparison it was. This PR extends the IR2Vec vocabulary to include comparison predicates (ICmp and FCmp) as part of the embedding space. Following are the changes: 1. Expand the vocabulary slot layout to include predicate entries after opcodes, types, and operands 2. Add methods to handle predicate embedding lookups and conversions 3. Update the embedder implementations to include predicate information when processing CmpInst instructions 4. Update test files to include the new predicate entries in the vocabulary (Tracking issues: #141817, #141833)
Fixes llvm/llvm-project#155459 by making sure the cases are considered in the right order. Previously intrinsics types where overriding the pointer cases which have higher precedence in the specification. Also passes the following [tests](llvm/llvm-test-suite#287).
… (#161112) Previously, we only used `objcopy`, which is not available for some build configurations. With this patch, we not only try to use `objcopy`, but also try to use `llvm-objcopy` if available. This is a follow-up of llvm/llvm-project#156383.
Refactored IR2Vec vocabulary and introduced IR (semantics) agnostic `VocabStorage` - `Vocabulary` *has-a* `VocabStorage` - `Vocabulary` deals with LLVM IR specific entities. This would help in efficient reuse of parts of the logic for MIR. - Storage uses a section-based approach instead of a flat vector, improving organization and access patterns.
Fast strlen implementations (naive wide-reads, SIMD-based, and x86_64/aarch64-optimized versions) all may perform technically-out-of-bound reads, which leads to reports under ASan, HWASan (on ARM machines), and also TSan (which also has the capability to detect heap out-of-bound reads). So, we need to explicitly disable instrumentation in all three cases. Tragically, Clang didn't support `[[gnu::no_sanitize]]` syntax until recently, and since we're supporting both GCC and Clang, we have to revert to `__attribute__` syntax.
This reverts commit d392563.
…move the fixme (#161531) Move LowerBufferFatPointers pass after CodegenPrepare and LoadStoreVectorizer pass, and remove the fixme about that.
Co-authored-by: ronlieb <[email protected]>
It's unnecessary to build the whole symtable, and on top of everything, un-optimal to do so for every function. All we really need is the instrumented PGO name - considering also LTO-ness - and then we can compute the function name.
…pp. NFC. (#161476)
R_AARCH64_TLSDESC_CALL is a relocation emitted as a hint for a linker to replace `blr r` instruction with nop. BOLT does not currently require any special handling for it. Note that previously existing extraction of the relocated value was incorrect.
…nt/APInt (#161474)
Fold `mulf(x, 0) -> 0` when (nnan | nsz)
CONFLICT (content): Merge conflict in clang/lib/Driver/ToolChains/Clang.cpp
This matches what we do for regular i8 extload due to the lack of c.lb in Zbc. This only affects global isel because SelectionDAG won't create an anyext i8 atomic_load today.
This patch fixes: llvm/lib/Analysis/IR2Vec.cpp:289:14: error: unused variable 'allSameDim' [-Werror,-Wunused-variable]
Previously if we had a subregister extract reading from a full copy, the no-subregister incoming copy would overwrite the DefSubReg index of the folding context. There's one ugly rvv regression, but it's a downstream issue of this; an unnecessary same class reg-to-reg full copy was avoided.
…ers (#161491) Both Usman Nadeem and I have constantly contributed to the DFAJumpThreading pass so far. To push DFAJumpThreading forwards and make it enabled by default, I volunteer myself and Usman Nadeem as DFAJumpThreading maintainers.
`Skip` parameter not used/set inside `analyzeRelocation()`.
…161624) ### Summary Stabilize ASan wchar tests across Darwin and Android. NFC: test-only. Follow-up to PR #160493 (adds wchar interceptors/tests). ### Motivation - Darwin: The top frame often resolves to `libclang_rt.asan_*` rather than a source file, so strict checks that include file/line can fail. See Chromium issue [448631142](https://g-issues.chromium.org/issues/448631142). - Android: The “ERROR:” header can go to logcat instead of stderr, so FileCheck may not see it; stdout/stderr reordering also makes pre-crash markers racy. See Android Buildbot [186/12821](https://lab.llvm.org/buildbot/#/builders/186/builds/12821). ### Changes - Android: - Force reports to stderr via `%env_asan_opts=log_to_stderr=1`, avoiding the “ERROR:” header going to logcat. - Print the pre-crash “Good so far.” to stderr and `fflush(stderr)` to avoid stdout/stderr reordering. - Darwin: - Relax the stack-frame check to only require the function name (`wcscpy/wcsncpy/wcscat/wcsncat`) to tolerate `libclang_rt.asan_*` frames. - Common: - Reuse FileCheck var `[[ADDR]]` instead of redefining. - Make wide string literals `const wchar_t*` to silence `-Wwritable-strings`. ### Risk - NFC: test-only; no change to runtime behavior. ### References - Follow-up to PR #160493. - Chromium: [448631142](https://g-issues.chromium.org/issues/448631142) (Darwin failures). - Android Buildbot: [186/12821](https://lab.llvm.org/buildbot/#/builders/186/builds/12821). Signed-off-by: Yixuan Cao <[email protected]>
@wenju-he Can you help to have a look at the libclc failure after 7f36611.
Looks like the native_cpu target support might have some problem? |
…as static When INLINE functions fail to be inlined, they are deleted by EliminateAvailableExternallyPass because they have available_externally attribute. Therefore, they become unresolved in libspirv-native_cpu.bc. Mark as static to fix the attribute. An alternative fix is to move function definitions into clc/lib/generic/math/clc_sincos_helpers.cl.
@jsji it is a bug exposed by native_cpu. It is fixed in ce2bd60 |
Thanks @wenju-he ! |
015798b
to
ce2bd60
Compare
LLVM: llvm/llvm-project@2d67cb1
SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@54525b6