Skip to content

Conversation

iclsrc
Copy link
Collaborator

@iclsrc iclsrc commented Oct 15, 2025

bader and others added 30 commits October 1, 2025 15:07
These includes are not used by ClangSYCLLinker.cpp directly.
Explicitly include FormatVariadic.h for formatv declaration, which was implicitly included by removed headers.
This change moves the getUsualDeleteParams function into the
FunctionDecl class so that it can be shared between LLVM IR and CIR
codegen.
Build on Clang-BOLT infrastructure to collect sample profile for CSSPGO.
Add CSSPGO.cmake and BOLT-CSSPGO.cmake to automate CSSPGO/+BOLT 
Clang builds.

Note that `CLANG_PGO_TRAINING_DATA_SOURCE_DIR` is required as built-in 
training set is inadequate for collecting sampled profile.

Hardware compatibility: CSSPGO requires synchronized (0-skid) call
and branch stacks, which is only available with Intel PEBS (Sandy
Bridge+),
AMD Zen3 with BRS, Zen4 with LBRv2+LBR_PMC_FREEZE, and Zen5 with LBRv2.
This patch adds support for Intel `br_inst_retired.near_taken:uppp`
event.

Test Plan:
Added BOLT-CSSPGO.cmake with same use as BOLT-PGO.cmake, 
e.g. for bootstrapped ThinLTO+CSSPGO+BOLT, with CSSPGO profile collected
from LLVM build, and BOLT profile collected from Hello World
(instrumentation):
```
cmake -B clang-csspgo-bolt -S /path/to/llvm-project/llvm \
-DLLVM_ENABLE_LLD=ON -DBOOTSTRAP_LLVM_ENABLE_LLD=ON \
-DBOOTSTRAP_BOOTSTRAP_LLVM_ENABLE_LLD=ON \
-DPGO_INSTRUMENT_LTO=Thin \
-DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=/path/to/llvm-project/llvm \
-GNinja  -C /path/to/llvm-project/clang/cmake/caches/BOLT-CSSPGO.cmake
ninja stage2-clang-bolt
...
warning: Sample PGO is estimated to optimize better with 19.5x more samples. Please consider increasing sampling rate or profiling for longer duration to get more samples.
...
[2800/2801] Optimizing Clang with BOLT
BOLT-INFO: 8189 out of 106942 functions in the binary (7.7%) have non-empty execution profile
            1377639 : taken branches (-42.1%)
```

Performance testing with Clang:
- Setup: Clang-BOLT testing harness
aaupov/llvm-devmtg-2022@9f2b46f
  - CSSPGO training: building LLVM, 
  - InstrPGO training: building Hello World,
  - BOLT training: building Hello World, instrumentation,
  - benchmark: building small LLVM tool (not),
- 2S Intel SKX Xeon 6138 with 40C/80T and 256GB RAM, using 20C/40T for
build,
- Results, wall time, lower is better
  - Baseline (bootstrapped build): 10.36s,
  - InstrPGO + ThinLTO: 9.34s,
  - CSSPGO + ThinLTO: 8.85s.
- BOLT results, for reference:
  - Baseline: 9.09s,
  - InstrPGO + ThinLTO: 9.09s,
  - CSSPGO + ThinLTO: 8.58s.

---------

Co-authored-by: Matthias Braun <[email protected]>
This patch adds documentation files for GFX12.
There's a pattern throughout LLVM of cl::opts being exported. That in
itself is probably a bit unfortunate, but what's especially bad about it
is that a lot of those symbols are in the global namespace. Move them
into the llvm namespace.

While doing this, I noticed some other variables in the global namespace
and moved them as well.
This test failed during testing on the RISC-V target because we couldn't
strip the main label from the binary. main is dynamically linked when
the -fPIC flag is enabled. The RISC-V ABI requires that executables
support loading at arbitrary addresses to enable shared libraries and
secure loading (ASLR). In PIC mode, function addresses cannot be
hardcoded in the code. Instead, code is generated to load addresses from
the GOT/PLT tables, which are initialized by the dynamic loader. The
reference to main thus ends up in .dynsym and is dynamically bound. We
cannot strip main or any other dynamically linked functions because
these functions are referenced indirectly via dynamic linking tables
(.plt and .got). Removing these symbols would break the dynamic linking
mechanism needed to resolve function addresses at runtime, causing the
executable to fail to correctly call them.
…(#156952)

Comparison predicates (equal, not equal, greater than, etc.) provide important semantic information about program behavior. Previously, IR2Vec only captured that a comparison was happening but not what kind of comparison it was. This PR extends the IR2Vec vocabulary to include comparison predicates (ICmp and FCmp) as part of the embedding space.

Following are the changes:
1. Expand the vocabulary slot layout to include predicate entries after opcodes, types, and operands
2. Add methods to handle predicate embedding lookups and conversions
3. Update the embedder implementations to include predicate information when processing CmpInst instructions
4. Update test files to include the new predicate entries in the vocabulary

(Tracking issues: #141817, #141833)
Fixes llvm/llvm-project#155459 by making sure
the cases are considered in the right order. Previously intrinsics types
where overriding the pointer cases which have higher precedence in the
specification.

Also passes the following
[tests](llvm/llvm-test-suite#287).
… (#161112)

Previously, we only used `objcopy`, which is not available for some
build configurations. With this patch, we not only try to use `objcopy`,
but also try to use `llvm-objcopy` if available.

This is a follow-up of llvm/llvm-project#156383.
Refactored IR2Vec vocabulary and introduced IR (semantics) agnostic `VocabStorage`
- `Vocabulary` *has-a* `VocabStorage`
- `Vocabulary` deals with LLVM IR specific entities. This would help in efficient reuse of parts of the logic for MIR.
- Storage uses a section-based approach instead of a flat vector, improving organization and access patterns.
Fast strlen implementations (naive wide-reads, SIMD-based, and
x86_64/aarch64-optimized versions) all may perform
technically-out-of-bound reads, which leads to reports under ASan,
HWASan (on ARM machines), and also TSan (which also has the capability
to detect heap out-of-bound reads). So, we need to explicitly disable
instrumentation in all three cases.

Tragically, Clang didn't support `[[gnu::no_sanitize]]` syntax until
recently, and since we're supporting both GCC and Clang, we have to
revert to `__attribute__` syntax.
…move the fixme (#161531)

Move LowerBufferFatPointers pass after CodegenPrepare and
LoadStoreVectorizer pass, and remove the fixme about that.
It's unnecessary to build the whole symtable, and on top of everything,
un-optimal to do so for every function. All we really need is the
instrumented PGO name - considering also LTO-ness - and then we can
compute the function name.
R_AARCH64_TLSDESC_CALL is a relocation emitted as a hint for a linker to
replace `blr r` instruction with nop. BOLT does not currently require
any special handling for it.

Note that previously existing extraction of the relocated value was
incorrect.
Fold  `mulf(x, 0) -> 0` when (nnan | nsz)
  CONFLICT (content): Merge conflict in clang/lib/Driver/ToolChains/Clang.cpp
This matches what we do for regular i8 extload due to the lack of c.lb
in Zbc.

This only affects global isel because SelectionDAG won't create an
anyext i8 atomic_load today.
This patch fixes:

  llvm/lib/Analysis/IR2Vec.cpp:289:14: error: unused variable
  'allSameDim' [-Werror,-Wunused-variable]
Previously if we had a subregister extract reading from a
full copy, the no-subregister incoming copy would overwrite
the DefSubReg index of the folding context.

There's one ugly rvv regression, but it's a downstream
issue of this; an unnecessary same class reg-to-reg full copy
was avoided.
…ers (#161491)

Both Usman Nadeem and I have constantly contributed to the
DFAJumpThreading pass so far. To push DFAJumpThreading forwards and make
it enabled by default, I volunteer myself and Usman Nadeem as
DFAJumpThreading maintainers.
`Skip` parameter not used/set inside `analyzeRelocation()`.
…161624)

### Summary
Stabilize ASan wchar tests across Darwin and Android. NFC: test-only.
Follow-up to PR #160493 (adds wchar interceptors/tests).

### Motivation
- Darwin: The top frame often resolves to `libclang_rt.asan_*` rather
than a source file, so strict checks that include file/line can fail.
See Chromium issue
[448631142](https://g-issues.chromium.org/issues/448631142).
- Android: The “ERROR:” header can go to logcat instead of stderr, so
FileCheck may not see it; stdout/stderr reordering also makes pre-crash
markers racy. See Android Buildbot
[186/12821](https://lab.llvm.org/buildbot/#/builders/186/builds/12821).

### Changes
- Android:
- Force reports to stderr via `%env_asan_opts=log_to_stderr=1`, avoiding
the “ERROR:” header going to logcat.
- Print the pre-crash “Good so far.” to stderr and `fflush(stderr)` to
avoid stdout/stderr reordering.
- Darwin:
- Relax the stack-frame check to only require the function name
(`wcscpy/wcsncpy/wcscat/wcsncat`) to tolerate `libclang_rt.asan_*`
frames.
- Common:
  - Reuse FileCheck var `[[ADDR]]` instead of redefining.
- Make wide string literals `const wchar_t*` to silence
`-Wwritable-strings`.

### Risk
- NFC: test-only; no change to runtime behavior.

### References
- Follow-up to PR #160493.
- Chromium: [448631142](https://g-issues.chromium.org/issues/448631142)
(Darwin failures).
- Android Buildbot:
[186/12821](https://lab.llvm.org/buildbot/#/builders/186/builds/12821).

Signed-off-by: Yixuan Cao <[email protected]>
@jsji
Copy link
Contributor

jsji commented Oct 17, 2025

@wenju-he Can you help to have a look at the libclc failure after 7f36611.


cd /localdisk2/jinsongj/llvmspirpd/llvm/build/tools/libclc && /rdrive/ics/itools/pkgtools/cmake/v_3_29_2/efi2_rhxx/bin/cmake -E make_directory /localdisk2/jinsongj/llvmspirpd/llvm/build/./lib/clc && /localdisk2/jinsongj/llvmspirpd/llvm/build/bin/libclc-remangler -o /localdisk2/jinsongj/llvmspirpd/llvm/build/./lib/clc/remangled-l64-signed_char.libspirv-native_cpu.bc --triple=native_cpu --long-width=l64 --char-signedness=signed --input-ir=/localdisk2/jinsongj/llvmspirpd/llvm/build/./lib/clc/libspirv-native_cpu.bc /localdisk2/jinsongj/llvmspirpd/llvm/build/./lib/clc/libclc_dummy_in.cc
Unable to demangle name: _Z18__clc_sincos_piby4ddPdS0_
Unable to demangle name: _Z15__clc_tan_piby4ddPdS0_
Failed to remangle all mangled functions in module.

Looks like the native_cpu target support might have some problem?

…as static

When INLINE functions fail to be inlined, they are deleted by
EliminateAvailableExternallyPass because they have available_externally
attribute. Therefore, they become unresolved in libspirv-native_cpu.bc.

Mark as static to fix the attribute. An alternative fix is to move
function definitions into clc/lib/generic/math/clc_sincos_helpers.cl.
@wenju-he
Copy link
Contributor

@wenju-he Can you help to have a look at the libclc failure after 7f36611.


cd /localdisk2/jinsongj/llvmspirpd/llvm/build/tools/libclc && /rdrive/ics/itools/pkgtools/cmake/v_3_29_2/efi2_rhxx/bin/cmake -E make_directory /localdisk2/jinsongj/llvmspirpd/llvm/build/./lib/clc && /localdisk2/jinsongj/llvmspirpd/llvm/build/bin/libclc-remangler -o /localdisk2/jinsongj/llvmspirpd/llvm/build/./lib/clc/remangled-l64-signed_char.libspirv-native_cpu.bc --triple=native_cpu --long-width=l64 --char-signedness=signed --input-ir=/localdisk2/jinsongj/llvmspirpd/llvm/build/./lib/clc/libspirv-native_cpu.bc /localdisk2/jinsongj/llvmspirpd/llvm/build/./lib/clc/libclc_dummy_in.cc
Unable to demangle name: _Z18__clc_sincos_piby4ddPdS0_
Unable to demangle name: _Z15__clc_tan_piby4ddPdS0_
Failed to remangle all mangled functions in module.

Looks like the native_cpu target support might have some problem?

@jsji it is a bug exposed by native_cpu. It is fixed in ce2bd60
I'll move function definitions into clc/lib/generic/math/clc_sincos_helpers.cl later in upstream.

@jsji
Copy link
Contributor

jsji commented Oct 18, 2025

@wenju-he Can you help to have a look at the libclc failure after 7f36611.


cd /localdisk2/jinsongj/llvmspirpd/llvm/build/tools/libclc && /rdrive/ics/itools/pkgtools/cmake/v_3_29_2/efi2_rhxx/bin/cmake -E make_directory /localdisk2/jinsongj/llvmspirpd/llvm/build/./lib/clc && /localdisk2/jinsongj/llvmspirpd/llvm/build/bin/libclc-remangler -o /localdisk2/jinsongj/llvmspirpd/llvm/build/./lib/clc/remangled-l64-signed_char.libspirv-native_cpu.bc --triple=native_cpu --long-width=l64 --char-signedness=signed --input-ir=/localdisk2/jinsongj/llvmspirpd/llvm/build/./lib/clc/libspirv-native_cpu.bc /localdisk2/jinsongj/llvmspirpd/llvm/build/./lib/clc/libclc_dummy_in.cc
Unable to demangle name: _Z18__clc_sincos_piby4ddPdS0_
Unable to demangle name: _Z15__clc_tan_piby4ddPdS0_
Failed to remangle all mangled functions in module.

Looks like the native_cpu target support might have some problem?

@jsji it is a bug exposed by native_cpu. It is fixed in ce2bd60 I'll move function definitions into clc/lib/generic/math/clc_sincos_helpers.cl later in upstream.

Thanks @wenju-he !

jsji added 2 commits October 18, 2025 10:05
2740e4b added symlinks unconditional
for non Windows.

All the sycl-in-tree tests are failing due to check_build_features
failures.
65d730b added IMG_SPIRV to be value 6, so we need to update
IMG_SYCLBIN to 7 instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

disable-lint Skip linter check step and proceed with build jobs

Projects

None yet

Development

Successfully merging this pull request may close these issues.