Skip to content

Conversation

ianayl
Copy link
Contributor

@ianayl ianayl commented Apr 16, 2025

Change benchmarking CI to use the shared library no assertions build, incase assertions end up affecting performance

@ianayl ianayl temporarily deployed to WindowsCILock April 16, 2025 21:30 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock April 16, 2025 22:37 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock April 16, 2025 22:37 — with GitHub Actions Inactive
@ianayl ianayl temporarily deployed to WindowsCILock April 16, 2025 22:50 — with GitHub Actions Inactive
@ianayl ianayl marked this pull request as ready for review April 21, 2025 19:05
@ianayl ianayl requested a review from a team as a code owner April 21, 2025 19:05

run-sycl-benchmarks:
needs: [ubuntu2204_build]
needs: [linux_shared_build]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you done a perf comparison of static vs shared linking?

Copy link
Contributor Author

@ianayl ianayl Apr 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly no, although I suppose we will find out in https://github.com/intel/llvm/actions/runs/14579704073 if it's slower

steffenlarsen and others added 21 commits May 26, 2025 06:14
This commit adds the bundle state as an argument to the -fsyclbin driver
option (default to executable) and --syclbin clang-linker-wrapper option
(no default). This argument is propagated to the SYCLBIN files.

---------

Signed-off-by: Larsen, Steffen <[email protected]>
…of a function is annotated. (#18590)

Fixes #17591.

---------

Co-authored-by: aelovikov-intel <[email protected]>
Fixes benchmark results presentation on
https://oneapi-src.github.io/unified-runtime/performance/
The same runs now have the same color on subsequent charts

Signed-off-by: Mateusz P. Nowak <[email protected]>
…18645)

We would like to extend urDeviceSelectBinary downstream to allow for
device-specific binary targets. This commit refactors the current
handling to make this easier.
Additionally L0 specific tests are added for urDeviceSelectBinary to
verify that the fallback logic works as expected.
Improve readability of Historical Results chart titles, tooltips, and
all charts' legends by introducing new Result object field:
`display_name` for Compute Benchmarks
Combined the following changes enable adding guards around sections in
linker script files which do not support conditional inclusion in their
syntax. This is done by pre-processing the linker scripts as part of
build configuration.

- Cleanup template `helper.py` and add typing hints
- Use dict for functions in loader script templates
- The `get_loader_functions()` template helper now returns `list[dict]`
instead of `list[str]`, this enables passing additional information into
the loader script templates
- Add script to strip guarded lines from file
- In order to pre-process files which don't support conditional
inclusion of line blocks, such as linker scripts, we can use this script
to remove lines which should not be included unless specified
- Hook up `strip-guarded-lines.py`
- Actually use the `strip-guarded-lines.py` script when using
`configure_file()` on linker scripts
An MSVC compiler update changed the assertion message slightly, the test
needed a simple update.

Fixes #17116
On some systems `CUDA_Toolkit_ROOT` might be emtpy,
even though CUDA is properly found in the CMake.
This can cause a failed search for `generated_cuda_meta.h`.
In that case only a warning is emitted when building
`cuda_trace_collector`,
but this can fail the `-Werror` build.
This patch ensures that if we can find `CUPTI`, we can find this file.
In practice we only build (and thus test) three libclc targets:
'nvptx64--nvidiacl', 'amdgcn--amdhsa' and 'native_cpu'. All other
upstream libclc targets are never built in our CI and would in fail to
build.

This commit rectifies this by selectively building libspirv only for
those three supported targets. More can be added in time if required.

There are still certain OpenCL libclc targets that can't be built with
this commit. The r600 target, for example, can't build because we
unconditionally enable the fp64 OpenCL extension across the board, but
the r600 target doesn't support that.

The clspv and clspv64 targets also fail to build due to SOURCES files
referencing missing files. This will be resolved in the next pulldown.
#18431)

To align with the comment in the file that specifies 32 storage
locations and 128 bits per warp.
Change file to opaque pointer mode.
Add more global variables for different sizes to resolve `Reducing
storage for small data types`.
The macro __CLC_FUNCTION is special and is used in CLC headers. Defining
it here in an implementation file - nextafter.cl - for another purpose
then including half_nextafter.inc which pulls in the SPIR-V headers
results in a macro redefinition, which is warned about.

Sicne the name isn't important and is local to this one file, this
commit just changes the macro definition to fix the issue.
- Implements the dynamic_local_accessor class with compiler support.
- Refactor the recently added dynamic_work_group_memory class to only
use one `impl` member variable. This brings it closer to the design of
other sycl classes and avoids future ABI break issues.
- There are 2 ABI breaking changes. However, they are both related to
the `dynamic_work_group_memory` class whose
[specification](#16712) has not been
merged yet and is not yet officially supported.
Some resource destruction is done in the destructor, but if we don't
manually clear the map, then the destructor is called after the adapter
release, which leads to the leak report and maybe some UB(trying to use
adapter after it is released).
After #18437, the runtime library is
producing a warning about an unused variable AccTarget in handler.cpp.
This is due to the variable only being used in assert, which may in turn
be removed when assertions are disabled. This commit removes the
variable in favor of making the conversion inside the assert.

Signed-off-by: Larsen, Steffen <[email protected]>
#18627)

The proprietary Intel Compiler (ICX) uses a different installation
layout then clang. It puts tools into a bin/compiler subdirectory (to
not expose them on the PATH by default). Handle this by not assuming
that the compiler is in the same directory as other llvm tools. Ask the
compiler for the path to `llvm-config` for the tools directory, using
the `-print-prog-name` option.
`llvm-config` was choosen because lit already assumes it is in the tools
directory see for example: [llvm/utils/lit/lit/llvm/config.py:285][1]

[1]:
https://github.com/llvm/llvm-project/blob/9cac4bf485e64f7992f2c01bb9517f6379e58164/llvm/utils/lit/lit/llvm/config.py#L285
- Remove unused Context parameters
- Avoid unnecessary copy in `guessLocalWorkSize`
- Simplify the control flow in setKernelParams
- Move cached properties fetching code to constructors
- Query HIP for occupancy in `guessLocalWorkSize`
jzc and others added 12 commits June 11, 2025 13:25
…t function (#18824)

This PR adds new e2e tests for free function kernels extension based on
test plan
https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/FreeFunctionKernels/test-plan.md#perform-test-that-free-function-kernel-can-be-used-as-device-function-within-another-kernel

Extension spec:
https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc

The overall idea behind this test to verify if free function kernel even
if marked with one of the properties (`nd_range_kernel` and
`single_task_kernel`) can still be used as device or host function.
…8914)

A recent community change
llvm/llvm-project#124858 affected
how we render some non-type template parameters in the integration
header. We were
generating malformed arguments of the form 'value-parameter-0-1', which
obviously
led to compilation errors when including the header in host
compilations.
This resolves Coverity issue `440250`
(https://scan.coverity.com/projects/intel-llvm?tab=overview).

We know that `ND` cannot be `nullptr`, but presence of the check makes
Coverity think that earlier accesses (within the `while` loop condition)
are potentially unsafe.
…ease on... (#18619)

Command Buffer, while it is still executing.
…set (#18869)

We are getting a warning on some scorecard, `read-all` doesn't warn.

---------

Signed-off-by: Sarnie, Nick <[email protected]>
#18787)

This PR decreases the number of TLS accesses in the `NestedCallsTracker`
and `tls_code_loc_t`. The idea is to cache TLS location in the
reference. As a result, we have only a single lookup for the TLS
location.
Before this PR one thread could add new events to the queue while
another removes events, both modifying and potentially corrupting
NativeCPU queue::events. This PR adds a mutex to the NativeCPU queue
handle to prevent this potential corruption.

Aims to at least fix: `SYCL/HostInteropTask/host-task-two-queues.cpp`
…not compressed (#18906)

**Problem**
When linking device images, we reject dependencies whose image format
does not match the parent image. However, consider the case when parent
image is compressed, while dependencies are not (demonstrated in the
test case attached to this PR). In this case, we are incorrectly
rejecting device images and thus causing `No device image found for
external symbol` error.

**Solution**
If the format of the main and dependent device image differs and one of
them is compressed, we decompress them and recheck the format of
decompressed device images.
One side-effect of this solution is that now we'll have to decompress
device images, even if we are not using them. For example, when format
of decompressed main and dependent images differs. Unfortunately,
there's no way to find format of the compressed device image, without
first decompressing it.
However, I don't think this will incur a significant overhead as (1) we
decompress device image only once and cache it for subsequent use, and
(2) we decompress only if the dependent device image has an export
symbol that main device image wants (when finding which images to link)
and if it is compatible with the device.
@ianayl ianayl requested review from a team and bader as code owners June 11, 2025 19:45
@ianayl ianayl requested review from againull and frasercrmck June 11, 2025 19:45
@ianayl ianayl temporarily deployed to WindowsCILock June 11, 2025 19:45 — with GitHub Actions Inactive
@ianayl ianayl closed this Jun 11, 2025
@ianayl
Copy link
Contributor Author

ianayl commented Jun 11, 2025

Oh crap... sorry all for the ping

@ianayl ianayl deleted the ianayl/benchmark-ci-use-noassert branch June 11, 2025 19:55
@ianayl ianayl temporarily deployed to WindowsCILock June 11, 2025 20:31 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.