Skip to content

Commit fe1c56d

Browse files
committed
Merge branch 'sycl' into issue-17527
2 parents e6d4ab3 + 6aa808f commit fe1c56d

35 files changed

+284
-361
lines changed
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
name: Benchmarks
2+
3+
# This workflow is a WIP: this workflow file acts as a placeholder.
4+
5+
on: [ workflow_dispatch ]
6+
7+
jobs:
8+
do-nothing:
9+
runs-on: ubuntu-latest
10+
steps:
11+
- run: echo 'This workflow is a WIP.'
12+

.github/workflows/ur-benchmarks-reusable.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
name: Benchmarks Reusable
22

33
# This workflow is a WIP: This workflow file acts as a placeholder.
4+
#
5+
# This workflow is set to be merged into benchmark.yml
46

57
on: [ workflow_call ]
68

.github/workflows/ur-benchmarks.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
name: Benchmarks
22

33
# This workflow is a WIP: this workflow file acts as a placeholder.
4+
#
5+
# This workflow is set to be merged into benchmark.yml
46

57
on: [ workflow_dispatch ]
68

sycl/doc/GetStartedGuide.md

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -39,15 +39,14 @@ and a wide range of compute accelerators such as GPU and FPGA.
3939

4040
## Prerequisites
4141

42-
* `git` - [Download](https://git-scm.com/downloads)
43-
* `cmake` version 3.20 or later - [Download](http://www.cmake.org/download/)
44-
* `python` - [Download](https://www.python.org/downloads/)
45-
* `ninja` -
46-
[Download](https://github.com/ninja-build/ninja/wiki/Pre-built-Ninja-packages)
47-
* `hwloc` version 2.3 or later (Linux only)
48-
* libhwloc-dev or hwloc-devel package on linux
49-
* C++ compiler
50-
* See LLVM's [host compiler toolchain requirements](https://github.com/intel/llvm/blob/sycl/llvm/docs/GettingStarted.rst#host-c-toolchain-both-compiler-and-standard-library)
42+
| Software | Version |
43+
| --- | --- |
44+
| [Git](https://git-scm.com/downloads) | |
45+
| [CMake](http://www.cmake.org/download/) | [See LLVM](https://github.com/intel/llvm/blob/sycl/llvm/docs/GettingStarted.rst#software) |
46+
| [Python](https://www.python.org/downloads/) | [See LLVM](https://github.com/intel/llvm/blob/sycl/llvm/docs/GettingStarted.rst#software) |
47+
| [Ninja](https://github.com/ninja-build/ninja/wiki/Pre-built-Ninja-packages) | |
48+
| `hwloc` | >= 2.3 (Linux only, `libhwloc-dev` or `hwloc-devel`) |
49+
| C++ compiler | [See LLVM](https://github.com/intel/llvm/blob/sycl/llvm/docs/GettingStarted.rst#host-c-toolchain-both-compiler-and-standard-library) |
5150

5251
Alternatively, you can create a Docker image that has everything you need for
5352
building pre-installed using the [Ubuntu 24.04 build Dockerfile](https://github.com/intel/llvm/blob/sycl/devops/containers/ubuntu2404_build.Dockerfile).

sycl/doc/design/SYCL2020-SpecializationConstants.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ as:
8080
[sycl-2020-spec-constant-glossary]: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#specialization-constant
8181
[sycl-2020-glossary]: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#glossary
8282

83-
And implementation is based on [SPIR-V speficiation][spirv-spec] support
83+
And implementation is based on [SPIR-V specification][spirv-spec] support
8484
for [Specialization][spirv-specialization]. However, the specification also
8585
states the following:
8686

@@ -172,7 +172,7 @@ Based on those limitations, the following mapping design is proposed:
172172
```
173173
namespace detail {
174174
// assuming user defined the following specialization_id:
175-
// constexpr specialiation_id<int> int_const;
175+
// constexpr specialization_id<int> int_const;
176176
// class Wrapper {
177177
// public:
178178
// static constexpr specialization_id<float> float_const;
@@ -341,7 +341,7 @@ used to identify the specialization constants at SPIR-V level.
341341
As noted above one symbolic ID can have several numeric IDs assigned to it -
342342
such 1:N mapping comes from the fact that at SPIR-V level, composite
343343
specialization constants don't have dedicated IDs and they are being identified
344-
and specialized through their scalar leafs and corresponding numeric IDs.
344+
and specialized through their scalar leaves and corresponding numeric IDs.
345345

346346
For example, the following code:
347347
```
@@ -375,7 +375,7 @@ unique_symbolic_id_for_id_A -> { 1, 2, 3 }
375375

376376
As it is shown in the example above, if a composite specialization constant
377377
contains another composite within it, that nested composite is also being
378-
"flattened" and its leafs are considered to be leafs of the parent
378+
"flattened" and its leaves are considered to be leaves of the parent
379379
specialization constants. This done by depth-first search through the composite
380380
elements.
381381

@@ -509,8 +509,8 @@ constant in that buffer:
509509
```
510510
[
511511
0, // for id_int, the first constant is at the beginning of the buffer
512-
4, // sizeof(int) == 4, the second constant is located right after the fisrt one
513-
16, // sizeof(int) + sizezof(A) == 4, the same approach for the third constant
512+
4, // sizeof(int) == 4, the second constant is located right after the first one
513+
16, // sizeof(int) + sizeof(A) == 4, the same approach for the third constant
514514
]
515515
```
516516

@@ -661,9 +661,9 @@ While transforming SYCL kernel function into an OpenCL kernel, DPC++ FE should
661661
- Communicate to DPC++ RT which kernel argument should be used for passing
662662
a buffer with specialization constant values when they are emulated.
663663

664-
DPC++ FE provides implementation of `__builtin_sycl_unique_id` built-in function and
665-
it also populates special integration footer with the content required by DPC++
666-
RT for access to right device image properties describing specialization
664+
DPC++ FE provides implementation of `__builtin_sycl_unique_id` built-in function
665+
and it also populates special integration footer with the content required by
666+
DPC++ RT for access to right device image properties describing specialization
667667
constants.
668668

669669
#### SYCL Kernel function transformations
@@ -763,7 +763,7 @@ struct A {
763763
};
764764
765765
constexpr specialization_id<int> id_int;
766-
struct Wraper {
766+
struct Wrapper {
767767
public:
768768
static constexpr specialization_id<A> id_A;
769769
};
@@ -839,10 +839,10 @@ constexpr sycl::specialization_id<int> same_name{1};
839839
840840
namespace {
841841
constexpr sycl::specialization_id<int> same_name{2}:
842-
/* application code that referenes ::(unnamed)::same_name */
842+
/* application code that references ::(unnamed)::same_name */
843843
namespace {
844844
constexpr sycl::specialization_id<int> same_name{3}:
845-
/* application code that referenes ::(unnamed)::(unnamed)::same_name */
845+
/* application code that references ::(unnamed)::(unnamed)::same_name */
846846
}
847847
}
848848
@@ -899,7 +899,7 @@ namespace {
899899
900900
namespace __sycl_detail {
901901
// Sometimes we need a 'shim', which points to another 'shim' in order to
902-
// "extract" a variable from an anonymous namespace unambiguosly
902+
// "extract" a variable from an anonymous namespace unambiguously
903903
static constexpr decltype(__sycl_detail::__shim_1()) &__shim_2() {
904904
// still address of ::(unnamed)::(unnamed)::same_name;
905905
return __sycl_detail::__shim_1();
@@ -972,7 +972,7 @@ address of the specialization constant provided by user and `offset` field of
972972
the descriptor as `(char*)(SpecConstantValuesMap[SymbolicID]) + offset`.
973973

974974
That calculation is required, because at SPIR-V level composite
975-
specialization constants are respresented by several specialization constants
975+
specialization constants are represented by several specialization constants
976976
for each element of a composite, whilst on a SYCL level, the whole composite
977977
is passed by user as a single blob of data. `offset` field from properties is
978978
used to specify which exact piece of that blob should be extracted to perform
@@ -1053,7 +1053,7 @@ the translator will generate `OpSpecConstant` SPIR-V instructions with proper
10531053
OpDecorate %A.float SpecId 44 ; ID of the 2nd member
10541054
%A.i32 = OpSpecConstant %int.type 0 ; 1st member with default value
10551055
%A.float = OpSpecConstant %float.type 0.0 ; 2nd member with default value
1056-
%struct = OpSpecConstantComposite %struct.type %A.i32 %A.float ; Composite doens't need IDs or default value
1056+
%struct = OpSpecConstantComposite %struct.type %A.i32 %A.float ; Composite doesn't need IDs or default value
10571057
%1 = OpTypeFunction %int
10581058
10591059
%get = OpFunction %int None %1

sycl/include/sycl/detail/spinlock.hpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ namespace detail {
2626
/// std::mutex, that doesn't provide such guarantees).
2727
class SpinLock {
2828
public:
29+
bool try_lock() { return !MLock.test_and_set(std::memory_order_acquire); }
30+
2931
void lock() {
3032
while (MLock.test_and_set(std::memory_order_acquire))
3133
std::this_thread::yield();

sycl/source/detail/kernel_program_cache.hpp

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
#include <sycl/detail/common.hpp>
1616
#include <sycl/detail/locked.hpp>
1717
#include <sycl/detail/os_util.hpp>
18+
#include <sycl/detail/spinlock.hpp>
1819
#include <sycl/detail/ur.hpp>
1920
#include <sycl/detail/util.hpp>
2021

@@ -421,7 +422,7 @@ class KernelProgramCache {
421422

422423
template <typename KeyT>
423424
KernelFastCacheValT tryToGetKernelFast(KeyT &&CacheKey) {
424-
std::unique_lock<std::mutex> Lock(MKernelFastCacheMutex);
425+
KernelFastCacheReadLockT Lock(MKernelFastCacheMutex);
425426
auto It = MKernelFastCache.find(CacheKey);
426427
if (It != MKernelFastCache.end()) {
427428
traceKernel("Kernel fetched.", CacheKey.second, true);
@@ -445,7 +446,7 @@ class KernelProgramCache {
445446
return;
446447
}
447448
// Save reference between the program and the fast cache key.
448-
std::unique_lock<std::mutex> Lock(MKernelFastCacheMutex);
449+
KernelFastCacheWriteLockT Lock(MKernelFastCacheMutex);
449450
MProgramToKernelFastCacheKeyMap[Program].emplace_back(CacheKey);
450451

451452
// if no insertion took place, thus some other thread has already inserted
@@ -483,7 +484,7 @@ class KernelProgramCache {
483484

484485
{
485486
// Remove corresponding entries from KernelFastCache.
486-
std::unique_lock<std::mutex> Lock(MKernelFastCacheMutex);
487+
KernelFastCacheWriteLockT Lock(MKernelFastCacheMutex);
487488
if (auto FastCacheKeyItr =
488489
MProgramToKernelFastCacheKeyMap.find(NativePrg);
489490
FastCacheKeyItr != MProgramToKernelFastCacheKeyMap.end()) {
@@ -630,7 +631,7 @@ class KernelProgramCache {
630631
std::lock_guard<std::mutex> EvictionListLock(MProgramEvictionListMutex);
631632
std::lock_guard<std::mutex> L1(MProgramCacheMutex);
632633
std::lock_guard<std::mutex> L2(MKernelsPerProgramCacheMutex);
633-
std::lock_guard<std::mutex> L3(MKernelFastCacheMutex);
634+
KernelFastCacheWriteLockT L3(MKernelFastCacheMutex);
634635
MCachedPrograms = ProgramCache{};
635636
MKernelsPerProgramCache = KernelCacheT{};
636637
MKernelFastCache = KernelFastCacheT{};
@@ -758,7 +759,10 @@ class KernelProgramCache {
758759
KernelCacheT MKernelsPerProgramCache;
759760
ContextPtr MParentContext;
760761

761-
std::mutex MKernelFastCacheMutex;
762+
using KernelFastCacheMutexT = SpinLock;
763+
using KernelFastCacheReadLockT = std::lock_guard<KernelFastCacheMutexT>;
764+
using KernelFastCacheWriteLockT = std::lock_guard<KernelFastCacheMutexT>;
765+
KernelFastCacheMutexT MKernelFastCacheMutex;
762766
KernelFastCacheT MKernelFastCache;
763767

764768
// Map between fast kernel cache keys and program handle.

unified-runtime/source/adapters/cuda/command_buffer.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -68,10 +68,10 @@ ur_exp_command_buffer_handle_t_::ur_exp_command_buffer_handle_t_(
6868
/// all the memory objects allocated for command_buffer managment
6969
ur_exp_command_buffer_handle_t_::~ur_exp_command_buffer_handle_t_() {
7070
// Release the memory allocated to the Context stored in the command_buffer
71-
UR_TRACE(urContextRelease(Context));
71+
UR_CALL_NOCHECK(urContextRelease(Context));
7272

7373
// Release the device
74-
UR_TRACE(urDeviceRelease(Device));
74+
UR_CALL_NOCHECK(urDeviceRelease(Device));
7575
}
7676

7777
// This may throw so it must be called from within a try...catch

unified-runtime/source/adapters/cuda/command_buffer.hpp

Lines changed: 0 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -18,23 +18,6 @@
1818
#include <memory>
1919
#include <unordered_set>
2020

21-
// Trace an internal UR call
22-
#define UR_TRACE(Call) \
23-
{ \
24-
ur_result_t Result; \
25-
UR_CALL(Call, Result); \
26-
}
27-
28-
// Trace an internal UR call and return the result to the user.
29-
#define UR_CALL(Call, Result) \
30-
{ \
31-
if (PrintTrace) \
32-
logger::always("UR ---> {}", #Call); \
33-
Result = (Call); \
34-
if (PrintTrace) \
35-
logger::always("UR <--- {}({})", #Call, Result); \
36-
}
37-
3821
enum class CommandType {
3922
Kernel,
4023
USMMemcpy,

unified-runtime/source/adapters/cuda/common.cpp

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -92,11 +92,6 @@ std::string getCudaVersionString() {
9292
return stream.str();
9393
}
9494

95-
void detail::ur::die(const char *Message) {
96-
logger::always("ur_die:{}", Message);
97-
std::terminate();
98-
}
99-
10095
void detail::ur::assertion(bool Condition, const char *Message) {
10196
if (!Condition)
10297
die(Message);

0 commit comments

Comments
 (0)