Skip to content

Commit c5f2ebb

Browse files
author
Victor Lomuller
committed
address comments and disable dynamic tests on gen12 (not locally reproducible)
1 parent 0963146 commit c5f2ebb

File tree

14 files changed

+108
-117
lines changed

14 files changed

+108
-117
lines changed

sycl/doc/extensions/experimental/sycl_ext_oneapi_work_group_scratch_memory.asciidoc

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ https://github.com/intel/llvm/issues
3636

3737
== Dependencies
3838

39-
This extension is written against the SYCL 2020 revision 8 specification. All
39+
This extension is written against the SYCL 2020 revision 9 specification. All
4040
references below to the "core SYCL specification" or to section numbers in the
4141
SYCL specification refer to that revision.
4242

@@ -49,11 +49,12 @@ The following extensions are required:
4949

5050
== Status
5151

52-
This is an experimental extension specification, intended to gather community
53-
feedback. Interfaces defined in this specification may not be implemented yet
54-
or may be in a preliminary state. The specification itself may also change in
55-
incompatible ways before it is finalized. *Shipping software products should
56-
not rely on APIs defined in this specification.*
52+
This is an experimental extension specification, intended to provide early
53+
access to features and gather community feedback. Interfaces defined in this
54+
specification are implemented in {dpcpp}, but they are not finalized and may
55+
change incompatibly in future versions of {dpcpp} without prior notice.
56+
*Shipping software products should not rely on APIs defined in this
57+
specification.*
5758

5859

5960
== Overview
@@ -71,7 +72,7 @@ The behavior is similar to the usage of unbounded array with the CUDA `+__shared
7172

7273
This extension provides a feature-test macro as described in the core SYCL
7374
specification. An implementation supporting this extension must predefine the
74-
macro `SYCL_EXT_ONEAPI_WORK_GROUP_DYNAMIC` to one of the values defined in the
75+
macro `SYCL_EXT_ONEAPI_WORK_GROUP_SCRATCH_MEMORY` to one of the values defined in the
7576
table below. Applications can test for the existence of this macro to
7677
determine if the implementation supports this feature, or applications can test
7778
the macro's value to determine which of the extension's features the
@@ -95,9 +96,10 @@ to a dynamically allocated buffer in the device local memory.
9596

9697
[source,c++]
9798
----
98-
void* get_work_group_scratch_memory()
99+
namespace sycl::ext::oneapi::experimental {
100+
void* get_work_group_scratch_memory();
101+
}
99102
----
100-
_Constraints_: `T` must be trivially constructible and trivially destructible.
101103

102104
_Returns_: A pointer to a dynamically allocated buffer
103105
in the device local memory.
@@ -141,29 +143,27 @@ device local memory required by the kernel in bytes.
141143

142144
=== Total allocation check
143145

144-
If the total amount of device local memory requested (i.e., the sum of
145-
all memory requested by `local_accessor`, `group_local_memory`,
146-
`group_local_memory_for_overwrite`, `work_group_static` and `work_group_scratch_size`) exceeds a device's
147-
local memory capacity (as reported by `local_mem_size`) then the implementation
148-
must throw a synchronous `exception` with the `errc::memory_allocation` error
149-
code from the kernel invocation command (e.g. `parallel_for`).
150-
151-
==== Usage examples
146+
If the total amount of device local memory requested exceeds a device's
147+
local memory capacity as reported by `info::device::local_mem_size`
148+
then the implementation must throw a synchronous exception with the
149+
`errc::memory_allocation` error code from the kernel invocation command
150+
(e.g. `parallel_for`). This check must take all APIs that allocation device
151+
local memory into account, whether via the `work_group_scratch_size` property
152+
or other APIs such as `local_accessor`.
152153

153-
===== Allocations with size unknown at compile-time
154+
=== Example
154155

155156
[source,c++]
156157
----
157-
using namespace syclex = sycl::ext::oneapi::experimental;
158-
158+
namespace syclex = sycl::ext::oneapi::experimental;
159159
160160
...
161161
162162
q.parallel_for(sycl::nd_range<1>{N, M},
163163
syclex::properties{syclex::work_group_scratch_size(M * sizeof(int))},
164164
[=](sycl::nd_item<1> it) {
165-
auto ptr = get_work_group_scratch_memory();
166-
auto ptr2 = get_work_group_scratch_memory();
165+
auto ptr = syclex::get_work_group_scratch_memory();
166+
auto ptr2 = syclex::get_work_group_scratch_memory();
167167
});
168168
----
169169

sycl/doc/extensions/experimental/sycl_ext_oneapi_work_group_static.asciidoc

Lines changed: 29 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
== Notice
2121

2222
[%hardbreaks]
23-
Copyright (C) 2024 Intel Corporation. All rights reserved.
23+
Copyright (C) 2023 Intel Corporation. All rights reserved.
2424

2525
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
2626
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
@@ -36,23 +36,23 @@ https://github.com/intel/llvm/issues
3636

3737
== Dependencies
3838

39-
This extension is written against the SYCL 2020 revision 8 specification. All
39+
This extension is written against the SYCL 2020 revision 9 specification. All
4040
references below to the "core SYCL specification" or to section numbers in the
4141
SYCL specification refer to that revision.
4242

4343

4444
== Status
4545

46-
This is an experimental extension specification, intended to gather community
47-
feedback. Interfaces defined in this specification may not be implemented yet
48-
or may be in a preliminary state. The specification itself may also change in
49-
incompatible ways before it is finalized. *Shipping software products should
50-
not rely on APIs defined in this specification.*
51-
46+
This is an experimental extension specification, intended to provide early
47+
access to features and gather community feedback. Interfaces defined in this
48+
specification are implemented in {dpcpp}, but they are not finalized and may
49+
change incompatibly in future versions of {dpcpp} without prior notice.
50+
*Shipping software products should not rely on APIs defined in this
51+
specification.*
5252

5353
== Overview
5454

55-
This extension adds a ways to allocate device local memory, without passing a
55+
This extension adds a way to allocate device local memory, without passing a
5656
kernel argument: `work_group_static`.
5757
Device local memory is memory that is shared by all work-items in a work-group.
5858
The behavior is similar to the CUDA `+__shared__+` keyword, and the extension
@@ -111,15 +111,12 @@ public:
111111
const work_group_static& operator=(const T& value) const noexcept;
112112
113113
T* operator&() const noexcept;
114-
115-
private:
116-
T storage;
117114
};
118115
119116
} // namespace sycl::ext::oneapi::experimental
120117
----
121118

122-
`T` must be trivially constructible and trivially destructible.
119+
`T` must be a cv-unqualified trivially constructible and trivially destructible.
123120

124121
The storage for the object is allocated in device local memory before
125122
calling the user's kernel lambda, and deallocated when all work-items
@@ -164,7 +161,7 @@ work_group_static& operator=(const T& value) noexcept;
164161
----
165162
_Constraints_: Available only if `std::is_array_v<T>` is false.
166163

167-
_Effects_: Replaces the value referenced by `*ptr` with `value`.
164+
_Effects_: Copy `value` into the storage of the `work_group_static` instance.
168165

169166
_Returns_: A reference to this instance of `work_group_static`.
170167

@@ -173,7 +170,7 @@ _Returns_: A reference to this instance of `work_group_static`.
173170
T* operator&() noexcept;
174171
----
175172
_Returns_: A pointer to the device local memory associated with this
176-
instance of `work_group_static` (i.e., `ptr`).
173+
instance of `work_group_static`.
177174

178175
==== Interaction with common address space deduction rules
179176

@@ -183,33 +180,39 @@ the local address space.
183180

184181
=== Total allocation check
185182

186-
If the total amount of device local memory requested (i.e., the sum of
187-
all memory requested by `local_accessor`, `group_local_memory`,
188-
`group_local_memory_for_overwrite`, `work_group_static` and `work_group_scratch_size`) exceeds a device's
189-
local memory capacity (as reported by `local_mem_size`) then the implementation
190-
must throw a synchronous `exception` with the `errc::memory_allocation` error
191-
code from the kernel invocation command (e.g. `parallel_for`).
192-
193-
==== Usage examples
183+
If the total amount of device local memory requested exceeds a device's
184+
local memory capacity as reported by `info::device::local_mem_size`
185+
then the implementation must throw a synchronous exception with the
186+
`errc::memory_allocation` error code from the kernel invocation command
187+
(e.g. `parallel_for`). This check must take all APIs that allocation device
188+
local memory into account, whether via the `work_group_scratch_size` property
189+
or other APIs such as `local_accessor`.
194190

195-
===== Allocations with size known at compile-time
191+
=== Example
196192

197193
[source,c++]
198194
----
199-
using namespace syclex = sycl::ext::oneapi::experimental;
195+
namespace syclex = sycl::ext::oneapi::experimental;
200196
201197
/* optional: static */ syclex::work_group_static<int> program_scope_scalar;
202198
/* optional: static */ syclex::work_group_static<int[16]> program_scope_array;
203199
200+
class ClassScope {
201+
static syclex::work_group_static<int> class_scope_scalar;
202+
};
203+
204+
syclex::work_group_static<int> ClassScope::class_scope_scalar;
205+
204206
void foo() {
205207
/* optional: static */ syclex::work_group_static<int> function_scope_scalar;
206208
function_scope_scalar = 1; // assignment via overloaded = operator
207209
function_scope_scalar += 2; // += operator via implicit conversion to int&
210+
class_scope_scalar = 3;
208211
int* ptr = &function_scope_scalar; // conversion to pointer via overloaded & operator
209212
}
210213
211214
void bar() {
212-
/* optional: static */ sylex::work_group_static<int[64]> function_scope_array;
215+
/* optional: static */ syclex::work_group_static<int[64]> function_scope_array;
213216
function_scope_array[0] = 1; // [] operator via implicit conversion to int(&)[64]
214217
int* ptr = function_scope_array; // conversion to pointer via implicit conversion to int(&)[64]
215218
}

sycl/include/sycl/ext/oneapi/experimental/graph.hpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,8 @@ enum class UnsupportedGraphFeatures {
5959
sycl_ext_oneapi_device_global = 6,
6060
sycl_ext_oneapi_bindless_images = 7,
6161
sycl_ext_oneapi_experimental_cuda_cluster_launch = 8,
62-
sycl_ext_codeplay_enqueue_native_command = 9
62+
sycl_ext_codeplay_enqueue_native_command = 9,
63+
sycl_ext_oneapi_work_group_scratch_memory = 10
6364
};
6465

6566
inline const char *

sycl/include/sycl/ext/oneapi/work_group_scratch_memory.hpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ inline void* get_work_group_scratch_memory() {
2323
#else
2424
throw sycl::exception(
2525
sycl::errc::feature_not_supported,
26-
"sycl_ext_oneapi_work_group_dynamic extension is not supported on host");
26+
"sycl_ext_oneapi_work_scratch_memory extension is not supported on host");
2727
#endif
2828
}
2929

sycl/include/sycl/ext/oneapi/work_group_static.hpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
#include <sycl/detail/defines_elementary.hpp> // for __SYCL_ALWAYS_INLINE
1111
#include <sycl/exception.hpp> // for exception
1212

13-
#include <type_traits> // for enable_if_t
13+
#include <type_traits> // for enable_if_t, is_trivially_destructible_v ...
1414

1515
namespace sycl {
1616
inline namespace _V1 {
@@ -34,6 +34,8 @@ template <typename T> class __SYCL_WG_SCOPE work_group_static final {
3434
std::is_trivially_destructible_v<T> &&
3535
std::is_trivially_constructible_v<T>,
3636
"Can only be used with trivially constructible and destructible types");
37+
static_assert(!std::is_const_v<T> || !std::is_volatile_v<T>,
38+
"Can only be used with non const and non volatile types");
3739
__SYCL_ALWAYS_INLINE work_group_static() = default;
3840
work_group_static(const work_group_static &) = delete;
3941
work_group_static &operator=(const work_group_static &) = delete;

sycl/source/feature_test.hpp.in

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ inline namespace _V1 {
109109
#define SYCL_EXT_ONEAPI_PROFILING_TAG 1
110110
#define SYCL_EXT_ONEAPI_ENQUEUE_NATIVE_COMMAND 1
111111
#define SYCL_EXT_ONEAPI_GET_KERNEL_INFO 1
112-
#define SYCL_EXT_ONEAPI_WORK_GROUP_DYNAMIC 1
112+
#define SYCL_EXT_ONEAPI_WORK_GROUP_SCRATCH_MEMORY 1
113113
#define SYCL_EXT_ONEAPI_WORK_GROUP_STATIC 1
114114
// In progress yet
115115
#define SYCL_EXT_ONEAPI_ATOMIC16 0

sycl/source/handler.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1951,6 +1951,8 @@ void handler::setKernelClusterLaunch(sycl::range<3> ClusterSize, int Dims) {
19511951
}
19521952

19531953
void handler::setKernelWorkGroupMem(size_t Size) {
1954+
throwIfGraphAssociated<syclex::detail::UnsupportedGraphFeatures::
1955+
sycl_ext_oneapi_work_group_scratch_memory>();
19541956
impl->MKernelWorkGroupMemorySize = Size;
19551957
}
19561958

sycl/test-e2e/Graph/Inputs/work_group_dynamic_memory.cpp

Lines changed: 0 additions & 65 deletions
This file was deleted.

sycl/test-e2e/WorkGroupMemory/Dynamic/copy_dynamic_size.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22
// RUN: %{run} %t.out
33
//
44

5+
// UNSUPPORTED: gpu-intel-gen12
6+
57
#include <sycl/detail/core.hpp>
68
#include <sycl/ext/oneapi/work_group_scratch_memory.hpp>
79
#include <sycl/group_barrier.hpp>

sycl/test-e2e/WorkGroupMemory/Dynamic/dynamic_alloc_local_accessor.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22
// RUN: %{run} %t.out
33
//
44

5+
// UNSUPPORTED: gpu-intel-gen12
6+
57
// Test work_group_dynamic extension with allocation size specified at runtime
68
// and an additional local accessor.
79

0 commit comments

Comments
 (0)