@@ -64,7 +64,7 @@ Device local memory is memory that is shared by all work-items in a work-group.
6464The behavior is similar to the CUDA `+__shared__+` keyword, and the extension
6565draws some inspiration from the {cpp} `thread_local` keyword.
6666
67- `work_group_static` variables can be allocated at namespace or block scope,
67+ `work_group_static` can only be used to declare variables at namespace, block or class scope,
6868lifting many of the restrictions in the existing
6969link:../supported/sycl_ext_oneapi_local_memory.asciidoc[sycl_ext_oneapi_local_memory]
7070extension. Note, however, that `work_group_static` variables currently place
@@ -106,7 +106,7 @@ an object into device local memory.
106106namespace sycl::ext::oneapi::experimental {
107107
108108template <typename T>
109- class work_group_static {
109+ class work_group_static final {
110110public:
111111
112112 work_group_static() = default;
@@ -121,8 +121,7 @@ public:
121121 T* operator&() const noexcept;
122122
123123private:
124- T* ptr; // exposition only
125-
124+ T storage;
126125};
127126
128127} // namespace sycl::ext::oneapi::experimental
@@ -134,6 +133,9 @@ The storage for the object is allocated in device local memory before
134133calling the user's kernel lambda, and deallocated when all work-items
135134in the work-group have completed execution of the kernel.
136135
136+ Objects of type `work_group_static` must only be declared at namespace, block, lambda or class scope.
137+ If the object is declared in class scope, it must be declared as a static data member.
138+
137139SYCL implementations conforming to the full feature set treat
138140`work_group_static` similarly to the `thread_local` keyword, and when
139141a `work_group_static` object is declared at block scope it behaves
@@ -150,18 +152,11 @@ multiple times, developers must take care to avoid race conditions (e.g., by
150152calling `group_barrier` before and after using the memory).
151153====
152154
153- Change to SYCL 2020 section `5.9.2 Common address space deduction rules`:
154- Namespace scope: if the variable is `work_group_static` object,
155- then the variable is assigned to the local address space.
156- Otherwise normal rules applies.
157-
158155SYCL 2020 requires that all global variables accessed by a device function are
159156`const` or `constexpr`. This extension lifts that restriction for
160157`work_group_static` variables.
161158
162- When `T` is a class type or bounded array, the size of the allocation is known
163- at compile-time, and a SYCL implementation embeds the size of the allocation
164- directly within a kernel. Each instance of `work_group_static<T>` is associated
159+ Each instance of `work_group_static<T>` is associated
165160with a unique allocation in device local memory.
166161
167162[source,c++]
@@ -173,7 +168,7 @@ associated with this instance of `work_group_static`.
173168
174169[source,c++]
175170----
176- work_group_static<T> & operator=(const T& value) noexcept;
171+ work_group_static& operator=(const T& value) noexcept;
177172----
178173_Constraints_: Available only if `std::is_array_v<T>` is false.
179174
@@ -188,6 +183,11 @@ T* operator&() noexcept;
188183_Returns_: A pointer to the device local memory associated with this
189184instance of `work_group_static` (i.e., `ptr`).
190185
186+ ==== Interaction with common address space deduction rules
187+
188+ Objects of type `work_group_static` are assigned to
189+ the local address space.
190+
191191=== `get_dynamic_work_group_memory` function
192192
193193The `get_dynamic_work_group_memory` function provides access
@@ -213,15 +213,7 @@ in device local memory, regardless of `T`. For example, two call declared
213213as `get_dynamic_work_group_memory<int>` and
214214`get_dynamic_work_group_memory<float>` will be associated with the same shared allocation.
215215
216- If the total amount of device local memory requested (i.e., the sum of
217- all memory requested by `local_accessor`, `group_local_memory`,
218- `group_local_memory_for_overwrite` and `work_group_static`) exceeds a device's
219- local memory capacity (as reported by `local_mem_size`) then the implementation
220- must throw a synchronous `exception` with the `errc::memory_allocation` error
221- code from the kernel invocation command (e.g. `parallel_for`).
222-
223-
224- ==== Kernel properties
216+ === Kernel properties
225217
226218The `work_group_static_size` property must be passed to a kernel to determine
227219the run-time size of the device local memory allocation associated with
@@ -252,6 +244,14 @@ device local memory required by the kernel in bytes.
252244
253245|===
254246
247+ === Total allocation check
248+
249+ If the total amount of device local memory requested (i.e., the sum of
250+ all memory requested by `local_accessor`, `group_local_memory`,
251+ `group_local_memory_for_overwrite`, `work_group_static` and `work_group_static_size`) exceeds a device's
252+ local memory capacity (as reported by `local_mem_size`) then the implementation
253+ must throw a synchronous `exception` with the `errc::memory_allocation` error
254+ code from the kernel invocation command (e.g. `parallel_for`).
255255
256256==== Usage examples
257257
0 commit comments