@@ -213,42 +213,6 @@ These translate any kernel dimensions from one convention to the other. An
213213example of an equivalent SYCL call for a 3D kernel using ` compat ` is
214214` syclcompat::global_id::x() == get_global_id(2) ` .
215215
216- ### ptr_to_int
217-
218- The following cuda backend specific function is introduced in order
219- to translate from the local memory pointers introduced above to ` uint32_t ` or
220- ` size_t ` variables that contain a byte address to the local
221- ( local refers to`.shared` in nvptx) memory state space.
222-
223- ``` c++
224- namespace syclcompat {
225- template <typename T >
226- __ syclcompat_inline__
227- std::enable_if_t<std::is_same_v<T, uint32_t> || std::is_same_v<T, size_t>,
228- T>
229- ptr_to_int(void * ptr)
230- } // syclcompat
231- ```
232-
233- These variables can be used in inline PTX instructions that take address
234- operands. Such inline PTX instructions are commonly used in optimized libraries.
235- A simplified example usage of the above functions is as follows:
236-
237- ``` c++
238- half *data = syclcompat::local_mem<half[NUM_ELEMENTS]>();
239- // ...
240- // ...
241- T addr =
242- syclcompat::ptr_to_int<T>(reinterpret_cast<char *>(data) + (id % 8) * 16);
243-
244- uint32_t fragment;
245- #if defined(__NVPTX__)
246- asm volatile("ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%0}, [%1];\n"
247- : "=r"(fragment)
248- : "r"(addr));
249- #endif
250- ```
251-
252216### launch<function >
253217
254218SYCLcompat provides a kernel ` launch ` interface which accepts a function that
@@ -983,6 +947,42 @@ public:
983947};
984948```
985949
950+ ### ptr_to_int
951+
952+ The following cuda backend specific function is introduced in order to
953+ translate from local memory pointers to ` uint32_t ` or ` size_t ` variables that
954+ contain a byte address to the local (local refers to` .shared ` in nvptx) memory
955+ state space.
956+
957+ ``` c++
958+ namespace syclcompat {
959+ template <typename T >
960+ __ syclcompat_inline__
961+ std::enable_if_t<std::is_same_v<T, uint32_t> || std::is_same_v<T, size_t>,
962+ T>
963+ ptr_to_int(void * ptr)
964+ } // syclcompat
965+ ```
966+
967+ These variables can be used in inline PTX instructions that take address
968+ operands. Such inline PTX instructions are commonly used in optimized
969+ libraries. A simplified example usage of the above functions is as follows:
970+
971+ ``` c++
972+ half *data = syclcompat::local_mem<half[NUM_ELEMENTS]>();
973+ // ...
974+ // ...
975+ T addr =
976+ syclcompat::ptr_to_int<T>(reinterpret_cast<char *>(data) + (id % 8) * 16);
977+
978+ uint32_t fragment;
979+ #if defined(__NVPTX__)
980+ asm volatile("ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%0}, [%1];\n"
981+ : "=r"(fragment)
982+ : "r"(addr));
983+ #endif
984+ ```
985+
986986### Device Management
987987
988988Multiple SYCL functionalities are exposed through utility functions to manage
0 commit comments