|
| 1 | +.. _sycl-vs-dpex: |
| 2 | + |
| 3 | + |
| 4 | +SYCL* and numba-dpex Feature Comparison |
| 5 | +####################################### |
| 6 | + |
| 7 | +The numba-dpex kernel API is developed with the aim of providing a SYCL*-like |
| 8 | +kernel programming features directly in Python. The page provides a summary of |
| 9 | +the SYCL* kernel programming features that are currently supported in |
| 10 | +numba-dpex's kernel API. |
| 11 | + |
| 12 | +Numba-dpex does not implement wrappers or analogues of SYCL's host-callable |
| 13 | +runtime API. Such features are provided by the ``dpctl`` package. |
| 14 | + |
| 15 | +.. list-table:: Ranges and index space identifiers |
| 16 | + :widths: 25 25 50 |
| 17 | + :header-rows: 1 |
| 18 | + |
| 19 | + * - SYCL* class |
| 20 | + - numba-dpex class |
| 21 | + - Notes |
| 22 | + * - ``range`` |
| 23 | + - :class:`numba_dpex.kernel_api.Range` |
| 24 | + - |
| 25 | + * - ``nd_range`` |
| 26 | + - :class:`numba_dpex.kernel_api.NdRange` |
| 27 | + - |
| 28 | + * - ``id`` |
| 29 | + - |
| 30 | + - Not directly supported. All functions that return an ``id`` object in |
| 31 | + SYCL have versions in numba-dpex that require the dimension to be |
| 32 | + explicitly specified. Equivalent to ``get_id.get(dim)``. |
| 33 | + * - ``item`` |
| 34 | + - :class:`numba_dpex.kernel_api.Item` |
| 35 | + - |
| 36 | + * - ``nd_item`` |
| 37 | + - :class:`numba_dpex.kernel_api.NdItem` |
| 38 | + - |
| 39 | + * - ``h_item`` |
| 40 | + - |
| 41 | + - Not supported. There is no corresponding API in numba-dpex for |
| 42 | + ``group::parallel_for_work_item`` or ``parallel_for_work_group``. |
| 43 | + * - ``group`` |
| 44 | + - :class:`numba_dpex.kernel_api.Group` |
| 45 | + - |
| 46 | + * - ``sub_group`` |
| 47 | + - |
| 48 | + - Not supported |
| 49 | + |
| 50 | +.. list-table:: Reduction variables |
| 51 | + :widths: 25 25 50 |
| 52 | + :header-rows: 1 |
| 53 | + |
| 54 | + * - SYCL* class |
| 55 | + - numba-dpex class |
| 56 | + - Notes |
| 57 | + * - ``reduction`` |
| 58 | + - |
| 59 | + - Not supported |
| 60 | + * - ``reducer`` |
| 61 | + - |
| 62 | + - Not supported |
| 63 | + |
| 64 | +.. list-table:: Invoking kernels |
| 65 | + :widths: 25 25 50 |
| 66 | + :header-rows: 1 |
| 67 | + |
| 68 | + * - SYCL* function for invoking kernels |
| 69 | + - numba-dpex function for invoking kernels |
| 70 | + - Notes |
| 71 | + * - ``single_task`` |
| 72 | + - |
| 73 | + - Not supported |
| 74 | + * - ``parallel_for`` |
| 75 | + - :func:`numba_dpex.core.kernel_launcher.call_kernel` |
| 76 | + - |
| 77 | + |
| 78 | + |
| 79 | +.. list-table:: Synchronization and atomics |
| 80 | + :widths: 25 25 50 |
| 81 | + :header-rows: 1 |
| 82 | + |
| 83 | + * - SYCL* feature |
| 84 | + - numba-dpex feature |
| 85 | + - Notes |
| 86 | + * - Accessor classes |
| 87 | + - |
| 88 | + - Not supported. Explicit ``sycl::event`` SYCL* objects exposed as |
| 89 | + ``dpctl.SyclEvent`` Python objects can be used for asynchronous kernel |
| 90 | + invocation using the |
| 91 | + :func:`numba_dpex.core.kernel_launcher.call_kernel_async` function. |
| 92 | + * - ``group_barrier`` |
| 93 | + - :func:`numba_dpex.kernel_api.group_barrier` |
| 94 | + - group_barrier does not support synchronization across a sub-group. |
| 95 | + * - ``atomic_fence`` |
| 96 | + - :func:`numba_dpex.kernel_api.atomic_fence` |
| 97 | + - |
| 98 | + * - ``device_event`` |
| 99 | + - |
| 100 | + - Not supported |
| 101 | + * - ``atomic_ref`` |
| 102 | + - :class:`numba_dpex.kernel_api.AtomicRef` |
| 103 | + - Atomic references are supported for both global and local memory. |
| 104 | + |
| 105 | +.. list-table:: On-device memory allocation |
| 106 | + :widths: 25 25 50 |
| 107 | + :header-rows: 1 |
| 108 | + |
| 109 | + * - SYCL* class |
| 110 | + - numba-dpex class |
| 111 | + - Notes |
| 112 | + * - ``local_accessor`` |
| 113 | + - :class:`numba_dpex.kernel_api.LocalAccessor` |
| 114 | + - |
| 115 | + * - ``private_memory`` |
| 116 | + - |
| 117 | + - Not supported as there is no corresponding API in numba-dpex for |
| 118 | + ``group::parallel_for_work_item`` or ``parallel_for_work_group``. |
| 119 | + |
| 120 | + Allocating variables on a work-item's private memory can be done using |
| 121 | + :class:`numba_dpex.kernel_api.PrivateMemory`. |
| 122 | + * - Constant memory |
| 123 | + - |
| 124 | + - SYCL 2020 no longer defines a constant memory region in the device memory |
| 125 | + model specification and as such the feature is not implemented by |
| 126 | + numba-dpex. |
| 127 | + * - Global memory |
| 128 | + - |
| 129 | + - Global memory allocation is not handled by numba-dpex and the kernel |
| 130 | + argument is expected to have allocated memory on a device's global |
| 131 | + memory region using a USM allocators. Such allocators are provided by |
| 132 | + the ``dpctl`` package. |
| 133 | + |
| 134 | +.. list-table:: Group functions |
| 135 | + :widths: 25 25 50 |
| 136 | + :header-rows: 1 |
| 137 | + |
| 138 | + * - SYCL* group function |
| 139 | + - numba-dpex function |
| 140 | + - Notes |
| 141 | + * - ``group_broadcast`` |
| 142 | + - |
| 143 | + - Not supported |
| 144 | + * - ``group_barrier`` |
| 145 | + - :func:`numba_dpex.kernel_api.group_barrier` |
| 146 | + - group_barrier does not support synchronization across a sub-group. |
| 147 | + |
| 148 | +.. list-table:: Group algorithms |
| 149 | + :widths: 25 25 50 |
| 150 | + :header-rows: 1 |
| 151 | + |
| 152 | + * - SYCL* group algorithm |
| 153 | + - numba-dpex function |
| 154 | + - Notes |
| 155 | + * - ``joint_any_of`` |
| 156 | + - |
| 157 | + - Not supported |
| 158 | + * - ``joint_all_of`` |
| 159 | + - |
| 160 | + - Not supported |
| 161 | + * - ``joint_none_of`` |
| 162 | + - |
| 163 | + - Not supported |
| 164 | + * - ``any_of_group`` |
| 165 | + - |
| 166 | + - Not supported |
| 167 | + * - ``all_of_group`` |
| 168 | + - |
| 169 | + - Not supported |
| 170 | + * - ``none_of_group`` |
| 171 | + - |
| 172 | + - Not supported |
| 173 | + * - ``shift_group_left`` |
| 174 | + - |
| 175 | + - Not supported |
| 176 | + * - ``shift_group_right`` |
| 177 | + - |
| 178 | + - Not supported |
| 179 | + * - ``permute_group_by_xor`` |
| 180 | + - |
| 181 | + - Not supported |
| 182 | + * - ``select_from_group`` |
| 183 | + - |
| 184 | + - Not supported |
| 185 | + * - ``joint_reduce`` |
| 186 | + - |
| 187 | + - Not supported |
| 188 | + * - ``reduce_over_group`` |
| 189 | + - |
| 190 | + - Not supported |
| 191 | + * - ``joint_exclusive_scan`` |
| 192 | + - |
| 193 | + - Not supported |
| 194 | + * - ``joint_inclusive_scan`` |
| 195 | + - |
| 196 | + - Not supported |
| 197 | + * - ``exclusive_scan_over_group`` |
| 198 | + - |
| 199 | + - Not supported |
| 200 | + * - ``inclusive_scan_over_group`` |
| 201 | + - |
| 202 | + - Not supported |
| 203 | + |
| 204 | +.. list-table:: Math functions |
| 205 | + :widths: 25 25 50 |
| 206 | + :header-rows: 1 |
| 207 | + |
| 208 | + * - SYCL* math function category |
| 209 | + - numba-dpex |
| 210 | + - Notes |
| 211 | + * - Math functions |
| 212 | + - |
| 213 | + - Refer the kernel programming guide for list of supported functions. |
| 214 | + * - Half and reduced precision math functions |
| 215 | + - |
| 216 | + - Not supported |
0 commit comments