@@ -15,23 +15,23 @@ implementation of `NumPy*`_'s API using the `SYCL*`_ language.
15
15
.. the same time automatically running such code parallelly on various types of
16
16
.. architecture.
17
17
18
- ``numba-dpex `` is developed as part of `Intel AI Analytics Toolkit `_ and
19
- is distributed with the `Intel Distribution for Python* `_. The extension is
20
- available on Anaconda cloud and as a Docker image on GitHub. Please refer the
21
- :doc: `getting_started ` page to learn more.
18
+ ``numba-dpex `` is an open-source project and can be installed as part of `Intel
19
+ AI Analytics Toolkit `_ or the `Intel Distribution for Python* `_. The package is
20
+ also available on Anaconda cloud and as a Docker image on GitHub. Please refer
21
+ the :doc: `getting_started ` page to learn more.
22
22
23
23
Main Features
24
24
-------------
25
25
26
26
Portable Kernel Programming
27
27
~~~~~~~~~~~~~~~~~~~~~~~~~~~
28
28
29
- The ``numba-dpex `` kernel API has a design and API similar to Numba's
29
+ The ``numba-dpex `` kernel programming API has a design similar to Numba's
30
30
``cuda.jit `` sub-module. The API is modeled after the `SYCL* `_ language and uses
31
31
the `DPC++ `_ SYCL runtime. Currently, compilation of kernels is supported for
32
32
SPIR-V-based OpenCL and `oneAPI Level Zero `_ devices CPU and GPU devices. In the
33
- future, the API can be extended to other architectures that are supported by
34
- DPC++.
33
+ future, compilation support for other types of hardware that are supported by
34
+ DPC++ will be added .
35
35
36
36
The following example illustrates a vector addition kernel written with
37
37
``numba-dpex `` kernel API.
@@ -56,31 +56,33 @@ The following example illustrates a vector addition kernel written with
56
56
print (c)
57
57
58
58
In the above example, three arrays are allocated on a default ``gpu `` device
59
- using the ``dpnp `` library. These arrays are then passed as input arguments to
60
- the kernel function. The compilation target and the subsequent execution of the
61
- kernel is determined completely by the input arguments and follow the
59
+ using the ``dpnp `` library. The arrays are then passed as input arguments to the
60
+ kernel function. The compilation target and the subsequent execution of the
61
+ kernel is determined by the input arguments and follow the
62
62
"compute-follows-data" programming model as specified in the `Python* Array API
63
63
Standard `_. To change the execution target to a CPU, the device keyword needs to
64
64
be changed to ``cpu `` when allocating the ``dpnp `` arrays. It is also possible
65
65
to leave the ``device `` keyword undefined and let the ``dpnp `` library select a
66
66
default device based on environment flag settings. Refer the
67
67
:doc: `user_guide/kernel_programming/index ` for further details.
68
68
69
- ``dpnp `` compilation support
70
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
71
-
72
- ``numba-dpex `` extends Numba's type system and compilation pipeline to compile
73
- ``dpnp `` functions and expressions in the same way as NumPy. Unlike Numba's
74
- NumPy compilation that is serial by default, ``numba-dpex `` always compiles
75
- ``dpnp `` expressions into data-parallel kernels and executes them in parallel.
76
- The ``dpnp `` compilation feature is provided using a decorator ``dpjit `` that
77
- behaves identically to ``numba.njit(parallel=True) `` with the addition of
78
- ``dpnp `` compilation and kernel offloading. Offloading by ``numba-dpex `` is not
79
- just restricted to CPUs and supports all devices that are presently supported by
80
- the kernel API. ``dpjit `` allows using NumPy and ``dpnp `` expressions in the
81
- same function. All NumPy compilation and parallelization is done via the default
82
- Numba code-generation pipeline, whereas ``dpnp `` expressions are compiled using
83
- the ``numba-dpex `` pipeline.
69
+ ``dpjit `` decorator
70
+ ~~~~~~~~~~~~~~~~~~~
71
+
72
+ The ``numba-dpex `` package provides a new decorator ``dpjit `` that extends
73
+ Numba's ``njit `` decorator. The new decorator is equivalent to
74
+ ``numba.njit(parallel=True) ``, but additionally supports compiling ``dpnp ``
75
+ functions, ``prange `` loops, and array expressions that use ``dpnp.ndarray ``
76
+ objects.
77
+
78
+ Unlike Numba's NumPy parallelization that only supports CPUs, ``dpnp ``
79
+ expressions are first converted to data-parallel kernels and can then be
80
+ `offloaded ` to different types of devices. As ``dpnp `` implements the same API
81
+ as NumPy*, an existing ``numba.njit `` decorated function that uses
82
+ ``numpy.ndarray `` may be refactored to use ``dpnp.ndarray `` and decorated with
83
+ ``dpjit ``. Such a refactoring can allow the parallel regions to be offloaded
84
+ to a supported GPU device, providing users an additional option to execute their
85
+ code parallelly.
84
86
85
87
The vector addition example depicted using the kernel API can also be
86
88
expressed in several different ways using ``dpjit ``.
0 commit comments