Skip to content

Commit 9b4b8aa

Browse files
author
Diptorup Deb
committed
Edits to overview section.
1 parent cec6ad8 commit 9b4b8aa

File tree

3 files changed

+36
-30
lines changed

3 files changed

+36
-30
lines changed

docs/source/ext_links.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,4 @@
2424
.. _Data Parallel Extensions for Python*: https://intelpython.github.io/DPEP/main/
2525
.. _Intel VTune Profiler: https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html
2626
.. _Intel Advisor: https://www.intel.com/content/www/us/en/developer/tools/oneapi/advisor.html
27+
.. _oneMKL: https://www.intel.com/content/www/us/en/docs/oneapi/programming-guide/2023-2/intel-oneapi-math-kernel-library-onemkl.html

docs/source/overview.rst

Lines changed: 27 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -15,23 +15,23 @@ implementation of `NumPy*`_'s API using the `SYCL*`_ language.
1515
.. the same time automatically running such code parallelly on various types of
1616
.. architecture.
1717
18-
``numba-dpex`` is developed as part of `Intel AI Analytics Toolkit`_ and
19-
is distributed with the `Intel Distribution for Python*`_. The extension is
20-
available on Anaconda cloud and as a Docker image on GitHub. Please refer the
21-
:doc:`getting_started` page to learn more.
18+
``numba-dpex`` is an open-source project and can be installed as part of `Intel
19+
AI Analytics Toolkit`_ or the `Intel Distribution for Python*`_. The package is
20+
also available on Anaconda cloud and as a Docker image on GitHub. Please refer
21+
the :doc:`getting_started` page to learn more.
2222

2323
Main Features
2424
-------------
2525

2626
Portable Kernel Programming
2727
~~~~~~~~~~~~~~~~~~~~~~~~~~~
2828

29-
The ``numba-dpex`` kernel API has a design and API similar to Numba's
29+
The ``numba-dpex`` kernel programming API has a design similar to Numba's
3030
``cuda.jit`` sub-module. The API is modeled after the `SYCL*`_ language and uses
3131
the `DPC++`_ SYCL runtime. Currently, compilation of kernels is supported for
3232
SPIR-V-based OpenCL and `oneAPI Level Zero`_ devices CPU and GPU devices. In the
33-
future, the API can be extended to other architectures that are supported by
34-
DPC++.
33+
future, compilation support for other types of hardware that are supported by
34+
DPC++ will be added.
3535

3636
The following example illustrates a vector addition kernel written with
3737
``numba-dpex`` kernel API.
@@ -56,31 +56,33 @@ The following example illustrates a vector addition kernel written with
5656
print(c)
5757
5858
In the above example, three arrays are allocated on a default ``gpu`` device
59-
using the ``dpnp`` library. These arrays are then passed as input arguments to
60-
the kernel function. The compilation target and the subsequent execution of the
61-
kernel is determined completely by the input arguments and follow the
59+
using the ``dpnp`` library. The arrays are then passed as input arguments to the
60+
kernel function. The compilation target and the subsequent execution of the
61+
kernel is determined by the input arguments and follow the
6262
"compute-follows-data" programming model as specified in the `Python* Array API
6363
Standard`_. To change the execution target to a CPU, the device keyword needs to
6464
be changed to ``cpu`` when allocating the ``dpnp`` arrays. It is also possible
6565
to leave the ``device`` keyword undefined and let the ``dpnp`` library select a
6666
default device based on environment flag settings. Refer the
6767
:doc:`user_guide/kernel_programming/index` for further details.
6868

69-
``dpnp`` compilation support
70-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
71-
72-
``numba-dpex`` extends Numba's type system and compilation pipeline to compile
73-
``dpnp`` functions and expressions in the same way as NumPy. Unlike Numba's
74-
NumPy compilation that is serial by default, ``numba-dpex`` always compiles
75-
``dpnp`` expressions into data-parallel kernels and executes them in parallel.
76-
The ``dpnp`` compilation feature is provided using a decorator ``dpjit`` that
77-
behaves identically to ``numba.njit(parallel=True)`` with the addition of
78-
``dpnp`` compilation and kernel offloading. Offloading by ``numba-dpex`` is not
79-
just restricted to CPUs and supports all devices that are presently supported by
80-
the kernel API. ``dpjit`` allows using NumPy and ``dpnp`` expressions in the
81-
same function. All NumPy compilation and parallelization is done via the default
82-
Numba code-generation pipeline, whereas ``dpnp`` expressions are compiled using
83-
the ``numba-dpex`` pipeline.
69+
``dpjit`` decorator
70+
~~~~~~~~~~~~~~~~~~~
71+
72+
The ``numba-dpex`` package provides a new decorator ``dpjit`` that extends
73+
Numba's ``njit`` decorator. The new decorator is equivalent to
74+
``numba.njit(parallel=True)``, but additionally supports compiling ``dpnp``
75+
functions, ``prange`` loops, and array expressions that use ``dpnp.ndarray``
76+
objects.
77+
78+
Unlike Numba's NumPy parallelization that only supports CPUs, ``dpnp``
79+
expressions are first converted to data-parallel kernels and can then be
80+
`offloaded` to different types of devices. As ``dpnp`` implements the same API
81+
as NumPy*, an existing ``numba.njit`` decorated function that uses
82+
``numpy.ndarray`` may be refactored to use ``dpnp.ndarray`` and decorated with
83+
``dpjit``. Such a refactoring can allow the parallel regions to be offloaded
84+
to a supported GPU device, providing users an additional option to execute their
85+
code parallelly.
8486

8587
The vector addition example depicted using the kernel API can also be
8688
expressed in several different ways using ``dpjit``.

docs/source/user_guide/dpnp_offload.rst

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,14 @@
33
Compiling and Offloading ``dpnp`` Functions
44
===========================================
55

6-
Data-Parallel Numeric Python (``dpnp``) is a drop-in ``NumPy*`` replacement library. The
7-
library is developed using SYCL and oneMKL. ``numba-dpex`` relies on ``dpnp`` to
8-
support offloading ``NumPy`` library functions to SYCL devices. For ``NumPy`` functions
9-
that are offloaded using ``dpnp``, ``numba-dpex`` generates library calls directly to
10-
``dpnp``'s `low-level API`_ inside the generated LLVM IR.
6+
Data Parallel Extension for NumPy* (``dpnp``) is a drop-in ``NumPy*``
7+
replacement library built on top of oneMKL.
8+
9+
10+
``numba-dpex`` relies on ``dpnp`` to
11+
support offloading ``NumPy`` library functions to SYCL devices. For ``NumPy``
12+
functions that are offloaded using ``dpnp``, ``numba-dpex`` generates library
13+
calls directly to ``dpnp``'s `low-level API`_ inside the generated LLVM IR.
1114

1215
.. _low-level API: https://github.com/IntelPython/dpnp/tree/master/dpnp/backend
1316

0 commit comments

Comments
 (0)