Skip to content

Commit 0bcc50a

Browse files
author
Diptorup Deb
committed
Update to overview section
1 parent 3d8c8fe commit 0bcc50a

File tree

4 files changed

+77
-136
lines changed

4 files changed

+77
-136
lines changed

docs/source/conf.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@
2727
"sphinx.ext.extlinks",
2828
"sphinx.ext.githubpages",
2929
"sphinx.ext.napoleon",
30-
"sphinx.ext.autosectionlabel",
3130
"sphinxcontrib.programoutput",
3231
"sphinxcontrib.googleanalytics",
3332
"myst_parser",

docs/source/ext_links.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,3 +25,4 @@
2525
.. _Intel VTune Profiler: https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html
2626
.. _Intel Advisor: https://www.intel.com/content/www/us/en/developer/tools/oneapi/advisor.html
2727
.. _oneMKL: https://www.intel.com/content/www/us/en/docs/oneapi/programming-guide/2023-2/intel-oneapi-math-kernel-library-onemkl.html
28+
.. _UXL: https://uxlfoundation.org/

docs/source/overview.rst

Lines changed: 70 additions & 131 deletions
Original file line numberDiff line numberDiff line change
@@ -4,147 +4,86 @@
44
Overview
55
========
66

7-
Data Parallel Extension for Numba* (`numba-dpex`_) is an extension to
8-
the `Numba*`_ Python JIT compiler adding an architecture-agnostic kernel
9-
programming API, and a new front-end to compile the Data Parallel Extension
10-
for Numpy* (`dpnp`_) library. The ``dpnp`` Python library is a data-parallel
11-
implementation of `NumPy*`_'s API using the `SYCL*`_ language.
12-
13-
.. ``numba-dpex``'s support for ``dpnp`` compilation is a new way for Numba* users
14-
.. to write code in a NumPy-like API that is already supported by Numba*, while at
15-
.. the same time automatically running such code parallelly on various types of
16-
.. architecture.
17-
18-
``numba-dpex`` is an open-source project and can be installed as part of `Intel
19-
AI Analytics Toolkit`_ or the `Intel Distribution for Python*`_. The package is
20-
also available on Anaconda cloud and as a Docker image on GitHub. Please refer
21-
the :doc:`getting_started` page to learn more.
22-
23-
Main Features
24-
-------------
25-
26-
Portable Kernel Programming
27-
~~~~~~~~~~~~~~~~~~~~~~~~~~~
28-
29-
The ``numba-dpex`` kernel programming API has a design similar to Numba's
30-
``cuda.jit`` sub-module. The API is modeled after the `SYCL*`_ language and uses
31-
the `DPC++`_ SYCL runtime. Currently, compilation of kernels is supported for
32-
SPIR-V-based OpenCL and `oneAPI Level Zero`_ devices CPU and GPU devices. In the
33-
future, compilation support for other types of hardware that are supported by
34-
DPC++ will be added.
35-
36-
The following example illustrates a vector addition kernel written with
37-
``numba-dpex`` kernel API.
7+
Data Parallel Extension for Numba* (`numba-dpex`_) is a free and open-source
8+
LLVM-based code generator for portable accelerator programming in Python. The
9+
code generator implements a new pseudo-kernel programming domain-specific
10+
language (DSL) called `KAPI` that is modeled after the C++ DSL `SYCL*`_. The
11+
SYCL language is an open standard developed under the Unified Acceleration
12+
Foundation (`UXL`_) as a vendor-agnostic way of programming different types of
13+
data-parallel hardware such as multi-core CPUs, GPUs, and FPGAs. Numba-dpex and
14+
KAPI aim to bring the same vendor-agnostic and standard-compliant programming
15+
model to Python.
16+
17+
Numba-dpex is built on top of the open-source `Numba*`_ JIT compiler that
18+
implements a CPython bytecode parser and code generator to lower the bytecode to
19+
LLVM IR. The Numba* compiler is able to compile a large sub-set of Python and
20+
most of the NumPy library. Numba-dpex uses Numba*'s tooling to implement the
21+
parsing and typing support for the data types and functions defined in the KAPI
22+
DSL. A custom code generator is then used to lower KAPI to a form of LLVM IR
23+
that includes special LLVM instructions that define a low-level data-parallel
24+
kernel API. Thus, a function defined in KAPI is compiled to a data-parallel
25+
kernel that can run on different types of hardware. Currently, compilation of
26+
KAPI is possible for x86 CPU devices, Intel Gen9 integrated GPUs, Intel UHD
27+
integrated GPUs, and Intel discrete GPUs.
28+
29+
30+
The following example shows a pairwise distance matrix computation in KAPI.
3831

3932
.. code-block:: python
4033
41-
import dpnp
42-
import numba_dpex as dpex
43-
44-
45-
@dpex.kernel
46-
def vecadd_kernel(a, b, c):
47-
i = dpex.get_global_id(0)
48-
c[i] = a[i] + b[i]
49-
50-
51-
a = dpnp.ones(1024, device="gpu")
52-
b = dpnp.ones(1024, device="gpu")
53-
c = dpnp.empty_like(a)
54-
55-
vecadd_kernel[dpex.Range(1024)](a, b, c)
56-
print(c)
57-
58-
In the above example, three arrays are allocated on a default ``gpu`` device
59-
using the ``dpnp`` library. The arrays are then passed as input arguments to the
60-
kernel function. The compilation target and the subsequent execution of the
61-
kernel is determined by the input arguments and follow the
62-
"compute-follows-data" programming model as specified in the `Python* Array API
63-
Standard`_. To change the execution target to a CPU, the device keyword needs to
64-
be changed to ``cpu`` when allocating the ``dpnp`` arrays. It is also possible
65-
to leave the ``device`` keyword undefined and let the ``dpnp`` library select a
66-
default device based on environment flag settings. Refer the
67-
:doc:`user_guide/kernel_programming/index` for further details.
68-
69-
``dpjit`` decorator
70-
~~~~~~~~~~~~~~~~~~~
71-
72-
The ``numba-dpex`` package provides a new decorator ``dpjit`` that extends
73-
Numba's ``njit`` decorator. The new decorator is equivalent to
74-
``numba.njit(parallel=True)``, but additionally supports compiling ``dpnp``
75-
functions, ``prange`` loops, and array expressions that use ``dpnp.ndarray``
76-
objects.
77-
78-
Unlike Numba's NumPy parallelization that only supports CPUs, ``dpnp``
79-
expressions are first converted to data-parallel kernels and can then be
80-
`offloaded` to different types of devices. As ``dpnp`` implements the same API
81-
as NumPy*, an existing ``numba.njit`` decorated function that uses
82-
``numpy.ndarray`` may be refactored to use ``dpnp.ndarray`` and decorated with
83-
``dpjit``. Such a refactoring can allow the parallel regions to be offloaded
84-
to a supported GPU device, providing users an additional option to execute their
85-
code parallelly.
86-
87-
The vector addition example depicted using the kernel API can also be
88-
expressed in several different ways using ``dpjit``.
89-
90-
.. code-block:: python
91-
92-
import dpnp
93-
import numba_dpex as dpex
94-
95-
96-
@dpex.dpjit
97-
def vecadd_v1(a, b):
98-
return a + b
99-
34+
from numba_dpex import kernel_api as kapi
35+
import math
10036
101-
@dpex.dpjit
102-
def vecadd_v2(a, b):
103-
return dpnp.add(a, b)
10437
38+
def pairwise_distance_kernel(item: kapi.Item, data, distance):
39+
i = item.get_id(0)
40+
j = item.get_id(1)
10541
106-
@dpex.dpjit
107-
def vecadd_v3(a, b):
108-
c = dpnp.empty_like(a)
109-
for i in prange(a.shape[0]):
110-
c[i] = a[i] + b[i]
111-
return c
42+
data_dims = data.shape[1]
11243
113-
As with the kernel API example, a ``dpjit`` function if invoked with ``dpnp``
114-
input arguments follows the compute-follows-data programming model. Refer
115-
:doc:`user_manual/dpnp_offload/index` for further details.
44+
d = data.dtype.type(0.0)
45+
for k in range(data_dims):
46+
tmp = data[i, k] - data[j, k]
47+
d += tmp * tmp
11648
49+
distance[j, i] = math.sqrt(d)
11750
118-
.. Project Goal
119-
.. ------------
12051
121-
.. If C++ is not your language, you can skip writing data-parallel kernels in SYCL
122-
.. and directly write them in Python.
52+
Skipping over much of the language details, at a high-level the
53+
``pairwise_distance_kernel`` can be viewed as a data-parallel function that gets
54+
executed individually by a set of "work items". That is, each work item runs the
55+
same function for a subset of the elements of the input ``data`` and
56+
``distance`` arrays. For programmers familiar with the CUDA or OpenCL languages,
57+
it is the same programming model that is referred to as Single Program Multiple
58+
Data (SPMD). As Python has no concept of a work item the KAPI function itself is
59+
sequential and needs to be compiled to convert it into a parallel version. The
60+
next example shows the changes to the original script to compile and run the
61+
``pairwise_distance_kernel`` in parallel.
12362

124-
.. Our package ``numba-dpex`` extends the Numba compiler to allow kernel creation
125-
.. directly in Python via a custom compute API
126-
127-
128-
.. Contributing
129-
.. ------------
130-
131-
.. Refer the `contributing guide
132-
.. <https://github.com/IntelPython/numba-dpex/blob/main/CONTRIBUTING>`_ for
133-
.. information on coding style and standards used in ``numba-dpex``.
134-
135-
.. License
136-
.. -------
137-
138-
.. ``numba-dpex`` is Licensed under Apache License 2.0 that can be found in `LICENSE
139-
.. <https://github.com/IntelPython/numba-dpex/blob/main/LICENSE>`_. All usage and
140-
.. contributions to the project are subject to the terms and conditions of this
141-
.. license.
63+
.. code-block:: python
14264
65+
from numba_dpex import kernel, call_kernel
66+
import dpnp
14367
144-
.. Along with the kernel programming API an auto-offload feature is also provided.
145-
.. The feature enables automatic generation of kernels from data-parallel NumPy
146-
.. library calls and array expressions, Numba ``prange`` loops, and `other
147-
.. "data-parallel by construction" expressions
148-
.. <https://numba.pydata.org/numba-doc/latest/user/parallel.html>`_ that Numba is
149-
.. able to parallelize. Following two examples demonstrate the two ways in which
150-
.. kernels may be written using numba-dpex.
68+
data = dpnp.random.ranf((10000, 3), device="gpu")
69+
distance = dpnp.empty(shape=(data.shape[0], data.shape[0]), device="gpu")
70+
exec_range = kapi.Range(data.shape[0], data.shape[0])
71+
call_kernel(kernel(pairwise_distance_kernel), exec_range, data, distance)
72+
73+
To compile a KAPI function into a data-parallel kernel and run it on a device,
74+
three things need to be done: allocate the arguments to the function on the
75+
device where the function is to execute, compile the function by applying a
76+
numba-dpex decorator, and `launch` or execute the compiled kernel on the device.
77+
78+
Allocating arrays or scalars to be passed to a compiled KAPI function is not
79+
done directly in numba-dpex. Instead, numba-dpex supports passing in
80+
tensors/ndarrays created using either the `dpnp`_ NumPy drop-in replacement
81+
library or the `dpctl`_ SYCl-based Python Array API library. To trigger
82+
compilation, the ``numba_dpex.kernel`` decorator has to be used, and finally to
83+
launch a compiled kernel the ``numba_dpex.call_kernel`` function should be
84+
invoked.
85+
86+
For a more detailed description about programming with numba-dpex, refer
87+
the :doc:`programming_model`, :doc:`user_guide/index` and the
88+
:doc:`autoapi/index` sections of the documentation. To setup numba-dpex and try
89+
it out refer the :doc:`getting_started` section.

docs/source/programming_model.rst

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -227,10 +227,12 @@ three keyword arguments present in all `array creation functions
227227
For example, consider
228228

229229
.. code:: python
230-
# Use usm_type = 'device' to get USM-device allocation (default),
231-
# usm_type = 'shared' to get USM-shared allocation,
232-
# usm_type = 'host' to get USM-host allocation
233-
# def dpt.empty(..., device=None, usm_type=None, sycl_queue=None) -> dpctl.tensor.usm_ndarray: ...
230+
231+
# Use usm_type = 'device' to get USM-device allocation (default),
232+
# usm_type = 'shared' to get USM-shared allocation,
233+
# usm_type = 'host' to get USM-host allocation
234+
def dpt.empty(..., device=None, usm_type=None, sycl_queue=None) -> dpctl.tensor.usm_ndarray:
235+
...
234236
235237
The keyword ``device`` is `mandated by the array API
236238
<https://data-apis.org/array-api/latest/design_topics/device_support.html#syntax-for-device-assignment>`__.

0 commit comments

Comments
 (0)