Skip to content

Commit 216947c

Browse files
author
Diptorup Deb
committed
Improvements to overview
1 parent f28b064 commit 216947c

File tree

1 file changed

+84
-46
lines changed

1 file changed

+84
-46
lines changed

docs/source/overview.rst

Lines changed: 84 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -6,33 +6,38 @@ Overview
66

77
Data Parallel Extension for Numba* (`numba-dpex`_) is a free and open-source
88
LLVM-based code generator for portable accelerator programming in Python. The
9-
code generator implements a new pseudo-kernel programming domain-specific
10-
language (DSL) called `KAPI` that is modeled after the C++ DSL `SYCL*`_. The
11-
SYCL language is an open standard developed under the Unified Acceleration
12-
Foundation (`UXL`_) as a vendor-agnostic way of programming different types of
13-
data-parallel hardware such as multi-core CPUs, GPUs, and FPGAs. Numba-dpex and
14-
KAPI aim to bring the same vendor-agnostic and standard-compliant programming
15-
model to Python.
9+
code generator implements a new kernel programming API (kapi) in pure Python
10+
that is modeled after the API of the C++ embedded domain-specific language
11+
(eDSL) `SYCL*`_. The SYCL eDSL is an open standard developed under the Unified
12+
Acceleration Foundation (`UXL`_) as a vendor-agnostic way of programming
13+
different types of data-parallel hardware such as multi-core CPUs, GPUs, and
14+
FPGAs. Numba-dpex and kapi aim to bring the same vendor-agnostic and
15+
standard-compliant programming model to Python.
1616

1717
Numba-dpex is built on top of the open-source `Numba*`_ JIT compiler that
1818
implements a CPython bytecode parser and code generator to lower the bytecode to
19-
LLVM IR. The Numba* compiler is able to compile a large sub-set of Python and
20-
most of the NumPy library. Numba-dpex uses Numba*'s tooling to implement the
21-
parsing and typing support for the data types and functions defined in the KAPI
22-
DSL. A custom code generator is then used to lower KAPI to a form of LLVM IR
23-
that includes special LLVM instructions that define a low-level data-parallel
24-
kernel API. Thus, a function defined in KAPI is compiled to a data-parallel
25-
kernel that can run on different types of hardware. Currently, compilation of
26-
KAPI is possible for x86 CPU devices, Intel Gen9 integrated GPUs, Intel UHD
27-
integrated GPUs, and Intel discrete GPUs.
28-
29-
30-
The following example shows a pairwise distance matrix computation in KAPI.
19+
LLVM intermediate representation (IR). The Numba* compiler is able to compile a
20+
large sub-set of Python and most of the NumPy library. Numba-dpex uses Numba*'s
21+
tooling to implement the parsing and the typing support for the data types and
22+
functions defined in kapi. A custom code generator is also introduced to lower
23+
kapi functions to a form of LLVM IR that defined a low-level data-parallel
24+
kernel. Thus, a function written kapi although purely sequential when executed
25+
in Python can be compiled to an actual data-parallel kernel that can run on
26+
different types of hardware. Compilation of kapi is possible for x86
27+
CPU devices, Intel Gen9 integrated GPUs, Intel UHD integrated GPUs, and Intel
28+
discrete GPUs.
29+
30+
The following example presents a pairwise distance matrix computation as written
31+
in kapi. A detailed description of the API and all relevant concepts are dealt
32+
with elsewhere in the documentation, for now the example introduces the core
33+
tenet of the programming model.
3134

3235
.. code-block:: python
36+
:linenos:
3337
3438
from numba_dpex import kernel_api as kapi
3539
import math
40+
import dpnp
3641
3742
3843
def pairwise_distance_kernel(item: kapi.Item, data, distance):
@@ -49,41 +54,74 @@ The following example shows a pairwise distance matrix computation in KAPI.
4954
distance[j, i] = math.sqrt(d)
5055
5156
52-
Skipping over much of the language details, at a high-level the
53-
``pairwise_distance_kernel`` can be viewed as a data-parallel function that gets
54-
executed individually by a set of "work items". That is, each work item runs the
55-
same function for a subset of the elements of the input ``data`` and
56-
``distance`` arrays. For programmers familiar with the CUDA or OpenCL languages,
57-
it is the same programming model that is referred to as Single Program Multiple
58-
Data (SPMD). As Python has no concept of a work item the KAPI function itself is
59-
sequential and needs to be compiled to convert it into a parallel version. The
60-
next example shows the changes to the original script to compile and run the
57+
data = dpnp.random.ranf((10000, 3), device="gpu")
58+
dist = dpnp.empty(shape=(data.shape[0], data.shape[0]), device="gpu")
59+
exec_range = kapi.Range(data.shape[0], data.shape[0])
60+
kapi.call_kernel(kernel(pairwise_distance_kernel), exec_range, data, dist)
61+
62+
The ``pairwise_distance_kernel`` function conceptually defines a data-parallel
63+
function to be executed individually by a set of "work items". That is, each
64+
work item runs the function for a subset of the elements of the input ``data``
65+
and ``distance`` arrays. The ``item`` argument passed to the function identifies
66+
the work item that is executing a specific instance of the function. The set of
67+
work items is defined by the ``exec_range`` object and the ``call_kernel`` call
68+
instructs every work item in ``exec_range`` to execute
69+
``pairwise_distance_kernel`` for a specific subset of the data.
70+
71+
The logical abstraction exposed by kapi is referred to as Single Program
72+
Multiple Data (SPMD) programming model. CUDA or OpenCL programmers will
73+
recognize the programming model exposed by kapi as similar to the one in those
74+
languages. However, as Python has no concept of a work item a kapi function
75+
executes sequentially when invoked from Python. To convert it into a true
76+
data-parallel function, the function has to be first compiled using numba-dpex.
77+
The next example shows the changes to the original script to compile and run the
6178
``pairwise_distance_kernel`` in parallel.
6279

6380
.. code-block:: python
81+
:linenos:
82+
:emphasize-lines: 7, 25
83+
84+
import numba_dpex as dpex
6485
65-
from numba_dpex import kernel, call_kernel
86+
from numba_dpex import kernel_api as kapi
87+
import math
6688
import dpnp
6789
90+
91+
@dpex.kernel
92+
def pairwise_distance_kernel(item: kapi.Item, data, distance):
93+
i = item.get_id(0)
94+
j = item.get_id(1)
95+
96+
data_dims = data.shape[1]
97+
98+
d = data.dtype.type(0.0)
99+
for k in range(data_dims):
100+
tmp = data[i, k] - data[j, k]
101+
d += tmp * tmp
102+
103+
distance[j, i] = math.sqrt(d)
104+
105+
68106
data = dpnp.random.ranf((10000, 3), device="gpu")
69-
distance = dpnp.empty(shape=(data.shape[0], data.shape[0]), device="gpu")
107+
dist = dpnp.empty(shape=(data.shape[0], data.shape[0]), device="gpu")
70108
exec_range = kapi.Range(data.shape[0], data.shape[0])
71-
call_kernel(kernel(pairwise_distance_kernel), exec_range, data, distance)
72109
73-
To compile a KAPI function into a data-parallel kernel and run it on a device,
74-
three things need to be done: allocate the arguments to the function on the
75-
device where the function is to execute, compile the function by applying a
76-
numba-dpex decorator, and `launch` or execute the compiled kernel on the device.
110+
dpex.call_kernel(pairwise_distance_kernel, exec_range, data, dist)
77111
78-
Allocating arrays or scalars to be passed to a compiled KAPI function is not
79-
done directly in numba-dpex. Instead, numba-dpex supports passing in
112+
To compile a kapi function, the ``call_kernel`` function from kapi has to be
113+
substituted by the one provided in ``numba_dpex`` and the ``kernel`` decorator
114+
has to be added to the kapi function. The actual device for which the function
115+
is compiled and on which it executes is controlled by the input arguments to
116+
``call_kernel``. Allocating the input arguments to be passed to a compiled kapi
117+
function is not done by numba-dpex. Instead, numba-dpex supports passing in
80118
tensors/ndarrays created using either the `dpnp`_ NumPy drop-in replacement
81-
library or the `dpctl`_ SYCl-based Python Array API library. To trigger
82-
compilation, the ``numba_dpex.kernel`` decorator has to be used, and finally to
83-
launch a compiled kernel the ``numba_dpex.call_kernel`` function should be
84-
invoked.
85-
86-
For a more detailed description about programming with numba-dpex, refer
87-
the :doc:`programming_model`, :doc:`user_guide/index` and the
88-
:doc:`autoapi/index` sections of the documentation. To setup numba-dpex and try
89-
it out refer the :doc:`getting_started` section.
119+
library or the `dpctl`_ SYCl-based Python Array API library. The objects
120+
allocated by these libraries encode the device information for that allocation.
121+
Numba-dpex extracts the information and uses it to compile a kernel for that
122+
specific device and then executes the compiled kernel on it.
123+
124+
For a more detailed description about programming with numba-dpex, refer the
125+
:doc:`programming_model`, :doc:`user_guide/index` and the :doc:`autoapi/index`
126+
sections of the documentation. To setup numba-dpex and try it out refer the
127+
:doc:`getting_started` section.

0 commit comments

Comments
 (0)