@@ -6,33 +6,38 @@ Overview
6
6
7
7
Data Parallel Extension for Numba* (`numba-dpex `_) is a free and open-source
8
8
LLVM-based code generator for portable accelerator programming in Python. The
9
- code generator implements a new pseudo- kernel programming domain-specific
10
- language (DSL) called ` KAPI ` that is modeled after the C++ DSL ` SYCL* `_. The
11
- SYCL language is an open standard developed under the Unified Acceleration
12
- Foundation (`UXL `_) as a vendor-agnostic way of programming different types of
13
- data-parallel hardware such as multi-core CPUs, GPUs, and FPGAs. Numba-dpex and
14
- KAPI aim to bring the same vendor-agnostic and standard-compliant programming
15
- model to Python.
9
+ code generator implements a new kernel programming API (kapi) in pure Python
10
+ that is modeled after the API of the C++ embedded domain-specific language
11
+ (eDSL) ` SYCL* `_. The SYCL eDSL is an open standard developed under the Unified
12
+ Acceleration Foundation (`UXL `_) as a vendor-agnostic way of programming
13
+ different types of data-parallel hardware such as multi-core CPUs, GPUs, and
14
+ FPGAs. Numba-dpex and kapi aim to bring the same vendor-agnostic and
15
+ standard-compliant programming model to Python.
16
16
17
17
Numba-dpex is built on top of the open-source `Numba* `_ JIT compiler that
18
18
implements a CPython bytecode parser and code generator to lower the bytecode to
19
- LLVM IR. The Numba* compiler is able to compile a large sub-set of Python and
20
- most of the NumPy library. Numba-dpex uses Numba*'s tooling to implement the
21
- parsing and typing support for the data types and functions defined in the KAPI
22
- DSL. A custom code generator is then used to lower KAPI to a form of LLVM IR
23
- that includes special LLVM instructions that define a low-level data-parallel
24
- kernel API. Thus, a function defined in KAPI is compiled to a data-parallel
25
- kernel that can run on different types of hardware. Currently, compilation of
26
- KAPI is possible for x86 CPU devices, Intel Gen9 integrated GPUs, Intel UHD
27
- integrated GPUs, and Intel discrete GPUs.
28
-
29
-
30
- The following example shows a pairwise distance matrix computation in KAPI.
19
+ LLVM intermediate representation (IR). The Numba* compiler is able to compile a
20
+ large sub-set of Python and most of the NumPy library. Numba-dpex uses Numba*'s
21
+ tooling to implement the parsing and the typing support for the data types and
22
+ functions defined in kapi. A custom code generator is also introduced to lower
23
+ kapi functions to a form of LLVM IR that defined a low-level data-parallel
24
+ kernel. Thus, a function written kapi although purely sequential when executed
25
+ in Python can be compiled to an actual data-parallel kernel that can run on
26
+ different types of hardware. Compilation of kapi is possible for x86
27
+ CPU devices, Intel Gen9 integrated GPUs, Intel UHD integrated GPUs, and Intel
28
+ discrete GPUs.
29
+
30
+ The following example presents a pairwise distance matrix computation as written
31
+ in kapi. A detailed description of the API and all relevant concepts are dealt
32
+ with elsewhere in the documentation, for now the example introduces the core
33
+ tenet of the programming model.
31
34
32
35
.. code-block :: python
36
+ :linenos:
33
37
34
38
from numba_dpex import kernel_api as kapi
35
39
import math
40
+ import dpnp
36
41
37
42
38
43
def pairwise_distance_kernel (item : kapi.Item, data , distance ):
@@ -49,41 +54,74 @@ The following example shows a pairwise distance matrix computation in KAPI.
49
54
distance[j, i] = math.sqrt(d)
50
55
51
56
52
- Skipping over much of the language details, at a high-level the
53
- ``pairwise_distance_kernel `` can be viewed as a data-parallel function that gets
54
- executed individually by a set of "work items". That is, each work item runs the
55
- same function for a subset of the elements of the input ``data `` and
56
- ``distance `` arrays. For programmers familiar with the CUDA or OpenCL languages,
57
- it is the same programming model that is referred to as Single Program Multiple
58
- Data (SPMD). As Python has no concept of a work item the KAPI function itself is
59
- sequential and needs to be compiled to convert it into a parallel version. The
60
- next example shows the changes to the original script to compile and run the
57
+ data = dpnp.random.ranf((10000 , 3 ), device = " gpu" )
58
+ dist = dpnp.empty(shape = (data.shape[0 ], data.shape[0 ]), device = " gpu" )
59
+ exec_range = kapi.Range(data.shape[0 ], data.shape[0 ])
60
+ kapi.call_kernel(kernel(pairwise_distance_kernel), exec_range, data, dist)
61
+
62
+ The ``pairwise_distance_kernel `` function conceptually defines a data-parallel
63
+ function to be executed individually by a set of "work items". That is, each
64
+ work item runs the function for a subset of the elements of the input ``data ``
65
+ and ``distance `` arrays. The ``item `` argument passed to the function identifies
66
+ the work item that is executing a specific instance of the function. The set of
67
+ work items is defined by the ``exec_range `` object and the ``call_kernel `` call
68
+ instructs every work item in ``exec_range `` to execute
69
+ ``pairwise_distance_kernel `` for a specific subset of the data.
70
+
71
+ The logical abstraction exposed by kapi is referred to as Single Program
72
+ Multiple Data (SPMD) programming model. CUDA or OpenCL programmers will
73
+ recognize the programming model exposed by kapi as similar to the one in those
74
+ languages. However, as Python has no concept of a work item a kapi function
75
+ executes sequentially when invoked from Python. To convert it into a true
76
+ data-parallel function, the function has to be first compiled using numba-dpex.
77
+ The next example shows the changes to the original script to compile and run the
61
78
``pairwise_distance_kernel `` in parallel.
62
79
63
80
.. code-block :: python
81
+ :linenos:
82
+ :emphasize- lines: 7 , 25
83
+
84
+ import numba_dpex as dpex
64
85
65
- from numba_dpex import kernel, call_kernel
86
+ from numba_dpex import kernel_api as kapi
87
+ import math
66
88
import dpnp
67
89
90
+
91
+ @dpex.kernel
92
+ def pairwise_distance_kernel (item : kapi.Item, data , distance ):
93
+ i = item.get_id(0 )
94
+ j = item.get_id(1 )
95
+
96
+ data_dims = data.shape[1 ]
97
+
98
+ d = data.dtype.type(0.0 )
99
+ for k in range (data_dims):
100
+ tmp = data[i, k] - data[j, k]
101
+ d += tmp * tmp
102
+
103
+ distance[j, i] = math.sqrt(d)
104
+
105
+
68
106
data = dpnp.random.ranf((10000 , 3 ), device = " gpu" )
69
- distance = dpnp.empty(shape = (data.shape[0 ], data.shape[0 ]), device = " gpu" )
107
+ dist = dpnp.empty(shape = (data.shape[0 ], data.shape[0 ]), device = " gpu" )
70
108
exec_range = kapi.Range(data.shape[0 ], data.shape[0 ])
71
- call_kernel(kernel(pairwise_distance_kernel), exec_range, data, distance)
72
109
73
- To compile a KAPI function into a data-parallel kernel and run it on a device,
74
- three things need to be done: allocate the arguments to the function on the
75
- device where the function is to execute, compile the function by applying a
76
- numba-dpex decorator, and `launch ` or execute the compiled kernel on the device.
110
+ dpex.call_kernel(pairwise_distance_kernel, exec_range, data, dist)
77
111
78
- Allocating arrays or scalars to be passed to a compiled KAPI function is not
79
- done directly in numba-dpex. Instead, numba-dpex supports passing in
112
+ To compile a kapi function, the ``call_kernel `` function from kapi has to be
113
+ substituted by the one provided in ``numba_dpex `` and the ``kernel `` decorator
114
+ has to be added to the kapi function. The actual device for which the function
115
+ is compiled and on which it executes is controlled by the input arguments to
116
+ ``call_kernel ``. Allocating the input arguments to be passed to a compiled kapi
117
+ function is not done by numba-dpex. Instead, numba-dpex supports passing in
80
118
tensors/ndarrays created using either the `dpnp `_ NumPy drop-in replacement
81
- library or the `dpctl `_ SYCl-based Python Array API library. To trigger
82
- compilation, the `` numba_dpex.kernel `` decorator has to be used, and finally to
83
- launch a compiled kernel the `` numba_dpex.call_kernel `` function should be
84
- invoked .
85
-
86
- For a more detailed description about programming with numba-dpex, refer
87
- the :doc: `programming_model `, :doc: `user_guide/index ` and the
88
- :doc: ` autoapi/index ` sections of the documentation. To setup numba-dpex and try
89
- it out refer the :doc: `getting_started ` section.
119
+ library or the `dpctl `_ SYCl-based Python Array API library. The objects
120
+ allocated by these libraries encode the device information for that allocation.
121
+ Numba-dpex extracts the information and uses it to compile a kernel for that
122
+ specific device and then executes the compiled kernel on it .
123
+
124
+ For a more detailed description about programming with numba-dpex, refer the
125
+ :doc: `programming_model `, :doc: `user_guide/index ` and the :doc: ` autoapi/index `
126
+ sections of the documentation. To setup numba-dpex and try it out refer the
127
+ :doc: `getting_started ` section.
0 commit comments