1
1
.. include :: ./../ext_links.txt
2
2
3
- Compiling and Offloading ``dpnp `` Functions
4
- ===========================================
3
+ Compiling and Offloading Mechanisms
4
+ ====================================
5
+
6
+ ``numba-dpex `` can directly compile and offload different data parallel
7
+ programming constructs and function libraries onto SYCL based devices.
8
+
9
+ ``dpnp `` Functions
10
+ -------------------
5
11
6
12
Data Parallel Extension for NumPy* (``dpnp ``) is a drop-in ``NumPy* ``
7
13
replacement library built on top of oneMKL. ``numba-dpex `` allows various
@@ -35,8 +41,8 @@ in the runtime and the function call is inlined in the generated LLVM IR.
35
41
The following sections go over as aspects of the dpnp integration inside
36
42
numba-dpex.
37
43
38
- Repository map
39
- --------------
44
+ Repository Map
45
+ ---------------
40
46
41
47
- The code for numba-dpex's ``dpnp `` integration runtime resides in the
42
48
:file: `numba_dpex/core/runtime ` sub-module.
@@ -48,7 +54,7 @@ Repository map
48
54
- Tests resides in :file: `numba_dpex/tests/dpjit_tests/dpnp `.
49
55
50
56
Design
51
- ------
57
+ -------
52
58
53
59
``numba_dpex `` uses the |numba.extending.overload | decorator to create a Numba*
54
60
implementation of a function that can be used in `nopython mode `_ functions.
@@ -96,17 +102,72 @@ The corresponding intrinsic implementation is in :file:`numba_dpex/dpnp_iface/_i
96
102
...
97
103
98
104
Parallel Range
99
- --------------
105
+ ---------------
106
+
107
+ ``numba-dpex `` implements the ability to run loops in parallel, the language
108
+ construct is adapted from Numba*'s ``prange `` concept that was initially
109
+ designed to run OpenMP parallel for loops. In Numba*, the loop-body is scheduled
110
+ in seperate threads, and they execute in a ``nopython `` Numba* context.
111
+ ``prange `` automatically takes care of data privatization. ``numba-dpex ``
112
+ employs the ``prange `` compilation mechanism to offload parallel loop like
113
+ programming constructs onto SYCL enabled devices.
114
+
115
+ The ``prange `` compilation pass is delegated through Numba's
116
+ :file: `numba/parfor/parfor_lowering.py ` module where ``numba-dpex `` provides
117
+ :file: `numba_dpex/core/parfors/parfor_lowerer.py ` module to be used as the
118
+ *lowering * mechanism through
119
+ :py:class: `numba_dpex.core.parfors.parfor_lowerer.ParforLowerImpl ` class. This
120
+ provides a custom lowerer for ``prange `` nodes that generates a SYCL kernel for
121
+ a ``prange `` node and submits it to a queue. Here is an example of a ``prange ``
122
+ use case in ``@dpjit `` context:
123
+
124
+ .. code-block :: python
125
+
126
+ from numba import prange
127
+ import dpnp
128
+ from numba_dpex import dpjit
129
+
130
+
131
+ @dpjit
132
+ def foo (a , b ):
133
+ x = dpnp.ones(10 )
134
+ for i in prange(10 ):
135
+ x[i] = a[i] + b[i]
136
+ return x
137
+
138
+
139
+ a = dpnp.ones(10 )
140
+ b = dpnp.ones(10 )
141
+
142
+ c = foo(a, b)
143
+ print (c)
144
+ print (type (c))
145
+
146
+ Each ``prange `` instruction in Numba* has an optional *lowerer * attribute. The
147
+ lowerer attribute determines how the parfor instruction should be lowered to
148
+ LLVM IR. In addition, the lower attribute decides which ``prange `` instructions
149
+ can be fused together. At this point ``numba-dpex `` does not generate
150
+ device-specific code and the lowerer used is same for all device types. However,
151
+ a different :py:class: `numba_dpex.core.parfors.parfor_lowerer.ParforLowerImpl `
152
+ instance is returned for every ``prange `` instruction for each corresponding CFD
153
+ (Compute Follows Data) inferred device to prevent illegal ``prange `` fusion.
154
+
100
155
101
- ``numba-dpex `` implements the ability to run loops in parallel,
102
- similar to OpenMP parallel for loops and Numba*’s ``prange ``. The loop-
103
- body is scheduled in seperate threads, and they execute in a ``nopython `` numba
104
- context. ``prange `` automatically takes care of data privatization:
156
+ Fusion of Kernels
157
+ ------------------
105
158
159
+ ``numba-dpex `` can identify each NumPy* (or ``dpnp ``) array expression as a
160
+ data-parallel kernel and fuse them together to generate a single SYCL kernel.
161
+ The kernel is automatically offloaded to the specified device where the fusion
162
+ operation is invoked. Here is a simple example of a Black-Scholes formula
163
+ computation where kernel fusion occurs at different ``dpnp `` math functions:
106
164
165
+ .. literalinclude :: ./../../../numba_dpex/examples/blacksholes_njit.py
166
+ :language: python
167
+ :pyobject: blackscholes
168
+ :caption: **EXAMPLE: ** Data parallel kernel implementing the vector sum a+b
169
+ :name: blackscholes_dpjit
107
170
108
- - prange, reduction prange
109
- - blackscholes, math example
110
171
111
172
.. |numba.extending.overload | replace :: ``numba.extending.overload ``
112
173
.. |numba.extending.intrinsic | replace :: ``numba.extending.intrinsic ``
0 commit comments