Skip to content

Commit 8e3b63d

Browse files
author
Diptorup Deb
authored
Merge pull request #1388 from IntelPython/docs/programming_model
[Documentation] Programming Model, Kernel Programming guide
2 parents f28b064 + f82360b commit 8e3b63d

19 files changed

+1180
-464
lines changed

docs/source/bibliography.bib

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
2+
3+
@techreport{scott70,
4+
author = {Dana Scott},
5+
institution = {OUCL},
6+
month = {November},
7+
number = {PRG02},
8+
pages = {30},
9+
title = {OUTLINE OF A MATHEMATICAL THEORY OF COMPUTATION},
10+
year = {1970}
11+
}
12+
13+
@article{PLOTKIN20043,
14+
abstract = {We review the origins of structural operational semantics. The main publication `A Structural Approach to Operational Semantics,' also known as the `Aarhus Notes,' appeared in 1981 [G.D. Plotkin, A structural approach to operational semantics, DAIMI FN-19, Computer Science Department, Aarhus University, 1981]. The development of the ideas dates back to the early 1970s, involving many people and building on previous work on programming languages and logic. The former included abstract syntax, the SECD machine, and the abstract interpreting machines of the Vienna school; the latter included the λ-calculus and formal systems. The initial development of structural operational semantics was for simple functional languages, more or less variations of the λ-calculus; after that the ideas were gradually extended to include languages with parallel features, such as Milner's CCS. This experience set the ground for a more systematic exposition, the subject of an invited course of lectures at Aarhus University; some of these appeared in print as the 1981 Notes. We discuss the content of these lectures and some related considerations such as `small state' versus `grand state,' structural versus compositional semantics, the influence of the Scott–Strachey approach to denotational semantics, the treatment of recursion and jumps, and static semantics. We next discuss relations with other work and some immediate further development. We conclude with an account of an old, previously unpublished, idea: an alternative, perhaps more readable, graphical presentation of systems of rules for operational semantics.},
15+
author = {Gordon D Plotkin},
16+
doi = {https://doi.org/10.1016/j.jlap.2004.03.009},
17+
issn = {1567-8326},
18+
journal = {The Journal of Logic and Algebraic Programming},
19+
keywords = {Semantics of programming languages, (Structural) operational semantics, Structural induction, (Labelled) transition systems, -calculus, Concurrency, Big step semantics, Small-step semantics, Abstract machines, Static semantics},
20+
note = {Structural Operational Semantics},
21+
pages = {3-15},
22+
title = {The origins of structural operational semantics},
23+
url = {https://www.sciencedirect.com/science/article/pii/S1567832604000268},
24+
volume = {60-61},
25+
year = {2004}
26+
}

docs/source/conf.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,11 @@
3131
"sphinxcontrib.googleanalytics",
3232
"myst_parser",
3333
"autoapi.extension",
34+
"sphinxcontrib.bibtex",
3435
]
3536

37+
bibtex_bibfiles = ["bibliography.bib"]
38+
3639
# Add any paths that contain templates here, relative to this directory.
3740
# templates_path = ['_templates']
3841
templates_path = []

docs/source/ext_links.txt

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
**********************************************************
33
THESE ARE EXTERNAL PROJECT LINKS USED IN THE DOCUMENTATION
44
**********************************************************
5-
5+
.. _math: https://docs.python.org/3/library/math.html
66
.. _NumPy*: https://numpy.org/
77
.. _Numba*: https://numba.pydata.org/
88
.. _numba-dpex: https://github.com/IntelPython/numba-dpex
@@ -14,6 +14,7 @@
1414
.. _SYCL*: https://www.khronos.org/sycl/
1515
.. _dpctl: https://intelpython.github.io/dpctl/latest/index.html
1616
.. _Data Parallel Control: https://intelpython.github.io/dpctl/latest/index.html
17+
.. _DLPack: https://dmlc.github.io/dlpack/latest/
1718
.. _Dpnp: https://intelpython.github.io/dpnp/
1819
.. _dpnp: https://intelpython.github.io/dpnp/
1920
.. _Data Parallel Extension for Numpy*: https://intelpython.github.io/dpnp/
@@ -28,3 +29,8 @@
2829
.. _oneDPL: https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-library.html#gs.5izf63
2930
.. _UXL: https://uxlfoundation.org/
3031
.. _oneAPI GPU optimization guide: https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2024-0/general-purpose-computing-on-gpu.html
32+
.. _dpctl.tensor.usm_ndarray: https://intelpython.github.io/dpctl/latest/docfiles/dpctl/usm_ndarray.html#dpctl.tensor.usm_ndarray
33+
.. _dpnp.ndarray: https://intelpython.github.io/dpnp/reference/ndarray.html
34+
35+
.. _Dispatcher: https://numba.readthedocs.io/en/stable/reference/jit-compilation.html#dispatcher-objects
36+
.. _Unboxes: https://numba.readthedocs.io/en/stable/extending/interval-example.html#boxing-and-unboxing

docs/source/overview.rst

Lines changed: 84 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -6,33 +6,38 @@ Overview
66

77
Data Parallel Extension for Numba* (`numba-dpex`_) is a free and open-source
88
LLVM-based code generator for portable accelerator programming in Python. The
9-
code generator implements a new pseudo-kernel programming domain-specific
10-
language (DSL) called `KAPI` that is modeled after the C++ DSL `SYCL*`_. The
11-
SYCL language is an open standard developed under the Unified Acceleration
12-
Foundation (`UXL`_) as a vendor-agnostic way of programming different types of
13-
data-parallel hardware such as multi-core CPUs, GPUs, and FPGAs. Numba-dpex and
14-
KAPI aim to bring the same vendor-agnostic and standard-compliant programming
15-
model to Python.
9+
code generator implements a new kernel programming API (kapi) in pure Python
10+
that is modeled after the API of the C++ embedded domain-specific language
11+
(eDSL) `SYCL*`_. The SYCL eDSL is an open standard developed under the Unified
12+
Acceleration Foundation (`UXL`_) as a vendor-agnostic way of programming
13+
different types of data-parallel hardware such as multi-core CPUs, GPUs, and
14+
FPGAs. Numba-dpex and kapi aim to bring the same vendor-agnostic and
15+
standard-compliant programming model to Python.
1616

1717
Numba-dpex is built on top of the open-source `Numba*`_ JIT compiler that
1818
implements a CPython bytecode parser and code generator to lower the bytecode to
19-
LLVM IR. The Numba* compiler is able to compile a large sub-set of Python and
20-
most of the NumPy library. Numba-dpex uses Numba*'s tooling to implement the
21-
parsing and typing support for the data types and functions defined in the KAPI
22-
DSL. A custom code generator is then used to lower KAPI to a form of LLVM IR
23-
that includes special LLVM instructions that define a low-level data-parallel
24-
kernel API. Thus, a function defined in KAPI is compiled to a data-parallel
25-
kernel that can run on different types of hardware. Currently, compilation of
26-
KAPI is possible for x86 CPU devices, Intel Gen9 integrated GPUs, Intel UHD
27-
integrated GPUs, and Intel discrete GPUs.
28-
29-
30-
The following example shows a pairwise distance matrix computation in KAPI.
19+
LLVM intermediate representation (IR). The Numba* compiler is able to compile a
20+
large sub-set of Python and most of the NumPy library. Numba-dpex uses Numba*'s
21+
tooling to implement the parsing and the typing support for the data types and
22+
functions defined in kapi. A custom code generator is also introduced to lower
23+
kapi functions to a form of LLVM IR that defined a low-level data-parallel
24+
kernel. Thus, a function written kapi although purely sequential when executed
25+
in Python can be compiled to an actual data-parallel kernel that can run on
26+
different types of hardware. Compilation of kapi is possible for x86
27+
CPU devices, Intel Gen9 integrated GPUs, Intel UHD integrated GPUs, and Intel
28+
discrete GPUs.
29+
30+
The following example presents a pairwise distance matrix computation as written
31+
in kapi. A detailed description of the API and all relevant concepts are dealt
32+
with elsewhere in the documentation, for now the example introduces the core
33+
tenet of the programming model.
3134

3235
.. code-block:: python
36+
:linenos:
3337
3438
from numba_dpex import kernel_api as kapi
3539
import math
40+
import dpnp
3641
3742
3843
def pairwise_distance_kernel(item: kapi.Item, data, distance):
@@ -49,41 +54,74 @@ The following example shows a pairwise distance matrix computation in KAPI.
4954
distance[j, i] = math.sqrt(d)
5055
5156
52-
Skipping over much of the language details, at a high-level the
53-
``pairwise_distance_kernel`` can be viewed as a data-parallel function that gets
54-
executed individually by a set of "work items". That is, each work item runs the
55-
same function for a subset of the elements of the input ``data`` and
56-
``distance`` arrays. For programmers familiar with the CUDA or OpenCL languages,
57-
it is the same programming model that is referred to as Single Program Multiple
58-
Data (SPMD). As Python has no concept of a work item the KAPI function itself is
59-
sequential and needs to be compiled to convert it into a parallel version. The
60-
next example shows the changes to the original script to compile and run the
57+
data = dpnp.random.ranf((10000, 3), device="gpu")
58+
dist = dpnp.empty(shape=(data.shape[0], data.shape[0]), device="gpu")
59+
exec_range = kapi.Range(data.shape[0], data.shape[0])
60+
kapi.call_kernel(kernel(pairwise_distance_kernel), exec_range, data, dist)
61+
62+
The ``pairwise_distance_kernel`` function conceptually defines a data-parallel
63+
function to be executed individually by a set of "work items". That is, each
64+
work item runs the function for a subset of the elements of the input ``data``
65+
and ``distance`` arrays. The ``item`` argument passed to the function identifies
66+
the work item that is executing a specific instance of the function. The set of
67+
work items is defined by the ``exec_range`` object and the ``call_kernel`` call
68+
instructs every work item in ``exec_range`` to execute
69+
``pairwise_distance_kernel`` for a specific subset of the data.
70+
71+
The logical abstraction exposed by kapi is referred to as Single Program
72+
Multiple Data (SPMD) programming model. CUDA or OpenCL programmers will
73+
recognize the programming model exposed by kapi as similar to the one in those
74+
languages. However, as Python has no concept of a work item a kapi function
75+
executes sequentially when invoked from Python. To convert it into a true
76+
data-parallel function, the function has to be first compiled using numba-dpex.
77+
The next example shows the changes to the original script to compile and run the
6178
``pairwise_distance_kernel`` in parallel.
6279

6380
.. code-block:: python
81+
:linenos:
82+
:emphasize-lines: 7, 25
83+
84+
import numba_dpex as dpex
6485
65-
from numba_dpex import kernel, call_kernel
86+
from numba_dpex import kernel_api as kapi
87+
import math
6688
import dpnp
6789
90+
91+
@dpex.kernel
92+
def pairwise_distance_kernel(item: kapi.Item, data, distance):
93+
i = item.get_id(0)
94+
j = item.get_id(1)
95+
96+
data_dims = data.shape[1]
97+
98+
d = data.dtype.type(0.0)
99+
for k in range(data_dims):
100+
tmp = data[i, k] - data[j, k]
101+
d += tmp * tmp
102+
103+
distance[j, i] = math.sqrt(d)
104+
105+
68106
data = dpnp.random.ranf((10000, 3), device="gpu")
69-
distance = dpnp.empty(shape=(data.shape[0], data.shape[0]), device="gpu")
107+
dist = dpnp.empty(shape=(data.shape[0], data.shape[0]), device="gpu")
70108
exec_range = kapi.Range(data.shape[0], data.shape[0])
71-
call_kernel(kernel(pairwise_distance_kernel), exec_range, data, distance)
72109
73-
To compile a KAPI function into a data-parallel kernel and run it on a device,
74-
three things need to be done: allocate the arguments to the function on the
75-
device where the function is to execute, compile the function by applying a
76-
numba-dpex decorator, and `launch` or execute the compiled kernel on the device.
110+
dpex.call_kernel(pairwise_distance_kernel, exec_range, data, dist)
77111
78-
Allocating arrays or scalars to be passed to a compiled KAPI function is not
79-
done directly in numba-dpex. Instead, numba-dpex supports passing in
112+
To compile a kapi function, the ``call_kernel`` function from kapi has to be
113+
substituted by the one provided in ``numba_dpex`` and the ``kernel`` decorator
114+
has to be added to the kapi function. The actual device for which the function
115+
is compiled and on which it executes is controlled by the input arguments to
116+
``call_kernel``. Allocating the input arguments to be passed to a compiled kapi
117+
function is not done by numba-dpex. Instead, numba-dpex supports passing in
80118
tensors/ndarrays created using either the `dpnp`_ NumPy drop-in replacement
81-
library or the `dpctl`_ SYCl-based Python Array API library. To trigger
82-
compilation, the ``numba_dpex.kernel`` decorator has to be used, and finally to
83-
launch a compiled kernel the ``numba_dpex.call_kernel`` function should be
84-
invoked.
85-
86-
For a more detailed description about programming with numba-dpex, refer
87-
the :doc:`programming_model`, :doc:`user_guide/index` and the
88-
:doc:`autoapi/index` sections of the documentation. To setup numba-dpex and try
89-
it out refer the :doc:`getting_started` section.
119+
library or the `dpctl`_ SYCl-based Python Array API library. The objects
120+
allocated by these libraries encode the device information for that allocation.
121+
Numba-dpex extracts the information and uses it to compile a kernel for that
122+
specific device and then executes the compiled kernel on it.
123+
124+
For a more detailed description about programming with numba-dpex, refer the
125+
:doc:`programming_model`, :doc:`user_guide/index` and the :doc:`autoapi/index`
126+
sections of the documentation. To setup numba-dpex and try it out refer the
127+
:doc:`getting_started` section.

0 commit comments

Comments
 (0)