Skip to content

Commit 9b356f9

Browse files
chudur-budurDiptorup Deb
authored andcommitted
Major edits from huddle
1 parent 2ad4ad7 commit 9b356f9

File tree

10 files changed

+316
-70
lines changed

10 files changed

+316
-70
lines changed
File renamed without changes.
File renamed without changes.

docs/source/overview.rst

Lines changed: 34 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,21 @@
1-
.. _overview
1+
.. _overview:
22
.. include:: ./ext_links.txt
33

44
Overview
55
========
66

7-
Data-Parallel Extensions for Numba* (`numba-dpex`_) is a standalone extension
8-
for the `Numba*`_ Python JIT compiler. Numba-dpex adds two new features to
9-
Numba: an architecture-agnostic kernel programming API, and a new compilation
10-
target that adds typing and compilation support for the `dpnp`_ library. Dpnp is
11-
a Python library for numerical computing that provides a data-parallel
12-
reimplementation of `NumPy*`_'s API. Numba-dpex's support for dpnp compilation
13-
is a new way for Numba users to write code in a NumPy-like API that is
14-
already supported by Numba, while at the same time automatically running such code
15-
parallelly on various types of architecture.
16-
17-
Numba-dpex is being developed as part of `Intel AI Analytics Toolkit`_ and is
7+
Data Parallel Extension for Numba* (`numba-dpex`_) is a standalone extension for
8+
the `Numba*`_ Python JIT compiler. ``numba-dpex`` adds two new features to
9+
Numba*: an architecture-agnostic kernel programming API, and a new compilation
10+
target that adds typing and compilation support for the Data Parallel Extension
11+
for Numpy* (`dpnp`_) library. ``dpnp`` is a Python package for numerical
12+
computing that provides a data-parallel reimplementation of `NumPy*`_'s API.
13+
``numba-dpex``'s support for ``dpnp`` compilation is a new way for Numba* users to write
14+
code in a NumPy-like API that is already supported by Numba*, while at the same
15+
time automatically running such code parallelly on various types of
16+
architecture.
17+
18+
``numba-dpex`` is being developed as part of `Intel AI Analytics Toolkit`_ and is
1819
distributed with the `Intel Distribution for Python*`_. The extension is also
1920
available on Anaconda cloud and as a Docker image on GitHub. Please refer the
2021
:doc:`getting_started` page to learn more.
@@ -27,7 +28,7 @@ Portable Kernel Programming
2728

2829
The kernel API has a design and API similar to Numba's ``cuda.jit`` module.
2930
However, the API uses the `SYCL*`_ language runtime and as such is extensible to
30-
various hardware types supported by a SYCL runtime. Presently, numba-dpex uses
31+
various hardware types supported by a SYCL runtime. Presently, ``numba-dpex`` uses
3132
the `DPC++`_ SYCL runtime and only supports SPIR-V-based OpenCL and `oneAPI
3233
Level Zero`_ devices CPU and GPU devices.
3334

@@ -54,30 +55,30 @@ interface.
5455
print(c)
5556
5657
In the above example, we allocated three arrays on a default ``gpu`` device
57-
using the dpnp library. These arrays are then passed as input arguments to the
58+
using the ``dpnp`` library. These arrays are then passed as input arguments to the
5859
kernel function. The compilation target and the subsequent execution of the
5960
kernel is determined completely by the input arguments and follow the
6061
"compute-follows-data" programming model as specified in the `Python* Array API
6162
Standard`_. To change the execution target to a CPU, the device keyword needs to
62-
be changed to ``cpu`` when allocating the dpnp arrays. It is also possible to
63-
leave the ``device`` keyword undefined and let the dpnp library select a default
63+
be changed to ``cpu`` when allocating the ``dpnp`` arrays. It is also possible to
64+
leave the ``device`` keyword undefined and let the ``dpnp`` library select a default
6465
device based on environment flag settings. Refer the
6566
:doc:`user_manual/kernel_programming/index` for further details.
6667

67-
dpnp compilation and offload
68+
``dpnp`` compilation and offload
6869
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6970

70-
Numba-dpex extends Numba's type system and compilation pipeline to compile dpnp
71+
``numba-dpex`` extends Numba's type system and compilation pipeline to compile ``dpnp``
7172
functions and expressions in the same way as NumPy. Unlike Numba's NumPy
72-
compilation that is serial by default, numba-dpex always compiles dpnp
73+
compilation that is serial by default, ``numba-dpex`` always compiles ``dpnp``
7374
expressions into offloadable kernels and executes them in parallel. The feature
7475
is provided using a decorator ``dpjit`` that behaves identically to
75-
``numba.njit(parallel=True)`` with the addition of dpnp compilation and offload.
76-
Offloading by numba-dpex is not just restricted to CPUs and supports all devices
76+
``numba.njit(parallel=True)`` with the addition of ``dpnp`` compilation and offload.
77+
Offloading by ``numba-dpex`` is not just restricted to CPUs and supports all devices
7778
that are presently supported by the kernel API. ``dpjit`` allows using NumPy and
78-
dpnp expressions in the same function. All NumPy compilation and parallelization
79-
is done via the default Numba code-generation pipeline, whereas dpnp expressions
80-
are compiled using the numba-dpex pipeline.
79+
``dpnp`` expressions in the same function. All NumPy compilation and parallelization
80+
is done via the default Numba code-generation pipeline, whereas ``dpnp`` expressions
81+
are compiled using the ``numba-dpex`` pipeline.
8182

8283
The vector addition example depicted using the kernel API can be easily
8384
expressed in several different ways using ``dpjit``.
@@ -105,32 +106,32 @@ expressed in several different ways using ``dpjit``.
105106
c[i] = a[i] + b[i]
106107
return c
107108
108-
As with the kernel API example, a ``dpjit`` function if invoked with dpnp
109+
As with the kernel API example, a ``dpjit`` function if invoked with ``dpnp``
109110
input arguments follows the compute-follows-data programming model. Refer
110111
:doc:`user_manual/dpnp_offload/index` for further details.
111112

112113

113-
Project Goal
114-
------------
114+
.. Project Goal
115+
.. ------------
115116
116-
If C++ is not your language, you can skip writing data-parallel kernels in SYCL
117-
and directly write them in Python.
117+
.. If C++ is not your language, you can skip writing data-parallel kernels in SYCL
118+
.. and directly write them in Python.
118119
119-
Our package numba-dpex extends the Numba compiler to allow kernel creation
120-
directly in Python via a custom compute API
120+
.. Our package ``numba-dpex`` extends the Numba compiler to allow kernel creation
121+
.. directly in Python via a custom compute API
121122
122123
123124
.. Contributing
124125
.. ------------
125126
126127
.. Refer the `contributing guide
127128
.. <https://github.com/IntelPython/numba-dpex/blob/main/CONTRIBUTING>`_ for
128-
.. information on coding style and standards used in numba-dpex.
129+
.. information on coding style and standards used in ``numba-dpex``.
129130
130131
.. License
131132
.. -------
132133
133-
.. Numba-dpex is Licensed under Apache License 2.0 that can be found in `LICENSE
134+
.. ``numba-dpex`` is Licensed under Apache License 2.0 that can be found in `LICENSE
134135
.. <https://github.com/IntelPython/numba-dpex/blob/main/LICENSE>`_. All usage and
135136
.. contributions to the project are subject to the terms and conditions of this
136137
.. license.

docs/source/user_guide/dpnp_offload/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,5 @@
44
Compiling and Offloading DPNP
55
==============================
66

7-
TODO
7+
- prange, reduction prange
8+
- blackscholes, math example

docs/source/user_guide/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ User Guide
1212
.. toctree::
1313
:maxdepth: 2
1414

15+
programming_model.rst
1516
kernel_programming/index
1617
dpnp_offload/index
1718
debugging/index

docs/source/user_guide/kernel_programming/index.rst

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,11 +44,9 @@ hardware vendors.
4444
:maxdepth: 2
4545

4646
writing_kernels
47-
memory-management
4847
synchronization
4948
device-functions
5049
atomic-operations
51-
selecting_device
5250
memory_allocation_address_space
5351
reduction
5452
ufunc

docs/source/user_guide/kernel_programming/synchronization.rst

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Synchronization Functions
22
=========================
33

4-
Numba-dpex only supports some of the SYCL synchronization operations. For
4+
``numba-dpex`` only supports some of the SYCL synchronization operations. For
55
synchronization of all threads in the same thread block, numba-dpex provides
66
a helper function called ``numba_dpex.barrier()``. This function implements the
77
same pattern as barriers in traditional multi-threaded programming: invoking the
@@ -10,23 +10,22 @@ barrier, at which point it returns control to all its callers.
1010

1111
``numba_dpex.barrier()`` supports two memory fence options:
1212

13-
- ``numba_dpex.CLK_GLOBAL_MEM_FENCE``: The barrier function will queue a memory
13+
- ``numba_dpex.GLOBAL_MEM_FENCE``: The barrier function will queue a memory
1414
fence to ensure correct ordering of memory operations to global memory. Using
1515
the option can be useful when work-items, for example, write to buffer or
1616
image objects and then want to read the updated data. Passing no arguments to
1717
``numba_dpex.barrier()`` is equivalent to setting the global memory fence
18-
option. For example,
18+
option.
1919

20-
.. literalinclude:: ./../../../../numba_dpex/examples/barrier.py
21-
:pyobject: no_arg_barrier_support
20+
.. .. literalinclude:: ./../../../../numba_dpex/examples/barrier.py
21+
.. :pyobject: no_arg_barrier_support
2222
23-
- ``numba_dpex.CLK_LOCAL_MEM_FENCE``: The barrier function will either flush
23+
- ``numba_dpex.LOCAL_MEM_FENCE``: The barrier function will either flush
2424
any variables stored in local memory or queue a memory fence to ensure
25-
correct ordering of memory operations to local memory. For example,
26-
27-
.. literalinclude:: ./../../../../numba_dpex/examples/barrier.py
28-
:pyobject: local_memory
25+
correct ordering of memory operations to local memory.
2926

27+
.. .. literalinclude:: ./../../../../numba_dpex/examples/barrier.py
28+
.. :pyobject: local_memory
3029
3130
.. note::
3231

docs/source/user_guide/kernel_programming/writing_kernels.rst

Lines changed: 25 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -30,40 +30,39 @@ storing the result of vector summation:
3030
:name: ex_kernel_declaration_vector_sum
3131

3232

33-
Kernel Invocation
34-
------------------
33+
.. Kernel Invocation
34+
.. ------------------
3535
36-
When a kernel is launched you must specify the *global size* and the *local size*,
37-
which determine the hierarchy of threads, that is the order in which kernels
38-
will be invoked.
36+
.. When a kernel is launched you must specify the *global size* and the *local size*,
37+
.. which determine the hierarchy of threads, that is the order in which kernels
38+
.. will be invoked.
3939
40-
The following syntax is used in ``numba-dpex`` for kernel invocation with
41-
specified global and local sizes:
40+
.. The following syntax is used in ``numba-dpex`` for kernel invocation with
41+
.. specified global and local sizes:
4242
43-
``kernel_function_name[global_size, local_size](kernel arguments)``
43+
.. ``kernel_function_name[global_size, local_size](kernel arguments)``
4444
45-
In the following example we invoke kernel ``kernel_vector_sum`` with global size
46-
specified via variable ``global_size``, and use ``numba_dpex.DEFAULT_LOCAL_SIZE``
47-
constant for setting local size to some default value:
45+
.. In the following example we invoke kernel ``kernel_vector_sum`` with global size
46+
.. specified via variable ``global_size``, and use ``numba_dpex.DEFAULT_LOCAL_SIZE``
47+
.. constant for setting local size to some default value:
4848
49-
.. code-block:: python
49+
.. .. code-block:: python
5050
51-
import numba_dpex as ndpx
51+
.. import numba_dpex as ndpx
5252
53-
global_size = 10
54-
kernel_vector_sum[global_size, ndpx.DEFAULT_LOCAL_SIZE](a, b, c)
53+
.. global_size = 10
54+
.. kernel_vector_sum[global_size, ndpx.DEFAULT_LOCAL_SIZE](a, b, c)
5555
56-
.. note::
57-
Each kernel is compiled once, but it can be called multiple times with different global and local sizes settings.
56+
.. .. note::
57+
.. Each kernel is compiled once, but it can be called multiple times with different global and local sizes settings.
5858
5959
60-
Kernel Invocation (New Syntax)
61-
------------------------------
60+
Kernel Invocation
61+
------------------
6262

63-
Since the release 0.20.0 (Phoenix), we have introduced new kernel launch
64-
parameter syntax for specifying global and local sizes that are similar to
65-
``SYCL``'s ``range`` and ``ndrange`` classes. The global and local sizes can
66-
now be specified with ``numba_dpex``'s ``Range`` and ``NdRange`` classes.
63+
The kernel launch parameter syntax for specifying global and local sizes are
64+
similar to ``SYCL``'s ``range`` and ``ndrange`` classes. The global and local
65+
sizes need to be specified with ``numba_dpex``'s ``Range`` and ``NdRange`` classes.
6766

6867
For example, we have a following kernel that computes a sum of two vectors:
6968

@@ -79,7 +78,7 @@ it like this (where ``global_size`` is an ``int``):
7978
.. literalinclude:: ./../../../../numba_dpex/examples/kernel/vector_sum.py
8079
:language: python
8180
:lines: 8-9, 18-24
82-
:emphasize-lines: 3
81+
:emphasize-lines: 5
8382
:caption: **EXAMPLE:** A vector sum kernel with a global size/range
8483
:name: vector_sum_kernel_with_launch_param
8584

@@ -105,6 +104,8 @@ and ``args.l`` are ``int``):
105104
:name: pairwise_distance_kernel_with_launch_param
106105

107106

107+
108+
108109
Kernel Indexing Functions
109110
-------------------------
110111

0 commit comments

Comments
 (0)