IntelPython
diff --git a/‎docs/source/user_guide/kernel_programming/memory-management.rst renamed to ‎docs/backups/user_guides/memory-management.rst b/‎docs/source/user_guide/kernel_programming/memory-management.rst renamed to ‎docs/backups/user_guides/memory-management.rst
diff --git a/‎docs/source/user_guide/kernel_programming/selecting_device.rst renamed to ‎docs/backups/user_guides/selecting_device.rst b/‎docs/source/user_guide/kernel_programming/selecting_device.rst renamed to ‎docs/backups/user_guides/selecting_device.rst
diff --git a/‎docs/source/overview.rst
Lines changed: 34 additions & 33 deletions b/‎docs/source/overview.rst
Lines changed: 34 additions & 33 deletions
diff --git a/‎docs/source/user_guide/dpnp_offload/index.rst
Lines changed: 2 additions & 1 deletion b/‎docs/source/user_guide/dpnp_offload/index.rst
Lines changed: 2 additions & 1 deletion
diff --git a/‎docs/source/user_guide/index.rst
Lines changed: 1 addition & 0 deletions b/‎docs/source/user_guide/index.rst
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/source/user_guide/kernel_programming/index.rst
Lines changed: 0 additions & 2 deletions b/‎docs/source/user_guide/kernel_programming/index.rst
Lines changed: 0 additions & 2 deletions
diff --git a/‎docs/source/user_guide/kernel_programming/synchronization.rst
Lines changed: 9 additions & 10 deletions b/‎docs/source/user_guide/kernel_programming/synchronization.rst
Lines changed: 9 additions & 10 deletions
diff --git a/‎docs/source/user_guide/kernel_programming/writing_kernels.rst
Lines changed: 25 additions & 24 deletions b/‎docs/source/user_guide/kernel_programming/writing_kernels.rst
Lines changed: 25 additions & 24 deletions
@@ -1,20 +1,21 @@
-.. _overview
+.. _overview:
 .. include:: ./ext_links.txt
 
 Overview
 ========
 
-Data-Parallel Extensions for Numba* (`numba-dpex`_) is a standalone extension
-for the `Numba*`_ Python JIT compiler. Numba-dpex adds two new features to
-Numba: an architecture-agnostic kernel programming API, and a new compilation
-target that adds typing and compilation support for the `dpnp`_ library. Dpnp is
-a Python library for numerical computing that provides a data-parallel
-reimplementation of `NumPy*`_'s API. Numba-dpex's support for dpnp compilation
-is a new way for Numba users to write code in a NumPy-like API that is
-already supported by Numba, while at the same time automatically running such code
-parallelly on various types of architecture.
-
-Numba-dpex is being developed as part of `Intel AI Analytics Toolkit`_ and is
+Data Parallel Extension for Numba* (`numba-dpex`_) is a standalone extension for
+the `Numba*`_ Python JIT compiler. ``numba-dpex`` adds two new features to
+Numba*: an architecture-agnostic kernel programming API, and a new compilation
+target that adds typing and compilation support for the Data Parallel Extension
+for Numpy* (`dpnp`_) library. ``dpnp`` is a Python package for numerical
+computing that provides a data-parallel reimplementation of `NumPy*`_'s API.
+``numba-dpex``'s support for ``dpnp`` compilation is a new way for Numba* users to write
+code in a NumPy-like API that is already supported by Numba*, while at the same
+time automatically running such code parallelly on various types of
+architecture.
+
+``numba-dpex`` is being developed as part of `Intel AI Analytics Toolkit`_ and is
 distributed with the `Intel Distribution for Python*`_. The extension is also
 available on Anaconda cloud and as a Docker image on GitHub. Please refer the
 :doc:`getting_started` page to learn more.
@@ -27,7 +28,7 @@ Portable Kernel Programming
 
 The kernel API has a design and API similar to Numba's ``cuda.jit`` module.
 However, the API uses the `SYCL*`_ language runtime and as such is extensible to
-various hardware types supported by a SYCL runtime. Presently, numba-dpex uses
+various hardware types supported by a SYCL runtime. Presently, ``numba-dpex`` uses
 the `DPC++`_ SYCL runtime and only supports SPIR-V-based OpenCL and `oneAPI
 Level Zero`_ devices CPU and GPU devices.
 
@@ -54,30 +55,30 @@ interface.
     print(c)
 
 In the above example, we allocated three arrays on a default ``gpu`` device
-using the dpnp library. These arrays are then passed as input arguments to the
+using the ``dpnp`` library. These arrays are then passed as input arguments to the
 kernel function. The compilation target and the subsequent execution of the
 kernel is determined completely by the input arguments and follow the
 "compute-follows-data" programming model as specified in the `Python* Array API
 Standard`_. To change the execution target to a CPU, the device keyword needs to
-be changed to ``cpu`` when allocating the dpnp arrays. It is also possible to
-leave the ``device`` keyword undefined and let the dpnp library select a default
+be changed to ``cpu`` when allocating the ``dpnp`` arrays. It is also possible to
+leave the ``device`` keyword undefined and let the ``dpnp`` library select a default
 device based on environment flag settings. Refer the
 :doc:`user_manual/kernel_programming/index` for further details.
 
-dpnp compilation and offload
+``dpnp`` compilation and offload
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Numba-dpex extends Numba's type system and compilation pipeline to compile dpnp
+``numba-dpex`` extends Numba's type system and compilation pipeline to compile ``dpnp``
 functions and expressions in the same way as NumPy. Unlike Numba's NumPy
-compilation that is serial by default, numba-dpex always compiles dpnp
+compilation that is serial by default, ``numba-dpex`` always compiles ``dpnp``
 expressions into offloadable kernels and executes them in parallel. The feature
 is provided using a decorator ``dpjit`` that behaves identically to
-``numba.njit(parallel=True)`` with the addition of dpnp compilation and offload.
-Offloading by numba-dpex is not just restricted to CPUs and supports all devices
+``numba.njit(parallel=True)`` with the addition of ``dpnp`` compilation and offload.
+Offloading by ``numba-dpex`` is not just restricted to CPUs and supports all devices
 that are presently supported by the kernel API. ``dpjit`` allows using NumPy and
-dpnp expressions in the same function. All NumPy compilation and parallelization
-is done via the default Numba code-generation pipeline, whereas dpnp expressions
-are compiled using the numba-dpex pipeline.
+``dpnp`` expressions in the same function. All NumPy compilation and parallelization
+is done via the default Numba code-generation pipeline, whereas ``dpnp`` expressions
+are compiled using the ``numba-dpex`` pipeline.
 
 The vector addition example depicted using the kernel API can be easily
 expressed in several different ways using ``dpjit``.
@@ -105,32 +106,32 @@ expressed in several different ways using ``dpjit``.
             c[i] = a[i] + b[i]
         return c
 
-As with the kernel API example, a ``dpjit`` function if invoked with dpnp
+As with the kernel API example, a ``dpjit`` function if invoked with ``dpnp``
 input arguments follows the compute-follows-data programming model. Refer
 :doc:`user_manual/dpnp_offload/index` for further details.
 
 
-Project Goal
-------------
+.. Project Goal
+.. ------------
 
-If C++ is not your language, you can skip writing data-parallel kernels in SYCL
-and directly write them in Python.
+.. If C++ is not your language, you can skip writing data-parallel kernels in SYCL
+.. and directly write them in Python.
 
-Our package numba-dpex extends the Numba compiler to allow kernel creation
-directly in Python via a custom compute API
+.. Our package ``numba-dpex`` extends the Numba compiler to allow kernel creation
+.. directly in Python via a custom compute API
 
 
 .. Contributing
 .. ------------
 
 .. Refer the `contributing guide
 .. <https://github.com/IntelPython/numba-dpex/blob/main/CONTRIBUTING>`_ for
-.. information on coding style and standards used in numba-dpex.
+.. information on coding style and standards used in ``numba-dpex``.
 
 .. License
 .. -------
 
-.. Numba-dpex is Licensed under Apache License 2.0 that can be found in `LICENSE
+.. ``numba-dpex`` is Licensed under Apache License 2.0 that can be found in `LICENSE
 .. <https://github.com/IntelPython/numba-dpex/blob/main/LICENSE>`_. All usage and
 .. contributions to the project are subject to the terms and conditions of this
 .. license.
 
@@ -4,4 +4,5 @@
 Compiling and Offloading DPNP
 ==============================
 
-TODO
+- prange, reduction prange
+- blackscholes, math example
@@ -12,6 +12,7 @@ User Guide
 .. toctree::
     :maxdepth: 2
 
+    programming_model.rst
     kernel_programming/index
     dpnp_offload/index
     debugging/index
 
@@ -44,11 +44,9 @@ hardware vendors.
    :maxdepth: 2
 
    writing_kernels
-   memory-management
    synchronization
    device-functions
    atomic-operations
-   selecting_device
    memory_allocation_address_space
    reduction
    ufunc
 
@@ -1,7 +1,7 @@
 Synchronization Functions
 =========================
 
-Numba-dpex only supports some of the SYCL synchronization operations. For
+``numba-dpex`` only supports some of the SYCL synchronization operations. For
 synchronization of all threads in the same thread block, numba-dpex provides
 a helper function called ``numba_dpex.barrier()``. This function implements the
 same pattern as barriers in traditional multi-threaded programming: invoking the
@@ -10,23 +10,22 @@ barrier, at which point it returns control to all its callers.
 
 ``numba_dpex.barrier()`` supports two memory fence options:
 
-- ``numba_dpex.CLK_GLOBAL_MEM_FENCE``: The barrier function will queue a memory
+- ``numba_dpex.GLOBAL_MEM_FENCE``: The barrier function will queue a memory
   fence to ensure correct ordering of memory operations to global memory. Using
   the option can be useful when work-items, for example, write to buffer or
   image objects and then want to read the updated data. Passing no arguments to
   ``numba_dpex.barrier()`` is equivalent to setting the global memory fence
-  option. For example,
+  option.
 
-  .. literalinclude:: ./../../../../numba_dpex/examples/barrier.py
-   :pyobject: no_arg_barrier_support
+  .. .. literalinclude:: ./../../../../numba_dpex/examples/barrier.py
+  ..  :pyobject: no_arg_barrier_support
 
-- ``numba_dpex.CLK_LOCAL_MEM_FENCE``: The barrier function will either flush
+- ``numba_dpex.LOCAL_MEM_FENCE``: The barrier function will either flush
   any variables stored in local memory or queue a memory fence to ensure
-  correct ordering of memory operations to local memory. For example,
-
-.. literalinclude:: ./../../../../numba_dpex/examples/barrier.py
-   :pyobject: local_memory
+  correct ordering of memory operations to local memory.
 
+.. .. literalinclude:: ./../../../../numba_dpex/examples/barrier.py
+..    :pyobject: local_memory
 
 .. note::
 
 
@@ -30,40 +30,39 @@ storing the result of vector summation:
    :name: ex_kernel_declaration_vector_sum
 
 
-Kernel Invocation
-------------------
+.. Kernel Invocation
+.. ------------------
 
-When a kernel is launched you must specify the *global size* and the *local size*,
-which determine the hierarchy of threads, that is the order in which kernels
-will be invoked.
+.. When a kernel is launched you must specify the *global size* and the *local size*,
+.. which determine the hierarchy of threads, that is the order in which kernels
+.. will be invoked.
 
-The following syntax is used in ``numba-dpex`` for kernel invocation with
-specified global and local sizes:
+.. The following syntax is used in ``numba-dpex`` for kernel invocation with
+.. specified global and local sizes:
 
-``kernel_function_name[global_size, local_size](kernel arguments)``
+.. ``kernel_function_name[global_size, local_size](kernel arguments)``
 
-In the following example we invoke kernel ``kernel_vector_sum`` with global size
-specified via variable ``global_size``, and use ``numba_dpex.DEFAULT_LOCAL_SIZE``
-constant for setting local size to some default value:
+.. In the following example we invoke kernel ``kernel_vector_sum`` with global size
+.. specified via variable ``global_size``, and use ``numba_dpex.DEFAULT_LOCAL_SIZE``
+.. constant for setting local size to some default value:
 
-.. code-block:: python
+.. .. code-block:: python
 
-   import numba_dpex as ndpx
+..    import numba_dpex as ndpx
 
-   global_size = 10
-   kernel_vector_sum[global_size, ndpx.DEFAULT_LOCAL_SIZE](a, b, c)
+..    global_size = 10
+..    kernel_vector_sum[global_size, ndpx.DEFAULT_LOCAL_SIZE](a, b, c)
 
-.. note::
-  Each kernel is compiled once, but it can be called multiple times with different global and local sizes settings.
+.. .. note::
+..   Each kernel is compiled once, but it can be called multiple times with different global and local sizes settings.
 
 
-Kernel Invocation (New Syntax)
-------------------------------
+Kernel Invocation
+------------------
 
-Since the release 0.20.0 (Phoenix), we have introduced new kernel launch
-parameter syntax for specifying global and local sizes that are similar to
-``SYCL``'s ``range`` and ``ndrange`` classes. The global and local sizes can
-now be specified with ``numba_dpex``'s ``Range`` and ``NdRange`` classes.
+The kernel launch parameter syntax for specifying global and local sizes are
+similar to ``SYCL``'s ``range`` and ``ndrange`` classes. The global and local
+sizes need to be specified with ``numba_dpex``'s ``Range`` and ``NdRange`` classes.
 
 For example, we have a following kernel that computes a sum of two vectors:
 
@@ -79,7 +78,7 @@ it like this (where ``global_size`` is an ``int``):
 .. literalinclude:: ./../../../../numba_dpex/examples/kernel/vector_sum.py
    :language: python
    :lines: 8-9, 18-24
-   :emphasize-lines: 3
+   :emphasize-lines: 5
    :caption: **EXAMPLE:** A vector sum kernel with a global size/range
    :name: vector_sum_kernel_with_launch_param
 
@@ -105,6 +104,8 @@ and ``args.l`` are ``int``):
    :name: pairwise_distance_kernel_with_launch_param
 
 
+
+
 Kernel Indexing Functions
 -------------------------