From 5c3823d5d38bca284cab978ab123933826bcd367 Mon Sep 17 00:00:00 2001
From: Nikita Grigorian <nikita.grigorian@intel.com>
Date: Tue, 9 Jul 2024 05:28:41 +0000
Subject: [PATCH 01/11] Adjust kde-python page so that sample source is fully
 visible without scrolling left or right

---
 content/en/docs/kde-python.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/content/en/docs/kde-python.md b/content/en/docs/kde-python.md
index 405bc84..60e7ef6 100644
--- a/content/en/docs/kde-python.md
+++ b/content/en/docs/kde-python.md
@@ -29,7 +29,9 @@ For illustration purpose, here is a sample extension source code:
 #include <vector>
 
 sycl::event
-py_foo(dpctl::tensor::usm_ndarray inp, dpctl::tensor::usm_ndarray out, const std::vector<sycl::event> &deps) {
+py_foo(dpctl::tensor::usm_ndarray inp,
+       dpctl::tensor::usm_ndarray out,
+       const std::vector<sycl::event> &deps) {
     // validation steps skipped
 
     // Execution queue is the queue associated with input arrays

From af944bda5e5f8b2e211f12ebd74637ead020d464 Mon Sep 17 00:00:00 2001
From: Nikita Grigorian <nikita.grigorian@intel.com>
Date: Tue, 9 Jul 2024 05:47:01 +0000
Subject: [PATCH 02/11] Add a bit of detail about unified shared memory to KDE
 Python extension chapter

Also fixes a small typo
---
 content/en/docs/_index.md     | 2 +-
 content/en/docs/kde-python.md | 8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/content/en/docs/_index.md b/content/en/docs/_index.md
index b627492..f02e0c2 100755
--- a/content/en/docs/_index.md
+++ b/content/en/docs/_index.md
@@ -10,4 +10,4 @@ by [Nikita Grigorian](https://github.com/ndgrigorian) and [Oleksandr Pavlyk](htt
 
 This poster is intended to introduce writing portable data-parallel Python extensions using oneAPI.
 
-We present several examples, starting with the basics of initializing a USM (universal shared memory) array, then a KDE (kernel density estimation) with pure DPC++/Sycl, then a KDE Python extension, and finally how to write a portable Python extension which uses oneMKL.
+We present several examples, starting with the basics of initializing a USM (unified shared memory) array, then a KDE (kernel density estimation) with pure DPC++/Sycl, then a KDE Python extension, and finally how to write a portable Python extension which uses oneMKL.
diff --git a/content/en/docs/kde-python.md b/content/en/docs/kde-python.md
index 60e7ef6..dfecc42 100644
--- a/content/en/docs/kde-python.md
+++ b/content/en/docs/kde-python.md
@@ -5,7 +5,7 @@ date: 2024-07-02
 weight: 3
 ---
 
-Since SYCL builds on C++, we are going to use `pybind11` project to generate Python extension.
+Since SYCL builds on C++, we are going to use the `pybind11` project to generate a Python extension.
 We also need Python objects to carry USM allocations of input and output data, such as `dpctl` ([Data Parallel Control](https://github.com/IntelPython/dpctl.git) Python package). The `dpctl` package also provides Python objects corresponding to DPC++ runtime objects:
 
 | Python object         | SYCL C++ object   |
@@ -15,9 +15,9 @@ We also need Python objects to carry USM allocations of input and output data, s
 | ``dpctl.SyclContext`` | ``sycl::context`` |
 | ``dpctl.SyclEvent``   | ``sycl::event``   |
 
-`dpctl` provides integration with `pybind11` supporting castings between `dpctl` Python objects and corresponding C++ SYCL classes listed in the table above. Furthermore, the integration provides C++ class ``dpctl::tensor::usm_ndarray`` which derives from ``pybind11::object``.
-It stores `dpctl.tensor.usm_ndarray` object and provides methods to query its attributes, such as data pointer, dimensionality, shape, strides
-and elemental type information.
+`dpctl` provides integration with `pybind11` supporting castings between `dpctl` Python objects and corresponding C++ SYCL classes listed in the table above. Furthermore, the integration provides the C++ class ``dpctl::tensor::usm_ndarray`` which derives from ``pybind11::object``.
+It stores the `dpctl.tensor.usm_ndarray` object and provides methods to query its attributes, such as data pointer, dimensionality, shape, strides
+and elemental type information. Underlying `dpctl.tensor.usm_ndarray` is a SYCL unified shared memory (USM) allocation. See the [SYCL standard](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:usm) or [dpctl.memory documentation](https://intelpython.github.io/dpctl/latest/api_reference/dpctl/memory.html#dpctl-memory-pyapi) for more details.
 
 For illustration purpose, here is a sample extension source code:
 

From d1a09277423ce44d951586fa4f97ad1b5eb7dac6 Mon Sep 17 00:00:00 2001
From: Nikita Grigorian <nikita.grigorian@intel.com>
Date: Tue, 9 Jul 2024 05:47:09 +0000
Subject: [PATCH 03/11] More content in oneMKL chapter

---
 content/en/docs/oneMKL.md | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/content/en/docs/oneMKL.md b/content/en/docs/oneMKL.md
index e97de98..9823b38 100755
--- a/content/en/docs/oneMKL.md
+++ b/content/en/docs/oneMKL.md
@@ -5,7 +5,7 @@ date: 2024-07-02
 weight: 4
 ---
 
-Since `dpctl.tensor.usm_ndarray` is a Python object carrying a USM allocation, it is possible to write extensions which wrap `oneAPI Math Kernel Library Interfaces` ([oneMKL Interfaces](https://github.com/oneapi-src/oneMKL)) routines and then call them on the USM data underlying the `usm_ndarray` container from Python.
+Since `dpctl.tensor.usm_ndarray` is a Python object with an underlying USM allocation, it is possible to write extensions which wrap `oneAPI Math Kernel Library Interfaces` ([oneMKL Interfaces](https://github.com/oneapi-src/oneMKL)) USM routines and then call them on the `dpctl.tensor.usm_ndarray` from Python. These low-level routines have the potential to greatly improve the performance of extensions.
 
 For an example routine from the `oneMKL` documentation, take [`geqrf`](https://spec.oneapi.io/versions/latest/elements/oneMKL/source/domains/lapack/geqrf.html#geqrf-usm-version):
 ```cpp
@@ -22,6 +22,10 @@ namespace oneapi::mkl::lapack {
 }
 ```
 
-The `pybind11` castings discussed in the previous section enable us to write a simple wrapper function for this routine with `dpctl::tensor::usm_ndarray` inputs and outputs, so long as we take the same precautions to avoid deadlocks. As a result, we can write the extension in much the same way as the `kde_sycl_ext` extension in the previous chapter.
+This general format (``sycl::queue``, arguments, and a vector of ``sycl::event``s) is more or less the same throughout the `oneMKL` USM routine.
 
-An example of a Python extension "mkl_interface_ext" that uses `oneMKL` calls to implement a QR decomposition can be found in "steps/mkl_interface" folder (see [README](steps/mkl_interface/README.md)).
+The `pybind11` castings discussed in the previous section enable us to write a simple wrapper function for this routine with ``dpctl::tensor::usm_ndarray`` inputs and outputs, so long as we take the same precautions to avoid deadlocks. As a result, we can write the extension in much the same way as the `"kde_sycl_ext"` extension in the previous chapter.
+
+An example of a Python extension `"mkl_interface_ext"` that uses `oneMKL` calls to implement a QR decomposition can be found in `"steps/mkl_interface"` folder (see [README](steps/mkl_interface/README.md)).
+
+`oneMKL` can be built for a variety of backends (see [oneMKL interfaces README](https://github.com/oneapi-src/oneMKL?tab=readme-ov-file#oneapi-math-kernel-library-onemkl-interfaces)). The example extension provides instructions for compiling for Intel, CUDA, and AMD, but the [`portBLAS`](https://github.com/codeplaysoftware/portBLAS) and [`portFFT`](https://github.com/codeplaysoftware/portFFT) backends are worth mentioning that. While the routines in `"mkl_interface_ext"` are not supported, these libraries are written in pure SYCL, and are therefore highly portable: they can offload to CPU, Intel, CUDA, and AMD devices. They are also open-source.

From d8ce111af7a6df1aade38480af8d576a6922f2b5 Mon Sep 17 00:00:00 2001
From: Nikita Grigorian <nikita.grigorian@intel.com>
Date: Tue, 9 Jul 2024 22:24:56 +0000
Subject: [PATCH 04/11] Add a run on NVidia to `oneMKL.md`

Fix formatting in `kde-python.md`
---
 content/en/docs/kde-python.md |  2 +-
 content/en/docs/oneMKL.md     | 18 ++++++++++++++++++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/content/en/docs/kde-python.md b/content/en/docs/kde-python.md
index dfecc42..5011679 100644
--- a/content/en/docs/kde-python.md
+++ b/content/en/docs/kde-python.md
@@ -105,7 +105,7 @@ section can be found in `"steps/sycl_python_extension"` folder (see [README](ste
 The folder contains comparison between `dpctl`-based implementation of the KDE implementation following the NumPy
 implementation [above](#kde_numpy) and the dedicated C++ code:
 
-```
+```bash
 KDE for n_sample = 1000000, n_est = 17, n_dim = 7, h = 0.05
 Result agreed.
 kde_dpctl took 0.3404452269896865 seconds
diff --git a/content/en/docs/oneMKL.md b/content/en/docs/oneMKL.md
index 9823b38..3ce0a12 100755
--- a/content/en/docs/oneMKL.md
+++ b/content/en/docs/oneMKL.md
@@ -28,4 +28,22 @@ The `pybind11` castings discussed in the previous section enable us to write a s
 
 An example of a Python extension `"mkl_interface_ext"` that uses `oneMKL` calls to implement a QR decomposition can be found in `"steps/mkl_interface"` folder (see [README](steps/mkl_interface/README.md)).
 
+The folder executes the tests found in `"steps/mkl_interface/tests"` as well as running a larger benchmark which compares Numpy's `linalg.qr` (for reference) to the extension's implementation:
+
+```bash
+$ python run.py
+Using device NVIDIA GeForce GT 1030
+================================================= test session starts ==================================================
+collected 8 items
+
+tests/test_qr.py ........                                                                                        [100%]
+
+================================================== 8 passed in 0.45s ===================================================
+QR decomposition for matrix of size = (3000, 3000)
+Result agreed.
+qr took 0.016026005148887634 seconds
+np.linalg.qr took 0.5165981948375702 seconds
+
+```
+
 `oneMKL` can be built for a variety of backends (see [oneMKL interfaces README](https://github.com/oneapi-src/oneMKL?tab=readme-ov-file#oneapi-math-kernel-library-onemkl-interfaces)). The example extension provides instructions for compiling for Intel, CUDA, and AMD, but the [`portBLAS`](https://github.com/codeplaysoftware/portBLAS) and [`portFFT`](https://github.com/codeplaysoftware/portFFT) backends are worth mentioning that. While the routines in `"mkl_interface_ext"` are not supported, these libraries are written in pure SYCL, and are therefore highly portable: they can offload to CPU, Intel, CUDA, and AMD devices. They are also open-source.

From 8801cc09dd296ff57fc3e3a43b9712e80833cad5 Mon Sep 17 00:00:00 2001
From: Nikita Grigorian <nikita.grigorian@intel.com>
Date: Wed, 10 Jul 2024 05:49:31 +0000
Subject: [PATCH 05/11] Add links to
 `example-portable-data-parallel-extensions` repo and add more to MKL chapter

---
 content/en/_index.md          |  2 +-
 content/en/docs/_index.md     |  2 ++
 content/en/docs/kde-cpp.md    |  8 ++++----
 content/en/docs/kde-python.md |  2 +-
 content/en/docs/oneMKL.md     | 12 ++++++++----
 layouts/404.html              |  1 -
 6 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/content/en/_index.md b/content/en/_index.md
index f52f310..f6e7f9a 100644
--- a/content/en/_index.md
+++ b/content/en/_index.md
@@ -20,7 +20,7 @@ title: Portable Data-Parallel Python Extensions with oneAPI
         <a class="btn btn-lg btn-secondary me-3 mb-4" href="https://IntelPython.github.io/portable-data-parallel-extensions-scipy-2024/docs/">
           First<i class="fa-solid fa-question ms-2 "></i>
         </a>
-        <a class="btn btn-lg btn-secondary me-3 mb-4" href="https://github.com/google/docsy-example">
+        <a class="btn btn-lg btn-secondary me-3 mb-4" href="https://github.com/IntelPython/example-portable-data-parallel-extensions">
           Demonstration<i class="fab fa-github ms-2 "></i>
         </a>
       </div>
diff --git a/content/en/docs/_index.md b/content/en/docs/_index.md
index f02e0c2..3725318 100755
--- a/content/en/docs/_index.md
+++ b/content/en/docs/_index.md
@@ -11,3 +11,5 @@ by [Nikita Grigorian](https://github.com/ndgrigorian) and [Oleksandr Pavlyk](htt
 This poster is intended to introduce writing portable data-parallel Python extensions using oneAPI.
 
 We present several examples, starting with the basics of initializing a USM (unified shared memory) array, then a KDE (kernel density estimation) with pure DPC++/Sycl, then a KDE Python extension, and finally how to write a portable Python extension which uses oneMKL.
+
+The examples can be found [here](https://github.com/IntelPython/example-portable-data-parallel-extensions).
diff --git a/content/en/docs/kde-cpp.md b/content/en/docs/kde-cpp.md
index 68c881b..ce46462 100644
--- a/content/en/docs/kde-cpp.md
+++ b/content/en/docs/kde-cpp.md
@@ -61,7 +61,7 @@ for further summation by another kernel operating in a similar fashion.
 ```
 
 Such an approach, known as tree reduction, is implemented in ``kernel_density_esimation_temps`` function found in
-``"steps/kernel_density_estimation_cpp/kde.hpp"``.
+[``"steps/kernel_density_estimation_cpp/kde.hpp"``](https://github.com/IntelPython/example-portable-data-parallel-extensions/blob/main/steps/kernel_density_estimation_cpp/kde.hpp).
 
 Use of temporary allocation can be avoided if each work-item atomically adds the value of the local sum to the
 appropriate zero-initialized location in the output array, as in implementation ``kernel_density_estimation_atomic_ref``
@@ -119,10 +119,10 @@ in the work-group without accessing the global memory. This could be done effici
 ```
 
 Complete implementation can be found in ``kernel_density_estimation_work_group_reduce_and_atomic_ref`` function
-in ``"steps/kernel_density_estimation_cpp/kde.hpp"``.
+in [``"steps/kernel_density_estimation_cpp/kde.hpp"``](https://github.com/IntelPython/example-portable-data-parallel-extensions/blob/main/steps/kernel_density_estimation_cpp/kde.hpp).
 
-These implementations are called from C++ application ``"steps/kernel_density_estimation_cpp/app.cpp"``, which
+These implementations are called from C++ application [``"steps/kernel_density_estimation_cpp/app.cpp"``](https://github.com/IntelPython/example-portable-data-parallel-extensions/blob/main/steps/kernel_density_estimation_cpp/app.cpp), which
 samples data uniformly distributed over unit cuboid, and estimates the density using Kernel Density Estimation
 and spherically symmetric multivariate Gaussian probability density function as the kernel.
 
-The application can be built using `CMake`, or `Meson`, please refer to [README](steps/kernel_density_estimation_cpp/README.md) document in that folder.
+The application can be built using `CMake`, or `Meson`, please refer to [README](https://github.com/IntelPython/example-portable-data-parallel-extensions/blob/main/steps/kernel_density_estimation_cpp/README.md) document in that folder.
diff --git a/content/en/docs/kde-python.md b/content/en/docs/kde-python.md
index 5011679..0feabf5 100644
--- a/content/en/docs/kde-python.md
+++ b/content/en/docs/kde-python.md
@@ -100,7 +100,7 @@ of the host task a chance at execution.
 Of course, if USM memory is not managed by Python, it may be possible to avoid using GIL altogether.
 
 An example of Python extension `"kde_sycl_ext"` that exposes kernel density estimation code from previous
-section can be found in `"steps/sycl_python_extension"` folder (see [README](steps/sycl_python_extension/README.md)).
+section can be found in [`"steps/sycl_python_extension"`](https://github.com/IntelPython/example-portable-data-parallel-extensions/tree/main/steps/sycl_python_extension) folder (see [README](https://github.com/IntelPython/example-portable-data-parallel-extensions/blob/main/steps/sycl_python_extension/README.md)).
 
 The folder contains comparison between `dpctl`-based implementation of the KDE implementation following the NumPy
 implementation [above](#kde_numpy) and the dedicated C++ code:
diff --git a/content/en/docs/oneMKL.md b/content/en/docs/oneMKL.md
index 3ce0a12..c80bd36 100755
--- a/content/en/docs/oneMKL.md
+++ b/content/en/docs/oneMKL.md
@@ -5,9 +5,14 @@ date: 2024-07-02
 weight: 4
 ---
 
-Since `dpctl.tensor.usm_ndarray` is a Python object with an underlying USM allocation, it is possible to write extensions which wrap `oneAPI Math Kernel Library Interfaces` ([oneMKL Interfaces](https://github.com/oneapi-src/oneMKL)) USM routines and then call them on the `dpctl.tensor.usm_ndarray` from Python. These low-level routines have the potential to greatly improve the performance of extensions.
+Given a matrix \\(A\\), the QR decomposition of \\(A\\) is defined as the decomposition of \\(A\\) into the product of matrices \\(Q\\) and \\(R\\) such that \\(Q\\) is orthonormal and \\(R\\) is an upper-triangular.
+
+QR factorization is a common routine in more optimized LAPACK libraries, so rather than write and implement an algorithm ourselves, it would be preferable to find a suitable library routine.
+
+Since `dpctl.tensor.usm_ndarray` is a Python object with an underlying USM allocation, it is possible to write extensions which wrap `oneAPI Math Kernel Library Interfaces` ([oneMKL Interfaces](https://github.com/oneapi-src/oneMKL)) USM routines and then call them on the `dpctl.tensor.usm_ndarray` from Python. These low-level routines can greatly improve the performance of an extension.
+
+Looking to the `oneMKL` documentation on [`geqrf`](https://spec.oneapi.io/versions/latest/elements/oneMKL/source/domains/lapack/geqrf.html#geqrf-usm-version):
 
-For an example routine from the `oneMKL` documentation, take [`geqrf`](https://spec.oneapi.io/versions/latest/elements/oneMKL/source/domains/lapack/geqrf.html#geqrf-usm-version):
 ```cpp
 namespace oneapi::mkl::lapack {
   cl::sycl::event geqrf(cl::sycl::queue &queue,
@@ -26,7 +31,7 @@ This general format (``sycl::queue``, arguments, and a vector of ``sycl::event``
 
 The `pybind11` castings discussed in the previous section enable us to write a simple wrapper function for this routine with ``dpctl::tensor::usm_ndarray`` inputs and outputs, so long as we take the same precautions to avoid deadlocks. As a result, we can write the extension in much the same way as the `"kde_sycl_ext"` extension in the previous chapter.
 
-An example of a Python extension `"mkl_interface_ext"` that uses `oneMKL` calls to implement a QR decomposition can be found in `"steps/mkl_interface"` folder (see [README](steps/mkl_interface/README.md)).
+An example of a Python extension `"mkl_interface_ext"` that uses `oneMKL` calls to implement a QR decomposition can be found in [`"steps/mkl_interface"`](https://github.com/IntelPython/example-portable-data-parallel-extensions/tree/main/steps/mkl_interface) folder (see [README](https://github.com/IntelPython/example-portable-data-parallel-extensions/blob/main/steps/mkl_interface/README.md)).
 
 The folder executes the tests found in `"steps/mkl_interface/tests"` as well as running a larger benchmark which compares Numpy's `linalg.qr` (for reference) to the extension's implementation:
 
@@ -43,7 +48,6 @@ QR decomposition for matrix of size = (3000, 3000)
 Result agreed.
 qr took 0.016026005148887634 seconds
 np.linalg.qr took 0.5165981948375702 seconds
-
 ```
 
 `oneMKL` can be built for a variety of backends (see [oneMKL interfaces README](https://github.com/oneapi-src/oneMKL?tab=readme-ov-file#oneapi-math-kernel-library-onemkl-interfaces)). The example extension provides instructions for compiling for Intel, CUDA, and AMD, but the [`portBLAS`](https://github.com/codeplaysoftware/portBLAS) and [`portFFT`](https://github.com/codeplaysoftware/portFFT) backends are worth mentioning that. While the routines in `"mkl_interface_ext"` are not supported, these libraries are written in pure SYCL, and are therefore highly portable: they can offload to CPU, Intel, CUDA, and AMD devices. They are also open-source.
diff --git a/layouts/404.html b/layouts/404.html
index 1a9bd70..d32828c 100644
--- a/layouts/404.html
+++ b/layouts/404.html
@@ -2,6 +2,5 @@
 <div class="td-content">
   <h1>Not found</h1>
   <p>Oops! This page doesn't exist. Try going back to the <a href="{{ "" | relURL }}">home page</a>.</p>
-  <p>You can learn how to make a 404 page like this in <a href="https://gohugo.io/templates/404/">Custom 404 Pages</a>.</p>
 </div>
 {{- end }}

From da728a7ba83a3967f9894777b5fdba62845b22cf Mon Sep 17 00:00:00 2001
From: Nikita Grigorian <nikita.grigorian@intel.com>
Date: Wed, 10 Jul 2024 06:06:50 +0000
Subject: [PATCH 06/11] Add a more to the MKL chapter on portability

---
 content/en/docs/oneMKL.md | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/content/en/docs/oneMKL.md b/content/en/docs/oneMKL.md
index c80bd36..443a16a 100755
--- a/content/en/docs/oneMKL.md
+++ b/content/en/docs/oneMKL.md
@@ -11,6 +11,10 @@ QR factorization is a common routine in more optimized LAPACK libraries, so rath
 
 Since `dpctl.tensor.usm_ndarray` is a Python object with an underlying USM allocation, it is possible to write extensions which wrap `oneAPI Math Kernel Library Interfaces` ([oneMKL Interfaces](https://github.com/oneapi-src/oneMKL)) USM routines and then call them on the `dpctl.tensor.usm_ndarray` from Python. These low-level routines can greatly improve the performance of an extension.
 
+`oneMKL Interfaces` can be built to dispatch to a variety of backends including `cuBLAS` and `rocBLAS` (see [oneMKL interfaces README](https://github.com/oneapi-src/oneMKL?tab=readme-ov-file#oneapi-math-kernel-library-onemkl-interfaces)). The [`portBLAS`](https://github.com/codeplaysoftware/portBLAS) backend is also notable as it is open-source and written in pure SYCL.
+
+`oneMKL` routines are essentially wrappers for the same routine in an underlying backend library, depending on the targeted device. This means that the same code can be used for NVidia, AMD, and Intel devices, making it highly portable.  
+
 Looking to the `oneMKL` documentation on [`geqrf`](https://spec.oneapi.io/versions/latest/elements/oneMKL/source/domains/lapack/geqrf.html#geqrf-usm-version):
 
 ```cpp
@@ -49,5 +53,3 @@ Result agreed.
 qr took 0.016026005148887634 seconds
 np.linalg.qr took 0.5165981948375702 seconds
 ```
-
-`oneMKL` can be built for a variety of backends (see [oneMKL interfaces README](https://github.com/oneapi-src/oneMKL?tab=readme-ov-file#oneapi-math-kernel-library-onemkl-interfaces)). The example extension provides instructions for compiling for Intel, CUDA, and AMD, but the [`portBLAS`](https://github.com/codeplaysoftware/portBLAS) and [`portFFT`](https://github.com/codeplaysoftware/portFFT) backends are worth mentioning that. While the routines in `"mkl_interface_ext"` are not supported, these libraries are written in pure SYCL, and are therefore highly portable: they can offload to CPU, Intel, CUDA, and AMD devices. They are also open-source.

From be12249bb1988860d02ec3cf7ff95dc68f56db62 Mon Sep 17 00:00:00 2001
From: Nikita Grigorian <nikita.grigorian@intel.com>
Date: Wed, 10 Jul 2024 06:07:43 +0000
Subject: [PATCH 07/11] Correct small typo

---
 content/en/docs/oneMKL.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/content/en/docs/oneMKL.md b/content/en/docs/oneMKL.md
index 443a16a..de30b1f 100755
--- a/content/en/docs/oneMKL.md
+++ b/content/en/docs/oneMKL.md
@@ -11,7 +11,7 @@ QR factorization is a common routine in more optimized LAPACK libraries, so rath
 
 Since `dpctl.tensor.usm_ndarray` is a Python object with an underlying USM allocation, it is possible to write extensions which wrap `oneAPI Math Kernel Library Interfaces` ([oneMKL Interfaces](https://github.com/oneapi-src/oneMKL)) USM routines and then call them on the `dpctl.tensor.usm_ndarray` from Python. These low-level routines can greatly improve the performance of an extension.
 
-`oneMKL Interfaces` can be built to dispatch to a variety of backends including `cuBLAS` and `rocBLAS` (see [oneMKL interfaces README](https://github.com/oneapi-src/oneMKL?tab=readme-ov-file#oneapi-math-kernel-library-onemkl-interfaces)). The [`portBLAS`](https://github.com/codeplaysoftware/portBLAS) backend is also notable as it is open-source and written in pure SYCL.
+`oneMKL Interfaces` can be built to dispatch to a variety of backends including `cuBLAS` and `rocBLAS` (see [oneMKL Interfaces README](https://github.com/oneapi-src/oneMKL?tab=readme-ov-file#oneapi-math-kernel-library-onemkl-interfaces)). The [`portBLAS`](https://github.com/codeplaysoftware/portBLAS) backend is also notable as it is open-source and written in pure SYCL.
 
 `oneMKL` routines are essentially wrappers for the same routine in an underlying backend library, depending on the targeted device. This means that the same code can be used for NVidia, AMD, and Intel devices, making it highly portable.  
 

From 9dde206f4165b08ee3d279eaad665b0397eedc56 Mon Sep 17 00:00:00 2001
From: Nikita Grigorian <nikita.grigorian@intel.com>
Date: Wed, 10 Jul 2024 06:08:45 +0000
Subject: [PATCH 08/11] Add link to `tests` directory in examples repo

---
 content/en/docs/oneMKL.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/content/en/docs/oneMKL.md b/content/en/docs/oneMKL.md
index de30b1f..81c6764 100755
--- a/content/en/docs/oneMKL.md
+++ b/content/en/docs/oneMKL.md
@@ -37,7 +37,7 @@ The `pybind11` castings discussed in the previous section enable us to write a s
 
 An example of a Python extension `"mkl_interface_ext"` that uses `oneMKL` calls to implement a QR decomposition can be found in [`"steps/mkl_interface"`](https://github.com/IntelPython/example-portable-data-parallel-extensions/tree/main/steps/mkl_interface) folder (see [README](https://github.com/IntelPython/example-portable-data-parallel-extensions/blob/main/steps/mkl_interface/README.md)).
 
-The folder executes the tests found in `"steps/mkl_interface/tests"` as well as running a larger benchmark which compares Numpy's `linalg.qr` (for reference) to the extension's implementation:
+The folder executes the tests found in [`"steps/mkl_interface/tests"`](https://github.com/IntelPython/example-portable-data-parallel-extensions/tree/main/steps/mkl_interface/tests) as well as running a larger benchmark which compares Numpy's `linalg.qr` (for reference) to the extension's implementation:
 
 ```bash
 $ python run.py

From cb9c66fa05f295829fbab1896401c546d1e4ac37 Mon Sep 17 00:00:00 2001
From: Nikita Grigorian <nikita.grigorian@intel.com>
Date: Wed, 10 Jul 2024 06:09:14 +0000
Subject: [PATCH 09/11] Another typo in `oneMKL.md`

---
 content/en/docs/oneMKL.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/content/en/docs/oneMKL.md b/content/en/docs/oneMKL.md
index 81c6764..f601914 100755
--- a/content/en/docs/oneMKL.md
+++ b/content/en/docs/oneMKL.md
@@ -31,7 +31,7 @@ namespace oneapi::mkl::lapack {
 }
 ```
 
-This general format (``sycl::queue``, arguments, and a vector of ``sycl::event``s) is more or less the same throughout the `oneMKL` USM routine.
+This general format (``sycl::queue``, arguments, and a vector of ``sycl::event``s) is more or less the same throughout the `oneMKL` USM routines.
 
 The `pybind11` castings discussed in the previous section enable us to write a simple wrapper function for this routine with ``dpctl::tensor::usm_ndarray`` inputs and outputs, so long as we take the same precautions to avoid deadlocks. As a result, we can write the extension in much the same way as the `"kde_sycl_ext"` extension in the previous chapter.
 

From d0b99dfa7579d1fb842aebac041dcc3d1a208649 Mon Sep 17 00:00:00 2001
From: Nikita Grigorian <nikita.grigorian@intel.com>
Date: Wed, 10 Jul 2024 06:26:53 +0000
Subject: [PATCH 10/11] Change front page buttons and icons

---
 content/en/_index.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/content/en/_index.md b/content/en/_index.md
index f6e7f9a..8636933 100644
--- a/content/en/_index.md
+++ b/content/en/_index.md
@@ -18,10 +18,10 @@ title: Portable Data-Parallel Python Extensions with oneAPI
     <div class="lead text-center">
       <div class="mx-auto mb-5">
         <a class="btn btn-lg btn-secondary me-3 mb-4" href="https://IntelPython.github.io/portable-data-parallel-extensions-scipy-2024/docs/">
-          First<i class="fa-solid fa-question ms-2 "></i>
+          Get Started<i class="fa-solid fa-play ms-2"></i>
         </a>
         <a class="btn btn-lg btn-secondary me-3 mb-4" href="https://github.com/IntelPython/example-portable-data-parallel-extensions">
-          Demonstration<i class="fab fa-github ms-2 "></i>
+          Examples<i class="fab fa-github ms-2 "></i>
         </a>
       </div>
     </div>

From def97de81238fea32238dd4b091880ba2bdbde62 Mon Sep 17 00:00:00 2001
From: Nikita Grigorian <nikita.grigorian@intel.com>
Date: Wed, 10 Jul 2024 06:28:04 +0000
Subject: [PATCH 11/11] Fix grammar error in `oneMKL.md`

---
 content/en/docs/oneMKL.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/content/en/docs/oneMKL.md b/content/en/docs/oneMKL.md
index f601914..15fa5e7 100755
--- a/content/en/docs/oneMKL.md
+++ b/content/en/docs/oneMKL.md
@@ -5,7 +5,7 @@ date: 2024-07-02
 weight: 4
 ---
 
-Given a matrix \\(A\\), the QR decomposition of \\(A\\) is defined as the decomposition of \\(A\\) into the product of matrices \\(Q\\) and \\(R\\) such that \\(Q\\) is orthonormal and \\(R\\) is an upper-triangular.
+Given a matrix \\(A\\), the QR decomposition of \\(A\\) is defined as the decomposition of \\(A\\) into the product of matrices \\(Q\\) and \\(R\\) such that \\(Q\\) is orthonormal and \\(R\\) is upper-triangular.
 
 QR factorization is a common routine in more optimized LAPACK libraries, so rather than write and implement an algorithm ourselves, it would be preferable to find a suitable library routine.