Skip to content

Commit 8801cc0

Browse files
committed
Add links to example-portable-data-parallel-extensions repo and add more to MKL chapter
1 parent d8ce111 commit 8801cc0

File tree

6 files changed

+16
-11
lines changed

6 files changed

+16
-11
lines changed

content/en/_index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ title: Portable Data-Parallel Python Extensions with oneAPI
2020
<a class="btn btn-lg btn-secondary me-3 mb-4" href="https://IntelPython.github.io/portable-data-parallel-extensions-scipy-2024/docs/">
2121
First<i class="fa-solid fa-question ms-2 "></i>
2222
</a>
23-
<a class="btn btn-lg btn-secondary me-3 mb-4" href="https://github.com/google/docsy-example">
23+
<a class="btn btn-lg btn-secondary me-3 mb-4" href="https://github.com/IntelPython/example-portable-data-parallel-extensions">
2424
Demonstration<i class="fab fa-github ms-2 "></i>
2525
</a>
2626
</div>

content/en/docs/_index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,5 @@ by [Nikita Grigorian](https://github.com/ndgrigorian) and [Oleksandr Pavlyk](htt
1111
This poster is intended to introduce writing portable data-parallel Python extensions using oneAPI.
1212

1313
We present several examples, starting with the basics of initializing a USM (unified shared memory) array, then a KDE (kernel density estimation) with pure DPC++/Sycl, then a KDE Python extension, and finally how to write a portable Python extension which uses oneMKL.
14+
15+
The examples can be found [here](https://github.com/IntelPython/example-portable-data-parallel-extensions).

content/en/docs/kde-cpp.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ for further summation by another kernel operating in a similar fashion.
6161
```
6262
6363
Such an approach, known as tree reduction, is implemented in ``kernel_density_esimation_temps`` function found in
64-
``"steps/kernel_density_estimation_cpp/kde.hpp"``.
64+
[``"steps/kernel_density_estimation_cpp/kde.hpp"``](https://github.com/IntelPython/example-portable-data-parallel-extensions/blob/main/steps/kernel_density_estimation_cpp/kde.hpp).
6565
6666
Use of temporary allocation can be avoided if each work-item atomically adds the value of the local sum to the
6767
appropriate zero-initialized location in the output array, as in implementation ``kernel_density_estimation_atomic_ref``
@@ -119,10 +119,10 @@ in the work-group without accessing the global memory. This could be done effici
119119
```
120120
121121
Complete implementation can be found in ``kernel_density_estimation_work_group_reduce_and_atomic_ref`` function
122-
in ``"steps/kernel_density_estimation_cpp/kde.hpp"``.
122+
in [``"steps/kernel_density_estimation_cpp/kde.hpp"``](https://github.com/IntelPython/example-portable-data-parallel-extensions/blob/main/steps/kernel_density_estimation_cpp/kde.hpp).
123123
124-
These implementations are called from C++ application ``"steps/kernel_density_estimation_cpp/app.cpp"``, which
124+
These implementations are called from C++ application [``"steps/kernel_density_estimation_cpp/app.cpp"``](https://github.com/IntelPython/example-portable-data-parallel-extensions/blob/main/steps/kernel_density_estimation_cpp/app.cpp), which
125125
samples data uniformly distributed over unit cuboid, and estimates the density using Kernel Density Estimation
126126
and spherically symmetric multivariate Gaussian probability density function as the kernel.
127127
128-
The application can be built using `CMake`, or `Meson`, please refer to [README](steps/kernel_density_estimation_cpp/README.md) document in that folder.
128+
The application can be built using `CMake`, or `Meson`, please refer to [README](https://github.com/IntelPython/example-portable-data-parallel-extensions/blob/main/steps/kernel_density_estimation_cpp/README.md) document in that folder.

content/en/docs/kde-python.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ of the host task a chance at execution.
100100
Of course, if USM memory is not managed by Python, it may be possible to avoid using GIL altogether.
101101
102102
An example of Python extension `"kde_sycl_ext"` that exposes kernel density estimation code from previous
103-
section can be found in `"steps/sycl_python_extension"` folder (see [README](steps/sycl_python_extension/README.md)).
103+
section can be found in [`"steps/sycl_python_extension"`](https://github.com/IntelPython/example-portable-data-parallel-extensions/tree/main/steps/sycl_python_extension) folder (see [README](https://github.com/IntelPython/example-portable-data-parallel-extensions/blob/main/steps/sycl_python_extension/README.md)).
104104
105105
The folder contains comparison between `dpctl`-based implementation of the KDE implementation following the NumPy
106106
implementation [above](#kde_numpy) and the dedicated C++ code:

content/en/docs/oneMKL.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,14 @@ date: 2024-07-02
55
weight: 4
66
---
77

8-
Since `dpctl.tensor.usm_ndarray` is a Python object with an underlying USM allocation, it is possible to write extensions which wrap `oneAPI Math Kernel Library Interfaces` ([oneMKL Interfaces](https://github.com/oneapi-src/oneMKL)) USM routines and then call them on the `dpctl.tensor.usm_ndarray` from Python. These low-level routines have the potential to greatly improve the performance of extensions.
8+
Given a matrix \\(A\\), the QR decomposition of \\(A\\) is defined as the decomposition of \\(A\\) into the product of matrices \\(Q\\) and \\(R\\) such that \\(Q\\) is orthonormal and \\(R\\) is an upper-triangular.
9+
10+
QR factorization is a common routine in more optimized LAPACK libraries, so rather than write and implement an algorithm ourselves, it would be preferable to find a suitable library routine.
11+
12+
Since `dpctl.tensor.usm_ndarray` is a Python object with an underlying USM allocation, it is possible to write extensions which wrap `oneAPI Math Kernel Library Interfaces` ([oneMKL Interfaces](https://github.com/oneapi-src/oneMKL)) USM routines and then call them on the `dpctl.tensor.usm_ndarray` from Python. These low-level routines can greatly improve the performance of an extension.
13+
14+
Looking to the `oneMKL` documentation on [`geqrf`](https://spec.oneapi.io/versions/latest/elements/oneMKL/source/domains/lapack/geqrf.html#geqrf-usm-version):
915

10-
For an example routine from the `oneMKL` documentation, take [`geqrf`](https://spec.oneapi.io/versions/latest/elements/oneMKL/source/domains/lapack/geqrf.html#geqrf-usm-version):
1116
```cpp
1217
namespace oneapi::mkl::lapack {
1318
cl::sycl::event geqrf(cl::sycl::queue &queue,
@@ -26,7 +31,7 @@ This general format (``sycl::queue``, arguments, and a vector of ``sycl::event``
2631
2732
The `pybind11` castings discussed in the previous section enable us to write a simple wrapper function for this routine with ``dpctl::tensor::usm_ndarray`` inputs and outputs, so long as we take the same precautions to avoid deadlocks. As a result, we can write the extension in much the same way as the `"kde_sycl_ext"` extension in the previous chapter.
2833
29-
An example of a Python extension `"mkl_interface_ext"` that uses `oneMKL` calls to implement a QR decomposition can be found in `"steps/mkl_interface"` folder (see [README](steps/mkl_interface/README.md)).
34+
An example of a Python extension `"mkl_interface_ext"` that uses `oneMKL` calls to implement a QR decomposition can be found in [`"steps/mkl_interface"`](https://github.com/IntelPython/example-portable-data-parallel-extensions/tree/main/steps/mkl_interface) folder (see [README](https://github.com/IntelPython/example-portable-data-parallel-extensions/blob/main/steps/mkl_interface/README.md)).
3035
3136
The folder executes the tests found in `"steps/mkl_interface/tests"` as well as running a larger benchmark which compares Numpy's `linalg.qr` (for reference) to the extension's implementation:
3237
@@ -43,7 +48,6 @@ QR decomposition for matrix of size = (3000, 3000)
4348
Result agreed.
4449
qr took 0.016026005148887634 seconds
4550
np.linalg.qr took 0.5165981948375702 seconds
46-
4751
```
4852

4953
`oneMKL` can be built for a variety of backends (see [oneMKL interfaces README](https://github.com/oneapi-src/oneMKL?tab=readme-ov-file#oneapi-math-kernel-library-onemkl-interfaces)). The example extension provides instructions for compiling for Intel, CUDA, and AMD, but the [`portBLAS`](https://github.com/codeplaysoftware/portBLAS) and [`portFFT`](https://github.com/codeplaysoftware/portFFT) backends are worth mentioning that. While the routines in `"mkl_interface_ext"` are not supported, these libraries are written in pure SYCL, and are therefore highly portable: they can offload to CPU, Intel, CUDA, and AMD devices. They are also open-source.

layouts/404.html

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,5 @@
22
<div class="td-content">
33
<h1>Not found</h1>
44
<p>Oops! This page doesn't exist. Try going back to the <a href="{{ "" | relURL }}">home page</a>.</p>
5-
<p>You can learn how to make a 404 page like this in <a href="https://gohugo.io/templates/404/">Custom 404 Pages</a>.</p>
65
</div>
76
{{- end }}

0 commit comments

Comments
 (0)