You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fourier correlation has many applications, e.g.: measuring the similarity of two 1D signals, finding the best translation to overlay similar images, volumetric medical image segmentation, etc. This sample shows how to implement 1D and 2D Fourier correlation using SYCL, oneMKL, and oneDPL kernel functions.
2
+
The cross-correlation has many applications, *e.g.*, measuring the similarity of two one-dimensional signals, finding the best translation to overlay similar images, volumetric medical image segmentation, etc. This sample shows how to implement one-dimensional and two-dimensional cross-correlations using SYCL, and oneMKL Discrete Fourier Transform (DFT) functions. This samples requires oneMKL 2024.1 (or newer).
3
3
4
4
For more information on oneMKL, and complete documentation of all oneMKL routines, see https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onemkl.html.
5
5
6
-
| Optimized for | Description
7
-
|:--- |:---
8
-
| OS | Linux* Ubuntu* 18.04; Windows 10*
9
-
| Hardware | Intel® Skylake with Gen9 or newer
10
-
| Software | Intel® oneMKL, Intel® oneDPL
11
-
| What you will learn | How to implement the Fourier correlation algorithm using SYCL, oneMKL, and oneDPL functions
12
-
| Time to complete | 15 minutes
6
+
For more information on supported systems and the corresponding requirements, see https://www.intel.com/content/www/us/en/developer/articles/system-requirements/intel-oneapi-base-toolkit-system-requirements.html.
13
7
14
8
## Purpose
15
-
This sample shows how to implement the Fourier correlation algorithm:
9
+
This sample shows how to find the optimal translational shift maximizing the cross-correlation between two real, periodic signals $u$ and $v$ in ${ℝ}^{n_{1}\times n_{2} \times \ldots \times n_{d}}$, *i.e.*, the (integer) value(s) $s_{i}, \ i \in \lbrace 1, \ldots, d\rbrace$ maximizing
Where ``DFT`` is the discrete Fourier transform, ``IDFT`` is the inverse DFT, ``CONJG`` is the complex conjugate, and ``MAXLOC`` is the location of the maximum value.
13
+
> **_NOTE:_** Given the periodic nature of $u$, $u_{j_{1} + k_{1} n_{1}, j_{2} + k_{2} n_{2}, \ldots, j_{d} + k_{d} n_{d}}$ and $u_{j_{1}, j_{2}, \ldots, j_{d}}$ are equal $\forall \\lbrace k_{1}, \ldots, k_{d}\rbrace \in {ℤ}^{d}$; the same remark holds for $v$.
20
14
21
-
The algorithm can be composed using SYCL, oneMKL, and/or oneDPL. SYCL provides the device offload and host-device memory transfer mechanisms. oneMKL provides optimized forward and backward transforms and complex conjugate multiplication functions. oneDPL provides the MAXLOC function. Therefore, the entire computation can be performed on the accelerator device.
15
+
Discrete Fourier transforms may be used to evaluate the above $c$ efficiently, via
22
16
23
-
The following articles provide more detailed explanations of the implementations:
-[Efficiently Implementing Fourier Correlation Using oneAPI Math Kernel Library](https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2023-1/efficiently-implementing-fourier-correlation-using.html)
26
-
-[Accelerating the 2D Fourier Correlation Algorithm with ArrayFire and oneAPI](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-2d-fourier-correlation-algorithm.html)
19
+
where ${𝓕}$ (resp. ${𝓕}^{-1}$) represents the forward (resp. backward) unscaled Discrete Fourier Transform, $\odot$ represents a component-wise product, and $\lambda^{*}$ is the complex conjugate of $\lambda$.
27
20
28
-
## Key Implementation Details
29
-
In many applications, only the final correlation result matters, so this is all that has to be transferred from the device back to the host. In this example, two artificial signals will be created on the device, transformed in-place, then correlated. The host will retrieve the final result (i.e., the location of the maximum value) and report the optimal translation and correlation score.
21
+
The implementations use SYCL and oneMKL. SYCL provides the device offload and host-device memory transfer mechanisms. oneMKL provides interfaces to compute forward and backward transforms on the device as well as other required functions like the component-wise product of two complex sequences. The result of the operation described above is compared with a naive SYCL implementation and the two are verified to be within numerical tolerance of each other.
30
22
31
-
Two implementations of the 1D Fourier correlation algorithm are provided: one that uses explicit buffering and one that uses Unified Shared Memory (USM). Both implementations perform the correlation on the selected device, but the buffered implementation uses the oneDPL max_element function to perform the final MAXLOC reduction while the USM implementation uses the SYCL reduction operator. A 2D Fourier correlation example is also included to show how 2D data layout is handled during the real-to-complex and complex-to-real transforms.
23
+
## Implementation Details
24
+
25
+
In this sample, two artificial signals are created on the device. Their cross-correlation is evaluate 1) via a naive SYCL implementation and 2) by using the DFT-based procedure described above. The host retrieves the results and verifies that they are within numerical tolerance of each other. The optimal shift maximizing the cross-correlation between the signals is also extracted on the host, and reported as a normalized correlation score
where $\overline{x}$ and $\sigma_{x}$ are the average value and standard deviations of $x$, respectively.
30
+
31
+
Two implementations of the one-dimensional algorithm are provided: one that uses explicit buffering and one that uses Unified Shared Memory (USM). Both implementations compute the cross-correlation on the selected device. A two-dimensional Fourier correlation example using USM is also included, illustrating how to define and use a two-dimensional data layout compliant with the requirements for in-place real-to-complex and complex-to-real transforms.
32
32
33
33
## License
34
34
Code samples are licensed under the MIT license. See [License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.
@@ -68,7 +68,7 @@ After learning how to use the extensions for Intel oneAPI Toolkits, return to th
68
68
>For more information on environment variables, see Use the setvars Script for [Linux or macOS](https://www.intel.com/content/www/us/en/docs/oneapi/programming-guide/2023-1/use-the-setvars-script-with-linux-or-macos.html), or [Windows](https://www.intel.com/content/www/us/en/docs/oneapi/programming-guide/2023-1/use-the-setvars-script-with-windows.html).
69
69
70
70
### On a Linux System
71
-
Run `make` to build and run the sample. Two programs are generated: onethat uses explicit buffering and one that uses USM.
71
+
Run `make` to build and run the sample. One two-dimensional program (using USM) and two one-dimensional programs (one that uses explicit buffering and one that uses USM) are created.
72
72
73
73
You can remove all generated files with `make clean`.
74
74
@@ -78,20 +78,22 @@ Run `nmake` to build and run the sample.
78
78
Note: To remove temporary files, run `nmake clean`.
79
79
80
80
### Example of Output
81
-
If everything is working correctly, the program will generate two artificial 1D signals, cross-correlate them, and report the relative shift that gives the maximum correlation score the output should be similar to this:
81
+
The one-dimensional programs generate two artificial one-dimensional signals, computes their cross-correlation, and report the optimal (right-)shift for the second signal maximizing its correlation score with the first. The output should be similar to this:
82
82
```
83
83
./fcorr_1d_buff 4096
84
-
Running on: Intel(R) Graphics Gen9 [0x3e96]
85
-
Shift the second signal 2048 elements relative to the first signal to get a maximum, normalized correlation score of 2.99976.
84
+
Running on: Intel(R) Data Center GPU Max 1550
85
+
Right-shift the second signal 2048 elements to get a maximum, normalized correlation score of 1 (treating the signals as periodic).
86
+
Max difference between naive and Fourier-based calculations : 2.38419e-07 (verification threshold: 6.66459e-06).
86
87
./fcorr_1d_usm 4096
87
-
Running on: Intel(R) Graphics Gen9 [0x3e96]
88
-
Shift the second signal 2048 elements relative to the first signal to get a maximum, normalized correlation score of 2.99975.
88
+
Running on: Intel(R) Data Center GPU Max 1550
89
+
Right-shift the second signal 2048 elements to get a maximum, normalized correlation score of 1 (treating the signals as periodic).
90
+
Max difference between naive and Fourier-based calculations : 2.38419e-07 (verification threshold: 6.66459e-06).
89
91
```
90
-
If the 2D example is working correctly, two small binary images are cross-correlated and their optimum relative shift and correlation score are reported:
92
+
For the two-dimensional case, the program generates two artificial two-dimensional images, computes their cross-correlation, and report the optimal translational vector for the second image maximizing its correlation score with the first. The output should be similar to this:
Shift the second image (x, y) = (4, 3) elements relative to the first image to get a maximum,
126
-
normalized correlation score of 4. Treat the images as circularly shifted versions of each other.
115
+
Shift the second signal by translation vector (3, 4) to get a maximum, normalized correlation score of 1 (treating the signals as periodic along both dimensions).
116
+
Max difference between naive and Fourier-based calculations : 1.19209e-07 (verification threshold: 4.91989e-06).
127
117
```
128
118
129
119
### Troubleshooting
130
120
If an error occurs, troubleshoot the problem using the Diagnostics Utility for Intel® oneAPI Toolkits.
0 commit comments