|
3 | 3 | VkResample is a demonstration that FFT upscaling and transition to the frequency domain can be done on GPUs in real-time. VkResample is based on VkFFT library (https://github.com/DTolm/VkFFT). |
4 | 4 |
|
5 | 5 | ## What VkResample does |
6 | | -Upscaling and supersampling became hot topics in the modern GPU world. There are a lot of options available already, starting from the simplest nearest neighbor algorithm, where each pixel is split in multiple, and coming to Nvidia's DLSS technology, which uses trained neural networks. One of such options is a Fast Fourier Transform (FFT) based upscaling.\ |
7 | | -FFT is an essential algorithm in image processing. It is used for convolution calculations (see: convolution theorem), filtering (high- and low-bandwidth passes), denoising (Wiener filter). FFT can be used as well for Lanczos upsampling, or in other words, for convolutions with sinc window. In the frequency domain, sinc window corresponds to a step-wise function, which is then multiplied with a centered FFT of an image, padded with zeros to the required size and transformed back to the spatial domain to retrieve upscaled image.\ |
8 | | -The main time consuming part, that was previously immensely limiting FFT-based algorithms, was forward and inverse FFTs themselves. The computational cost of them was simply too high to be performed in real-time. However, modern advances in general purpose GPU computing allow for efficient parallelization of FFT, which is done in a form of Vulkan FFT library - VkFFT. It can be used as a part of a rendering process to perform frequency based computations on a frame before showing it to the user.\ |
9 | | -VkResample uses various optimizations available in VkFFT package, such as R2C/C2R mode and native zero padding support, which greatly reduce the amount of memory transfers and computations. With them enabled, it is possible to upscale 2048x1024 image to 4096x2048 in under 2ms on Nvidia GTX 1660Ti GPU. Measured time covers command buffer submission and execution, which include data transfers to the chip, FFT algorithm, modifications in frequency domain and inverse transformation with its own data trasnfers. Support for arbitrary resolutions will be added in one of the next updates of VkFFT.\ |
10 | | -Possible improvements to this algorithm can include: implementing the Discrete Cosine Transform, which is better suited for real-world images; using additional data from previous frames and/or motion vectors; more low-precision tests and optimizations; using deep learning methods in the frequency domain. As of now, VkResample is more of a proof of concept that can be greatly enchanced in the future.\ |
11 | | -Below you can find a collection of screenshots details comparison from Cyberpunk 2077 game upscaled 2x using nearest neighbor method (NN), FFT method (FFT) and rendered in native resolution (Native). All of the images can be found in the samples folder as well.\ |
| 6 | +Upscaling and supersampling became hot topics in the modern GPU world. There are a lot of options available already, starting from the simplest nearest neighbor algorithm, where each pixel is split in multiple, and coming to Nvidia's DLSS technology, which uses trained neural networks. One of such options is a Fast Fourier Transform (FFT) based upscaling. |
| 7 | + |
| 8 | +FFT is an essential algorithm in image processing. It is used for convolution calculations (see: convolution theorem), filtering (high- and low-bandwidth passes), denoising (Wiener filter). FFT can be used as well for Lanczos upsampling, or in other words, for convolutions with sinc window. In the frequency domain, sinc window corresponds to a step-wise function, which is then multiplied with a centered FFT of an image, padded with zeros to the required size and transformed back to the spatial domain to retrieve upscaled image. |
| 9 | + |
| 10 | +The main time consuming part, that was previously immensely limiting FFT-based algorithms, was forward and inverse FFTs themselves. The computational cost of them was simply too high to be performed in real-time. However, modern advances in general purpose GPU computing allow for efficient parallelization of FFT, which is done in a form of Vulkan FFT library - VkFFT. It can be used as a part of a rendering process to perform frequency based computations on a frame before showing it to the user. |
| 11 | + |
| 12 | +VkResample uses various optimizations available in VkFFT package, such as R2C/C2R mode and native zero padding support, which greatly reduce the amount of memory transfers and computations. With them enabled, it is possible to upscale 2048x1024 image to 4096x2048 in under 2ms on Nvidia GTX 1660Ti GPU. Measured time covers command buffer submission and execution, which include data transfers to the chip, FFT algorithm, modifications in frequency domain and inverse transformation with its own data trasnfers. Support for arbitrary resolutions will be added in one of the next updates of VkFFT. |
| 13 | + |
| 14 | +Possible improvements to this algorithm can include: implementing the Discrete Cosine Transform, which is better suited for real-world images; using additional data from previous frames and/or motion vectors; more low-precision tests and optimizations; using deep learning methods in the frequency domain. As of now, VkResample is more of a proof of concept that can be greatly enchanced in the future. |
| 15 | + |
| 16 | +Below you can find a collection of screenshots details comparison from Cyberpunk 2077 game upscaled 2x using nearest neighbor method (NN), FFT method (FFT) and rendered in native resolution (Native). All of the images can be found in the samples folder as well. |
12 | 17 |
|
13 | 18 |  |
14 | 19 |  |
@@ -36,7 +41,8 @@ VkResample has a command-line interface with the following set of commands:\ |
36 | 41 | -o NAME: specify output png file path (default X_X_upscale.png)\ |
37 | 42 | -u X: specify upscale factor (power of 2, default 1)\ |
38 | 43 | -p X: specify precision (0 - single, 1 - double, 2 - half, default - single)\ |
39 | | --n X: specify how many times to perform upscale. This removes dispatch overhead and will show the real application performance (default 1)\ |
| 44 | +-n X: specify how many times to perform upscale. This removes dispatch overhead and will show the real application performance (default 1) |
| 45 | + |
40 | 46 | The simplest way to launch a 2x upscaler will be: -i no_upscaling.png -u 2 |
41 | 47 |
|
42 | 48 | ## I am looking for a PhD position/job that may be interested in my set of skills. Contact me by email: <[email protected]> | <[email protected]> |
|
0 commit comments