|
6 | 6 | * `cxx/c2c_<single/double>_many_example` computes batched forward and backward 1D/2D/3D C2C FFTs in single/double precision with strided data. |
7 | 7 | * `cxx/r2c_c2r_<single/double>_many_example` computes batched forward R2C and backward C2R 1D/2D/3D FFTs in single/double precision with strided data. |
8 | 8 | * `cxx/<r2c_c2r/c2c>_single_withomp_example` computes batched forward and backward 1D FFTs in single precision with contiguous data. Plan creation and execution are called inside an omp parallel region. |
9 | | -* `cxx/c2c_c2r_r2c_many_bench_example` measures performance of 1D/2D/3D C2C/C2R/R2C FFTs with single and double precision. |
| 9 | +* `cxx/plan_many_dft_benchmark_example` measures performance of 1D/2D/3D C2C/C2R/R2C FFTs with single and double precision and arbitrary strides for batched calls. |
10 | 10 | * `cxx/c2c_c2r_r2c_single_apis_example` demonstrates usage of simple and advanced FFTW APIs for computing C2C / R2C / C2R FFTs with inplace / out-of-place data. |
11 | 11 | * `cxx/auxiliary_apis_example` demonstrates usage of few auxiliary APIs. |
12 | 12 | * `cxx/include_header_example` demonstrates inclusion of `fftw3.h` (as opposed to using `nvpl_fftw3.h`). |
|
43 | 43 | ./fortran/c2c_c2r_r2c_single_apis_example.f90 |
44 | 44 |
|
45 | 45 | ``` |
46 | | -### c2c_c2r_r2c_many_bench_example |
| 46 | +### plan_many_dft_benchmark_example |
| 47 | +It's recommended to use the `./scripts/nvpl/nvplbench_generic.py` script to run the `plan_many_dft_benchmark_example` |
| 48 | +allowing to test multiple configurations grouped into one use-case. |
47 | 49 | ``` |
48 | | -Usage: ./c2c_r2c_c2r_many_bench_example |
| 50 | +Usage: ./plan_many_dft_benchmark_example |
49 | 51 | Arguments: |
50 | | - --prec precision: The precision of the transform fp32 or fp64. |
51 | | - --fft_type fft_type: The type of the transform c2c, r2c or c2r. |
52 | | - --mode mode: (optional) The mode of the transform ip or oop (default: ip). |
53 | | - --config config_name: (optional) Name of the config to be logged (default: no_config). |
54 | | - --cat bench_category: (optional) The case to benchmark p_2357, f_2357_l_512_r_1, f_2357_l_512_r_2, varargs_r_1 (default: p_2357). |
55 | | - --size data_size: (optional) Transform data size. Supported options: |
56 | | - * 0 - default, total data size is 256 MB. |
57 | | - * <number> - number of batches to process for each FFT size. |
58 | | - * <number>k or <number>m - for example 64m - the size of data in KB or MB to process. |
59 | | - --cycles cycles: (optional) The number of cycles (default: 100). |
60 | | - --warmup warmup_runs: (optional) The number of warm-up runs (default: 10). |
61 | | - --fft_sizes *fft_sizes: (optional) If `varargs_r_1` is selected, fft sizes can be listed manually (for rank 1). This must be the last argument! |
| 52 | + --prec precision: The precision of the transform fp32 or fp64. |
| 53 | + --fft_type fft_type: (optional) The type of the transform c2c, r2c or c2r (default: c2c). |
| 54 | + --mode mode: (optional) The mode of the transform ip or oop (default: ip). |
| 55 | + --rank rank: (optional) Rank of the transform (default: 1). |
| 56 | + --size data_size: (optional) Transform data size. Supported options: |
| 57 | + * 0 - default, total data size is 256 MB. |
| 58 | + * <number> - number of batches to process for each FFT size. |
| 59 | + * <number>k or <number>m - for example 64m - the size of data in KB or MB to process. |
| 60 | + --cycles cycles: (optional) The number of cycles (default: 100). |
| 61 | + --warmup warmup_runs: (optional) The number of warm-up runs (default: 10). |
| 62 | + --fft_sizes *fft_sizes: (optional) Size of the fft transform. If the rank != 1, it must be specified earlier! |
| 63 | + --istride istride: (optional) Input stride - distance between elements of the sample (default: 1) |
| 64 | + --idist idist: (optional) Distance between start of each input sample (~ number of transformed elements) |
| 65 | + --ostride ostride: (optional) Output stride - distance between elements of the sample (default: 1) |
| 66 | + --odist odist: (optional) Distance between start of each output sample (~ number of transformed elements) |
62 | 67 | ``` |
| 68 | + |
| 69 | + |
0 commit comments