Skip to content

Commit be9922d

Browse files
authored
Improved documentation in tuning.md. (CNugteren#615)
* Improved documentation. * Improved documentation for the threads parameter as @CNugteren commented on.
1 parent ec91015 commit be9922d

File tree

1 file changed

+17
-1
lines changed

1 file changed

+17
-1
lines changed

doc/tuning.md

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -216,7 +216,23 @@ The kernels `gemm` and `gemm_direct` have too many parameters to explore. Theref
216216

217217
There are also several routine-level tuners. They tune inter-kernel parameters and should only be run after the kernels are tuned. However, they do automatically pick up kernel tuning results from the current folder if there are any. An example is the GEMM routine tuner, which determines when to use the direct or the in-direct GEMM kernel.
218218

219-
The tuners also proivide a `-threads` option allowing you to control how many threads are used for OpenCL kernel compilation (not for actually executing the kernels). It defaults to running the single threaded version with 1 thread but more can be specified via the parameter. It is recommended to use the same amount of threads as CPU cores to maximize performance. More threads may hurt or improve performance. It is also the safest option to use the default of 1 thread.
219+
Common Tuning Parameters
220+
-------------
221+
222+
The tuners provide a few common parameters that are shared by each tuner:
223+
1. **Precision** -- This is the precision type to train for, valid options are:
224+
* 16 for real 16 bit loating point numbers
225+
* 32 for real 32 bit loating point numbers
226+
* 64 for real 64 bit floating point numbers
227+
* 3232 for complex 32 bit floating point numbers
228+
* 6464 for complex 64 bit floating point numbers
229+
2. **Platform** -- The OpenCL platform to use
230+
3. **Device** -- The OpenCL device to use
231+
4. **Fraction** -- The fraction of a larger search space to explore when running the tuners. A value of 100 is equal to 1% and so 10000 is equal to the whole search space (100%)
232+
5. **Threads** -- The number of threads that the tuner should use for compiling the OpenCL kernels. This does NOT run the OpenCL kernels with multithreading, it only compiles them with multithreading. 1 is the minimum and the recommended amount is the number of cores present in the CPU (with hyperthreading) and more may hurt prformance. It is safer to use 1 thread, espescially for tuning CPUs. If you specify more threads than there are kernels, it will simply use the number of kernels as the number of threads.
233+
234+
Running All Tuners For All Precisions
235+
-------------
220236

221237
Here are all the tuners included in the `make alltuners` target (in the same order) with all their precision arguments:
222238

0 commit comments

Comments
 (0)