You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Fixed c_ld documentation.
* Ran the generator script.
* Added multithreaded tuning.
* Added a parameter to control number of threads since more threads is not always faster.
* Update the args struct and constants.
* Fixed include order for tuning.cpp
* Prevented the user from using more threads than configurations which would lead to wasted resources.
* Fixed the typo of std::max for the number of threads instead of std::min which it should have been.
* Make it clearer to understand tuning.cpp
Co-authored-by: Cedric Nugteren <web@cedricnugteren.nl>
* Fixed warnings of buffer overflow with GCC.
* Added support for single threading which is the default used.
* Updated changelog and added informationg to tuning.md
* Update CHANGELOG
Co-authored-by: Cedric Nugteren <web@cedricnugteren.nl>
* Improve doc/tuning.md
Co-authored-by: Cedric Nugteren <web@cedricnugteren.nl>
* Improve readability of src/tuning/tuning.cpp
Co-authored-by: Cedric Nugteren <web@cedricnugteren.nl>
* Improved naming of variables.
* Imrpoved const correctness in src/tuning/tuning.cpp
Co-authored-by: Cedric Nugteren <web@cedricnugteren.nl>
---------
Co-authored-by: Cedric Nugteren <web@cedricnugteren.nl>
Copy file name to clipboardExpand all lines: doc/tuning.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -216,6 +216,8 @@ The kernels `gemm` and `gemm_direct` have too many parameters to explore. Theref
216
216
217
217
There are also several routine-level tuners. They tune inter-kernel parameters and should only be run after the kernels are tuned. However, they do automatically pick up kernel tuning results from the current folder if there are any. An example is the GEMM routine tuner, which determines when to use the direct or the in-direct GEMM kernel.
218
218
219
+
The tuners also proivide a `-threads` option allowing you to control how many threads are used for OpenCL kernel compilation (not for actually executing the kernels). It defaults to running the single threaded version with 1 thread but more can be specified via the parameter. It is recommended to use the same amount of threads as CPU cores to maximize performance. More threads may hurt or improve performance. It is also the safest option to use the default of 1 thread.
220
+
219
221
Here are all the tuners included in the `make alltuners` target (in the same order) with all their precision arguments:
0 commit comments