@@ -12,7 +12,7 @@ The Kernel Tuner is designed to be extensible and support
1212different search and execution strategies. The current architecture of
1313the Kernel Tuner can be seen as:
1414
15- .. image :: design .png
15+ .. image :: architecture_0.4.3 .png
1616 :width: 500pt
1717
1818At the top we have the kernel code and the Python script that tunes it,
@@ -33,32 +33,33 @@ the only supported runner, which does exactly what its name says. It compiles
3333and benchmarks configurations using a single sequential Python process.
3434Other runners are foreseen in future releases.
3535
36- The runners are implemented on top of a high-level *Device Interface *,
36+ The runners are implemented on top of the core, which implements a
37+ high-level *Device Interface *,
3738which wraps all the functionality for compiling and benchmarking
3839kernel configurations based on the low-level *Device Function Interface *.
3940Currently, we have
40- four different implementations of the device function interface, which
41+ five different implementations of the device function interface, which
4142basically abstracts the different backends into a set of simple
4243functions such as ``ready_argument_list `` which allocates GPU memory and
4344moves data to the GPU, and functions like ``compile ``, ``benchmark ``, or
4445``run_kernel ``. The functions in the core are basically the main
4546building blocks for implementing runners.
4647
47- At the bottom, three of the backends are shown.
48- PyCUDA and PyOpenCL are for tuning either CUDA or OpenCL kernels.
49- A relatively new addition is the Cupy backend based on Cupy for tuning
50- CUDA kernels using the NVRTC compiler .
48+ The observers are explained in :ref: ` observers `.
49+
50+ At the bottom, the backends are shown.
51+ PyCUDA, CuPy, cuda-python and PyOpenCL are for tuning either CUDA or OpenCL kernels .
5152The C
5253Functions implementation can actually call any compiler, typically NVCC
53- or GCC is used. This backend was created not just to be able to tune C
54- functions, but mostly to tune C functions that in turn launch GPU kernels.
54+ or GCC is used. There is limited support for tuning Fortran kernels.
55+ This backend was created not just to be able to tune C
56+ functions, but in particular to tune C functions that in turn launch GPU kernels.
5557
5658The rest of this section contains the API documentation of the modules
5759discussed above. For the documentation of the user API see the
5860:doc: `user-api `.
5961
6062
61-
6263Strategies
6364----------
6465
@@ -109,6 +110,12 @@ kernel_tuner.cupy.CupyFunctions
109110 :special-members: __init__
110111 :members:
111112
113+ kernel_tuner.nvcuda.CudaFunctions
114+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
115+ .. autoclass :: kernel_tuner.nvcuda.CudaFunctions
116+ :special-members: __init__
117+ :members:
118+
112119kernel_tuner.opencl.OpenCLFunctions
113120~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
114121.. autoclass :: kernel_tuner.opencl.OpenCLFunctions
0 commit comments