You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/source/hostcode.rst
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,6 +15,8 @@ There are few differences with tuning just a single CUDA or OpenCL kernel, to li
15
15
* You have to specify the lang="C" option
16
16
* The C function should return a ``float``
17
17
* You have to do your own timing and error handling in C
18
+
* Data is not automatically copied to and from device memory. To use an array in host memory, pass in a :mod:`numpy` array. To use an array
19
+
in device memory, pass in a :mod:`cupy` array.
18
20
19
21
You have to specify the language as "C" because the Kernel Tuner will be calling a host function. This means that the Kernel
20
22
Tuner will have to interface with C and in fact uses a different backend. This also means you can use this way of tuning
@@ -94,7 +96,7 @@ compiled C code. This way, you don't have to compute the grid size in C, you can
94
96
95
97
The filter is not passed separately as a constant memory argument, because the CudaMemcpyToSymbol operation is now performed by the C host function. Also,
96
98
because the code is compiled differently, we have no direct reference to the compiled module that is uploaded to the device and therefore we can not perform this
97
-
operation directly from Python. If you are tuning host code, you have to perform all memory allocations, frees, and memcpy operations inside the C host code,
99
+
operation directly from Python. If you are tuning host code, you have the option to perform all memory allocations, frees, and memcpy operations inside the C host code,
98
100
that's the purpose of host code after all. That is also why you have to do the timing yourself in C, as you may not want to include the time spent on memory
99
101
allocations and other setup into your time measurements.
0 commit comments