Skip to content

Commit e101642

Browse files
committed
Fix small issues in README
1 parent 3857bd7 commit e101642

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313

1414

15-
_Kernel Launcher_ is a C++ library that enables dynamic compilation _CUDA_ kernels at run time (using [NVRTC](https://docs.nvidia.com/cuda/nvrtc/index.html)) and launching them in an easy type-safe way using C++ magic.
15+
_Kernel Launcher_ is a C++ library that enables dynamic compilation of _CUDA_ kernels at run time (using [NVRTC](https://docs.nvidia.com/cuda/nvrtc/index.html)) and launching them in an easy type-safe way using C++ magic.
1616
On top of that, Kernel Launcher supports _capturing_ kernel launches, to enable tuning by [Kernel Tuner](https://github.com/KernelTuner/kernel_tuner), and importing the tuning results, known as _wisdom_ files, back into the application.
1717
The result: highly efficient GPU applications with maximum portability.
1818

@@ -25,7 +25,7 @@ Recommended installation is using CMake. See the [installation guide](https://ke
2525

2626
## Example
2727

28-
There are many ways of using Kernel Launcher. See the documentation for [examples](https://kerneltuner.github.io/kernel_launcher/example.html) or check out the [examples](https://github.com/KernelTuner/kernel_launcher/tree/master/examples) directory.
28+
There are many ways of using Kernel Launcher. See the documentation for [examples](https://kerneltuner.github.io/kernel_launcher/example.html) or check out the [examples/](https://github.com/KernelTuner/kernel_launcher/tree/master/examples) directory.
2929

3030

3131
### Pragma-based API
@@ -37,9 +37,9 @@ Below shows an example of using the pragma-based API, which allows existing CUDA
3737
#pragma kernel block_size(threads_per_block)
3838
#pragma kernel problem_size(n)
3939
#pragma kernel buffers(A[n], B[n], C[n])
40-
template <typename T>
40+
template <typename T, int threads_per_block>
4141
__global__ void vector_add(int n, T *C, const T *A, const T *B) {
42-
int i = blockIdx.x * blockDim.x + threadIdx.x;
42+
int i = blockIdx.x * threads_per_block + threadIdx.x;
4343
if (i < n) {
4444
C[i] = A[i] + B[i];
4545
}

0 commit comments

Comments
 (0)