Skip to content

Commit 6e27d0f

Browse files
committed
Improve text in documentation and remove some typos
1 parent 091c955 commit 6e27d0f

File tree

5 files changed

+29
-25
lines changed

5 files changed

+29
-25
lines changed

docs/examples/basic.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,8 @@ Code Explanation
3030
:lineno-start: 8
3131

3232
First, we need to define a ``KernelBuilder`` instance.
33-
A ``KernelBuilder`` is essentially a `blueprint` that describes the information required to compile the CUDA kernel.
34-
The constructor takes the name of the kernel function and the `.cu` file where the code is located.
33+
A ``KernelBuilder`` is essentially a ``blueprint`` that describes the information required to compile the CUDA kernel.
34+
The constructor takes the name of the kernel function and the ``.cu`` file where the code is located.
3535
Optionally, we can also provide the kernel source as the third parameter.
3636

3737

@@ -40,15 +40,15 @@ Optionally, we can also provide the kernel source as the third parameter.
4040
:lineno-start: 11
4141

4242
CUDA kernels often have tunable parameters that can impact their performance, such as block size, thread granularity, register usage, and the use of shared memory.
43-
Here, we define two tunable parameters: the number of threads per blocks and the number of elements processed per thread.
43+
Here, we define two tunable parameters: the number of threads per block and the number of elements processed per thread.
4444

4545

4646

4747
.. literalinclude:: basic.cpp
4848
:lines: 15-16
4949
:lineno-start: 15
5050

51-
The values returned by ``tune`` are placeholder objecs.
51+
The values returned by ``tune`` are placeholder objects.
5252
These objects can be combined using C++ operators to create new expressions objects.
5353
Note that ``elements_per_block`` does not actually contain a specific value;
5454
instead, it is an abstract expression that, upon kernel instantiation, is evaluated as the product of ``threads_per_block`` and ``elements_per_thread``.
@@ -64,7 +64,7 @@ The following properties are supported:
6464

6565
* ``problem_size``: This is an N-dimensional vector that represents the size of the problem. In this case, is one-dimensional and ``kl::arg0`` means that the size is specified as the first kernel argument (`argument 0`).
6666
* ``block_size``: A triplet ``(x, y, z)`` representing the block dimensions.
67-
* ``grid_divsor``: This property is used to calculate the size of the grid (i.e., the number of blocks along each axis). For each kernel launch, the problem size is divided by the divisors to calculate the grid size. In other words, this property expresses the number of elements processed per thread block.
67+
* ``grid_divisor``: This property is used to calculate the size of the grid (i.e., the number of blocks along each axis). For each kernel launch, the problem size is divided by the divisors to calculate the grid size. In other words, this property expresses the number of elements processed per thread block.
6868
* ``template_args``: This property specifies template arguments, which can be type names and integral values.
6969
* ``define``: Define preprocessor constants.
7070
* ``shared_memory``: Specify the amount of shared memory required, in bytes.

docs/examples/pragma.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ Pragma Kernels
22
===========================
33

44
In the previous examples, we demonstrated how a tunable kernel can be specified by defining a ``KernelBuilder`` instance in the host-side code.
5-
While this API offers flexiblity, it can be cumbersome and requires keeping the kernel code in CUDA in sync with the host-side code in C++.
5+
While this API offers flexibility, it can be cumbersome and requires keeping the kernel code in CUDA in sync with the host-side code in C++.
66

77
Kernel Launcher also provides a way to define kernel specifications directly in the CUDA code by using pragma directives to annotate the kernel code.
88
Although this method is less flexible than the ``KernelBuilder`` API, it is much more convenient and suitable for most CUDA kernels.
@@ -30,7 +30,7 @@ The kernel contains the following ``pragma`` directives:
3030
:lineno-start: 1
3131

3232
The tune directives specify the tunable parameters: ``threads_per_block`` and ``items_per_thread``.
33-
Since ``items_per_thread`` is also the name of the template parameter, so it is passed to the kernel as a compile-time constant via this parameter.
33+
Since ``items_per_thread`` is also the name of the template parameter, it is passed to the kernel as a compile-time constant via this parameter.
3434
The value of ``threads_per_block`` is not passed to the kernel but is used by subsequent pragmas.
3535

3636
.. literalinclude:: vector_add_annotated.cu
@@ -44,7 +44,7 @@ In this case, the constant ``items_per_block`` is defined as the product of ``th
4444
:lines: 4-6
4545
:lineno-start: 4
4646

47-
The ``problem_size`` directive defines the problem size (as discussed in as discussed in :doc:`basic`), ``block_size`` specifies the thread block size, and ``grid_divisor`` specifies how the problem size should be divided to obtain the thread grid size.
47+
The ``problem_size`` directive defines the problem size (as discussed in :doc:`basic`), ``block_size`` specifies the thread block size, and ``grid_divisor`` specifies how the problem size should be divided to obtain the thread grid size.
4848
Alternatively, ``grid_size`` can be used to specify the grid size directly.
4949

5050

@@ -67,7 +67,7 @@ In this example, the tuning key is ``"vector_add_" + T``, where ``T`` is the nam
6767
Host Code
6868
---------
6969

70-
The below code shows how to call the kernel from the host in C++::
70+
The code below shows how to call the kernel from the host in C++::
7171

7272
#include "kernel_launcher/pragma.h"
7373
using namespace kl = kernel_launcher;

docs/examples/registry.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,11 @@ Kernel Registry
77
.. The kernel registry essentially acts like a global cache of compiled kernels.
88
99
In the previous example, we saw how to use wisdom files by creating a ``WisdomKernel`` object.
10-
This object will compile the kernel code on the first call and the keep the kernel loaded as long as the object exists.
10+
This object will compile the kernel code on the first call and then keep the kernel loaded as long as the object exists.
1111
Typically, one would define the ``WisdomKernel`` object as part of a class or as a global variable.
1212

1313
However, in certain scenarios, it is inconvenient or impractical to store ``WisdomKernel`` objects.
14-
In these cases, it is possible to use the ``KernelRegistry``, that essentially acts like a global table of compiled kernel instances.
14+
In these cases, it is possible to use the ``KernelRegistry`` that essentially acts like a global table of compiled kernel instances.
1515

1616

1717
Source code
@@ -36,8 +36,8 @@ Defining a kernel descriptor
3636
:lines: 6-43
3737
:lineno-start: 6
3838

39-
This part of the code defines a ``IKernelDescriptor``:
40-
a class that encapsulate the information required to compile a kernel.
39+
This part of the code defines an ``IKernelDescriptor``:
40+
a class that encapsulates the information required to compile a kernel.
4141
This class should override two methods:
4242

4343
- ``build`` to instantiate a ``KernelBuilder``,
@@ -64,7 +64,7 @@ kernel is only compiled once and stored in the registry.
6464
:lineno-start: 59
6565

6666
Alternatively, it is possible to use the above short-hand syntax.
67-
This syntax also make it is easy to replace the element type ``float`` to some other type such as ``int``::
67+
This syntax also makes it easy to replace the element type ``float`` with some other type such as ``int``::
6868

6969
kl::launch(VectorAddDescriptor::for_type<int>(), n, dev_C, dev_A, dev_B);
7070

@@ -75,4 +75,4 @@ It is even possible to define a templated function that passes type ``T`` on to
7575
kl::launch(VectorAddDescriptor::for_type<T>(), n, C, A, B);
7676
}
7777

78-
Instead of using the global kernel registery, it is also possible to create local registry by creating a ``KernelRegistry`` instance.
78+
Instead of using the global kernel registry, it is also possible to create a local registry by creating a ``KernelRegistry`` instance.

docs/examples/wisdom.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ Wisdom Files
66

77
In the previous example, we demonstrated how to compile a kernel by providing both a ``KernelBuilder`` instance (describing the `blueprint` for the kernel) and a ``Config`` instance (describing the configuration of the tunable parameters).
88

9+
910
However, determining the optimal configuration can often be challenging, as it depends on both the problem size and the specific type of GPU being used.
1011
To address this problem, Kernel Launcher provides a solution in the form of **wisdom files** (terminology borrowed from `FFTW <http://www.fftw.org/>`_).
1112

@@ -86,7 +87,7 @@ To do so, we need to run the program with the environment variable ``KERNEL_LAUN
8687
This generates a file called ``vector_add_1000000.json`` in the directory set by ``set_global_capture_directory``.
8788

8889
Alternatively, it is possible to capture several kernels at once by using the wildcard ``*``.
89-
For example, the following command export all kernels that are start with ``vector_``::
90+
For example, the following command exports all kernels that start with ``vector_``::
9091

9192
$ KERNEL_LAUNCHER_CAPTURE=vector_* ./main
9293

docs/index.rst

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,9 @@ Kernel Launcher
1919

2020
.. image:: /logo.png
2121
:width: 670
22-
:alt: kernel launcher
22+
:alt: Kernel Launcher logo
2323

24-
**Kernel Launcher** is a C++ library that makes it easy to dynamically compile *CUDA* kernels at runtime (using `NVRTC <https://docs.nvidia.com/cuda/nvrtc/index.html>`_) and launching them in a type-safe manner using C++ magic. There are two main reasons for using runtime compilation:
24+
**Kernel Launcher** is a C++ library designed to dynamically compile *CUDA* kernels at runtime (using `NVRTC <https://docs.nvidia.com/cuda/nvrtc/index.html>`_) and to launch them in a type-safe manner using C++ magic. Runtime compilation offers two significant advantages:
2525

2626
* Kernels that have tunable parameters (block size, elements per thread, loop unroll factors, etc.) where the optimal configuration depends on dynamic factors such as the GPU type and problem size.
2727

@@ -33,12 +33,14 @@ Kernel Tuner Integration
3333

3434
.. image:: /kernel_tuner_integration.png
3535
:width: 670
36-
:alt: kernel launcher integration
36+
:alt: Kernel Launcher and Kernel Tuner integration
3737

3838

39-
Kernel Launcher's tight integration with `Kernel Tuner <https://kerneltuner.github.io/>`_ results in highly-tuned kernels, as visualized above.
40-
Kernel Launcher **captures** kernel launches within your application, which are then **tuned** by Kernel Tuner and saved as **wisdom** files.
41-
These files are processed by Kernel Launcher during execution to **compile** the tuned kernel at runtime.
39+
The tight integration of **Kernel Launcher** with `Kernel Tuner <https://kerneltuner.github.io/>`_ ensures that kernels are highly optimized, as illustrated in the image above.
40+
Kernel Launcher can **capture** kernel launches within your application at runtime.
41+
These captured kernels can then be **tuned** by Kernel Tuner and the tuning results are saved as **wisdom** files.
42+
These wisdom files are used by Kernel Launcher during execution to **compile** the tuned kernel at runtime.
43+
4244

4345
See :doc:`examples/wisdom` for an example of how this works in practise.
4446

@@ -48,21 +50,22 @@ See :doc:`examples/wisdom` for an example of how this works in practise.
4850
Basic Example
4951
=============
5052

51-
This sections hows a basic code example. See :ref:`example` for a more advance example.
53+
This section presents a simple code example illustrating how to use the Kernel Launcher.
54+
For a more detailed example, refer to :ref:`example`.
5255

5356
Consider the following CUDA kernel for vector addition.
5457
This kernel has a template parameter ``T`` and a tunable parameter ``ELEMENTS_PER_THREAD``.
5558

5659
.. literalinclude:: examples/vector_add.cu
5760

5861

59-
The following C++ snippet shows how to use *Kernel Launcher* in host code:
62+
The following C++ snippet demonstrates how to use the Kernel Launcher in the host code:
6063

6164
.. literalinclude:: examples/index.cpp
6265

6366

6467

65-
Indices and tables
68+
Indices and Tables
6669
============
6770

6871
* :ref:`genindex`

0 commit comments

Comments
 (0)