Skip to content

Commit d034aff

Browse files
committed
Merge branch 'master' into refactor_interface
2 parents 0c7ec04 + a60e060 commit d034aff

File tree

17 files changed

+671
-290
lines changed

17 files changed

+671
-290
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ This project adheres to [Semantic Versioning](http://semver.org/).
44

55
## Unreleased
66

7+
## [0.4.4] - 2023-03-09
78
### Added
89
- Support for using time_limit in simulation mode
910
- Helper functions for energy tuning
@@ -12,6 +13,7 @@ This project adheres to [Semantic Versioning](http://semver.org/).
1213

1314
### Changed
1415
- Changed what timings are stored in cache files
16+
- No longer inserting partial loop unrolling factor of 0 in CUDA
1517

1618
## [0.4.3] - 2022-10-19
1719
### Added

doc/source/conf.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,9 +59,9 @@
5959
# built documents.
6060
#
6161
# The short X.Y version.
62-
version = u'0.4.3'
62+
version = u'0.4.4'
6363
# The full version, including alpha/beta/rc tags.
64-
release = u'0.4.3'
64+
release = u'0.4.4'
6565

6666
# The language for content autogenerated by Sphinx. Refer to documentation
6767
# for a list of supported languages.

doc/source/optimization.rst

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,16 +28,26 @@ the ``strategy=`` optional argument of ``tune_kernel()``. Kernel Tuner currently
2828

2929
Most strategies have some mechanism built in to detect when to stop tuning, which may be controlled through specific
3030
parameters that can be passed to the strategies using the ``strategy_options=`` optional argument of ``tune_kernel()``. You
31-
can also override whatever internal stop criterion the strategy uses, and set either a time limit in seconds or a maximum
32-
number of unique function evaluations.
31+
can also override whatever internal stop criterion the strategy uses, and set either a time limit in seconds (using ``time_limit=``) or a maximum
32+
number of unique function evaluations (using ``max_fevals=``).
33+
34+
To give an example, one could simply add these two arguments to any code calling ``tune_kernel()``:
35+
36+
.. code-block:: python
37+
38+
results, env = tune_kernel("vector_add", kernel_string, size, args, tune_params,
39+
strategy="random_sample",
40+
strategy_options=dict(max_fevals=5))
41+
3342
3443
A 'unique function evaluation' corresponds to the first time that Kernel Tuner tries to compile and benchmark a parameter
3544
configuration that has been selected by the optimization strategy. If you are continuing from a previous tuning session using
36-
cache files, serving a value from the cache also counts as a function evaluation for the strategy. Only unique function
37-
evaluations are counted, so the second time a parameter configuration is selected by the strategy it is served from the
45+
cache files, serving a value from the cache for the first time in the run also counts as a function evaluation for the strategy.
46+
Only unique function evaluations are counted, so the second time a parameter configuration is selected by the strategy it is served from the
3847
cache, but not counted as a unique function evaluation.
3948

40-
The ``strategy_options=`` argument of ``tune_kernel()`` should be used as follows:
49+
Below all the strategies are listed with their strategy-specific options that can be passed in a dictionary to the ``strategy_options=`` argument
50+
of ``tune_kernel()``.
4151

4252

4353
kernel_tuner.strategies.basinhopping

examples/c/matrix_multiply.cpp

Lines changed: 0 additions & 17 deletions
This file was deleted.

examples/c/matrix_multiply.py

Lines changed: 0 additions & 38 deletions
This file was deleted.
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
#!/usr/bin/env python
2+
"""
3+
This is the vector_add example modified to show
4+
how to use PythonKernel with the CuPy backend
5+
"""
6+
7+
import cupy as cp
8+
import numpy as np
9+
from kernel_tuner.kernelbuilder import PythonKernel
10+
11+
def kernelbuilder_example():
12+
13+
# To make this example self-contained we include the kernel as a string
14+
# here, but you can also just point to a file with the kernel code
15+
kernel_string = """
16+
__global__ void vector_add(float *c, float *a, float *b, int n) {
17+
int i = blockIdx.x * block_size_x + threadIdx.x;
18+
if (i<n) {
19+
c[i] = a[i] + b[i];
20+
}
21+
}
22+
"""
23+
24+
# Setup the arguments for our vector add kernel
25+
size = 100000
26+
a = cp.random.randn(size).astype(np.float32)
27+
b = cp.random.randn(size).astype(np.float32)
28+
c = cp.zeros_like(b)
29+
n = np.int32(size)
30+
31+
# Note that the type and order should match our GPU code
32+
# Because the arguments are all CuPy arrays, our PythonKernel does not need to
33+
# worry about moving data between host and device
34+
args = [c, a, b, n]
35+
36+
# We can instantiate a specific kernel configurations
37+
params = {"block_size_x": 128}
38+
39+
# Here we construct a Python object that represents the kernel
40+
# we can use it to conveniently use the GPU kernel in Python
41+
# applications that want to frequently call the GPU kernel
42+
vector_add = PythonKernel("vector_add", kernel_string, size, args, params, lang="cupy")
43+
44+
# We can use the PythonKernel instance as a regular Python function
45+
vector_add(c, a, b, n)
46+
47+
# Compare the result in c with a+b computed in Python
48+
assert np.allclose(c, a+b)
49+
50+
51+
if __name__ == "__main__":
52+
kernelbuilder_example()

examples/cuda/reduction.cu

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ __global__ void sum_floats(float *sum_global, floatvector *array, int n) {
5454
sum = sh_mem[ti];
5555
#pragma unroll
5656
for (unsigned int s=16; s>0; s>>=1) {
57-
sum += __shfl_down_sync(0, sum, s);
57+
sum += __shfl_down_sync(0xffffffff, sum, s);
5858
}
5959
}
6060
#else

examples/cuda/reduction.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,10 @@ def tune():
1111

1212
tune_params = OrderedDict()
1313
tune_params["block_size_x"] = [2**i for i in range(5,11)]
14-
tune_params["use_shuffle"] = [0, 1]
1514
tune_params["vector"] = [2**i for i in range(3)]
1615
tune_params["num_blocks"] = [2**i for i in range(5,16)]
1716
tune_params["loop_unroll_factor_0"] = [0, 1, 8, 16, 32, 64]
17+
tune_params["use_shuffle"] = [0, 1]
1818

1919
problem_size = "num_blocks"
2020
size = 800000000

kernel_tuner/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
from kernel_tuner.integration import store_results, create_device_targets
22
from kernel_tuner.interface import tune_kernel, run_kernel
33

4-
__version__ = "0.4.3"
4+
__version__ = "0.4.4"

0 commit comments

Comments
 (0)