Skip to content

Commit 84a3563

Browse files
committed
Merge branch 'master' into refactor_interface
2 parents 1148813 + 588d93f commit 84a3563

File tree

4 files changed

+49
-5
lines changed

4 files changed

+49
-5
lines changed

CONTRIBUTING.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ Before creating a pull request please ensure the following:
2828

2929
If you are in doubt on where to put your additions to the Kernel Tuner, please
3030
have look at the `design documentation
31-
<http://benvanwerkhoven.github.io/kernel_tuner/design.html>`__, or discuss it in the issue regarding your additions.
31+
<https://kerneltuner.github.io/kernel_tuner/stable/design.html>`__, or discuss it in the issue regarding your additions.
3232

3333
Development setup
3434
-----------------

doc/source/matrix_multiplication.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,7 @@
161161
"As we can see the execution times printed by `tune_kernel` already vary quite dramatically between the different values for `block_size_x` and `block_size_y`. However, even with the best thread block dimensions our kernel is still not very efficient.\n",
162162
"\n",
163163
"Therefore, we'll have a look at the Nvidia Visual Profiler to find that the utilization of our kernel is actually pretty low:\n",
164-
"![](https://raw.githubusercontent.com/kerneltuner/kernel_tuner/master/tutorial/matmul/matmul_naive.png)\n",
164+
"![](https://raw.githubusercontent.com/kerneltuner/kernel_tuner/master/doc/source/matmul/matmul_naive.png)\n",
165165
"There is however, a lot of opportunity for data reuse, which is realized by making the threads in a thread block collaborate."
166166
]
167167
},
@@ -270,7 +270,7 @@
270270
"source": [
271271
"This kernel drastically reduces memory bandwidth consumption. Compared to our naive kernel, it is about three times faster now, which comes from the highly increased memory utilization:\n",
272272
"\n",
273-
"![](https://raw.githubusercontent.com/kerneltuner/kernel_tuner/master/tutorial/matmul/matmul_shared.png)\n",
273+
"![](https://raw.githubusercontent.com/kerneltuner/kernel_tuner/master/doc/source/matmul/matmul_shared.png)\n",
274274
"\n",
275275
"The compute utilization has actually decreased slightly, which is due to the synchronization overhead, because ``__syncthread()`` is called frequently.\n",
276276
"\n",
@@ -422,7 +422,7 @@
422422
"source": [
423423
"As we can see the number of kernel configurations evaluated by the tuner has increased again. Also the performance has increased quite dramatically with roughly another factor 3. If we look at the Nvidia Visual Profiler output of our kernel we see the following:\n",
424424
"\n",
425-
"![](https://raw.githubusercontent.com/kerneltuner/kernel_tuner/master/tutorial/matmul/matmul.png)\n",
425+
"![](https://raw.githubusercontent.com/kerneltuner/kernel_tuner/master/doc/source/matmul/matmul.png)\n",
426426
"\n",
427427
"As expected, the compute utilization of our kernel has improved. There may even be some more room for improvement, but our tutorial on how to use Kernel Tuner ends here. In this tutorial, we have seen how you can use Kernel Tuner to tune kernels with a small number of tunable parameters, how to impose restrictions on the parameter space, and how to use grid divisor lists to specify how grid dimensions are computed."
428428
]

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ def readme():
5050
'Topic :: System :: Distributed Computing',
5151
'Development Status :: 5 - Production/Stable',
5252
],
53-
install_requires=['numpy>=1.13.3', 'scipy>=1.8.1', 'jsonschema', 'python-constraint'],
53+
install_requires=['numpy>=1.13.3,<1.24.0', 'scipy>=1.8.1', 'jsonschema', 'python-constraint'],
5454
extras_require={
5555
'doc': ['sphinx', 'sphinx_rtd_theme', 'nbsphinx', 'pytest', 'ipython', 'markupsafe==2.0.1'],
5656
'cuda': ['pycuda', 'nvidia-ml-py', 'pynvml>=11.4.1'],

test/test_runners.py

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@
77

88
import kernel_tuner
99
from kernel_tuner import util
10+
from kernel_tuner import core
11+
from kernel_tuner.interface import Options, _kernel_options, _device_options, _tuning_options
12+
from kernel_tuner.runners.sequential import SequentialRunner
1013

1114
from .context import skip_if_no_pycuda
1215

@@ -186,3 +189,44 @@ def test_interface_handles_compile_failures(env):
186189

187190
failed_config = [record for record in results if record["block_size_x"] == 256][0]
188191
assert isinstance(failed_config["time"], util.CompilationFailedConfig)
192+
193+
194+
@skip_if_no_pycuda
195+
def test_runner(env):
196+
197+
kernel_name, kernel_source, problem_size, arguments, tune_params = env
198+
199+
# create KernelSource
200+
kernelsource = core.KernelSource(kernel_name, kernel_source, lang=None, defines=None)
201+
202+
# create option bags
203+
device=0
204+
atol=1e-6
205+
platform=0
206+
iterations=7
207+
verbose=False
208+
objective="time"
209+
opts = locals()
210+
kernel_options = Options([(k, opts.get(k, None)) for k in _kernel_options.keys()])
211+
tuning_options = Options([(k, opts.get(k, None)) for k in _tuning_options.keys()])
212+
device_options = Options([(k, opts.get(k, None)) for k in _device_options.keys()])
213+
tuning_options.cachefile = None
214+
215+
# create runner
216+
runner = SequentialRunner(kernelsource, kernel_options, device_options, iterations, observers=None)
217+
runner.warmed_up = True # disable warm up for this test
218+
219+
# select a config to run
220+
searchspace = []
221+
222+
# insert configurations to run with this runner in this list
223+
# each configuration is described as a list of values, one for each tunable parameter
224+
# the order should correspond to the order of parameters specified in tune_params
225+
searchspace.append([32]) # vector_add only has one tunable parameter (block_size_x)
226+
227+
# call the runner
228+
results, _ = runner.run(searchspace, kernel_options, tuning_options)
229+
230+
assert len(results) == 1
231+
assert results[0]['block_size_x'] == 32
232+
assert len(results[0]['times']) == iterations

0 commit comments

Comments
 (0)