Merge branch 'master' into refactor_interface

isazi · isazi · commit 84a35638c4ea · 2023-01-12T11:20:27.000+01:00
diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst
@@ -28,7 +28,7 @@ Before creating a pull request please ensure the following:
 
 If you are in doubt on where to put your additions to the Kernel Tuner, please
 have look at the `design documentation
-<http://benvanwerkhoven.github.io/kernel_tuner/design.html>`__, or discuss it in the issue regarding your additions.
+<https://kerneltuner.github.io/kernel_tuner/stable/design.html>`__, or discuss it in the issue regarding your additions.
 
 Development setup
 -----------------
diff --git a/doc/source/matrix_multiplication.ipynb b/doc/source/matrix_multiplication.ipynb
@@ -161,7 +161,7 @@
     "As we can see the execution times printed by `tune_kernel` already vary quite dramatically between the different values for `block_size_x` and `block_size_y`. However, even with the best thread block dimensions our kernel is still not very efficient.\n",
     "\n",
     "Therefore, we'll have a look at the Nvidia Visual Profiler to find that the utilization of our kernel is actually pretty low:\n",
-    "![](https://raw.githubusercontent.com/kerneltuner/kernel_tuner/master/tutorial/matmul/matmul_naive.png)\n",
+    "![](https://raw.githubusercontent.com/kerneltuner/kernel_tuner/master/doc/source/matmul/matmul_naive.png)\n",
     "There is however, a lot of opportunity for data reuse, which is realized by making the threads in a thread block collaborate."
    ]
   },
@@ -270,7 +270,7 @@
    "source": [
     "This kernel drastically reduces memory bandwidth consumption. Compared to our naive kernel, it is about three times faster now, which comes from the highly increased memory utilization:\n",
     "\n",
-    "![](https://raw.githubusercontent.com/kerneltuner/kernel_tuner/master/tutorial/matmul/matmul_shared.png)\n",
+    "![](https://raw.githubusercontent.com/kerneltuner/kernel_tuner/master/doc/source/matmul/matmul_shared.png)\n",
     "\n",
     "The compute utilization has actually decreased slightly, which is due to the synchronization overhead, because ``__syncthread()`` is called frequently.\n",
     "\n",
@@ -422,7 +422,7 @@
    "source": [
     "As we can see the number of kernel configurations evaluated by the tuner has increased again. Also the performance has increased quite dramatically with roughly another factor 3. If we look at the Nvidia Visual Profiler output of our kernel we see the following:\n",
     "\n",
-    "![](https://raw.githubusercontent.com/kerneltuner/kernel_tuner/master/tutorial/matmul/matmul.png)\n",
+    "![](https://raw.githubusercontent.com/kerneltuner/kernel_tuner/master/doc/source/matmul/matmul.png)\n",
     "\n",
     "As expected, the compute utilization of our kernel has improved. There may even be some more room for improvement, but our tutorial on how to use Kernel Tuner ends here. In this tutorial, we have seen how you can use Kernel Tuner to tune kernels with a small number of tunable parameters, how to impose restrictions on the parameter space, and how to use grid divisor lists to specify how grid dimensions are computed."
    ]
diff --git a/setup.py b/setup.py
@@ -50,7 +50,7 @@ def readme():
         'Topic :: System :: Distributed Computing',
         'Development Status :: 5 - Production/Stable',
     ],
-    install_requires=['numpy>=1.13.3', 'scipy>=1.8.1', 'jsonschema', 'python-constraint'],
+    install_requires=['numpy>=1.13.3,<1.24.0', 'scipy>=1.8.1', 'jsonschema', 'python-constraint'],
     extras_require={
         'doc': ['sphinx', 'sphinx_rtd_theme', 'nbsphinx', 'pytest', 'ipython', 'markupsafe==2.0.1'],
         'cuda': ['pycuda', 'nvidia-ml-py', 'pynvml>=11.4.1'],
diff --git a/test/test_runners.py b/test/test_runners.py
@@ -7,6 +7,9 @@
 
 import kernel_tuner
 from kernel_tuner import util
+from kernel_tuner import core
+from kernel_tuner.interface import Options, _kernel_options, _device_options, _tuning_options
+from kernel_tuner.runners.sequential import SequentialRunner
 
 from .context import skip_if_no_pycuda
 
@@ -186,3 +189,44 @@ def test_interface_handles_compile_failures(env):
 
     failed_config = [record for record in results if record["block_size_x"] == 256][0]
     assert isinstance(failed_config["time"], util.CompilationFailedConfig)
+
+
+@skip_if_no_pycuda
+def test_runner(env):
+
+    kernel_name, kernel_source, problem_size, arguments, tune_params = env
+
+    # create KernelSource
+    kernelsource = core.KernelSource(kernel_name, kernel_source, lang=None, defines=None)
+
+    # create option bags
+    device=0
+    atol=1e-6
+    platform=0
+    iterations=7
+    verbose=False
+    objective="time"
+    opts = locals()
+    kernel_options = Options([(k, opts.get(k, None)) for k in _kernel_options.keys()])
+    tuning_options = Options([(k, opts.get(k, None)) for k in _tuning_options.keys()])
+    device_options = Options([(k, opts.get(k, None)) for k in _device_options.keys()])
+    tuning_options.cachefile = None
+
+    # create runner
+    runner = SequentialRunner(kernelsource, kernel_options, device_options, iterations, observers=None)
+    runner.warmed_up = True # disable warm up for this test
+
+    # select a config to run
+    searchspace = []
+
+    # insert configurations to run with this runner in this list
+    # each configuration is described as a list of values, one for each tunable parameter
+    # the order should correspond to the order of parameters specified in tune_params
+    searchspace.append([32]) # vector_add only has one tunable parameter (block_size_x)
+
+    # call the runner
+    results, _ = runner.run(searchspace, kernel_options, tuning_options)
+
+    assert len(results) == 1
+    assert results[0]['block_size_x'] == 32
+    assert len(results[0]['times']) == iterations