Skip to content

Commit dd4a4ed

Browse files
committed
Merge branch 'master' into parallel_runner
2 parents c4f7f32 + a48abc1 commit dd4a4ed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+2276
-751
lines changed

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22
poetry.lock
33
noxenv.txt
44
noxsettings.toml
5+
hyperparamtuning/
6+
*.prof
57

68
### Python ###
79
*.pyc
@@ -16,6 +18,8 @@ push_to_pypi.sh
1618
.nfs*
1719
*.log
1820
*.json
21+
!kernel_tuner/schema/T1/1.0.0/input-schema.json
22+
!test/test_T1_input.json
1923
*.csv
2024
.cache
2125
*.ipynb_checkpoints

CHANGELOG.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,17 @@ All notable changes to this project will be documented in this file.
33
This project adheres to [Semantic Versioning](http://semver.org/).
44

55
## Unreleased
6+
<!-- ## [1.1.0] - 2025 ?? -->
7+
- Additional improvements to search space construction
68
- changed HIP python bindings from pyhip-interface to the official hip-python
9+
- Added Python 3.13 and experimental 3.14 support
10+
- Dropped Python 3.8 and 3.9 support (due to incompatibility with newer scipy versions)
711

812
## [1.0.0] - 2024-04-04
913
- HIP backend to support tuning HIP kernels on AMD GPUs
1014
- Experimental features for mixed-precision and accuracy tuning
1115
- Experimental features for OpenACC tuning
12-
- Major speedup due to new parser and using revamped python-constraint for searchspace building
16+
- Major speedup due to new parser and using revamped python-constraint for search space construction
1317
- Implemented ability to use `PySMT` and `ATF` for searchspace building
1418
- Added Poetry for dependency and build management
1519
- Switched from `setup.py` and `setup.cfg` to `pyproject.toml` for centralized metadata, added relevant tests

INSTALL.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Linux users could type the following to download and install Python 3 using Mini
2020
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
2121
bash Miniconda3-latest-Linux-x86_64.sh
2222
23-
You are of course also free to use your own Python installation, and the Kernel Tuner is developed to be fully compatible with Python 3.9 and newer.
23+
You are of course also free to use your own Python installation, and the Kernel Tuner is developed to be fully compatible with Python 3.10 and newer.
2424

2525
Installing Python Packages
2626
--------------------------

doc/requirements_test.txt

Lines changed: 291 additions & 164 deletions
Large diffs are not rendered by default.

doc/source/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
# import data from pyproject.toml using https://github.com/sphinx-toolbox/sphinx-pyproject
3030
# additional data can be added with `[tool.sphinx-pyproject]` and retrieved with `config['']`.
3131
config = SphinxConfig(
32-
"../../pyproject.toml", style="poetry"
32+
"../../pyproject.toml",
3333
) # add `, globalns=globals()` to directly insert in namespace
3434
year = time.strftime("%Y")
3535
startyear = "2016"

doc/source/dev-environment.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,8 @@ Steps with :bash:`sudo` access (e.g. on a local device):
2727
* After installation, restart your shell.
2828
#. Install the required Python versions:
2929
* On some systems, additional packages may be needed to build Python versions. For example on Ubuntu: :bash:`sudo apt install build-essential zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libreadline-dev libffi-dev libsqlite3-dev wget libbz2-dev liblzma-dev lzma`.
30-
* Install the Python versions with: :bash:`pyenv install 3.9 3.10 3.11 3.12`. The reason we're installing all these versions as opposed to just one, is so we can test against all supported Python versions.
31-
#. Set the Python versions so they can be found: :bash:`pyenv local 3.9 3.10 3.11 3.12` (replace :bash:`local` with :bash:`global` when not using the virtualenv).
30+
* Install the Python versions with: :bash:`pyenv install 3.9 3.10 3.11 3.12 3.13`. The reason we're installing all these versions as opposed to just one, is so we can test against all supported Python versions.
31+
#. Set the Python versions so they can be found: :bash:`pyenv local 3.9 3.10 3.11 3.12 3.13` (replace :bash:`local` with :bash:`global` when not using the virtualenv).
3232
#. Setup a local virtual environment in the folder: :bash:`pyenv virtualenv 3.11 kerneltuner` (or whatever environment name and Python version you prefer).
3333
#. `Install Poetry <https://python-poetry.org/docs/#installing-with-the-official-installer>`__.
3434
* Use :bash:`curl -sSL https://install.python-poetry.org | python3 -` to install Poetry.

doc/source/optimization.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,12 @@ cache files, serving a value from the cache for the first time in the run also c
4646
Only unique function evaluations are counted, so the second time a parameter configuration is selected by the strategy it is served from the
4747
cache, but not counted as a unique function evaluation.
4848

49+
All optimization algorithms, except for brute_force, random_sample, and bayes_opt, allow the user to specify an initial guess or
50+
starting point for the optimization, called ``x0``. This can be passed to the strategy using the ``strategy_options=`` dictionary with ``"x0"`` as key and
51+
a list of values for each parameter in tune_params to note the starting point. For example, for a kernel that has parameters ``block_size_x`` (64, 128, 256)
52+
and ``tile_size_x`` (1,2,3), one could pass ``strategy_options=dict(x0=[128,2])`` to ``tune_kernel()`` to make sure the strategy starts from
53+
the configuration with ``block_size_x=128, tile_size_x=2``. The order in the ``x0`` list should match the order in the tunable parameters dictionary.
54+
4955
Below all the strategies are listed with their strategy-specific options that can be passed in a dictionary to the ``strategy_options=`` argument
5056
of ``tune_kernel()``.
5157

kernel_tuner/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
from kernel_tuner.integration import store_results, create_device_targets
2-
from kernel_tuner.interface import tune_kernel, run_kernel
2+
from kernel_tuner.interface import tune_kernel, tune_kernel_T1, run_kernel
33

44
from importlib.metadata import version
55

kernel_tuner/backends/backend.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
1-
"""This module contains the interface of all kernel_tuner backends"""
1+
"""This module contains the interface of all kernel_tuner backends."""
22
from __future__ import print_function
33

44
from abc import ABC, abstractmethod
55

66

77
class Backend(ABC):
8-
"""Base class for kernel_tuner backends"""
8+
"""Base class for kernel_tuner backends."""
99

1010
@abstractmethod
1111
def ready_argument_list(self, arguments):
1212
"""This method must implement the allocation of the arguments on device memory."""
13-
pass
13+
return arguments
1414

1515
@abstractmethod
1616
def compile(self, kernel_instance):
@@ -64,7 +64,7 @@ def refresh_memory(self, device_memory, host_arguments, should_sync):
6464

6565

6666
class GPUBackend(Backend):
67-
"""Base class for GPU backends"""
67+
"""Base class for GPU backends."""
6868

6969
@abstractmethod
7070
def __init__(self, device, iterations, compiler_options, observers):
@@ -93,7 +93,7 @@ def refresh_memory(self, gpu_memory, host_arguments, should_sync):
9393

9494

9595
class CompilerBackend(Backend):
96-
"""Base class for compiler backends"""
96+
"""Base class for compiler backends."""
9797

9898
@abstractmethod
9999
def __init__(self, iterations, compiler_options, compiler):
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
"""This module contains a 'device' for hyperparameter tuning using the autotuning methodology."""
2+
3+
import platform
4+
from pathlib import Path
5+
6+
from numpy import mean
7+
8+
from kernel_tuner.backends.backend import Backend
9+
from kernel_tuner.observers.observer import BenchmarkObserver
10+
11+
try:
12+
methodology_available = True
13+
from autotuning_methodology.experiments import generate_experiment_file
14+
from autotuning_methodology.report_experiments import get_strategy_scores
15+
except ImportError:
16+
methodology_available = False
17+
18+
19+
class ScoreObserver(BenchmarkObserver):
20+
def __init__(self, dev):
21+
self.dev = dev
22+
self.scores = []
23+
24+
def after_finish(self):
25+
self.scores.append(self.dev.last_score)
26+
27+
def get_results(self):
28+
results = {'score': mean(self.scores), 'scores': self.scores.copy()}
29+
self.scores = []
30+
return results
31+
32+
class HypertunerFunctions(Backend):
33+
"""Class for executing hyperparameter tuning."""
34+
units = {}
35+
36+
def __init__(self, iterations):
37+
self.iterations = iterations
38+
self.observers = [ScoreObserver(self)]
39+
self.name = platform.processor()
40+
self.max_threads = 1024
41+
self.last_score = None
42+
43+
# set the environment options
44+
env = dict()
45+
env["iterations"] = self.iterations
46+
self.env = env
47+
48+
# check for the methodology package
49+
if methodology_available is not True:
50+
raise ImportError("Unable to import the autotuning methodology, run `pip install autotuning_methodology`.")
51+
52+
def ready_argument_list(self, arguments):
53+
arglist = super().ready_argument_list(arguments)
54+
if arglist is None:
55+
arglist = []
56+
return arglist
57+
58+
def compile(self, kernel_instance):
59+
super().compile(kernel_instance)
60+
path = Path(__file__).parent.parent.parent / "hyperparamtuning"
61+
path.mkdir(exist_ok=True)
62+
63+
# TODO get applications & GPUs args from benchmark
64+
gpus = ["RTX_3090", "RTX_2080_Ti"]
65+
applications = None
66+
# applications = [
67+
# {
68+
# "name": "convolution",
69+
# "folder": "./cached_data_used/kernels",
70+
# "input_file": "convolution.json"
71+
# },
72+
# {
73+
# "name": "pnpoly",
74+
# "folder": "./cached_data_used/kernels",
75+
# "input_file": "pnpoly.json"
76+
# }
77+
# ]
78+
79+
# strategy settings
80+
strategy: str = kernel_instance.arguments[0]
81+
hyperparams = [{'name': k, 'value': v} for k, v in kernel_instance.params.items()]
82+
hyperparams_string = "_".join(f"{k}={str(v)}" for k, v in kernel_instance.params.items())
83+
searchspace_strategies = [{
84+
"autotuner": "KernelTuner",
85+
"name": f"{strategy.lower()}_{hyperparams_string}",
86+
"display_name": strategy.replace('_', ' ').capitalize(),
87+
"search_method": strategy.lower(),
88+
'search_method_hyperparameters': hyperparams
89+
}]
90+
91+
# any additional settings
92+
override = {
93+
"experimental_groups_defaults": {
94+
"samples": self.iterations
95+
}
96+
}
97+
98+
name = kernel_instance.name if len(kernel_instance.name) > 0 else kernel_instance.kernel_source.kernel_name
99+
experiments_filepath = generate_experiment_file(name, path, searchspace_strategies, applications, gpus,
100+
override=override, overwrite_existing_file=True)
101+
return str(experiments_filepath)
102+
103+
def start_event(self):
104+
return super().start_event()
105+
106+
def stop_event(self):
107+
return super().stop_event()
108+
109+
def kernel_finished(self):
110+
super().kernel_finished()
111+
return True
112+
113+
def synchronize(self):
114+
return super().synchronize()
115+
116+
def run_kernel(self, func, gpu_args=None, threads=None, grid=None, stream=None):
117+
# generate the experiments file
118+
experiments_filepath = Path(func)
119+
120+
# run the methodology to get a fitness score for this configuration
121+
scores = get_strategy_scores(str(experiments_filepath))
122+
self.last_score = scores[list(scores.keys())[0]]['score']
123+
124+
def memset(self, allocation, value, size):
125+
return super().memset(allocation, value, size)
126+
127+
def memcpy_dtoh(self, dest, src):
128+
return super().memcpy_dtoh(dest, src)
129+
130+
def memcpy_htod(self, dest, src):
131+
return super().memcpy_htod(dest, src)

0 commit comments

Comments
 (0)