Skip to content

Hyperparameter tuning #289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 183 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
183 commits
Select commit Hold shift + click to select a range
5442303
New Bayesian Optimization methods
fjwillemsen Dec 1, 2021
1c95770
Updated to latest Kernel Tuner version
fjwillemsen Dec 1, 2021
3273dd3
Completely new Bayesian Optimizaation implementation
fjwillemsen Jan 12, 2022
5e0bfde
Enormous improvement in both performance and speed with BO GPyTorch, …
fjwillemsen Jan 15, 2022
e355c58
Made experimental Python runner parallel, completely new hyperparamet…
fjwillemsen Feb 16, 2022
cf1d4e4
Reverted file permissions
fjwillemsen Feb 16, 2022
531627a
Search spaces are now generated much more efficiently using python-co…
fjwillemsen Mar 24, 2022
2bc55be
Brought branch up to date with master
fjwillemsen Mar 25, 2022
5983e24
Added backwards compatibility with most python-constraint Constraints…
fjwillemsen Mar 25, 2022
f4c8e0b
Added new minmax initial sampling
fjwillemsen Apr 5, 2022
79bc266
Merge with master
fjwillemsen Apr 5, 2022
85f244e
Completed merge with searchspace_experiments
fjwillemsen Oct 23, 2024
b33c6bd
Skip strategies that don't have their dependencies installed
fjwillemsen Oct 25, 2024
208fe7b
Tuning new optimization algorithm
fjwillemsen Oct 25, 2024
6281a0c
Added new BO strategies to interface
fjwillemsen Oct 25, 2024
5ab70df
Made BO GPyTorch implementations importable
fjwillemsen Oct 25, 2024
e407a84
Compatibility with optional dependencies
fjwillemsen Oct 26, 2024
e6c457d
Improved time unit conversion
fjwillemsen Oct 29, 2024
a9f8de4
Changed hyperparameter tuning setup
fjwillemsen Oct 29, 2024
4231999
Added the hyperparamtuning experiments file
fjwillemsen Oct 29, 2024
6fe94ca
Changed from BAT to HIP paper searchspaces
fjwillemsen Oct 29, 2024
05c39cb
Changed from BAT to HIP paper searchspaces
fjwillemsen Oct 29, 2024
b0e4573
Complex restrictions with tunable parameters provided are compiled
fjwillemsen Oct 30, 2024
0a2748d
Made original BO compatible with Searchspaces
fjwillemsen Oct 31, 2024
6354f4d
Implemented a new acquisition function that takes the ratio between p…
fjwillemsen Oct 31, 2024
5401519
Changed supported Python versions to include 3.13, updated dependencies
fjwillemsen Nov 4, 2024
04eacc4
Setup Searchspace to Ax SearchSpace conversion
fjwillemsen Nov 5, 2024
5f31dfc
Implemented Ax as a BO strategy
fjwillemsen Nov 5, 2024
2e4f490
Made BO compatible with StopCriterion
fjwillemsen Nov 5, 2024
705e724
Minor compatbility change to BO strategies
fjwillemsen Nov 5, 2024
6cde57e
Extended hyperparameter tuning benchmark
fjwillemsen Nov 5, 2024
aed5f0d
Implemented Bayesian Optimization using BOTorch
fjwillemsen Nov 5, 2024
8c0dc49
Automatically time out any PyTest that takes longer than 60 seconds
fjwillemsen Nov 5, 2024
b9b748d
Avoided inadvertent use of cache in hyperparametertuning tests
fjwillemsen Nov 5, 2024
1778026
Avoided inadvertent use of cache in hyperparametertuning tests
fjwillemsen Nov 5, 2024
034352f
Shallow copy if the restrictions are copiable
fjwillemsen Nov 5, 2024
eba03f8
Refactored BO BOTorch into class structure
fjwillemsen Nov 5, 2024
c6b243a
Switched to newer fit function, more efficient model initialization b…
fjwillemsen Nov 6, 2024
1581840
Added option to return invalid configurations in CostFunc
fjwillemsen Nov 6, 2024
620ee60
Added the handling of invalid configurations, training data is direct…
fjwillemsen Nov 6, 2024
009cf01
Setup structure for Tensorspace in Searchspace
fjwillemsen Nov 6, 2024
33983f7
Implemented mappings and conversions to and from tensor to parameter …
fjwillemsen Nov 7, 2024
f3fc81b
Improved efficiency of acquisition function by removing evaluated con…
fjwillemsen Nov 7, 2024
a5a0471
Removed Ax, added BOTorch as dependency
fjwillemsen Nov 7, 2024
9429539
Convenience script for benchmarking BO
fjwillemsen Nov 7, 2024
176b8f5
Added objective, tuning direction and hyperparameter tuning language …
fjwillemsen Nov 7, 2024
196af62
Completed implementation of mixed-type handling and handling of inval…
fjwillemsen Nov 8, 2024
55a5c1a
Added docstrings, improved formatting
fjwillemsen Nov 8, 2024
d64f783
Extended strategies test to test for ability to handle non-numeric an…
fjwillemsen Nov 8, 2024
e95ab30
Mixed-type parameters are not converted to numeric constraints
fjwillemsen Nov 8, 2024
10a6a5c
CostFunc can now encode and decode non-numeric configurations for str…
fjwillemsen Nov 8, 2024
6ae3ba6
Fixed logging statements, improved formatting
fjwillemsen Nov 8, 2024
4873a20
Improved the performance of get_bounds
fjwillemsen Nov 8, 2024
bae7e96
Applied non-numeric encoding in differential evolution to handle non-…
fjwillemsen Nov 8, 2024
7eb7ef7
Implemented automatic conversion to multiple types for encoded tensor…
fjwillemsen Nov 8, 2024
91d3ce4
Added tests for Searchspace tensor encoding and conversion
fjwillemsen Nov 8, 2024
80d514e
Seperated strategies and runners test cache file
fjwillemsen Nov 8, 2024
a489252
Implemented handling of categorical parameters
fjwillemsen Nov 8, 2024
68aee14
Implemented variational GP and likelihood
fjwillemsen Nov 8, 2024
b9c012d
Using LogExpectedImprovement to avoid stability issues
fjwillemsen Nov 8, 2024
41ce663
Implemented tensor space bounds in searchspace
fjwillemsen Nov 9, 2024
07ef1d4
Implemented normalization for input features
fjwillemsen Nov 9, 2024
721d072
Tensorspace is reduced by removing inconsequential parameters
fjwillemsen Nov 9, 2024
2434b3b
Extended strategies tests to include single parameter value
fjwillemsen Nov 9, 2024
1679751
Fixed an indexing error for tensorspace bounds
fjwillemsen Nov 9, 2024
2b816a6
Extended searchspace tests to include single parameter value
fjwillemsen Nov 9, 2024
c417585
Implemented additional acquisition functions, reduced number of reini…
fjwillemsen Nov 10, 2024
3d53b29
Implemented division of tensorspace into chunks for faster optimization
fjwillemsen Nov 10, 2024
3ed43a6
Switch to fit_gpytorch_mll_torch for faster fitting, use approximate …
fjwillemsen Nov 12, 2024
559813f
Implemented running BO on GPU / Apple Silicon, settable precision
fjwillemsen Nov 12, 2024
c391428
Removed Apple Silicon MPS support as cholesky operation is not yet im…
fjwillemsen Nov 12, 2024
07925c5
Implemented discrete local search for cases where the tensorspace isn…
fjwillemsen Nov 12, 2024
4113513
Implemented standardization of output
fjwillemsen Nov 12, 2024
ed12b5a
Implemented unified optimization direction
fjwillemsen Nov 12, 2024
d62c941
Updated outcome standardization
fjwillemsen Nov 12, 2024
1c015cb
Using extra information from variance in BO for better fits
fjwillemsen Nov 13, 2024
cad10f8
Implemented gradual cooldown on multi-feval depending on number of fe…
fjwillemsen Nov 13, 2024
1ed0352
Adjusted the calculation of number of optimization spaces to be more …
fjwillemsen Nov 19, 2024
38f084c
Two different kernels as test files for BO
fjwillemsen Nov 19, 2024
c447dc2
Setup structure for BOTorch transfer learning strategy as separate st…
fjwillemsen Nov 21, 2024
7c2fd51
Implemented Rank-Weighted GP Ensemble for transferlearning
fjwillemsen Nov 21, 2024
fec0e65
Avoided import of whole util submodule
fjwillemsen Nov 21, 2024
091ef47
Simplified BO transfer run loop
fjwillemsen Nov 21, 2024
ee11757
Implemented transfer learning caches in interface to be read and pass…
fjwillemsen Nov 21, 2024
1162ece
Added BO transfer learning strategy
fjwillemsen Nov 22, 2024
62fa135
Implemented optionally constructing a searchspace from a cache dictio…
fjwillemsen Nov 22, 2024
57a262f
Implemented construction of Searchspaces from caches
fjwillemsen Nov 22, 2024
964a6ee
Transfer learning inputs and outcomes are represented in Tensors
fjwillemsen Nov 22, 2024
24c6767
More general approach to model and likelihood initialization to make …
fjwillemsen Nov 22, 2024
dc4b4c7
Fitting a model for each base transfer learning task
fjwillemsen Nov 22, 2024
e21a605
Account for invalid configurations in base task caches
fjwillemsen Nov 22, 2024
e3cfe91
Implement main RGPE BO loop
fjwillemsen Nov 22, 2024
2334214
Improved the efficiency of taking initial sample
fjwillemsen Nov 22, 2024
c78a18c
Use of state dictionary is made optional
fjwillemsen Nov 22, 2024
8416098
Renamed RGPE strategy
fjwillemsen Nov 22, 2024
dc000b7
Implemented new transfer learning strategy with multiple independent GPs
fjwillemsen Nov 22, 2024
aa30ec2
Removed redundant min/max results adjustment
fjwillemsen Nov 23, 2024
fd6f95e
Result registration must be optimization direction dependent
fjwillemsen Nov 26, 2024
6963feb
Transfer learning by direct transfer of best configurations
fjwillemsen Nov 26, 2024
a08953e
BO update
fjwillemsen Mar 5, 2025
ecd7802
Improved conversion of tunable parameter
fjwillemsen Mar 5, 2025
b7cda36
Extended and improved conversion of T1 arguments, improved error repo…
fjwillemsen Mar 6, 2025
8836ce2
Improved selection of transfer learning caches
fjwillemsen Mar 7, 2025
539aed3
Fixed an error with quotes in an f-string
fjwillemsen Mar 7, 2025
435b56b
Fixed torch import error due to Tensor type hint
fjwillemsen Mar 7, 2025
3c48b49
Fixed torch import error due to Tensor type hint
fjwillemsen Mar 7, 2025
388f325
Fixed torch import error due to Tensor type hint
fjwillemsen Mar 7, 2025
373782f
Fixed torch import error due to Tensor type hint
fjwillemsen Mar 7, 2025
874c390
Merge with searchspace_experiments
fjwillemsen Mar 7, 2025
db3abb3
Merge with searchspace_experiments
fjwillemsen Mar 7, 2025
c692ba6
Loosened required positional arguments
fjwillemsen Mar 7, 2025
fe113e6
Changed benchmarks location for hypertuner
fjwillemsen Mar 7, 2025
5e65abd
Used hip-python-fork package as hip-python is not available
fjwillemsen Mar 7, 2025
0ba00a0
Removed transfer learning references
fjwillemsen Mar 7, 2025
6633bed
Updated pyproject
fjwillemsen Mar 7, 2025
23f557f
Merge branch 'searchspace_experiments' into hyperparametertuning
fjwillemsen Mar 7, 2025
c39ac5a
Adjusted hyper.py for paper
fjwillemsen Mar 7, 2025
cc19515
Extended hypertuner with additional kernels, adjusted for benchmark_hub
fjwillemsen Mar 7, 2025
1433930
Merge branch 'hyperparametertuning' of https://github.com/benvanwerkh…
fjwillemsen Mar 7, 2025
638d216
Implemented passing strategy to hyperparametertune by CLI argument
fjwillemsen Mar 8, 2025
d36adb5
Extended hyperparmeter tuning with 4 more strategies
fjwillemsen Mar 8, 2025
4e46459
Generate a unique filename for generated experiment files to avoid co…
fjwillemsen Mar 8, 2025
d28fdbe
Adjusted the test / train sets and number of repeats
fjwillemsen Mar 10, 2025
49fa92f
Added simulated_annealing to hyperparameter tuning, adjusted greedy_i…
fjwillemsen Mar 10, 2025
1056269
Updated hyperparameters
fjwillemsen Mar 13, 2025
7ce2234
Updated search spaces used in hyperparameter tuning and number of rep…
fjwillemsen Mar 13, 2025
1ed1893
Added bayes_opt to hyperparamtuning
fjwillemsen Mar 15, 2025
1e2532f
Fixed link with hyperparameter tuning attributes
fjwillemsen Mar 17, 2025
7b7bd8b
Merge branch 'hyperparametertuning' of https://github.com/benvanwerkh…
fjwillemsen Mar 17, 2025
afbf83e
Added support for evaluating T1 strings as a type
fjwillemsen Mar 17, 2025
84a2b1f
Added automatic scaling of random sample size if necessary
fjwillemsen Mar 17, 2025
9e80479
Formatting
fjwillemsen Mar 17, 2025
ce552d0
Minor update to hyperparameter tuning
fjwillemsen Mar 18, 2025
5da8845
Merge branch 'hyperparametertuning' of https://github.com/benvanwerkh…
fjwillemsen Mar 18, 2025
2714c28
Set new default hyperparameters for PSO, dual annealing and simulated…
fjwillemsen Mar 18, 2025
25d5202
Set new default hyperparameters for Genetic Algorithm and Differentia…
fjwillemsen Mar 18, 2025
651c42c
Avoid requesting more random samples than the searchspace size
fjwillemsen Mar 20, 2025
b953a69
Clearer message when exceeding the stop criterion
fjwillemsen Mar 20, 2025
a401008
Add soft maximum function evaluations limit to dual annealing
fjwillemsen Mar 20, 2025
425b4f4
Improved rounding of encoded parameter values
fjwillemsen Mar 20, 2025
0b7ec15
Merge of doc requirements files
fjwillemsen Mar 20, 2025
3bba923
Updated pyproject and requirements files
fjwillemsen Mar 20, 2025
64dfd95
Improved assertion error message
fjwillemsen Mar 20, 2025
5a83d36
Added logging in case default block size restriction is added
fjwillemsen Mar 20, 2025
5e3512b
Adjusted path to benchmarking kernels
fjwillemsen Mar 20, 2025
bff6d7b
Automatically adjust genetic algorithm popsize for smaller search spaces
fjwillemsen Mar 20, 2025
8ddce18
Updated poetry configuration fields to project configuration fields, …
fjwillemsen Mar 20, 2025
19470e4
Removed not yet fully implemented bayesian optimization references, m…
fjwillemsen Mar 20, 2025
d2bb76a
Avoid import of whole util module
fjwillemsen Mar 20, 2025
58f147f
Avoid import of whole util module
fjwillemsen Mar 20, 2025
a48394a
Avoid import of whole util module
fjwillemsen Mar 20, 2025
5dd3e4c
Updated dependencies, required python version and bumped version
fjwillemsen Mar 25, 2025
02833f3
Updated dependencies, required python version and bumped version
fjwillemsen Mar 25, 2025
b820419
Updated documentation dependencies
fjwillemsen Mar 25, 2025
11b378f
Added python version classifiers
fjwillemsen Mar 26, 2025
6550916
Improved code quality based on sonarcloud issues
fjwillemsen Mar 26, 2025
6770d3c
Removed PythonFunctions approach to hyperparameter tuning that is no …
fjwillemsen Mar 26, 2025
3dbe379
Removed bayes_opt_old as a strategy
fjwillemsen Mar 26, 2025
dcd102b
Report last HIP error on error
fjwillemsen Mar 26, 2025
1f935a1
Merge branch 'hyperparametertuning' of https://github.com/KernelTuner…
fjwillemsen Mar 26, 2025
290a860
Added docstring to ScoreObserver class
fjwillemsen Mar 26, 2025
496af94
Reduced cognitive complexity
fjwillemsen Mar 26, 2025
063fe97
Merge branch 'hyperparametertuning' of https://github.com/KernelTuner…
fjwillemsen Mar 26, 2025
c1c3a71
Improved development environment creation specification
fjwillemsen Mar 26, 2025
26914be
Merge with recent additions to searchspace_experiments
fjwillemsen Apr 3, 2025
54010b4
introduced repair technique in genetic algorithm
benvanwerkhoven Apr 30, 2025
71e3de8
added non-constraint-aware initialization and mutation for comparison
benvanwerkhoven Apr 30, 2025
67a5070
fix test_mutate
benvanwerkhoven May 1, 2025
939ea19
constraint-aware variants for pso, firefly, and sa
benvanwerkhoven May 12, 2025
b358265
remove unused variable
benvanwerkhoven May 12, 2025
2d24ae9
Added objective performance keys
fjwillemsen May 12, 2025
77676c8
Support for time-based cutoff with T1 format
fjwillemsen May 13, 2025
2e718f4
Merge with constrained_optimization
fjwillemsen May 13, 2025
9196266
Improvements to constraint-aware strategies
fjwillemsen May 13, 2025
83df948
Implemented passing settings to hyperparameter tuner, improved hyperp…
fjwillemsen May 13, 2025
f6811ab
Added firefly to hyperparameter tuning, various minor improvements
fjwillemsen May 14, 2025
e4af9f7
Added explicit restrictions definition to hyperparameter tuning
fjwillemsen May 15, 2025
5f3b6fc
Updated tune_kernel_T1 to be more broadly applicable
fjwillemsen May 16, 2025
7f3a4a3
Updated hyperparameters to newly tuned defaults
fjwillemsen May 24, 2025
80a5b62
Set default arguments if not provided
fjwillemsen May 28, 2025
79fe080
Merge with master branch
fjwillemsen May 28, 2025
e9797e2
Made Hypertuner backend compatible with changes to Backend ABC
fjwillemsen May 28, 2025
1a4c439
Adjusted GA popsize to only be adjusted when necessary
fjwillemsen May 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
poetry.lock
noxenv.txt
noxsettings.toml
hyperparamtuning/
hyperparamtuning*/*
*.prof

### Python ###
Expand All @@ -20,13 +20,15 @@ push_to_pypi.sh
*.json
!kernel_tuner/schema/T1/1.0.0/input-schema.json
!test/test_T1_input.json
!test_cache_file.json
*.csv
.cache
*.ipynb_checkpoints
examples/cuda/output
deploy_key
*.mod
temp_*.*
.DS_Store
.python-version
.nox

Expand All @@ -41,4 +43,4 @@ temp_*.*
.LSOverride

.vscode
.idea
.idea
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@


Create optimized GPU applications in any mainstream GPU
programming language (CUDA, HIP, OpenCL, OpenACC).
programming language (CUDA, HIP, OpenCL, OpenACC, OpenMP).

What Kernel Tuner does:

Expand Down
179 changes: 92 additions & 87 deletions doc/requirements.txt

Large diffs are not rendered by default.

518 changes: 272 additions & 246 deletions doc/requirements_test.txt

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion doc/source/dev-environment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ Steps without :bash:`sudo` access (e.g. on a cluster):
* Verify that your development environment has no missing installs or updates with :bash:`poetry install --sync --dry-run --with test`.
#. Check if the environment is setup correctly by running :bash:`pytest`. All tests should pass, except if you're not on a GPU node, or one or more extras has been left out in the previous step, then these tests will skip gracefully.
#. Set Nox to use the correct backend and location:
* Run :bash:`conda -- create-settings-file` to automatically create a settings file.
* Run :bash:`nox -- create-settings-file` to automatically create a settings file.
* In this settings file :bash:`noxsettings.toml`, change the :bash:`venvbackend`:
* If you used Mamba in step 2, to :bash:`mamba`.
* If you used Miniconda or Anaconda in step 2, to :bash:`conda`.
Expand Down
7 changes: 3 additions & 4 deletions examples/c/vector_add.py
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
}
"""

size = 72*1024*1024
size = 72 * 1024 * 1024

a = numpy.random.randn(size).astype(numpy.float32)
b = numpy.random.randn(size).astype(numpy.float32)
Expand All @@ -39,7 +39,6 @@
tune_params["nthreads"] = [1, 2, 3, 4, 8, 12, 16, 24, 32]
tune_params["vecsize"] = [1, 2, 4, 8, 16]

answer = [a+b, None, None, None]
answer = [a + b, None, None, None]

tune_kernel("vector_add", kernel_string, size, args, tune_params,
answer=answer, compiler_options=['-O3'])
tune_kernel("vector_add", kernel_string, size, args, tune_params, answer=answer, compiler_options=["-fopenmp", "-O3"])
Empty file modified examples/cuda-c++/vector_add.py
100755 → 100644
Empty file.
Empty file modified examples/cuda-c++/vector_add_blocksize.py
100755 → 100644
Empty file.
Empty file modified examples/cuda-c++/vector_add_cupy.py
100755 → 100644
Empty file.
Empty file modified examples/cuda/convolution.py
100755 → 100644
Empty file.
Empty file modified examples/cuda/convolution_correct.py
100755 → 100644
Empty file.
Empty file modified examples/cuda/convolution_streams.py
100755 → 100644
Empty file.
Empty file modified examples/cuda/expdist.py
100755 → 100644
Empty file.
Empty file modified examples/cuda/matmul.py
100755 → 100644
Empty file.
Empty file modified examples/cuda/pnpoly.py
100755 → 100644
Empty file.
Empty file modified examples/cuda/python_kernel.py
100755 → 100644
Empty file.
Empty file modified examples/cuda/reduction.py
100755 → 100644
Empty file.
Empty file modified examples/cuda/sepconv.py
100755 → 100644
Empty file.
Empty file modified examples/cuda/spmv.py
100755 → 100644
Empty file.
Empty file modified examples/cuda/stencil.py
100755 → 100644
Empty file.
Empty file modified examples/cuda/test_vector_add.py
100755 → 100644
Empty file.
Empty file modified examples/cuda/test_vector_add_parameterized.py
100755 → 100644
Empty file.
Empty file modified examples/cuda/vector_add.py
100755 → 100644
Empty file.
Empty file modified examples/cuda/vector_add_codegen.py
100755 → 100644
Empty file.
Empty file modified examples/cuda/vector_add_cupy.py
100755 → 100644
Empty file.
Empty file modified examples/cuda/vector_add_jinja.py
100755 → 100644
Empty file.
Empty file modified examples/cuda/vector_add_metric.py
100755 → 100644
Empty file.
Empty file modified examples/cuda/vector_add_observers.py
100755 → 100644
Empty file.
Empty file modified examples/cuda/zeromeanfilter.py
100755 → 100644
Empty file.
72 changes: 72 additions & 0 deletions examples/directives/histogram_c_openacc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
#!/usr/bin/env python
"""This is a simple example for tuning C++ OpenACC code with the kernel tuner"""
import numpy as np

from kernel_tuner import tune_kernel
from kernel_tuner.utils.directives import Code, OpenACC, Cxx, process_directives


# Naive Python histogram implementation
def histogram(vector, hist):
for i in range(0, len(vector)):
hist[vector[i]] += 1
return hist


code = """
#include <stdlib.h>

#define HIST_SIZE 256
#define VECTOR_SIZE 1000000

#pragma tuner start histogram vector(int*:VECTOR_SIZE) hist(int*:HIST_SIZE)
#if enable_reduction == 1
#pragma acc parallel num_gangs(ngangs) vector_length(nthreads) reduction(+:hist[:HIST_SIZE])
#else
#pragma acc parallel num_gangs(ngangs) vector_length(nthreads)
#endif
#pragma acc loop independent
for ( int i = 0; i < VECTOR_SIZE; i++ ) {
#if enable_atomic == 1
#pragma acc atomic update
#endif
hist[vector[i]] += 1;
}
#pragma tuner stop
"""

# Extract tunable directive
app = Code(OpenACC(), Cxx())
kernel_string, kernel_args = process_directives(app, code)

tune_params = dict()
tune_params["ngangs"] = [2**i for i in range(1, 11)]
tune_params["nthreads"] = [32 * i for i in range(1, 33)]
tune_params["enable_reduction"] = [0, 1]
tune_params["enable_atomic"] = [0, 1]
constraints = ["enable_reduction != enable_atomic"]
metrics = dict()
metrics["GB/s"] = (
lambda x: ((2 * 4 * len(kernel_args["histogram"][0])) + (4 * len(kernel_args["histogram"][0])))
/ (x["time"] / 10**3)
/ 10**9
)

kernel_args["histogram"][0] = np.random.randint(0, 256, len(kernel_args["histogram"][0]), dtype=np.int32)
kernel_args["histogram"][1] = np.zeros(len(kernel_args["histogram"][1])).astype(np.int32)
reference_hist = np.zeros_like(kernel_args["histogram"][1]).astype(np.int32)
reference_hist = histogram(kernel_args["histogram"][0], reference_hist)
answer = [None, reference_hist]

tune_kernel(
"histogram",
kernel_string["histogram"],
0,
kernel_args["histogram"],
tune_params,
restrictions=constraints,
metrics=metrics,
answer=answer,
compiler="nvc++",
compiler_options=["-fast", "-acc=gpu"],
)
71 changes: 71 additions & 0 deletions examples/directives/histogram_c_openmp.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
#!/usr/bin/env python
"""This is a simple example for tuning C++ OpenMP code with the kernel tuner"""
import numpy as np

from kernel_tuner import tune_kernel
from kernel_tuner.utils.directives import Code, OpenMP, Cxx, process_directives


# Naive Python histogram implementation
def histogram(vector, hist):
for i in range(0, len(vector)):
hist[vector[i]] += 1
return hist


code = """
#include <stdlib.h>

#define HIST_SIZE 256
#define VECTOR_SIZE 1000000

#pragma tuner start histogram vector(int*:VECTOR_SIZE) hist(int*:HIST_SIZE)
#if enable_reduction == 1
#pragma omp target teams distribute parallel for num_teams(nteams) num_threads(nthreads) reduction(+:hist[:HIST_SIZE])
#else
#pragma omp target teams distribute parallel for num_teams(nteams) num_threads(nthreads)
#endif
for ( int i = 0; i < VECTOR_SIZE; i++ ) {
#if enable_atomic == 1
#pragma omp atomic update
#endif
hist[vector[i]] += 1;
}
#pragma tuner stop
"""

# Extract tunable directive
app = Code(OpenMP(), Cxx())
kernel_string, kernel_args = process_directives(app, code)

tune_params = dict()
tune_params["nteams"] = [2**i for i in range(1, 11)]
tune_params["nthreads"] = [32 * i for i in range(1, 33)]
tune_params["enable_reduction"] = [0, 1]
tune_params["enable_atomic"] = [0, 1]
constraints = ["enable_reduction != enable_atomic"]
metrics = dict()
metrics["GB/s"] = (
lambda x: ((2 * 4 * len(kernel_args["histogram"][0])) + (4 * len(kernel_args["histogram"][0])))
/ (x["time"] / 10**3)
/ 10**9
)

kernel_args["histogram"][0] = np.random.randint(0, 256, len(kernel_args["histogram"][0]), dtype=np.int32)
kernel_args["histogram"][1] = np.zeros(len(kernel_args["histogram"][1])).astype(np.int32)
reference_hist = np.zeros_like(kernel_args["histogram"][1]).astype(np.int32)
reference_hist = histogram(kernel_args["histogram"][0], reference_hist)
answer = [None, reference_hist]

tune_kernel(
"histogram",
kernel_string["histogram"],
0,
kernel_args["histogram"],
tune_params,
restrictions=constraints,
metrics=metrics,
answer=answer,
compiler="nvc++",
compiler_options=["-fast", "-mp=gpu"],
)
18 changes: 10 additions & 8 deletions examples/directives/matrix_multiply_c_openacc.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,8 @@
#!/usr/bin/env python
"""This is an example tuning a naive matrix multiplication using the simplified directives interface"""

from kernel_tuner import tune_kernel
from kernel_tuner.utils.directives import (
Code,
OpenACC,
Cxx,
process_directives
)
from kernel_tuner import tune_kernel, run_kernel
from kernel_tuner.utils.directives import Code, OpenACC, Cxx, process_directives

N = 4096

Expand Down Expand Up @@ -45,13 +40,20 @@
metrics["GB/s"] = lambda x: ((N**3 * 2 * 4) + (N**2 * 4)) / x["time_s"] / 10**9
metrics["GFLOP/s"] = lambda x: (N**3 * 3) / x["time_s"] / 10**9

# compute reference solution from CPU
results = run_kernel(
"mm", kernel_string["mm"], 0, kernel_args["mm"], {"nthreads": 1}, compiler="nvc++", compiler_options=["-fast"]
)
answer = [None, None, results[2]]

tune_kernel(
"mm",
kernel_string["mm"],
0,
kernel_args["mm"],
tune_params,
metrics=metrics,
compiler_options=["-fast", "-acc=gpu"],
answer=answer,
compiler="nvc++",
compiler_options=["-fast", "-acc=gpu"],
)
59 changes: 59 additions & 0 deletions examples/directives/matrix_multiply_c_openmp.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#!/usr/bin/env python
"""This is an example tuning a naive matrix multiplication using the simplified directives interface"""

from kernel_tuner import tune_kernel, run_kernel
from kernel_tuner.utils.directives import Code, OpenMP, Cxx, process_directives

N = 4096

code = """
#define N 4096

void matrix_multiply(float *A, float *B, float *C) {
#pragma tuner start mm A(float*:NN) B(float*:NN) C(float*:NN)
float temp_sum = 0.0f;
#pragma omp target
#pragma omp teams distribute collapse(2)
for ( int i = 0; i < N; i++) {
for ( int j = 0; j < N; j++ ) {
temp_sum = 0.0f;
#pragma omp parallel for num_threads(nthreads) reduction(+:temp_sum)
for ( int k = 0; k < N; k++ ) {
temp_sum += A[(i * N) + k] * B[(k * N) + j];
}
C[(i * N) + j] = temp_sum;
}
}
#pragma tuner stop
}
"""

# Extract tunable directive
app = Code(OpenMP(), Cxx())
dims = {"NN": N**2}
kernel_string, kernel_args = process_directives(app, code, user_dimensions=dims)

tune_params = dict()
tune_params["nthreads"] = [32 * i for i in range(1, 33)]
metrics = dict()
metrics["time_s"] = lambda x: x["time"] / 10**3
metrics["GB/s"] = lambda x: ((N**3 * 2 * 4) + (N**2 * 4)) / x["time_s"] / 10**9
metrics["GFLOP/s"] = lambda x: (N**3 * 3) / x["time_s"] / 10**9

# compute reference solution from CPU
results = run_kernel(
"mm", kernel_string["mm"], 0, kernel_args["mm"], {"nthreads": 1}, compiler="nvc++", compiler_options=["-fast"]
)
answer = [None, None, results[2]]

tune_kernel(
"mm",
kernel_string["mm"],
0,
kernel_args["mm"],
tune_params,
metrics=metrics,
answer=answer,
compiler="nvc++",
compiler_options=["-fast", "-mp=gpu"],
)
2 changes: 1 addition & 1 deletion examples/directives/vector_add_c_openacc.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,6 @@
tune_params,
metrics=metrics,
answer=answer,
compiler_options=["-fast", "-acc=gpu"],
compiler="nvc++",
compiler_options=["-fast", "-acc=gpu"],
)
57 changes: 57 additions & 0 deletions examples/directives/vector_add_c_openmp.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
#!/usr/bin/env python
"""This is a simple example for tuning C++ OpenMP code with the kernel tuner"""

from kernel_tuner import tune_kernel
from kernel_tuner.utils.directives import Code, OpenMP, Cxx, process_directives

code = """
#include <stdlib.h>

#define VECTOR_SIZE 1000000

int main(void) {
int size = VECTOR_SIZE;
float * a = (float *) malloc(VECTOR_SIZE * sizeof(float));
float * b = (float *) malloc(VECTOR_SIZE * sizeof(float));
float * c = (float *) malloc(VECTOR_SIZE * sizeof(float));

#pragma tuner start vector_add a(float*:VECTOR_SIZE) b(float*:VECTOR_SIZE) c(float*:VECTOR_SIZE) size(int:VECTOR_SIZE)
#pragma omp target teams distribute parallel for num_teams(nteams) num_threads(nthreads)
for ( int i = 0; i < size; i++ ) {
c[i] = a[i] + b[i];
}
#pragma tuner stop

free(a);
free(b);
free(c);
}
"""

# Extract tunable directive
app = Code(OpenMP(), Cxx())
kernel_string, kernel_args = process_directives(app, code)

tune_params = dict()
tune_params["nteams"] = [2**i for i in range(1, 11)]
tune_params["nthreads"] = [32 * i for i in range(1, 33)]
metrics = dict()
metrics["GB/s"] = (
lambda x: ((2 * 4 * len(kernel_args["vector_add"][0])) + (4 * len(kernel_args["vector_add"][0])))
/ (x["time"] / 10**3)
/ 10**9
)

answer = [None, None, kernel_args["vector_add"][0] + kernel_args["vector_add"][1], None]

tune_kernel(
"vector_add",
kernel_string["vector_add"],
0,
kernel_args["vector_add"],
tune_params,
metrics=metrics,
answer=answer,
compiler="nvc++",
compiler_options=["-fast", "-mp=gpu"],
)
2 changes: 1 addition & 1 deletion examples/directives/vector_add_fortran_openacc.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,6 @@
tune_params,
metrics=metrics,
answer=answer,
compiler_options=["-fast", "-acc=gpu"],
compiler="nvfortran",
compiler_options=["-fast", "-acc=gpu"],
)
Loading