Skip to content

Commit 1148813

Browse files
committed
Merge branch 'master' into refactor_interface
2 parents 582489b + 92553cc commit 1148813

File tree

16 files changed

+587
-27
lines changed

16 files changed

+587
-27
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ This project adheres to [Semantic Versioning](http://semver.org/).
66

77
### Added
88
- Support for using time_limit in simulation mode
9+
- Helper functions for energy tuning
10+
- Example to show ridge frequency and power-frequency model
911

1012
### Changed
1113
- Changed what timings are stored in cache files

README.rst

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -172,7 +172,17 @@ If you use Kernel Tuner in research or research software, please cite the most r
172172
author={Schoonhoven, Richard and van Werkhoven, Ben and Batenburg, K Joost},
173173
journal={IEEE Transactions on Evolutionary Computation},
174174
year={2022},
175-
publisher={IEEE}
175+
publisher={IEEE},
176+
url = {https://arxiv.org/abs/2210.01465}
177+
}
178+
179+
@article{schoonhoven2022going,
180+
author = {Schoonhoven, Richard and Veenboer, Bram, and van Werkhoven, Ben and Batenburg, K Joost},
181+
title = {Going green: optimizing GPUs for energy efficiency through model-steered auto-tuning},
182+
journal = {International Workshop on Performance Modeling, Benchmarking and Simulation
183+
of High Performance Computer Systems (PMBS) at Supercomputing (SC22)},
184+
year = {2022},
185+
url = {https://arxiv.org/abs/2211.07260}
176186
}
177187

178188

doc/source/cache_files.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,14 @@ Cache files
44
===========
55

66
A very useful feature of Kernel Tuner is the ability to store benchmarking results in a cache file during tuning. You can enable cache files by
7-
passing any filename to the ``cache=`` optional argument of ``tune_kernel()``.
7+
passing any filename to the ``cache=`` optional argument of ``tune_kernel``.
88

99
The benchmark results of individual kernel configurations are appended to the cache file as Kernel Tuner is running. This also allows Kernel Tuner
1010
to restart a ``tune_kernel()`` session from an existing cache file, should something have terminated the previous session before the run had
1111
completed. This happens quite often in HPC environments when a job reservation runs out.
1212

1313
Cache files enable a number of other features, such as simulations and visualizations. Simulations are useful for benchmarking optimization
14-
strategies. You can start a simulation by call tune_kernel with a cache file that contains the full search space and the ``simulation=True`` option.
14+
strategies. You can start a simulation by calling ``tune_kernel`` with a cache file that contains the full search space and the ``simulation=True`` option.
1515

1616
Cache files can be used to create visualizations of the search space. This even works while Kernel Tuner is still running. As the new results are
1717
coming, they are streamed to the visualization. Please see `Kernel Tuner Dashboard <https://github.com/KernelTuner/dashboard>`__.

doc/source/index.rst

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,4 +98,11 @@ If you use Kernel Tuner in research or research software, please cite the most r
9898
publisher={IEEE}
9999
}
100100

101-
101+
@article{schoonhoven2022going,
102+
author = {Schoonhoven, Richard and Veenboer, Bram, and van Werkhoven, Ben and Batenburg, K Joost},
103+
title = {Going green: optimizing GPUs for energy efficiency through model-steered auto-tuning},
104+
journal = {International Workshop on Performance Modeling, Benchmarking and Simulation
105+
of High Performance Computer Systems (PMBS) at Supercomputing (SC22)},
106+
year = {2022},
107+
url = {https://arxiv.org/abs/2211.07260}
108+
}

doc/source/quickstart.rst

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ Getting Started
33

44
So you have installed Kernel Tuner! That's great! But now you'd like to get started tuning some GPU code.
55

6-
Let's say we have a simple CUDA kernel stored in a file called vector_add_kernel.cu:
6+
Let's say we have a simple CUDA kernel stored in a file called ``vector_add_kernel.cu``:
77

88
.. code-block:: cuda
99
@@ -16,7 +16,7 @@ Let's say we have a simple CUDA kernel stored in a file called vector_add_kernel
1616
}
1717
1818
19-
This kernel simply performs a point-wise addition of vectors a and b and stores the result in c.
19+
This kernel simply performs a point-wise addition of vectors ``a`` and ``b`` and stores the result in ``c``.
2020

2121
To tune this kernel with Kernel Tuner, we are going to create the input and output data in Python using Numpy arrays.
2222

@@ -34,30 +34,30 @@ To tune this kernel with Kernel Tuner, we are going to create the input and outp
3434
To tell Kernel Tuner how it should call the kernel, we can create a list in Python that should correspond to
3535
our CUDA kernel's argument list with the same order and types.
3636

37-
.. code-block::python
37+
.. code-block:: python
3838
3939
args = [c, a, b, n]
4040
4141
So far, we have created the data structures needed by Kernel Tuner to call our kernel, but we have not yet specified what we
4242
want Kernel Tuner to tune in our kernel. For that, we create a dictionary that we call tune_params, in which keys correspond
4343
to tunable parameters in our kernel and the values are lists of values that these parameters may take.
4444

45-
.. code-block::python
45+
.. code-block:: python
4646
4747
tune_params = dict()
4848
tune_params["block_size_x"] = [32, 64, 128, 256, 512, 1024]
4949
50-
In the code above, we have inserted a key into our dictionary called "block_size_x". This is a special name for a tunable
50+
In the code above, we have inserted a key into our dictionary, namely ``"block_size_x"``. This is a special name for a tunable
5151
parameter that is recognized by Kernel Tuner to denote the size of our thread block in the x-dimension.
5252
For a full list of special parameter names, please see the :ref:`parameter-vocabulary`.
5353

54-
Alright, we are all set to start calling Kernel Tuner's main function, which is called tune_kernel.
54+
Alright, we are all set to start calling Kernel Tuner's main function, which is called ``tune_kernel``.
5555

56-
.. code-block::python
56+
.. code-block:: python
5757
5858
results, env = kernel_tuner.tune_kernel("vector_add", "vector_add_kernel.cu", size, args, tune_params)
5959
60-
In the above, tune_kernel takes five arguments:
60+
In the above, ``tune_kernel`` takes five arguments:
6161

6262
* The kernel name passed as a string
6363
* The filename of the kernel, also as a string

doc/source/structs.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
Using structs
22
-------------
33

4-
One of the issues with calling GPU kernels from Python is the use of custom data types in kernel arguments. In general, it is recommended for portability of your GPU code to be used from
5-
many different host languages to keep the interface of your kernels as simple as possible. This means sticking to simple pointers of primitive types such as integer, float, and double.
4+
One of the issues with calling GPU kernels from Python is the use of custom data types in kernel arguments. In general, it is recommended for portability of your GPU code, which may be
5+
used in any host program in any host programming language, to keep the interface of your kernels as simple as possible. This means sticking to simple pointers of primitive types such as integer, float, and double.
66
For performance reasons, it is also recommended to not use arrays of structs for kernel arguments, as this is very likely to lead to inefficient memory accesses on the GPU.
77

88
However, there are situations, in particular in scientific applications, where the GPU code needs a lot of input parameters where it makes sense to collect these in a struct that
9-
describes the simulation or experimental setup. For these use cases it is possible to use Python's built-in ``struct`` library, in particular the function ``struct.pack()``. For how to use
9+
describes the simulation or experimental setup. For these use cases, it is possible to use Python's built-in ``struct`` library, in particular the function ``struct.pack()``. For how to use
1010
``struct.pack``, please consult the `Python documentation <https://docs.python.org/3/library/struct.html>`__. In the code below we show part of Python script that uses ``struct.pack``,
1111
Numpy, and Kernel Tuner to call a CUDA kernel that uses a struct as kernel argument.
1212

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
#!/usr/bin/env python
2+
"""
3+
This example demonstrates how to use the power-frequency model presented in
4+
5+
* Going green: optimizing GPUs for energy efficiency through model-steered auto-tuning
6+
R. Schoonhoven, B. Veenboer, B. van Werkhoven, K. J. Batenburg
7+
International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) at Supercomputing (SC22) 2022
8+
9+
to reduce the number of frequencies for GPU energy tuning.
10+
11+
In particular, this example creates a plot with the modeled power consumption vs
12+
frequency curve, highlighting the ridge frequency and the frequency range
13+
selected by the user.
14+
15+
This example requires CUDA and NVML as well as PyCuda and a CUDA-capable
16+
GPU with the ability (and permissions) to set applications clocks. GPUs
17+
that do support locked clocks but not application clocks may use the
18+
locked_clocks=True option.
19+
20+
"""
21+
import argparse
22+
from collections import OrderedDict
23+
import numpy as np
24+
import math
25+
import matplotlib.pyplot as plt
26+
from scipy import optimize
27+
import time
28+
29+
try:
30+
from pycuda import driver as drv
31+
except ImportError as e:
32+
drv = None
33+
raise e
34+
35+
from kernel_tuner.energy import energy
36+
from kernel_tuner.nvml import get_nvml_gr_clocks
37+
38+
def get_default_parser():
39+
parser = argparse.ArgumentParser(
40+
description='Find energy efficient frequencies')
41+
parser.add_argument("-d", dest="device", nargs="?",
42+
default=0, help="GPU ID to use")
43+
parser.add_argument("-s", dest="samples", nargs="?",
44+
default=10, help="Number of frequency samples")
45+
parser.add_argument("-r", dest="range", nargs="?",
46+
default=10, help="Frequency spread (10%% of 'optimum')")
47+
parser.add_argument("-n", dest="number", nargs="?", default=10,
48+
help="Maximum number of suggested frequencies")
49+
parser.add_argument("-l", dest="locked_clocks", nargs="?", default=False,
50+
help="Whether to use locked clocks over application clocks")
51+
parser.add_argument("-nsf", dest="nvidia_smi_fallback", nargs="?", default=None,
52+
help="Path to nvidia-smi as fallback when missing NVML permissions")
53+
54+
55+
return parser
56+
57+
58+
if __name__ == "__main__":
59+
parser = get_default_parser()
60+
args = parser.parse_args()
61+
62+
ridge_frequency, freqs, nvml_power, fitted_params, scaling = energy.create_power_frequency_model(device=args.device,
63+
n_samples=args.samples,
64+
verbose=True,
65+
nvidia_smi_fallback=args.nvidia_smi_fallback,
66+
use_locked_clocks=args.locked_clocks)
67+
68+
all_frequencies = np.array(get_nvml_gr_clocks(args.device, quiet=True)['nvml_gr_clock'])
69+
70+
frequency_selection = energy.get_frequency_range_around_ridge(ridge_frequency, all_frequencies, args.range, args.number, verbose=True)
71+
print(f"Search space reduction: {np.round(100 - len(frequency_selection) / len(all_frequencies) * 100, 1)} %")
72+
73+
xs = np.linspace(all_frequencies[0], all_frequencies[-1], 100)
74+
# scale to start at 0
75+
xs -= scaling[0]
76+
modelled_power = energy.estimated_power(xs, *fitted_params)
77+
# undo scaling
78+
xs += scaling[0]
79+
modelled_power *= scaling[1]
80+
81+
# Add point for ridge frequency
82+
P_ridge = energy.estimated_power([ridge_frequency - scaling[0]], *fitted_params) * scaling[1]
83+
84+
# Add the frequency range
85+
min_freq = 1e-2 * (100 - int(args.range)) * ridge_frequency
86+
max_freq = 1e-2 * (100 + int(args.range)) * ridge_frequency
87+
88+
# plot measurements with model
89+
try:
90+
import seaborn as sns
91+
sns.set_theme(style="darkgrid")
92+
sns.set_context("paper", rc={"font.size":10,
93+
"axes.titlesize":9, "axes.labelsize":12})
94+
fig, ax = plt.subplots()
95+
except ImportError:
96+
fig, ax = plt.subplots()
97+
plt.grid()
98+
99+
plt.scatter(x=freqs, y=nvml_power, label='NVML measurements')
100+
plt.scatter(x=ridge_frequency, y=P_ridge, color='g',
101+
label='Ridge frequency (MHz)')
102+
plt.plot(xs, modelled_power, label='Modelled power consumption')
103+
ax.axvspan(min_freq, max_freq, alpha=0.15, color='green',
104+
label='Recommended frequency range')
105+
plt.title('GPU modelled power consumption', size=18)
106+
plt.xlabel('Core frequency (MHz)')
107+
plt.ylabel('Power consumption (W)')
108+
plt.legend()
109+
plt.show()
110+
111+
plt.savefig("GPU_power_consumption_model.pdf")

kernel_tuner/energy/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)