KernelTuner
diff --git a/‎CHANGELOG.md‎
Lines changed: 2 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎README.rst‎
Lines changed: 11 additions & 1 deletion b/‎README.rst‎
Lines changed: 11 additions & 1 deletion
diff --git a/‎doc/source/cache_files.rst‎
Lines changed: 2 additions & 2 deletions b/‎doc/source/cache_files.rst‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎doc/source/index.rst‎
Lines changed: 8 additions & 1 deletion b/‎doc/source/index.rst‎
Lines changed: 8 additions & 1 deletion
diff --git a/‎doc/source/quickstart.rst‎
Lines changed: 8 additions & 8 deletions b/‎doc/source/quickstart.rst‎
Lines changed: 8 additions & 8 deletions
diff --git a/‎doc/source/structs.rst‎
Lines changed: 3 additions & 3 deletions b/‎doc/source/structs.rst‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎examples/cuda/going_green_performance_model.py‎
Lines changed: 111 additions & 0 deletions b/‎examples/cuda/going_green_performance_model.py‎
Lines changed: 111 additions & 0 deletions
diff --git a/‎kernel_tuner/energy/__init__.py‎ b/‎kernel_tuner/energy/__init__.py‎
@@ -6,6 +6,8 @@ This project adheres to [Semantic Versioning](http://semver.org/).
 
 ### Added
 - Support for using time_limit in simulation mode
+- Helper functions for energy tuning
+- Example to show ridge frequency and power-frequency model
 
 ### Changed
 - Changed what timings are stored in cache files
 
@@ -172,7 +172,17 @@ If you use Kernel Tuner in research or research software, please cite the most r
       author={Schoonhoven, Richard and van Werkhoven, Ben and Batenburg, K Joost},
       journal={IEEE Transactions on Evolutionary Computation},
       year={2022},
-      publisher={IEEE}
+      publisher={IEEE},
+      url = {https://arxiv.org/abs/2210.01465}
+    }
+
+    @article{schoonhoven2022going,
+      author = {Schoonhoven, Richard and Veenboer, Bram, and van Werkhoven, Ben and Batenburg, K Joost},
+      title = {Going green: optimizing GPUs for energy efficiency through model-steered auto-tuning},
+      journal = {International Workshop on Performance Modeling, Benchmarking and Simulation
+         of High Performance Computer Systems (PMBS) at Supercomputing (SC22)},
+      year = {2022},
+      url = {https://arxiv.org/abs/2211.07260}
     }
 
 
 
@@ -4,14 +4,14 @@ Cache files
 ===========
 
 A very useful feature of Kernel Tuner is the ability to store benchmarking results in a cache file during tuning. You can enable cache files by 
-passing any filename to the ``cache=`` optional argument of ``tune_kernel()``.
+passing any filename to the ``cache=`` optional argument of ``tune_kernel``.
 
 The benchmark results of individual kernel configurations are appended to the cache file as Kernel Tuner is running. This also allows Kernel Tuner 
 to restart a ``tune_kernel()`` session from an existing cache file, should something have terminated the previous session before the run had 
 completed. This happens quite often in HPC environments when a job reservation runs out. 
 
 Cache files enable a number of other features, such as simulations and visualizations. Simulations are useful for benchmarking optimization 
-strategies. You can start a simulation by call tune_kernel with a cache file that contains the full search space and the ``simulation=True`` option.
+strategies. You can start a simulation by calling ``tune_kernel`` with a cache file that contains the full search space and the ``simulation=True`` option.
 
 Cache files can be used to create visualizations of the search space. This even works while Kernel Tuner is still running. As the new results are 
 coming, they are streamed to the visualization. Please see `Kernel Tuner Dashboard <https://github.com/KernelTuner/dashboard>`__.
@@ -98,4 +98,11 @@ If you use Kernel Tuner in research or research software, please cite the most r
       publisher={IEEE}
     }
 
-
+    @article{schoonhoven2022going,
+      author = {Schoonhoven, Richard and Veenboer, Bram, and van Werkhoven, Ben and Batenburg, K Joost},
+      title = {Going green: optimizing GPUs for energy efficiency through model-steered auto-tuning},
+      journal = {International Workshop on Performance Modeling, Benchmarking and Simulation
+         of High Performance Computer Systems (PMBS) at Supercomputing (SC22)},
+      year = {2022},
+      url = {https://arxiv.org/abs/2211.07260}
+    }
@@ -3,7 +3,7 @@ Getting Started
 
 So you have installed Kernel Tuner! That's great! But now you'd like to get started tuning some GPU code.
 
-Let's say we have a simple CUDA kernel stored in a file called vector_add_kernel.cu:
+Let's say we have a simple CUDA kernel stored in a file called ``vector_add_kernel.cu``:
 
 .. code-block:: cuda
 
@@ -16,7 +16,7 @@ Let's say we have a simple CUDA kernel stored in a file called vector_add_kernel
     }
 
 
-This kernel simply performs a point-wise addition of vectors a and b and stores the result in c.
+This kernel simply performs a point-wise addition of vectors ``a`` and ``b`` and stores the result in ``c``.
 
 To tune this kernel with Kernel Tuner, we are going to create the input and output data in Python using Numpy arrays.
 
@@ -34,30 +34,30 @@ To tune this kernel with Kernel Tuner, we are going to create the input and outp
 To tell Kernel Tuner how it should call the kernel, we can create a list in Python that should correspond to 
 our CUDA kernel's argument list with the same order and types.
 
-.. code-block::python
+.. code-block:: python
 
     args = [c, a, b, n]
 
 So far, we have created the data structures needed by Kernel Tuner to call our kernel, but we have not yet specified what we 
 want Kernel Tuner to tune in our kernel. For that, we create a dictionary that we call tune_params, in which keys correspond 
 to tunable parameters in our kernel and the values are lists of values that these parameters may take.
 
-.. code-block::python
+.. code-block:: python
 
     tune_params = dict()
     tune_params["block_size_x"] = [32, 64, 128, 256, 512, 1024]
 
-In the code above, we have inserted a key into our dictionary called "block_size_x". This is a special name for a tunable
+In the code above, we have inserted a key into our dictionary, namely ``"block_size_x"``. This is a special name for a tunable
 parameter that is recognized by Kernel Tuner to denote the size of our thread block in the x-dimension. 
 For a full list of special parameter names, please see the :ref:`parameter-vocabulary`.
 
-Alright, we are all set to start calling Kernel Tuner's main function, which is called tune_kernel. 
+Alright, we are all set to start calling Kernel Tuner's main function, which is called ``tune_kernel``. 
 
-.. code-block::python
+.. code-block:: python
 
     results, env = kernel_tuner.tune_kernel("vector_add", "vector_add_kernel.cu", size, args, tune_params)
 
-In the above, tune_kernel takes five arguments:
+In the above, ``tune_kernel`` takes five arguments:
 
  * The kernel name passed as a string
  * The filename of the kernel, also as a string
 
@@ -1,12 +1,12 @@
 Using structs
 -------------
 
-One of the issues with calling GPU kernels from Python is the use of custom data types in kernel arguments. In general, it is recommended for portability of your GPU code to be used from 
-many different host languages to keep the interface of your kernels as simple as possible. This means sticking to simple pointers of primitive types such as integer, float, and double. 
+One of the issues with calling GPU kernels from Python is the use of custom data types in kernel arguments. In general, it is recommended for portability of your GPU code, which may be
+used in any host program in any host programming language, to keep the interface of your kernels as simple as possible. This means sticking to simple pointers of primitive types such as integer, float, and double. 
 For performance reasons, it is also recommended to not use arrays of structs for kernel arguments, as this is very likely to lead to inefficient memory accesses on the GPU.
 
 However, there are situations, in particular in scientific applications, where the GPU code needs a lot of input parameters where it makes sense to collect these in a struct that 
-describes the simulation or experimental setup. For these use cases it is possible to use Python's built-in ``struct`` library, in particular the function ``struct.pack()``. For how to use 
+describes the simulation or experimental setup. For these use cases, it is possible to use Python's built-in ``struct`` library, in particular the function ``struct.pack()``. For how to use 
 ``struct.pack``, please consult the `Python documentation <https://docs.python.org/3/library/struct.html>`__. In the code below we show part of Python script that uses ``struct.pack``, 
 Numpy, and Kernel Tuner to call a CUDA kernel that uses a struct as kernel argument.
 
 
@@ -0,0 +1,111 @@
+#!/usr/bin/env python
+"""
+This example demonstrates how to use the power-frequency model presented in
+
+  * Going green: optimizing GPUs for energy efficiency through model-steered auto-tuning
+    R. Schoonhoven, B. Veenboer, B. van Werkhoven, K. J. Batenburg
+    International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) at Supercomputing (SC22) 2022
+
+to reduce the number of frequencies for GPU energy tuning.
+
+In particular, this example creates a plot with the modeled power consumption vs
+frequency curve, highlighting the ridge frequency and the frequency range
+selected by the user.
+
+This example requires CUDA and NVML as well as PyCuda and a CUDA-capable
+GPU with the ability (and permissions) to set applications clocks. GPUs
+that do support locked clocks but not application clocks may use the
+locked_clocks=True option.
+
+"""
+import argparse
+from collections import OrderedDict
+import numpy as np
+import math
+import matplotlib.pyplot as plt
+from scipy import optimize
+import time
+
+try:
+    from pycuda import driver as drv
+except ImportError as e:
+    drv = None
+    raise e
+
+from kernel_tuner.energy import energy
+from kernel_tuner.nvml import get_nvml_gr_clocks
+
+def get_default_parser():
+    parser = argparse.ArgumentParser(
+        description='Find energy efficient frequencies')
+    parser.add_argument("-d", dest="device", nargs="?",
+                        default=0, help="GPU ID to use")
+    parser.add_argument("-s", dest="samples", nargs="?",
+                        default=10, help="Number of frequency samples")
+    parser.add_argument("-r", dest="range", nargs="?",
+                        default=10, help="Frequency spread (10%% of 'optimum')")
+    parser.add_argument("-n", dest="number", nargs="?", default=10,
+                        help="Maximum number of suggested frequencies")
+    parser.add_argument("-l", dest="locked_clocks", nargs="?", default=False,
+                        help="Whether to use locked clocks over application clocks")
+    parser.add_argument("-nsf", dest="nvidia_smi_fallback", nargs="?", default=None,
+                        help="Path to nvidia-smi as fallback when missing NVML permissions")
+
+
+    return parser
+
+
+if __name__ == "__main__":
+    parser = get_default_parser()
+    args = parser.parse_args()
+
+    ridge_frequency, freqs, nvml_power, fitted_params, scaling = energy.create_power_frequency_model(device=args.device,
+                                                                                               n_samples=args.samples,
+                                                                                               verbose=True,
+                                                                                               nvidia_smi_fallback=args.nvidia_smi_fallback,
+                                                                                               use_locked_clocks=args.locked_clocks)
+
+    all_frequencies = np.array(get_nvml_gr_clocks(args.device, quiet=True)['nvml_gr_clock'])
+
+    frequency_selection = energy.get_frequency_range_around_ridge(ridge_frequency, all_frequencies, args.range, args.number, verbose=True)
+    print(f"Search space reduction: {np.round(100 - len(frequency_selection) / len(all_frequencies) * 100, 1)} %")
+
+    xs = np.linspace(all_frequencies[0], all_frequencies[-1], 100)
+    # scale to start at 0
+    xs -= scaling[0]
+    modelled_power = energy.estimated_power(xs, *fitted_params)
+    # undo scaling
+    xs += scaling[0]
+    modelled_power *= scaling[1]
+
+    # Add point for ridge frequency
+    P_ridge = energy.estimated_power([ridge_frequency - scaling[0]], *fitted_params) * scaling[1]
+
+    # Add the frequency range
+    min_freq = 1e-2 * (100 - int(args.range)) * ridge_frequency
+    max_freq = 1e-2 * (100 + int(args.range)) * ridge_frequency
+
+    # plot measurements with model
+    try:
+        import seaborn as sns
+        sns.set_theme(style="darkgrid")
+        sns.set_context("paper", rc={"font.size":10,
+                        "axes.titlesize":9, "axes.labelsize":12})
+        fig, ax = plt.subplots()
+    except ImportError:
+        fig, ax = plt.subplots()
+        plt.grid()
+
+    plt.scatter(x=freqs, y=nvml_power, label='NVML measurements')
+    plt.scatter(x=ridge_frequency, y=P_ridge, color='g',
+                label='Ridge frequency (MHz)')
+    plt.plot(xs, modelled_power, label='Modelled power consumption')
+    ax.axvspan(min_freq, max_freq, alpha=0.15, color='green',
+               label='Recommended frequency range')
+    plt.title('GPU modelled power consumption', size=18)
+    plt.xlabel('Core frequency (MHz)')
+    plt.ylabel('Power consumption (W)')
+    plt.legend()
+    plt.show()
+
+    plt.savefig("GPU_power_consumption_model.pdf")