Skip to content

Commit 890938b

Browse files
authored
Merge pull request #134 from martius-lab/fkloss/basic_tutorials
docs: Add basic tutorials
2 parents 1bae454 + f1f0233 commit 890938b

File tree

7 files changed

+1883
-9
lines changed

7 files changed

+1883
-9
lines changed

CHANGELOG.md

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -122,17 +122,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
122122

123123
**Last version supporting Python 3.6.**
124124

125-
126-
## 2.4 - 2022-02-08
127-
128-
## [2.1] - 2020-06-20
129-
130-
## [2.0] - 2020-03-25
131-
132125
---
133126

134127
[Unreleased]: https://github.com/martius-lab/cluster_utils/compare/v3.0.0...HEAD
135128
[3.0.0]: https://github.com/martius-lab/cluster_utils/compare/v2.5...v3.0.0
136129
[2.5]: https://github.com/martius-lab/cluster_utils/compare/v2.1...v2.5
137-
[2.1]: https://github.com/martius-lab/cluster_utils/compare/v2.0...v2.1
138-
[2.0]: https://github.com/martius-lab/cluster_utils/releases/tag/v2.0

docs/configuration.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -530,6 +530,8 @@ settings (i.e. the ones independent of the optimisation method set in
530530
not set, the user will be asked at runtime in this case.
531531

532532

533+
.. _config.hp_optimization_iterations:
534+
533535
About Iterations
534536
~~~~~~~~~~~~~~~~
535537

docs/images/Rosenbrock-contour.svg

Lines changed: 1450 additions & 0 deletions
Loading

docs/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,8 @@ For more information see :doc:`usage` and the examples in the ``examples/basic/`
6464

6565
installation
6666
usage
67+
tutorials/grid_search.rst
68+
tutorials/hp_optimization.rst
6769

6870

6971
.. toctree::

docs/tutorials/grid_search.rst

Lines changed: 211 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
***************************
2+
Tutorial: Basic Grid Search
3+
***************************
4+
5+
In this tutorial you will learn
6+
7+
- how to write a simple script that can be executed by cluster-utils, and
8+
- how to configure cluster-utils to run a grid search over a few parameters on your
9+
script.
10+
11+
It does not cover all available options but instead shows the minimal steps needed to
12+
get started.
13+
14+
--------
15+
16+
17+
What is grid search?
18+
====================
19+
20+
For grid search, you specify a list of parameters and, for each of them, a list of
21+
values to check. cluster-utils will then execute your script with all possible
22+
combinations of parameter values and collect the resulting metrics (e.g. the reward
23+
achieved by a policy trained with the given parameters).
24+
In the end, you will get an overview of the results and a list of parameter values that
25+
performed best with respect to your metric.
26+
27+
In the example below, we use the Rosenbrock function::
28+
29+
f(x,y) = (1 - x)² + 100 · (y - x²)²
30+
31+
For each of the two parameters ``x`` and ``y``, we will check the values ``[0.0, 0.5,
32+
1.0, 1.5, 2.0]``. That is, a total of 25 jobs will be run with the following parameter
33+
values:
34+
35+
.. csv-table::
36+
:header-rows: 1
37+
38+
x,y
39+
0.0,0.0
40+
0.0,0.5
41+
0.0,1.0
42+
0.0,1.5
43+
0.0,2.0
44+
0.5,0.0
45+
0.5,0.5
46+
...,...
47+
48+
49+
Prepare your code
50+
=================
51+
52+
For the sake of this tutorial, we will use the two-dimensional Rosenbrock function.
53+
However, any other function could be used here without affecting the general setup to
54+
run with cluster_utils.
55+
56+
.. code-block:: python
57+
58+
def rosenbrock(x, y):
59+
return (1 - x) ** 2 + 100 * (y - x**2) ** 2
60+
61+
The function has a minimum value of zero at (x, y) = (1, 1):
62+
63+
.. figure:: ../images/Rosenbrock-contour.svg
64+
:alt: Plot of the Rosenbrock function.
65+
66+
Image by Nschloe - Own work, CC BY-SA 4.0, `link <https://commons.wikimedia.org/w/index.php?curid=114931732>`_
67+
68+
69+
To be able to run the grid search on this function, we need to write a little script,
70+
called ``rosenbrock.py`` in the following:
71+
72+
73+
.. code-block:: python
74+
75+
# rosenbrock.py
76+
from cluster_utils import cluster_main
77+
78+
def rosenbrock(x, y):
79+
return (1 - x) ** 2 + 100 * (y - x**2) ** 2
80+
81+
@cluster_main
82+
def main(**params):
83+
value = rosenbrock(params["x"], params["y"])
84+
85+
metrics = {"rosenbrock_value": value}
86+
return metrics
87+
88+
if __name__ == "__main__":
89+
main()
90+
91+
92+
This script will later be called by cluster_utils for each set of parameters in the grid
93+
search.
94+
95+
**cluster_utils expects your code to be committed to a Git repository.** This
96+
helps to keep track of the exact version of the code you ran the grid search on (the
97+
Git revision will be included in the report). Thus, create a git repository, commit the
98+
``rosenbrock.py`` script and push to the remote (cluster_utils will later pull from
99+
there).
100+
101+
102+
Write a cluster_utils configuration file
103+
========================================
104+
105+
Now we need to write a configuration file to tell cluster_utils how to run it, which
106+
parameters to do the grid search over, where to save results, etc.
107+
108+
This config file can be either JSON, YAML or TOML. In the following, we use TOML but
109+
the other formats would work just as well (JSON is discouraged, though, as it is rather
110+
annoying to write by hand and doesn't support comments).
111+
112+
113+
.. code-block:: toml
114+
115+
# Name and base of the output directory. With the given config, results will be
116+
# written to /tmp/rosenbrock_grid_search/.
117+
optimization_procedure_name = "rosenbrock_grid_search"
118+
results_dir = "/tmp"
119+
120+
# Automatically generate a PDF report when finished
121+
generate_report = "when_finished"
122+
123+
# Path to the job script. Note that this is relative to the repositories root
124+
# directory, not to this config file!
125+
script_relative_path = "rosenbrock.py"
126+
127+
# How often to run each configuration (useful if there is some randomness
128+
# in the result).
129+
restarts = 1
130+
131+
[git_params]
132+
# which repo/branch to check out
133+
url = "<url to your git repository>"
134+
branch = "main"
135+
136+
[cluster_requirements]
137+
request_cpus = 1
138+
139+
[environment_setup]
140+
# This section is required, even if no options are set here.
141+
142+
[fixed_params]
143+
# Likewise required but may be empty.
144+
145+
[[hyperparam_list]]
146+
param = "x"
147+
values = [0.0, 0.5, 1.0, 1.5, 2.0]
148+
149+
[[hyperparam_list]]
150+
param = "y"
151+
values = [0.0, 0.5, 1.0, 1.5, 2.0]
152+
153+
154+
In natural words, this config tells cluster_utils to do the following: Run grid search
155+
over the two parameters "x" and "y", checking the values "[0.0, 0.5, 1.0, 1.5, 2.0]"
156+
for each of them (entries in ``hyperparam_list``). Get the Python script
157+
"rosenbrock.py" (``script_relative_path``) from the specified git repository
158+
(``git_params``). For each combination of "(x, y)", execute the script once
159+
(``restarts``) on a single CPU core (``cluster_requirements``). When finished, generate
160+
a nice PDF report (``generate_report``) and store it, together with other output files,
161+
in "/tmp/rosenbrock_grid_search" (``optimization_procedure_name``, ``results_dir``).
162+
163+
164+
**Note:** You will need to adjust the settings in the ``[git_params]`` section to point
165+
to the repository that contains the ``rosenbrock.py``.
166+
167+
168+
Run the grid search
169+
===================
170+
171+
Now you can run the grid search locally:
172+
173+
.. code-block:: sh
174+
175+
python3 -m cluster_utils.grid_search path/to/config.toml
176+
177+
It will detect that it is not executed on a cluster and ask for confirmation to run
178+
locally. Simply press enter to confirm. It will then start executing jobs, and, when
179+
finished, create a report. The output should look something like this:
180+
181+
.. code-block:: text
182+
183+
Detailed logging available in /tmp/rosenbrock_grid_search/cluster_run.log
184+
Creating directory /tmp/rosenbrock_grid_search/working_directories
185+
Logs of individual jobs stored at /home/arada/.cache/cluster_utils/rosenbrock_grid_search-20241031-135040-jobs
186+
Using project direcory /home/arada/.cache/cluster_utils/rosenbrock_grid_search-20241031-135040-project
187+
No cluster detected. Do you want to run locally? [Y/n]:
188+
Completed: 92%|████████████████████████████████████████████████████▋ | 23/25
189+
Started execution: 92%|████████████████████████████████████ | 23/25, Failed=0
190+
Submitted: 100%|█████████████████████████████████████████████████████████████| 25/25
191+
192+
Killing remaining jobs...
193+
Results are stored in /tmp/rosenbrock_grid_search
194+
Procedure successfully finished
195+
Producing basic report...
196+
Report saved at /tmp/rosenbrock_grid_search/rosenbrock_grid_search_report.pdf
197+
198+
All results of the grid search are stored in ``/tmp/rosenbrock_grid_search``. Most
199+
relevant files are:
200+
201+
- rosenbrock_grid_search_report.pdf: The PDF report which includes a list of best
202+
parameters and several plots for further analysis.
203+
- all_data.csv: Results of all runs as CSV file.
204+
- cluster_run.log: Log of cluster_utils. Useful for debugging if something goes wrong.
205+
206+
207+
.. important::
208+
209+
Every time you run cluster_utils, it creates a temporary working copy of the
210+
specified git repository. This means, when you make changes to the code, you need to
211+
**commit and push** them before running cluster_utils again.

0 commit comments

Comments
 (0)