Skip to content

Commit be8cffd

Browse files
committed
added a tiny bit of documentation
1 parent 7f028be commit be8cffd

File tree

2 files changed

+21
-0
lines changed

2 files changed

+21
-0
lines changed

examples/advanced_example_config.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,8 @@ seml:
5959
output_dir: logs
6060
project_root_dir: .
6161
description: "An advanced example configuration. We can also use variable interpolation here: ${config.model.model_type}"
62+
reschedule_timeout: 300 # The time (in seconds) that are left on the job before SEML will try to reschedule unfinished experiments.
63+
# Note that you have to implement a `reschedule_hook` to use this feature.
6264

6365
slurm:
6466
- experiments_per_job: 1

examples/advanced_example_experiment.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,9 @@ def train(self, patience, num_epochs):
147147
# everything is set up
148148
for e in range(num_epochs):
149149
# simulate training
150+
151+
# calling reschedule hook
152+
reschedule_hook(model_weights={}, step=e)
150153
continue
151154
results = {
152155
"test_acc": 0.5 + 0.3 * np.random.randn(),
@@ -165,6 +168,22 @@ def get_experiment(init_all=False):
165168
return experiment
166169

167170

171+
# This function will be called when the reschedule is triggered.
172+
# It should save the current state of the experiment and return a
173+
# dictionary that may be used to update the configuration upon rescheduling.
174+
# You are responsible for implementing the actual saving/loading of the experiment state
175+
# due to the updated config.
176+
@ex.reschedule_hook
177+
def reschedule_hook(model_weights, step, **kwargs):
178+
# Here you would save the current state of the experiment
179+
# and return any necessary configuration updates.
180+
181+
# !!! You will need to call this function regularly from within your training loop
182+
# to check if rescheduling is needed.
183+
# Pass everything you need to store your state to this function.
184+
return {"checkpoint_path": "path/to/saved/checkpoint"}
185+
186+
168187
# This function will be called by default. Note that we could in principle manually pass an experiment instance,
169188
# e.g., obtained by loading a model from the database or by calling this from a Jupyter notebook.
170189
@ex.automain

0 commit comments

Comments
 (0)