Skip to content

Commit d302b99

Browse files
authored
Merge pull request #582 from teojgo/doc/document_flex_alloc_tasks
[doc] Document the `--flex-alloc-tasks` feature
2 parents 3e66bbb + c64aff1 commit d302b99

File tree

5 files changed

+121
-12
lines changed

5 files changed

+121
-12
lines changed

docs/advanced.rst

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -397,3 +397,49 @@ Another way, which is quite useful if you want to generate lots of different tes
397397

398398
Combining parameterized tests and test class hierarchies can offer you a very flexible way for generating multiple related tests at once keeping at the same time the maintenance cost low.
399399
We use this technique extensively in our tests.
400+
401+
402+
Flexible Regression Tests
403+
-------------------------
404+
405+
.. versionadded:: 2.15
406+
407+
ReFrame can automatically set the number of tasks of a particular test, if its :attr:`num_tasks <reframe.core.pipeline.RegressionTest.num_tasks>` attribute is set to ``0``.
408+
In ReFrame's terminology, such tests are called `flexible`.
409+
By default, ReFrame will spawn such a test on all the idle nodes of the current system partition, but this behavior can be adjusted from the command-line.
410+
Flexible tests are very useful for diagnostics tests, e.g., tests for checking the health of a whole set nodes.
411+
In this example, we demonstrate this feature through a simple test that runs ``hostname``.
412+
The test will verify that all the nodes print the expected host name:
413+
414+
.. literalinclude:: ../tutorial/advanced/advanced_example9.py
415+
416+
The first thing to notice in this test is that :attr:`num_tasks <reframe.core.pipeline.RegressionTest.num_tasks>` is set to ``0``.
417+
This is a requirement for flexible tests:
418+
419+
.. literalinclude:: ../tutorial/advanced/advanced_example9.py
420+
:lines: 13
421+
:dedent: 8
422+
423+
The sanity function of this test simply counts the host names and verifies that they are as many as expected:
424+
425+
.. literalinclude:: ../tutorial/advanced/advanced_example9.py
426+
:lines: 15-18
427+
:dedent: 8
428+
429+
Notice, however, that the sanity check does not use :attr:`num_tasks` for verification, but rather a different, custom attribute, the ``num_tasks_assigned``.
430+
This happens for two reasons:
431+
432+
a. At the time the sanity check expression is created, :attr:`num_tasks` is ``0``.
433+
So the actual number of tasks assigned must be a deferred expression as well.
434+
b. When ReFrame will determine and set the number of tasks of the test, it will not set the :attr:`num_tasks` attribute of the :class:`RegressionTest`.
435+
It will only set the corresponding attribute of the associated job instance.
436+
437+
Here is how the new deferred attribute is defined:
438+
439+
.. literalinclude:: ../tutorial/advanced/advanced_example9.py
440+
:lines: 22-25
441+
:dedent: 4
442+
443+
444+
The behavior of the flexible task allocation is controlled by the ``--flex-alloc-tasks`` command line option.
445+
See the corresponding `section <running.html#controlling-the-flexible-task-allocation>`__ for more information.

docs/running.rst

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -460,11 +460,12 @@ They are summarized below:
460460
In this example, Slurm's policy is that later definitions of options override previous ones.
461461
So, in this case, way you would override the standard output for all the submitted jobs!
462462

463+
* ``--flex-alloc-tasks {all|idle|NUM}``: Automatically determine the number of tasks allocated for each test.
463464
* ``--force-local``: Force the local execution of the selected tests.
464465
No jobs will be submitted.
465466
* ``--skip-sanity-check``: Skip sanity checking phase.
466467
* ``--skip-performance-check``: Skip performance verification phase.
467-
* ``--strict``: Force strict performance checking. Some tests may set their :attr:`strict_check <reframe.core.pipeline.RegressionTest.strick_check>` attribute to :class:`False` (see `"Reference Guide" <reference.html>`__) in order to just let their performance recorded but not yield an error.
468+
* ``--strict``: Force strict performance checking. Some tests may set their :attr:`strict_check <reframe.core.pipeline.RegressionTest.strick_check>` attribute to :class:`False` (see `"Reference Guide" <running.html#controlling-the-execution-of-regression-tests>`__) in order to just let their performance recorded but not yield an error.
468469
This option overrides this behavior and forces all tests to be strict.
469470
* ``--skip-system-check``: Skips the system check and run the selected tests even if they do not support the current system.
470471
This option is sometimes useful when you need to quickly verify if a regression test supports a new system.
@@ -998,3 +999,34 @@ If you now try to run a test that loads the module `cudatoolkit`, the following
998999
* Failing phase: setup
9991000
* Reason: caught framework exception: module cyclic dependency: cudatoolkit->foo->bar->foobar->cudatoolkit
10001001
------------------------------------------------------------------------------
1002+
1003+
Controlling the Flexible Task Allocation
1004+
----------------------------------------
1005+
1006+
.. versionadded:: 2.15
1007+
1008+
ReFrame can automatically set the number of tasks of a particular test, if its :attr:`num_tasks <reframe.core.pipeline.RegressionTest.num_tasks>` attribute is set to ``0``.
1009+
By default, ReFrame will spawn such a test on all the idle nodes of the current system partition.
1010+
This behavior can be adjusted using the ``--flex-alloc-tasks`` command line option.
1011+
This option accepts three values:
1012+
1013+
1. ``idle``: (default) In this case, ReFrame will set the number of tasks to the number of idle nodes of the current logical partition multiplied by the :attr:`num_tasks_per_node <reframe.core.pipeline.RegressionTest.num_tasks_per_node>` attribute of the particular test.
1014+
2. ``all``: In this case, ReFrame will set the number of tasks to the number of all the nodes of the current logical partition multiplied by the :attr:`num_tasks_per_node <reframe.core.pipeline.RegressionTest.num_tasks_per_node>` attribute of the particular test.
1015+
1016+
3. Any positive integer: In this case, ReFrame will set the number of tasks to the given value.
1017+
1018+
The flexible allocation of number of tasks takes into account any additional logical constraint imposed by the command line options affecting the job allocation, such as ``--partition``, ``--reservation``, ``--nodelist``, ``--exclude-nodes`` and ``--job-option`` (if the scheduler option passed to the latter imposes a restriction).
1019+
Notice that ReFrame will issue an error if the resulting number of nodes is zero.
1020+
1021+
For example, using the following options would run a flexible test on all the nodes of reservation ``foo`` except the nodes ``n0[1-5]``:
1022+
1023+
.. code-block:: bash
1024+
1025+
--flex-alloc-tasks=all --reservation=foo --exclude-nodes=n0[1-5]
1026+
1027+
1028+
.. note::
1029+
Flexible task allocation is supported only for the Slurm scheduler backend.
1030+
1031+
.. warning::
1032+
Test cases resulting from flexible ReFrame tests may not be run using the asynchronous execution policy, because the nodes satisfying the required criteria will be allocated for the first test case, causing all subsequent ones to fail.

reframe/core/pipeline.py

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -255,22 +255,20 @@ class RegressionTest:
255255

256256
#: Number of tasks required by this test.
257257
#:
258-
#: If the number of tasks is set to ``0``, ReFrame will try to use all
259-
#: the available nodes of a reservation. A reservation *must* be specified
260-
#: through the `--reservation` command-line option, otherwise the
261-
#: regression test will fail during submission. ReFrame will try to run the
262-
#: test on all the nodes of the reservation that satisfy the selection
263-
#: criteria of the current
264-
#: `virtual partition <configure.html#partition-configuration>`__
265-
#: (i.e., constraints and/or partitions).
258+
#: If the number of tasks is set to ``0``, ReFrame will try to flexibly
259+
#: allocate the number of tasks, based on the command line option
260+
#: ``--flex-alloc-tasks``.
266261
#:
267262
#: :type: integral
268263
#: :default: ``1``
269264
#:
270265
#: .. note::
271-
#: .. versionchanged:: 2.9
272-
#: Added support for running the test using all the nodes of the
273-
#: specified reservation if the number of tasks is set to ``0``.
266+
#: .. versionchanged:: 2.15
267+
#: Added support for flexible allocation of the number of tasks
268+
#: according to the ``--flex-alloc-tasks`` command line option
269+
#: (see `Flexible task allocation
270+
#: <running.html#flexible-task-allocation>`__)
271+
#: if the number of tasks is set to ``0``.
274272
num_tasks = fields.TypedField('num_tasks', int)
275273

276274
#: Number of tasks per node required by this test.

reframe/core/schedulers/__init__.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,14 @@ def workdir(self):
141141

142142
@property
143143
def num_tasks(self):
144+
"""The number of tasks assigned to this job.
145+
146+
This attribute is useful in a flexible regression test for determining
147+
the actual number of tasks that ReFrame assigned to the test.
148+
149+
For more information on flexible task allocation, please refer to the
150+
`tutorial <advanced.html#flexible-regression-tests>`__.
151+
"""
144152
return self._num_tasks
145153

146154
@property
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
import reframe as rfm
2+
import reframe.utility.sanity as sn
3+
4+
5+
@rfm.simple_test
6+
class HostnameCheck(rfm.RunOnlyRegressionTest):
7+
def __init__(self):
8+
super().__init__()
9+
self.valid_systems = ['daint:gpu', 'daint:mc']
10+
self.valid_prog_environs = ['PrgEnv-cray']
11+
self.executable = 'hostname'
12+
self.sourcesdir = None
13+
self.num_tasks = 0
14+
self.num_tasks_per_node = 1
15+
self.sanity_patterns = sn.assert_eq(
16+
self.num_tasks_assigned,
17+
sn.count(sn.findall(r'nid\d+', self.stdout))
18+
)
19+
self.maintainers = ['you-can-type-your-email-here']
20+
self.tags = {'tutorial'}
21+
22+
@property
23+
@sn.sanity_function
24+
def num_tasks_assigned(self):
25+
return self.job.num_tasks

0 commit comments

Comments
 (0)