Skip to content

Commit 5971855

Browse files
author
Vasileios Karakasis
authored
Merge branch 'master' into feat/slurm-N-option
2 parents 5af4781 + 2195f68 commit 5971855

File tree

12 files changed

+233
-115
lines changed

12 files changed

+233
-115
lines changed

docs/dependencies.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -170,3 +170,14 @@ In fact, you can rewrite :func:`set_executable` function as follows:
170170
Now it's easier to understand what the ``@require_deps`` decorator does behind the scenes.
171171
It binds the function arguments to a partial realization of the :func:`getdep` function and attaches the decorated function as an after-setup hook.
172172
In fact, any ``@require_deps``-decorated function will be invoked before any other after-setup hook.
173+
174+
175+
.. _cleaning-up-stage-files:
176+
177+
Cleaning up stage files
178+
-----------------------
179+
180+
In principle, the output of a test might be needed by its dependent tests.
181+
As a result, the stage directory of the test will only be cleaned up after all of its *immediate* dependent tests have finished successfully.
182+
If any of its children has failed, the cleanup phase will be skipped, such that all the test's files will remain in the stage directory.
183+
This allows users to reproduce manually the error of a failed test with dependencies, since all the needed resources of the failing test are left in their original location.

docs/manpage.rst

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -366,15 +366,6 @@ If no node can be selected, the test will be marked as a failure with an appropr
366366
This is the default policy.
367367
- Any positive integer: Flexible tests will be assigned as many tasks as needed in order to span over the specified number of nodes from the node pool.
368368

369-
.. option:: --flex-alloc-tasks[=POLICY]
370-
371-
.. deprecated:: 2.21
372-
373-
Please use |--flex-alloc-nodes|_ instead.
374-
375-
.. |--flex-alloc-nodes| replace:: :attr:`--flex-alloc-nodes`
376-
.. _--flex-alloc-nodes: #cmdoption-flex-alloc-nodes
377-
378369
---------------------------------------
379370
Options controlling ReFrame environment
380371
---------------------------------------

docs/pipeline.rst

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,9 @@ The Cleanup Phase
9191
During this final stage of the pipeline, the test's resources are cleaned up.
9292
More specifically, if the test has finished successfully, all interesting test files (build/job scripts, build/job script output and any user-specified files) are copied to ReFrame's output directory and the stage directory of the test is deleted.
9393

94+
.. note::
95+
This phase might be deferred in case a test has dependents (see :ref:`cleaning-up-stage-files` for more details).
96+
9497

9598
Execution Policies
9699
------------------
@@ -129,3 +132,19 @@ ReFrame tries to keep concurrency high by maintaining as many test cases as poss
129132
When the `concurrency limit <config_reference.html#.systems[].partitions[].max_jobs>`__ is reached, ReFrame will first try to free up execution slots by checking if any of the spawned jobs have finished, and it will fill that slots first before throttling execution.
130133

131134
ReFrame uses polling to check the status of the spawned jobs, but it does so in a dynamic way, in order to ensure both responsiveness and avoid overloading the system job scheduler with excessive polling.
135+
136+
Timing the Test Pipeline
137+
------------------------
138+
139+
.. versionadded:: 3.0
140+
141+
ReFrame keeps track of the time a test spends in every pipeline stage and reports that after each test finishes.
142+
However, it does so from its own perspective and not from that of the scheduler backend used.
143+
This has some practical implications:
144+
As soon as a test enters the "run" phase, ReFrame's timer for that phase starts ticking regardless if the associated job is pending.
145+
Similarly, the "run" phase ends as soon as ReFrame realizes it.
146+
This will happen after the associated job has finished.
147+
For this reason, the time spent in the pipeline's "run" phase should *not* be interpreted as the actual runtime of the test, especially if a non-local scheduler backend is used.
148+
149+
Finally, the execution time of the "cleanup" phase is not reported when a test finishes, since it may be deferred in case that there exist tests that depend on that one.
150+
See :doc:`dependencies` for more information on how ReFrame treats tests with dependencies.

docs/tutorial_basic.rst

Lines changed: 45 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -177,50 +177,51 @@ If everything is configured correctly for your system, you should get an output
177177

178178
.. code-block:: none
179179
180-
[ReFrame Setup]
181-
version: 3.0-dev6 (rev: 89d50861)
182-
command: './bin/reframe -C tutorial/config/settings.py -c tutorial/example1.py -r'
183-
launched by: user@daint101
184-
working directory: '/path/to/reframe'
185-
check search path: (R) '/path/to/reframe/tutorial/example1.py'
186-
stage directory: '/path/to/reframe/stage'
187-
output directory: '/path/to/reframe/output'
188-
189-
[==========] Running 1 check(s)
190-
[==========] Started on Sat May 9 22:10:16 2020
191-
192-
[----------] started processing Example1Test (Simple matrix-vector multiplication example)
193-
[ RUN ] Example1Test on daint:login using PrgEnv-cray
194-
[ RUN ] Example1Test on daint:login using PrgEnv-gnu
195-
[ RUN ] Example1Test on daint:login using PrgEnv-intel
196-
[ RUN ] Example1Test on daint:login using PrgEnv-pgi
197-
[ RUN ] Example1Test on daint:gpu using PrgEnv-cray
198-
[ RUN ] Example1Test on daint:gpu using PrgEnv-gnu
199-
[ RUN ] Example1Test on daint:gpu using PrgEnv-intel
200-
[ RUN ] Example1Test on daint:gpu using PrgEnv-pgi
201-
[ RUN ] Example1Test on daint:mc using PrgEnv-cray
202-
[ RUN ] Example1Test on daint:mc using PrgEnv-gnu
203-
[ RUN ] Example1Test on daint:mc using PrgEnv-intel
204-
[ RUN ] Example1Test on daint:mc using PrgEnv-pgi
205-
[----------] finished processing Example1Test (Simple matrix-vector multiplication example)
206-
207-
[----------] waiting for spawned checks to finish
208-
[ OK ] ( 1/12) Example1Test on daint:mc using PrgEnv-cray
209-
[ OK ] ( 2/12) Example1Test on daint:gpu using PrgEnv-intel
210-
[ OK ] ( 3/12) Example1Test on daint:gpu using PrgEnv-cray
211-
[ OK ] ( 4/12) Example1Test on daint:login using PrgEnv-intel
212-
[ OK ] ( 5/12) Example1Test on daint:login using PrgEnv-cray
213-
[ OK ] ( 6/12) Example1Test on daint:mc using PrgEnv-gnu
214-
[ OK ] ( 7/12) Example1Test on daint:gpu using PrgEnv-gnu
215-
[ OK ] ( 8/12) Example1Test on daint:login using PrgEnv-gnu
216-
[ OK ] ( 9/12) Example1Test on daint:login using PrgEnv-pgi
217-
[ OK ] (10/12) Example1Test on daint:gpu using PrgEnv-pgi
218-
[ OK ] (11/12) Example1Test on daint:mc using PrgEnv-intel
219-
[ OK ] (12/12) Example1Test on daint:mc using PrgEnv-pgi
220-
[----------] all spawned checks have finished
221-
222-
[ PASSED ] Ran 12 test case(s) from 1 check(s) (0 failure(s))
223-
[==========] Finished on Sat May 9 22:10:46 2020
180+
[ReFrame Setup]
181+
version: 3.0-dev7 (rev: 85ca676f)
182+
command: './bin/reframe -C tutorial/config/settings.py -c tutorial/example1.py -r'
183+
launched by: user@daint104
184+
working directory: '/path/to/reframe'
185+
settings file: 'tutorial/config/settings.py'
186+
check search path: (R) '/path/to/reframe/tutorial/example1.py'
187+
stage directory: '/path/to/reframe/stage'
188+
output directory: '/path/to/reframe/output'
189+
190+
[==========] Running 1 check(s)
191+
[==========] Started on Wed Jun 3 08:50:46 2020
192+
193+
[----------] started processing Example1Test (Simple matrix-vector multiplication example)
194+
[ RUN ] Example1Test on daint:login using PrgEnv-cray
195+
[ RUN ] Example1Test on daint:login using PrgEnv-gnu
196+
[ RUN ] Example1Test on daint:login using PrgEnv-intel
197+
[ RUN ] Example1Test on daint:login using PrgEnv-pgi
198+
[ RUN ] Example1Test on daint:gpu using PrgEnv-cray
199+
[ RUN ] Example1Test on daint:gpu using PrgEnv-gnu
200+
[ RUN ] Example1Test on daint:gpu using PrgEnv-intel
201+
[ RUN ] Example1Test on daint:gpu using PrgEnv-pgi
202+
[ RUN ] Example1Test on daint:mc using PrgEnv-cray
203+
[ RUN ] Example1Test on daint:mc using PrgEnv-gnu
204+
[ RUN ] Example1Test on daint:mc using PrgEnv-intel
205+
[ RUN ] Example1Test on daint:mc using PrgEnv-pgi
206+
[----------] finished processing Example1Test (Simple matrix-vector multiplication example)
207+
208+
[----------] waiting for spawned checks to finish
209+
[ OK ] ( 1/12) Example1Test on daint:login using PrgEnv-intel [compile: 1.940s run: 20.747s total: 23.778s]
210+
[ OK ] ( 2/12) Example1Test on daint:login using PrgEnv-cray [compile: 0.347s run: 25.096s total: 26.591s]
211+
[ OK ] ( 3/12) Example1Test on daint:mc using PrgEnv-intel [compile: 2.019s run: 6.286s total: 8.357s]
212+
[ OK ] ( 4/12) Example1Test on daint:mc using PrgEnv-cray [compile: 0.506s run: 11.056s total: 11.744s]
213+
[ OK ] ( 5/12) Example1Test on daint:gpu using PrgEnv-cray [compile: 0.435s run: 19.499s total: 20.483s]
214+
[ OK ] ( 6/12) Example1Test on daint:login using PrgEnv-gnu [compile: 1.648s run: 25.631s total: 27.964s]
215+
[ OK ] ( 7/12) Example1Test on daint:mc using PrgEnv-gnu [compile: 1.825s run: 10.434s total: 12.301s]
216+
[ OK ] ( 8/12) Example1Test on daint:gpu using PrgEnv-intel [compile: 2.018s run: 16.316s total: 18.529s]
217+
[ OK ] ( 9/12) Example1Test on daint:login using PrgEnv-pgi [compile: 1.643s run: 22.118s total: 24.090s]
218+
[ OK ] (10/12) Example1Test on daint:gpu using PrgEnv-gnu [compile: 1.729s run: 20.028s total: 21.901s]
219+
[ OK ] (11/12) Example1Test on daint:gpu using PrgEnv-pgi [compile: 1.753s run: 15.128s total: 16.923s]
220+
[ OK ] (12/12) Example1Test on daint:mc using PrgEnv-pgi [compile: 1.732s run: 35.556s total: 37.330s]
221+
[----------] all spawned checks have finished
222+
223+
[ PASSED ] Ran 12 test case(s) from 1 check(s) (0 failure(s))
224+
[==========] Finished on Wed Jun 3 08:51:46 2020
224225
225226
226227
Notice how our regression test is run on every partition of the configured system and for every programming environment.

docs/tutorial_deps.rst

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ Here is the output when running the OSU tests with the asynchronous execution po
7777
.. code-block:: none
7878
7979
[==========] Running 7 check(s)
80-
[==========] Started on Wed Mar 25 13:51:06 2020
80+
[==========] Started on Wed Jun 3 09:00:40 2020
8181
8282
[----------] started processing OSUBuildTest (OSU benchmarks build test)
8383
[ RUN ] OSUBuildTest on daint:gpu using PrgEnv-gnu
@@ -140,31 +140,31 @@ Here is the output when running the OSU tests with the asynchronous execution po
140140
[----------] finished processing OSUAllreduceTest_16 (OSU Allreduce test)
141141
142142
[----------] waiting for spawned checks to finish
143-
[ OK ] ( 1/21) OSUBuildTest on daint:gpu using PrgEnv-pgi
144-
[ OK ] ( 2/21) OSUBuildTest on daint:gpu using PrgEnv-gnu
145-
[ OK ] ( 3/21) OSUBuildTest on daint:gpu using PrgEnv-intel
146-
[ OK ] ( 4/21) OSUAllreduceTest_2 on daint:gpu using PrgEnv-pgi
147-
[ OK ] ( 5/21) OSUAllreduceTest_4 on daint:gpu using PrgEnv-pgi
148-
[ OK ] ( 6/21) OSUAllreduceTest_8 on daint:gpu using PrgEnv-pgi
149-
[ OK ] ( 7/21) OSUAllreduceTest_16 on daint:gpu using PrgEnv-pgi
150-
[ OK ] ( 8/21) OSUAllreduceTest_4 on daint:gpu using PrgEnv-gnu
151-
[ OK ] ( 9/21) OSUAllreduceTest_16 on daint:gpu using PrgEnv-gnu
152-
[ OK ] (10/21) OSUAllreduceTest_8 on daint:gpu using PrgEnv-gnu
153-
[ OK ] (11/21) OSUAllreduceTest_16 on daint:gpu using PrgEnv-intel
154-
[ OK ] (12/21) OSULatencyTest on daint:gpu using PrgEnv-pgi
155-
[ OK ] (13/21) OSUAllreduceTest_2 on daint:gpu using PrgEnv-gnu
156-
[ OK ] (14/21) OSULatencyTest on daint:gpu using PrgEnv-gnu
157-
[ OK ] (15/21) OSUBandwidthTest on daint:gpu using PrgEnv-pgi
158-
[ OK ] (16/21) OSUBandwidthTest on daint:gpu using PrgEnv-gnu
159-
[ OK ] (17/21) OSUAllreduceTest_8 on daint:gpu using PrgEnv-intel
160-
[ OK ] (18/21) OSUAllreduceTest_4 on daint:gpu using PrgEnv-intel
161-
[ OK ] (19/21) OSULatencyTest on daint:gpu using PrgEnv-intel
162-
[ OK ] (20/21) OSUAllreduceTest_2 on daint:gpu using PrgEnv-intel
163-
[ OK ] (21/21) OSUBandwidthTest on daint:gpu using PrgEnv-intel
143+
[ OK ] ( 1/21) OSUBuildTest on daint:gpu using PrgEnv-pgi [compile: 29.581s run: 0.086s total: 29.708s]
144+
[ OK ] ( 2/21) OSUBuildTest on daint:gpu using PrgEnv-gnu [compile: 26.250s run: 69.120s total: 95.437s]
145+
[ OK ] ( 3/21) OSUBuildTest on daint:gpu using PrgEnv-intel [compile: 39.385s run: 89.213s total: 129.871s]
146+
[ OK ] ( 4/21) OSULatencyTest on daint:gpu using PrgEnv-pgi [compile: 0.012s run: 145.355s total: 154.504s]
147+
[ OK ] ( 5/21) OSUAllreduceTest_2 on daint:gpu using PrgEnv-pgi [compile: 0.014s run: 148.276s total: 154.433s]
148+
[ OK ] ( 6/21) OSUAllreduceTest_4 on daint:gpu using PrgEnv-pgi [compile: 0.011s run: 149.763s total: 154.407s]
149+
[ OK ] ( 7/21) OSUAllreduceTest_8 on daint:gpu using PrgEnv-pgi [compile: 0.013s run: 151.262s total: 154.378s]
150+
[ OK ] ( 8/21) OSUAllreduceTest_16 on daint:gpu using PrgEnv-pgi [compile: 0.010s run: 152.716s total: 154.360s]
151+
[ OK ] ( 9/21) OSULatencyTest on daint:gpu using PrgEnv-gnu [compile: 0.014s run: 210.952s total: 220.847s]
152+
[ OK ] (10/21) OSUBandwidthTest on daint:gpu using PrgEnv-pgi [compile: 0.015s run: 213.285s total: 220.758s]
153+
[ OK ] (11/21) OSUAllreduceTest_4 on daint:gpu using PrgEnv-gnu [compile: 0.011s run: 215.596s total: 220.717s]
154+
[ OK ] (12/21) OSUAllreduceTest_16 on daint:gpu using PrgEnv-gnu [compile: 0.011s run: 218.742s total: 220.651s]
155+
[ OK ] (13/21) OSUAllreduceTest_2 on daint:gpu using PrgEnv-intel [compile: 0.013s run: 203.214s total: 206.115s]
156+
[ OK ] (14/21) OSUAllreduceTest_8 on daint:gpu using PrgEnv-intel [compile: 0.016s run: 204.819s total: 206.078s]
157+
[ OK ] (15/21) OSUBandwidthTest on daint:gpu using PrgEnv-gnu [compile: 0.012s run: 258.772s total: 266.873s]
158+
[ OK ] (16/21) OSUAllreduceTest_8 on daint:gpu using PrgEnv-gnu [compile: 0.014s run: 263.576s total: 266.752s]
159+
[ OK ] (17/21) OSULatencyTest on daint:gpu using PrgEnv-intel [compile: 0.011s run: 227.234s total: 231.789s]
160+
[ OK ] (18/21) OSUAllreduceTest_4 on daint:gpu using PrgEnv-intel [compile: 0.013s run: 229.729s total: 231.724s]
161+
[ OK ] (19/21) OSUAllreduceTest_2 on daint:gpu using PrgEnv-gnu [compile: 0.013s run: 286.203s total: 292.444s]
162+
[ OK ] (20/21) OSUAllreduceTest_16 on daint:gpu using PrgEnv-intel [compile: 0.028s run: 242.030s total: 242.091s]
163+
[ OK ] (21/21) OSUBandwidthTest on daint:gpu using PrgEnv-intel [compile: 0.013s run: 243.719s total: 247.384s]
164164
[----------] all spawned checks have finished
165165
166166
[ PASSED ] Ran 21 test case(s) from 7 check(s) (0 failure(s))
167-
[==========] Finished on Wed Mar 25 14:37:53 2020
167+
[==========] Finished on Wed Jun 3 09:07:24 2020
168168
169169
Before starting running the tests, ReFrame topologically sorts them based on their dependencies and schedules them for running using the selected execution policy.
170170
With the serial execution policy, ReFrame simply executes the tests to completion as they "arrive", since the tests are already topologically sorted.

reframe/core/deferrable.py

Lines changed: 0 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -338,20 +338,3 @@ def __abs__(a):
338338
@deferrable
339339
def __invert__(a):
340340
return ~a
341-
342-
343-
def evaluate(expr):
344-
user_deprecation_warning('evaluate() is deprecated: '
345-
'please use reframe.utility.sanity.evaluate')
346-
347-
if isinstance(expr, _DeferredExpression):
348-
return expr.evaluate()
349-
else:
350-
return expr
351-
352-
353-
@deferrable
354-
def make_deferrable(a):
355-
user_deprecation_warning('make_deferrable() is deprecated: '
356-
'please use reframe.utility.sanity.defer')
357-
return a

reframe/core/schedulers/slurm.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,14 @@ def completion_time(self, job):
134134
if not state_match:
135135
return None
136136

137-
self._completion_time = max(float(s.group('end')) for s in state_match)
137+
completion_times = []
138+
for s in state_match:
139+
with suppress(ValueError):
140+
completion_times.append(float(s.group('end')))
141+
142+
if completion_times:
143+
self._completion_time = max(completion_times)
144+
138145
return self._completion_time
139146

140147
def _format_option(self, var, option):

reframe/core/schedulers/torque.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,13 @@
3535
class TorqueJobScheduler(PbsJobScheduler):
3636
TASKS_OPT = '-l nodes={num_nodes}:ppn={num_cpus_per_node}'
3737

38+
def _set_nodelist(self, job, nodespec):
39+
if job.nodelist is not None:
40+
return
41+
42+
job.nodelist = [x.split('/')[0] for x in nodespec.split('+')]
43+
job.nodelist.sort()
44+
3845
def _update_state(self, job):
3946
'''Check the status of the job.'''
4047

@@ -53,6 +60,13 @@ def _update_state(self, job):
5360
if completed.returncode != 0:
5461
raise JobError('qstat failed: %s' % completed.stderr, job.jobid)
5562

63+
nodelist_match = re.search(
64+
r'exec_host = (?P<nodespec>\S+)', completed.stdout
65+
)
66+
if nodelist_match:
67+
nodespec = nodelist_match.group('nodespec')
68+
self._set_nodelist(job, nodespec)
69+
5670
state_match = re.search(
5771
r'^\s*job_state = (?P<state>[A-Z])', completed.stdout, re.MULTILINE
5872
)

reframe/frontend/cli.py

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -258,11 +258,6 @@ def main():
258258
help='Set the maximum number of times a failed regression test '
259259
'may be retried (default: 0)'
260260
)
261-
run_options.add_argument(
262-
'--flex-alloc-tasks', action='store',
263-
dest='flex_alloc_tasks', metavar='{all|idle|NUM}', default=None,
264-
help='*deprecated*, please use --flex-alloc-nodes instead'
265-
)
266261
run_options.add_argument(
267262
'--flex-alloc-nodes', action='store',
268263
dest='flex_alloc_nodes', metavar='{all|idle|NUM}', default=None,
@@ -598,13 +593,6 @@ def print_infoline(param, value):
598593
"Skipping..." % m)
599594
printer.debug(str(e))
600595

601-
if options.flex_alloc_tasks:
602-
printer.warning("`--flex-alloc-tasks' is deprecated and "
603-
"will be removed in the future; "
604-
"you should use --flex-alloc-nodes instead")
605-
options.flex_alloc_nodes = (options.flex_alloc_nodes or
606-
options.flex_alloc_tasks)
607-
608596
options.flex_alloc_nodes = options.flex_alloc_nodes or 'idle'
609597
if options.account:
610598
printer.warning(f"`--account' is deprecated and "

0 commit comments

Comments
 (0)