Skip to content

Commit 770f463

Browse files
author
Vasileios Karakasis
authored
Merge branch 'master' into feat/repeat-tests
2 parents 52e95fd + c4a270b commit 770f463

File tree

13 files changed

+448
-159
lines changed

13 files changed

+448
-159
lines changed

docs/config_reference.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -472,6 +472,16 @@ ReFrame can launch containerized applications, but you need to configure properl
472472
- ``Singularity``: The `Singularity <https://sylabs.io/>`__ container runtime.
473473

474474

475+
.. js:attribute:: .systems[].partitions[].container_platforms[].default
476+
477+
:required: No
478+
479+
If set to ``true``, this is the default container platform of this partition.
480+
If not specified, the default container platform is assumed to be the first in the list of :js:attr:`container_platforms`.
481+
482+
.. versionadded:: 3.12.0
483+
484+
475485
.. js:attribute:: .systems[].partitions[].container_platforms[].modules
476486

477487
:required: No

docs/tutorial_advanced.rst

Lines changed: 120 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -683,6 +683,9 @@ First, we need to enable the container platform support in ReFrame's configurati
683683
For each partition, users can define a list of container platforms supported using the :js:attr:`container_platforms` `configuration parameter <config_reference.html#.systems[].partitions[].container_platforms>`__.
684684
In this case, we define the `Sarus <https://github.com/eth-cscs/sarus>`__ platform for which we set the :js:attr:`modules` parameter in order to instruct ReFrame to load the ``sarus`` module, whenever it needs to run with this container platform.
685685
Similarly, we add an entry for the `Singularity <https://sylabs.io>`__ platform.
686+
Optionally, users are allowed to set the ``default`` attribute to :obj:`True` in order to mark a specific container platform as the default of that partition (see below on how this information is being used).
687+
If no default container platform is specified explicitly, then always the first in the list will be considered as successful.
688+
686689

687690
The following parameterized test, will create two tests, one for each of the supported container platforms:
688691

@@ -698,14 +701,17 @@ The following parameterized test, will create two tests, one for each of the sup
698701

699702
A container-based test can be written as :class:`~reframe.core.pipeline.RunOnlyRegressionTest` that sets the :attr:`~reframe.core.pipeline.RegressionTest.container_platform` attribute.
700703
This attribute accepts a string that corresponds to the name of the container platform that will be used to run the container for this test.
701-
If such a platform is not `configured <config_reference.html#container-platform-configuration>`__ for the current system, the test will fail.
704+
It is not necessary to specify this attribute, in which case, the default container platform of the current partition will be used.
705+
You can still differentiate your test based on the actual container platform that is being used by checking the ``self.container_platform.name`` variable.
706+
707+
As soon as the container platform to be used is determined, you need to specify the container image to use by setting the :attr:`~reframe.core.containers.ContainerPlatform.image`.
708+
If the image is not specified, then the container logic is skipped and the test executes as if the :attr:`~reframe.core.pipeline.RegressionTest.container_platform` was never set.
702709

703-
As soon as the container platform to be used is defined, you need to specify the container image to use by setting the :attr:`~reframe.core.containers.ContainerPlatform.image`.
704710
In the ``Singularity`` test variant, we add the ``docker://`` prefix to the image name, in order to instruct ``Singularity`` to pull the image from `DockerHub <https://hub.docker.com/>`__.
705711
The default command that the container runs can be overwritten by setting the :attr:`~reframe.core.containers.ContainerPlatform.command` attribute of the container platform.
706712

707713
The :attr:`~reframe.core.containers.ContainerPlatform.image` is the only mandatory attribute for container-based checks.
708-
It is important to note that the :attr:`~reframe.core.pipeline.RegressionTest.executable` and :attr:`~reframe.core.pipeline.RegressionTest.executable_opts` attributes of the actual test are ignored in case of container-based tests.
714+
It is important to note that the :attr:`~reframe.core.pipeline.RegressionTest.executable` and :attr:`~reframe.core.pipeline.RegressionTest.executable_opts` attributes of the actual test are ignored if the containerized code path is taken, i.e., when :attr:`~reframe.core.containers.ContainerPlatform.image` is not :obj:`None`.
709715

710716
ReFrame will run the container according to the given platform as follows:
711717

@@ -759,6 +765,117 @@ For a complete list of the available attributes of a specific container platform
759765
On how to configure ReFrame for running containerized tests, please have a look at the :ref:`container-platform-configuration` section of the :doc:`config_reference`.
760766

761767

768+
.. versionchanged:: 3.12.0
769+
There is no need any more to explicitly set the :attr:`container_platform` in the test.
770+
This is automatically initialized from the default platform of the current partition.
771+
772+
773+
774+
Combining containerized and native application tests
775+
====================================================
776+
777+
.. versionadded:: 3.12.0
778+
779+
It is very easy in ReFrame to have a single run-only test to either test the native or the containerized version of an application.
780+
This is possible, since the framework will only take the "containerized" code path only if the :attr:`~reframe.core.containers.ContainerPlatform.image` attribute of the :attr:`~reframe.core.pipeline.RegressionTest.container_platform` is not :obj:`None`.
781+
Otherwise, the *bare metal* version of the tested application will be run.
782+
The following test uses exactly this trick to test a series of GROMACS images as well as the native one provided on the Piz Daint supercomputer.
783+
It also extends the GROMACS benchmark tests that are provided with ReFrame's test library (see :doc:`hpctestlib`).
784+
For simplicity, we are assuming a single system here (the hybrid partition of Piz Daint) and we set fixed values for the :attr:`num_cpus_per_task` as well as the ``-ntomp`` option of GROMACS (NB: in a real-world test we would use the auto-detected processor topology information to set these values; see :ref:`proc-autodetection` for more information).
785+
We also redefine and restrict the benchmark's parameters ``benchmark_info`` and ``nb_impl`` to the values that are of interest for the demonstration of this test.
786+
Finally, we also reset the executable to use ``gmx`` instead of the ``gmx_mpi`` that is used from the library test.
787+
788+
789+
.. literalinclude:: ../tutorials/advanced/containers/gromacs_test.py
790+
:start-after: # rfmdocstart: gromacstest
791+
792+
All this test does in addition to the library test it inherits from is to set the :attr:`~reframe.core.containers.ContainerPlatform.image` and the :attr:`~reframe.core.containers.ContainerPlatform.command` attributes of the :attr:`~reframe.core.pipeline.RegressionTest.container_platform`.
793+
The former is set from the ``gromacs_image`` test parameter whereas the latter from the test's :attr:`~reframe.core.pipeline.RegressionTest.executable` and :attr:`~reframe.core.pipeline.RegressionTest.executable_opts` attributes.
794+
Remember that these attributes are ignored if the framework takes the path of launching a container.
795+
Finally, if the image is :obj:`None` we handle the case of the native run, in which case we load the modules required to run GROMACS natively on the target system.
796+
797+
In the following, we run the GPU version of a single benchmark with a series of images from NVIDIA and natively:
798+
799+
.. code-block:: console
800+
801+
$ ./bin/reframe -C tutorials/config/settings.py -c tutorials/advanced/containers/gromacs_test.py -r
802+
803+
.. code-block:: console
804+
805+
[==========] Running 6 check(s)
806+
[==========] Started on Fri Jun 17 16:20:16 2022
807+
808+
[----------] start processing checks
809+
[ RUN ] gromacs_containerized_test %benchmark_info=HECBioSim/hEGFRDimerSmallerPL %nb_impl=gpu %gromacs_image=nvcr.io/hpc/gromacs:2022.1 @daint:gpu+gnu
810+
[ RUN ] gromacs_containerized_test %benchmark_info=HECBioSim/hEGFRDimerSmallerPL %nb_impl=gpu %gromacs_image=nvcr.io/hpc/gromacs:2021.3 @daint:gpu+gnu
811+
[ RUN ] gromacs_containerized_test %benchmark_info=HECBioSim/hEGFRDimerSmallerPL %nb_impl=gpu %gromacs_image=nvcr.io/hpc/gromacs:2021 @daint:gpu+gnu
812+
[ RUN ] gromacs_containerized_test %benchmark_info=HECBioSim/hEGFRDimerSmallerPL %nb_impl=gpu %gromacs_image=nvcr.io/hpc/gromacs:2020.2 @daint:gpu+gnu
813+
[ RUN ] gromacs_containerized_test %benchmark_info=HECBioSim/hEGFRDimerSmallerPL %nb_impl=gpu %gromacs_image=nvcr.io/hpc/gromacs:2020 @daint:gpu+gnu
814+
[ RUN ] gromacs_containerized_test %benchmark_info=HECBioSim/hEGFRDimerSmallerPL %nb_impl=gpu %gromacs_image=None @daint:gpu+gnu
815+
[ OK ] (1/6) gromacs_containerized_test %benchmark_info=HECBioSim/hEGFRDimerSmallerPL %nb_impl=gpu %gromacs_image=nvcr.io/hpc/gromacs:2020.2 @daint:gpu+gnu
816+
[ OK ] (2/6) gromacs_containerized_test %benchmark_info=HECBioSim/hEGFRDimerSmallerPL %nb_impl=gpu %gromacs_image=nvcr.io/hpc/gromacs:2020 @daint:gpu+gnu
817+
[ OK ] (3/6) gromacs_containerized_test %benchmark_info=HECBioSim/hEGFRDimerSmallerPL %nb_impl=gpu %gromacs_image=None @daint:gpu+gnu
818+
[ OK ] (4/6) gromacs_containerized_test %benchmark_info=HECBioSim/hEGFRDimerSmallerPL %nb_impl=gpu %gromacs_image=nvcr.io/hpc/gromacs:2022.1 @daint:gpu+gnu
819+
[ OK ] (5/6) gromacs_containerized_test %benchmark_info=HECBioSim/hEGFRDimerSmallerPL %nb_impl=gpu %gromacs_image=nvcr.io/hpc/gromacs:2021 @daint:gpu+gnu
820+
[ OK ] (6/6) gromacs_containerized_test %benchmark_info=HECBioSim/hEGFRDimerSmallerPL %nb_impl=gpu %gromacs_image=nvcr.io/hpc/gromacs:2021.3 @daint:gpu+gnu
821+
[----------] all spawned checks have finished
822+
823+
[ PASSED ] Ran 6/6 test case(s) from 6 check(s) (0 failure(s), 0 skipped)
824+
[==========] Finished on Fri Jun 17 16:23:47 2022
825+
826+
827+
We can also inspect the generated job scripts for the native and a containerized run:
828+
829+
.. code-block:: console
830+
831+
cat output/daint/gpu/gnu/gromacs_containerized_test_0/rfm_gromacs_containerized_test_0_job.sh
832+
833+
.. code-block:: bash
834+
835+
#!/bin/bash
836+
#SBATCH --job-name="rfm_gromacs_containerized_test_0_job"
837+
#SBATCH --ntasks=1
838+
#SBATCH --ntasks-per-node=1
839+
#SBATCH --cpus-per-task=12
840+
#SBATCH --output=rfm_gromacs_containerized_test_0_job.out
841+
#SBATCH --error=rfm_gromacs_containerized_test_0_job.err
842+
#SBATCH -A csstaff
843+
#SBATCH --constraint=gpu
844+
#SBATCH --hint=nomultithread
845+
module unload PrgEnv-cray
846+
module load PrgEnv-gnu
847+
module load daint-gpu
848+
module load GROMACS
849+
curl -LJO https://github.com/victorusu/GROMACS_Benchmark_Suite/raw/1.0.0/HECBioSim/hEGFRDimerSmallerPL/benchmark.tpr
850+
srun gmx mdrun -dlb yes -ntomp 12 -npme -1 -v -nb gpu -s benchmark.tpr
851+
852+
And the containerized run:
853+
854+
.. code-block:: console
855+
856+
cat output/daint/gpu/gnu/gromacs_containerized_test_1/rfm_gromacs_containerized_test_1_job.sh
857+
858+
.. code-block:: bash
859+
860+
#!/bin/bash
861+
#SBATCH --job-name="rfm_gromacs_containerized_test_1_job"
862+
#SBATCH --ntasks=1
863+
#SBATCH --ntasks-per-node=1
864+
#SBATCH --cpus-per-task=12
865+
#SBATCH --output=rfm_gromacs_containerized_test_1_job.out
866+
#SBATCH --error=rfm_gromacs_containerized_test_1_job.err
867+
#SBATCH -A csstaff
868+
#SBATCH --constraint=gpu
869+
#SBATCH --hint=nomultithread
870+
module unload PrgEnv-cray
871+
module load PrgEnv-gnu
872+
module load sarus
873+
curl -LJO https://github.com/victorusu/GROMACS_Benchmark_Suite/raw/1.0.0/HECBioSim/hEGFRDimerSmallerPL/benchmark.tpr
874+
sarus pull nvcr.io/hpc/gromacs:2020
875+
srun sarus run --mount=type=bind,source="/users/user/Devel/reframe/stage/daint/gpu/gnu/gromacs_containerized_test_43",destination="/rfm_workdir" -w /rfm_workdir nvcr.io/hpc/gromacs:2020 gmx mdrun -dlb yes -ntomp 12 -npme -1 -v -nb gpu -s benchmark.tpr
876+
877+
878+
762879
Writing reusable tests
763880
----------------------
764881

reframe/core/containers.py

Lines changed: 57 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,9 @@
66
import abc
77

88
import reframe.core.fields as fields
9+
import reframe.core.warnings as warn
10+
import reframe.utility as util
911
import reframe.utility.typecheck as typ
10-
from reframe.core.exceptions import ContainerError
1112

1213

1314
_STAGEDIR_MOUNT = '/rfm_workdir'
@@ -80,24 +81,18 @@ class ContainerPlatform(abc.ABC):
8081
#: :default: ``[]``
8182
options = fields.TypedField(typ.List[str])
8283

83-
_workdir = fields.TypedField(str, type(None))
8484
#: The working directory of ReFrame inside the container.
8585
#:
8686
#: This is the directory where the test's stage directory is mounted inside
8787
#: the container. This directory is always mounted regardless if
8888
#: :attr:`mount_points` is set or not.
8989
#:
90-
#: .. deprecated:: 3.5
91-
#: Please use the `options` field to set the working directory.
92-
#:
9390
#: :type: :class:`str`
9491
#: :default: ``/rfm_workdir``
95-
workdir = fields.DeprecatedField(
96-
_workdir,
97-
'The `workdir` field is deprecated, please use the `options` field to '
98-
'set the container working directory',
99-
fields.DeprecatedField.OP_SET, from_version='3.5.0'
100-
)
92+
#:
93+
#: .. versionchanged:: 3.12.0
94+
#: This attribute is no more deprecated.
95+
workdir = fields.TypedField(str, type(None))
10196

10297
def __init__(self):
10398
self.image = None
@@ -106,8 +101,8 @@ def __init__(self):
106101
# NOTE: Here we set the target fields directly to avoid the deprecation
107102
# warnings
108103
self._commands = []
109-
self._workdir = _STAGEDIR_MOUNT
110104

105+
self.workdir = _STAGEDIR_MOUNT
111106
self.mount_points = []
112107
self.options = []
113108
self.pull_image = True
@@ -143,12 +138,37 @@ def launch_command(self, stagedir):
143138
:arg stagedir: The stage directory of the test.
144139
'''
145140

146-
def validate(self):
147-
if self.image is None:
148-
raise ContainerError('no image specified')
141+
@classmethod
142+
def create(cls, name):
143+
'''Factory method to create a new container by name.'''
144+
name = name.capitalize()
145+
try:
146+
return globals()[name]()
147+
except KeyError:
148+
raise ValueError(f'unknown container platform: {name}') from None
149+
150+
@classmethod
151+
def create_from(cls, name, other):
152+
new = cls.create(name)
153+
new.image = other.image
154+
new.command = other.command
155+
new.mount_points = other.mount_points
156+
new.options = other.options
157+
new.pull_image = other.pull_image
158+
new.workdir = other.workdir
159+
160+
# Update deprecated fields
161+
with warn.suppress_deprecations():
162+
new.commands = other.commands
163+
164+
return new
165+
166+
@property
167+
def name(self):
168+
return type(self).__name__
149169

150170
def __str__(self):
151-
return type(self).__name__
171+
return self.name
152172

153173
def __rfm_json_encode__(self):
154174
return str(self)
@@ -165,15 +185,17 @@ def launch_command(self, stagedir):
165185
super().launch_command(stagedir)
166186
mount_points = self.mount_points + [(stagedir, _STAGEDIR_MOUNT)]
167187
run_opts = [f'-v "{mp[0]}":"{mp[1]}"' for mp in mount_points]
168-
run_opts += self.options
188+
if self.workdir:
189+
run_opts.append(f'-w {self.workdir}')
169190

191+
run_opts += self.options
170192
if self.command:
171193
return (f'docker run --rm {" ".join(run_opts)} '
172194
f'{self.image} {self.command}')
173195

174196
if self.commands:
175197
return (f"docker run --rm {' '.join(run_opts)} {self.image} "
176-
f"bash -c 'cd {self.workdir}; {'; '.join(self.commands)}'")
198+
f"bash -c '{'; '.join(self.commands)}'")
177199

178200
return f'docker run --rm {" ".join(run_opts)} {self.image}'
179201

@@ -197,7 +219,8 @@ def emit_prepare_commands(self, stagedir):
197219
# The format that Sarus uses to call the images is
198220
# <reposerver>/<user>/<image>:<tag>. If an image was loaded
199221
# locally from a tar file, the <reposerver> is 'load'.
200-
if not self.pull_image or self.image.startswith('load/'):
222+
if (not self.pull_image or not self.image or
223+
self.image.startswith('load/')):
201224
return []
202225
else:
203226
return [f'{self._command} pull {self.image}']
@@ -210,15 +233,17 @@ def launch_command(self, stagedir):
210233
if self.with_mpi:
211234
run_opts.append('--mpi')
212235

213-
run_opts += self.options
236+
if self.workdir:
237+
run_opts.append(f'-w {self.workdir}')
214238

239+
run_opts += self.options
215240
if self.command:
216241
return (f'{self._command} run {" ".join(run_opts)} {self.image} '
217242
f'{self.command}')
218243

219244
if self.commands:
220245
return (f"{self._command} run {' '.join(run_opts)} {self.image} "
221-
f"bash -c 'cd {self.workdir}; {'; '.join(self.commands)}'")
246+
f"bash -c '{'; '.join(self.commands)}'")
222247

223248
return f'{self._command} run {" ".join(run_opts)} {self.image}'
224249

@@ -232,6 +257,12 @@ def __init__(self):
232257
super().__init__()
233258
self._command = 'shifter'
234259

260+
def launch_command(self, stagedir):
261+
# Temporarily change `workdir`, since Sarus and Shifter have otherwise
262+
# the same interface
263+
with util.temp_setattr(self, 'workdir', None):
264+
return super().launch_command(stagedir)
265+
235266

236267
class Singularity(ContainerPlatform):
237268
'''Container platform backend for running containers with `Singularity
@@ -257,14 +288,17 @@ def launch_command(self, stagedir):
257288
if self.with_cuda:
258289
run_opts.append('--nv')
259290

291+
if self.workdir:
292+
run_opts.append(f'-W {self.workdir}')
293+
260294
run_opts += self.options
261295
if self.command:
262296
return (f'singularity exec {" ".join(run_opts)} '
263297
f'{self.image} {self.command}')
264298

265299
if self.commands:
266300
return (f"singularity exec {' '.join(run_opts)} {self.image} "
267-
f"bash -c 'cd {self.workdir}; {'; '.join(self.commands)}'")
301+
f"bash -c '{'; '.join(self.commands)}'")
268302

269303
return f'singularity run {" ".join(run_opts)} {self.image}'
270304

@@ -275,10 +309,6 @@ def __init__(self, *other_types):
275309

276310
def __set__(self, obj, value):
277311
if isinstance(value, str):
278-
try:
279-
value = globals()[value]()
280-
except KeyError:
281-
raise ValueError(
282-
f'unknown container platform: {value}') from None
312+
value = ContainerPlatform.create(value)
283313

284314
super().__set__(obj, value)

0 commit comments

Comments
 (0)