ippl/ci/cscs/cscs-ci-cd.md at master · IPPL-framework/ippl

CI/CD PR testing on ALPS @ CSCS

Unit tests and Integration tests will be performed for each PR made to IPPL on ALPS at CSCS, currently, the (daint) gh200 nodes and (beverin) mi300a nodes are being used for tests (other architectures can/will be added on request and subject to availability). Each PR that is tested will receive either a green tick (pass), or a red cross (fail).

Clicking on the Tick/Cross will take you to the pipeline information page where you can drill down to see individual stages/steps

what stages/steps exist (and dependencies), currently there are
- Debug
  - Build all tests in debug mode to ensure all compilation succeeds
  - 1 rank run of unit tests (excluding known fails)
- Release
  - Build all tests in release mode to ensure all compilation succeeds
  - 1 rank run of unit tests (excluding known fails)
  - 4 ranks run of unit tests (excluding known fails)
  - 4 ranks run of unit tests (only known fails) - This step is marked as "Allowed to fail" and can be used to check if fixing a failing test has worked and if it can be removed from the known fail list.
what the commands run and output generated at each step is/was
which tests passed failed and what error messages appear

Which PRs are tested

All PRs submitted to IPPL via the IPPL github repo (ie not via forks) will be tested automatically - this is possible since only authorized github accounts are allowed to push branches and create PRs from the IPPL repo directly.

PRs submitted via forks can be tested by adding a comment of the form

Example: run multiple testing pipelines

cscs-ci run cscs-ci-gh200,cscs-ci-mi300

Example: run only the gh200 testing pipeline

cscs-ci run cscs-ci-gh200

Example: run only the mi300 testing pipeline

cscs-ci run cscs-ci-mi300

The usernames of people who are authorized to trigger pipeline runs is essentially the same as those who have repo access to IPPL but can be maintained via the maintenance page mentioned in the section Information. Please contact a CSCS/IPPL collaborator to have names added/removed from the authorized members list or to change which branches should be considered for testing (currently master)

The CI/CD pipelines will be automatically (re-)run on every push/modification of a qualifying PR. Appending new commits will trigger a new run and a new tick/cross status.

Unit/Integration tests

Testing is performed using ctest and cmake/ctest has no distinction between any tests added via add_test in a CMakeLists file. In IPPL a unit test is intended to be a small modular check of a specific feature of function that confirms whether something is working or not. An integration test represents a build or run of something that uses multiple modules or spans several algorithms or code blocks.
The majority of unit test in IPPL are defined simply such as these examples

add_ippl_test(EdgeElement)
add_ippl_test(QuadrilateralElement)
add_ippl_test(HexahedralElement)

with no parameters or other qualifiers, if they complete without error, then the test passes.

An example of an integration test can be found in the alpine module. This runs a small test case of a more complex example and checks that it completes without error.

add_ippl_integration_test(LandauDamping 
  ARGS 16 16 16 10000000 10 FFT 0.01 LeapFrog --overallocate 2.0 --info 10
  LABELS alpine integration)

Improvements needed to Integration tests

Currently no checks are performed on the validity or correctness of the results generated by tests unless something is explicitly written to make those checks. For example, a validity check on LandauDamping would load the results created by the test and run a yes/no pass/fail test on them to see if the values are correct (or fit some known analytic function to a certain tolerance).Such a check can be added as part of the integration test, or as a separate test.

Directory structure of CSCS CI/CD testing

Files needed by cscs CI are all placed under a directory structure where the pipelines exist in the ci/cscs root and each architecture/configuration has a sub directory. Editing these yaml files changes how tests are executed and they can be examined for information on how the pipelines work.

ippl-root
└── ci
    └── cscs
        ├── common.yml
        ├── cscs-gh200.yml
        ├── cscs-mi300.yml
        ├── cuda
        │   ├── build_sm90.yml
        │   └── run_sm90.yml
        └── rocm
            ├── build_rocm-6.3.yml
            └── run_rocm-6.3.yml

Implementation note Note that due to the way CSCS mirrors the github repo to a gitlab instance and then triggers gitlab runners for pipelines by invoking slurm, the build stages and test/run stages are setup slightly differently. When building, we use a uenv based runnner to provide the base libraries needed for ippl, but when testing, we use a baremetal runner and manually invoke srun commands with the uenv/view details as parameters. The reason for this is that uenv runners invoke the whole script via slurm (safe when building as we only have one build command), but when running tests, we want to execute individual tests using srun -n <ranks> command rather than have the runner invoke srun -n <ranks> ctest - for this reason we have a customized ctest invocation in the run_xxx.yaml files.

Information for CI/CD at CSCS

CSCS webpage for documentation of CI/CD implementation https://docs.cscs.ch/services/cicd Contains general info on all aspects of CI/CD using gitlab runners and github webhooks

CSCS Knowledge-base for CI/CD https://confluence.cscs.ch/spaces/KBTDS/pages/868812112/Continuous+Integration+Continuous+Deployment This page has information on how to setup CI using containers, or UENV builds, and how to set variables that are passed to the actual build/test jobs. For example - since the gh200 and mi300 clusters are accessed via different authentication domains, we use some variables to pass firecrest authentication keys into the runners whilst keeping them secret from the main scripts in the repo. This arrangement change when mi300 nodes are more closely integrated with other CSCS clusters.

gitlab yaml syntax https://docs.gitlab.com/ci/yaml/ Essential information on how the yaml files are structured and what sections mean/do.

CSCS CI admin console https://cicd-ext-mw.cscs.ch/ci/setup/ui?repo=2663791694469788 the ID 2663791694469788 refers to the internal project number granted to the IPPL projet to submit tasks to ALPS for testing. On this pages, per mission can be set and pipeline properties can be changed.

Note that the pipeline names can be set via the admin console, the names cscs-ci-gh200 and cscs-ci-mi300 were chosen to represent the architectures that they are run on, but are otherwise completely arbitrary and if changed the adjusted names must be used to trigger pipeline checks when done manually as described above.

The pipeline entry points are set on the admin page to point to ci/cscs/cscs-gh200.yml and ci/cscs/cscs-mi300.yml if these yaml files are moved to renamed, the pipeline entry points must be edited accordingly.

For information on how to setup authorization keys for firecrest launching of CSCS jobs, please consult the pages above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI/CD PR testing on ALPS @ CSCS

Which PRs are tested

Unit/Integration tests

Improvements needed to Integration tests

Directory structure of CSCS CI/CD testing

Information for CI/CD at CSCS

FilesExpand file tree

cscs-ci-cd.md

Latest commit

History

cscs-ci-cd.md

File metadata and controls

CI/CD PR testing on ALPS @ CSCS

Which PRs are tested

Unit/Integration tests

Improvements needed to Integration tests

Directory structure of CSCS CI/CD testing

Information for CI/CD at CSCS