-
Notifications
You must be signed in to change notification settings - Fork 246
Continuous Integration in Devito
Fabio Luporini edited this page Apr 12, 2021
·
40 revisions
We use GitHub Actions for Continuous Integration.
Some of the workflows, in particular CI-core, which executes all of the core Devito tests, run on VMs that GitHub Actions provides for free to open source repositories.
Some other workflows run in the devito-cluster, which comprises nodes owned by Devito Codes as well as nodes gifted by various companies.
| node | CI-gpu | CI-mpi | asv | examples-MPI | docker-publish GPU |
|---|---|---|---|---|---|
| kimogila (NVidia RTX 3070) | x[OMP] | x | x | ||
| sarlacc (NVidia RTX 3070) | x[ACC] | ||||
| nexu | x | ||||
| bantha (NVidia RTX 3090) | |||||
| rancor (NVidia RTX 3090) | |||||
| macdevito | |||||
| acca-beast | x[ACC+Docker] | x |
MacBook macOS Catalina
- Architecture: x86_64
- CPU(s): 8
- Thread(s) per core: 2
- Core(s) per socket: 4
- Socket(s): 1
- NUMA node(s): 1
- Model name: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- Total online memory: 31G
- GPU: NVidia RTX 3070
- Architecture: x86_64
- CPU(s): 12
- Thread(s) per core: 2
- Core(s) per socket: 6
- Socket(s): 2
- Model name: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
- Total online memory: 48G
- GPU: NVidia RTX 3070
- Architecture: x86_64
- CPU(s): 12
- Thread(s) per core: 2
- Core(s) per socket: 6
- Socket(s): 1
- NUMA node(s): 1
- Model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
- Total online memory: 64G
- GPU: -
- Architecture: x86_64
- CPU(s): 12
- Thread(s) per core: 2
- Core(s) per socket: 6
- Socket(s): 1
- Model name: AMD Ryzen 5 2600
- Total online memory: 32GB
- GPU: NVidia RTX 3090
- Architecture: x86_64
- CPU(s): 16
- Thread(s) per core: 2
- Core(s) per socket: 8
- Socket(s): 2
- Model name: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
- Total online memory: 48G
- GPU: NVidia RTX 3090
- private node
- Move examples-mpi from kimogila to nexu
- Add test with larger MPI ranks (up to
mpirun -n 8 ...)
- Add test with larger MPI ranks (up to
- Set up timeout to kill builds after a short period of silence (What should the metric be?)
- Restrict builds on self-hosted runners to PRs (not all branches)? (TBD)
- Script to monitor host and device memory consumption and report it in the build output? (TBD, 3rd party package?)
- Steal useful ideas for CI from other open source projects?
- Remove the now obsolete DEVITO_BACKEND env var from the workflow files
- Write documentation about we can explicitly stop/restart the background processes (Is this the Devito daemon?)
- Review and clean up various workflows. This includes updating out of date/obsolete actions.
- Move Docker GPU workflow to another machine
- Migrate CI-mpi to our own runners? (TBD)
- More GPU testing?
- Parallelize GPU tests (pytest-n <num_of_phys_vores> ...) 'cause adjoint tests are quite expensive
- Clean up install instructions openacc/openmp
- Setup organization-level self-hosted runners so that we can use TheShed for both CI and TheMatrix. See here
- Action in the private_runners repo that "locks" a specific node for a given number of minutes standing on a
wait(nminutes) - ...
- Gerards list
- MI50s setup - have cards, seeking server to host them
- A100s setup - waiting on delivery
- install cluster management and monitoring software on TheShed
- setup GitHub authentication
- give George access to bantha
- Gerards list