-
Notifications
You must be signed in to change notification settings - Fork 121
Generating MFC Images and Testing Them on OSPool #935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Status Update: I overslept on this. The concept in itself works as supposed to. The persisting hurdle has been no space left on disk for GPU images as Nvidia HPC base container is like 5-8 GB. Clearing the cache wherever (GH runner or OSPOOL) is obsolete. I tried different base containers and recipe instructions but no shot. GPU Base Container: New ApproachBuild: make the process on self-hosted Phoenix. |
|
These folks build all of nvhpc + openmpi + cuda https://github.com/link89/github-action-demo/blob/cp2k-with-deepmd/cp2k/2025.1-cuda124-openmpi-avx512-psmp/build.sh, so far as I can tell, into a Docker image using a standard GH runner. Can we just try this more simple approach first? It seems that some issues here are the attempt to get it done in one shot. Perhaps try something easy first, then add complexity. For example, building a simple gnu+mpi docker container that has MFC in it. We could even use this as an example for new users so they can get up and running without worrying about dependencies on their system. |
Ohh, interesting, I will try this approach and see if it can compile and run MFC on gh runner. Converting between Docker & Singularity is not even something to worry about anyways. |
|
Sorta figured it out with Edit (Note to Self): e.g. Log files used to trigger failure if *.out/.err contains keyword 'Error'. Corresponding log numbers can be referenced with Wait for each job instance using the following |
|
Closed in favor of a more efficient and streamlined Docker image build process in (#971). |
User description
Description
Closes #654
Generating four images CPU, CPU_Benchmark, GPU, and GPU_Benchmark. All MFC builds occur on a GitHub runner, while testing and storing latest images take place on OSPOOL. They are retrievable on the CI itself as the images are pre-built MFC with pre-installed packages that can be accessed with simple commands.
Debugging info,
To locally generate images,
apptainer build mfc_cpu.sif Singularity.cpuTo start shell instance,
apptainer shell --fakeroot --writable-tmpfs mfc_cpu.sifTo execute directly specific commands,
apptainer exec --fakeroot --writable-tmpfs mfc_cpu.sif /bin/bash -c './mfc.sh test -a'To download container images, install pelican,
then run
pelican object get osdf:///ospool/ap40/data/<user>/<image>.sif <local dir>e.g.
pelican object get osdf:///ospool/ap40/data/mohammed.al-mahrouqi/mfc_gpu.sif ~/DesktopIt would require login credentials from any cilogon.org college/institute.
To-dos,
Note to Self: current secrets are hosted in the fork, and prior to merge new dedicated ones should be added to the base repo. To do so, request access point under "GATech_Bryngelson" project, then upload public SSH key to https://registry.cilogon.org/. Later on, update secrets which include private SSH key and user@host.
Ref's
NVIDIA Container
PR Type
Other
Description
Remove existing CI workflows and testing infrastructure
Add Singularity container image building workflow
Create four container definitions for CPU/GPU variants
Implement automated image building and testing on OSPool
Changes diagram
Changes walkthrough 📝
17 files
Remove Frontier build scriptRemove Frontier job submission scriptRemove Frontier test scriptRemove Phoenix benchmark scriptRemove Phoenix benchmark submission scriptRemove Phoenix job submission scriptRemove Phoenix test scriptRemove benchmark workflowRemove code cleanliness workflowRemove coverage check workflowRemove documentation workflowRemove formatting check workflowRemove line count workflowRemove source linting workflowRemove toolchain linting workflowRemove spell check workflowRemove main test suite workflow5 files
Add Singularity image building workflowAdd CPU container definitionAdd CPU benchmark container definitionAdd GPU container definitionAdd GPU benchmark container definition