22
33## Current Status
44
5- The Slurm testing workflow (` .github/workflows/slurm-tests.yml ` ) is currently set to ** manual trigger ** ( ` workflow_dispatch ` ) due to the complexity of setting up Slurm in GitHub Actions CI environment .
5+ The Slurm testing workflow (` .github/workflows/slurm-tests.yml ` ) uses the ** pitt-crc/Slurm-Test-Environment ** Docker images for automated Slurm testing in GitHub Actions.
66
7- ## Why Manual?
7+ Repository: https://github.com/pitt-crc/Slurm-Test-Environment
88
9- 1 . ** Docker Image Availability** : The original workflow used a non-existent Docker image (` ghcr.io/natejenkins/slurm-docker-cluster:23.11.7 ` )
10- 2 . ** Complexity** : Running Slurm requires:
11- - Privileged container access
12- - Munge authentication setup
13- - Multiple Slurm daemons (controller, compute nodes)
14- - Proper networking configuration
15- 3 . ** Maintenance** : Community Slurm Docker images may become outdated or unavailable
9+ ## How It Works
1610
17- ## Running Slurm Tests Locally
11+ The workflow:
12+ 1 . Uses pre-built Slurm Docker images from ` ghcr.io/pitt-crc/test-env `
13+ 2 . Tests against multiple Slurm versions (23.02.5, 23.11.10)
14+ 3 . Automatically starts Slurm services via the image's entrypoint
15+ 4 . Runs comprehensive asimov tests with actual Slurm job submission
1816
19- ### Option 1: Using Docker
17+ ## Test Matrix
2018
21- ``` bash
22- # Pull a Slurm Docker image
23- docker pull nathanhess/slurm:latest
19+ The CI tests against multiple Slurm versions to ensure compatibility:
20+ - Slurm 23.02.5 (older stable)
21+ - Slurm 23.11.10 (newer stable)
2422
25- # Run the container
26- docker run -it --privileged --hostname slurmctld nathanhess/slurm:latest /bin/bash
23+ Additional versions can be added by updating the matrix in the workflow file.
2724
28- # Inside the container:
29- # 1. Start munge
30- munged
25+ ## Running Tests Locally
3126
32- # 2. Start Slurm daemons
33- slurmctld
34- slurmd
27+ ### Option 1: Using the Same Docker Image
3528
36- # 3. Verify Slurm is running
37- sinfo
38- squeue
29+ ``` bash
30+ # Pull the test environment
31+ docker pull ghcr.io/pitt-crc/test-env:23.02.5
3932
40- # 4. Run asimov tests
33+ # Run interactively
34+ docker run -it ghcr.io/pitt-crc/test-env:23.02.5 /bin/bash
35+
36+ # Inside the container, the entrypoint has already started Slurm
37+ # Run your tests
4138cd /path/to/asimov
4239python -m unittest tests.test_scheduler
4340```
4441
45- ### Option 2: Install Slurm on Ubuntu
42+ ### Option 2: Using Docker Compose
43+
44+ Create a ` docker-compose.yml ` :
45+
46+ ``` yaml
47+ version : ' 3'
48+ services :
49+ slurm-test :
50+ image : ghcr.io/pitt-crc/test-env:23.02.5
51+ volumes :
52+ - .:/workspace
53+ working_dir : /workspace
54+ command : /bin/bash -c "/usr/local/bin/entrypoint.sh && /bin/bash"
55+ stdin_open : true
56+ tty : true
57+ ` ` `
58+
59+ Then run:
60+ ` ` ` bash
61+ docker-compose run slurm-test
62+ ```
63+
64+ ### Option 3: Install Slurm on Ubuntu
65+
66+ For local development without Docker:
4667
4768``` bash
4869# Install Slurm
@@ -61,59 +82,107 @@ sudo systemctl start slurmd
6182sinfo
6283```
6384
64- ### Option 3 : Unit Tests Only
85+ ### Option 4 : Unit Tests Only (No Slurm Required)
6586
66- The unit tests for Slurm scheduler can run without a real Slurm installation:
87+ The unit tests for Slurm scheduler use mocking and don't require a real Slurm installation:
6788
6889``` bash
6990cd /path/to/asimov
7091python -m unittest tests.test_scheduler.SlurmSchedulerTests -v
7192```
7293
73- These tests use mocking and don't require an actual Slurm cluster.
94+ ## What Gets Tested
95+
96+ The CI workflow tests:
97+
98+ 1 . ** Slurm Detection** : Verifies ` asimov init ` correctly detects Slurm
99+ 2 . ** Scheduler Unit Tests** : All 30 unit tests for the scheduler abstraction
100+ 3 . ** Job Submission** : Actual Slurm job submission and monitoring
101+ 4 . ** DAG Translation** : HTCondor DAG to Slurm batch script conversion
102+ 5 . ** Integration** : End-to-end workflow with asimov commands
103+
104+ ## Customizing the Test Environment
105+
106+ To test with a different Slurm version:
107+
108+ 1 . Check available versions at: https://github.com/pitt-crc/Slurm-Test-Environment/pkgs/container/test-env
109+ 2 . Update the matrix in ` .github/workflows/slurm-tests.yml ` :
110+
111+ ``` yaml
112+ matrix :
113+ slurm_version :
114+ - " 20.11.9"
115+ - " 22.05.11"
116+ - " 23.02.5"
117+ - " 23.11.10"
118+ ` ` `
119+
120+ ## Building Your Own Slurm Test Image
74121
75- ## Enabling Automatic CI Testing
122+ If you need custom Slurm configuration:
76123
77- To enable automatic Slurm testing in CI:
124+ ` ` ` dockerfile
125+ FROM ghcr.io/pitt-crc/test-env:23.02.5
78126
79- 1 . ** Build Your Own Slurm Container** :
80- ``` dockerfile
81- FROM ubuntu:22.04
82- RUN apt-get update && apt-get install -y \
83- slurm-wlm munge sudo python3 python3-pip git
84- # Add your Slurm configuration
85- COPY slurm.conf /etc/slurm/slurm.conf
86- # Add startup script
87- COPY start-slurm.sh /start-slurm.sh
88- RUN chmod +x /start-slurm.sh
89- CMD ["/start-slurm.sh" ]
90- ```
127+ # Add your custom Slurm configuration
128+ COPY my-slurm.conf /etc/slurm/slurm.conf
91129
92- 2 . ** Push to Container Registry** :
93- ``` bash
94- docker build -t your-org/slurm-test:latest .
95- docker push your-org/slurm-test:latest
96- ```
130+ # Add custom setup
131+ COPY setup-script.sh /usr/local/bin/custom-setup.sh
132+ RUN chmod +x /usr/local/bin/custom-setup.sh
133+ ```
134+
135+ Then build and push to your registry:
136+ ``` bash
137+ docker build -t your-org/slurm-test:custom .
138+ docker push your-org/slurm-test:custom
139+ ```
140+
141+ Update the workflow to use your image:
142+ ``` yaml
143+ container :
144+ image : your-org/slurm-test:custom
145+ ` ` `
146+
147+ ## Advantages of This Approach
148+
149+ 1. **Reliable**: Uses maintained Docker images specifically designed for CI testing
150+ 2. **Versioned**: Test against multiple Slurm versions
151+ 3. **Pre-configured**: Slurm services start automatically via entrypoint
152+ 4. **No Manual Setup**: No need to manually start munge, slurmctld, slurmd
153+ 5. **Fast**: Images are optimized for quick startup in CI
154+ 6. **Maintained**: The pitt-crc project actively maintains these images
155+
156+ ## Troubleshooting
97157
98- 3 . ** Update Workflow** :
99- - Edit ` .github/workflows/slurm-tests.yml `
100- - Change ` image: nathanhess/slurm:latest ` to ` image: your-org/slurm-test:latest `
101- - Change ` on: workflow_dispatch ` to ` on: [push, pull_request] `
158+ ### Container Fails to Start
159+
160+ Check the workflow logs for entrypoint errors. The entrypoint script should handle Slurm service startup automatically.
161+
162+ ### Slurm Commands Not Found
163+
164+ Ensure the entrypoint has been called:
165+ ` ` ` bash
166+ /usr/local/bin/entrypoint.sh
167+ ```
168+
169+ ### Jobs Stay in Pending State
170+
171+ Check node status:
172+ ``` bash
173+ sinfo -N -o " %N %t %C"
174+ ```
102175
103- ## Alternative: Use Existing Public Images
176+ If nodes are down, the entrypoint may not have completed successfully.
104177
105- Community-maintained Slurm images (use at your own risk):
106- - ` nathanhess/slurm:latest ` - Basic Slurm installation
107- - ` agaveapi/slurm:latest ` - Includes controller and compute nodes
108- - ` xenonmiddleware/slurm:latest ` - For Xenon middleware testing
178+ ### Permission Errors
109179
110- Update the workflow file to use one of these images if they meet your requirements .
180+ The test environment runs as root by default. If you encounter permission issues, check file ownership in the workspace .
111181
112- ## Testing Strategy
182+ ## References
113183
114- The current testing strategy prioritizes:
115- 1 . ** Unit tests ** - Mock-based tests that don't require Slurm (always run )
116- 2 . ** Integration tests ** - Manual or local testing with real Slurm
117- 3 . ** CI tests ** - Manual trigger when Slurm container is available
184+ - [ Slurm Test Environment Repository ] ( https://github.com/pitt-crc/Slurm-Test-Environment )
185+ - [ Available Docker Images ] ( https://github.com/pitt-crc/Slurm-Test-Environment/pkgs/container/test-env )
186+ - [ Slurm Documentation ] ( https://slurm.schedmd.com/ )
187+ - [ GitHub Actions Container Jobs ] ( https://docs.github.com/en/actions/using-jobs/running-jobs-in-a- container)
118188
119- This ensures the code is well-tested without blocking CI on Slurm setup complexity.
0 commit comments