Skip to content

Commit 71599d3

Browse files
committed
refactor: deduplicate hard-coded values in test scripts
Extract configuration values to the top of each script with environment variable overrides. run-tests.sh configuration (passed to test-integration.sh): - RETRY_TIMES (default: 15) - cluster readiness retry attempts - RETRY_DELAY (default: 2) - delay between retries in seconds - JOB_RETRY_DELAY (default: 1) - delay for job state checks - JOB_MAX_WAIT (default: 120) - maximum job wait time in seconds - JOB_POLL_INTERVAL (default: 3) - job polling interval in seconds - LOG_TAIL_LINES (default: 100) - lines to show in failure logs test-integration.sh configuration (container-specific defaults): - PLUGIN_LIBEXEC_DIR (default: /usr/libexec) - SLURM_SYSCONFDIR (default: /etc/slurm) - SLURM_JOB_SPOOL (default: /var/spool/slurm-jobs) - SLURM_LOG_DIR (default: /var/log/slurm) - SLURM_PARTITION (default: debug) All timing parameters from run-tests.sh are passed to test-integration.sh via docker exec -e flags for consistency.
1 parent db72e07 commit 71599d3

File tree

3 files changed

+106
-45
lines changed

3 files changed

+106
-45
lines changed

tests/runtime/README.md

Lines changed: 37 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ The runtime tests:
99
2. Build and install the slurm-singularity-exec plugin
1010
3. Verify plugin files are installed (library and configuration)
1111
4. Verify plugin CLI options appear in `sbatch --help` and `srun --help`
12-
5. Verify SPANK plugin loads when jobs run (check slurmd logs)
12+
5. Verify SPANK plugin loads when jobs run (check container logs)
1313
6. Submit and run a containerized test job (if singularity/apptainer is available)
1414

1515
## Docker Compose Architecture
@@ -28,7 +28,7 @@ The test infrastructure consists of three services orchestrated by Docker Compos
2828

2929
| Volume | Containers | Access | Purpose |
3030
|--------|------------|--------|---------|
31-
| `../..``/workspace` | All | Read-only (`:z`) | Source code and build scripts |
31+
| `../..``/workspace` | All | Read-write (`:z`) | Source code and build scripts |
3232
| `plugin-build` | All | Read-write | Shared build artifacts (plugin binaries) |
3333
| `slurmctld-state` | slurmctld | Read-write | Controller state persistence |
3434
| `slurmd-state` | slurmd | Read-write | Daemon state persistence |
@@ -58,6 +58,41 @@ The test infrastructure consists of three services orchestrated by Docker Compos
5858

5959
All services communicate via the `slurm-net` bridge network, allowing hostname-based service discovery.
6060

61+
## Configuration
62+
63+
The test infrastructure uses environment variables for configuration, allowing customization without modifying scripts:
64+
65+
### Timing Configuration (set in run-tests.sh, passed to test-integration.sh)
66+
67+
| Variable | Default | Description |
68+
|----------|---------|-------------|
69+
| `RETRY_TIMES` | 15 | Number of retry attempts for cluster readiness |
70+
| `RETRY_DELAY` | 2 | Delay in seconds between retry attempts |
71+
| `JOB_RETRY_DELAY` | 1 | Delay in seconds between job state checks |
72+
| `JOB_MAX_WAIT` | 120 | Maximum wait time in seconds for job completion |
73+
| `JOB_POLL_INTERVAL` | 3 | Interval in seconds between job status polls |
74+
| `LOG_TAIL_LINES` | 100 | Number of log lines to show on failure |
75+
76+
### Container Path Configuration (test-integration.sh only)
77+
78+
| Variable | Default | Description |
79+
|----------|---------|-------------|
80+
| `PLUGIN_LIBEXEC_DIR` | `/usr/libexec` | Plugin library directory |
81+
| `SLURM_SYSCONFDIR` | `/etc/slurm` | Slurm configuration directory |
82+
| `SLURM_JOB_SPOOL` | `/var/spool/slurm-jobs` | Job output spool directory |
83+
| `SLURM_LOG_DIR` | `/var/log/slurm` | Slurm log directory |
84+
| `SLURM_PARTITION` | `debug` | Default Slurm partition name |
85+
86+
### Example: Custom Timing
87+
88+
```bash
89+
# Faster retries for local testing
90+
RETRY_TIMES=5 RETRY_DELAY=1 ./run-tests.sh
91+
92+
# Longer timeouts for slow environments
93+
JOB_MAX_WAIT=300 ./run-tests.sh
94+
```
95+
6196
## Quick Start
6297

6398
```bash

tests/runtime/run-tests.sh

Lines changed: 25 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,14 @@ set -e
77
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
88
cd "$SCRIPT_DIR"
99

10+
# Configuration - can be overridden via environment variables
11+
: "${RETRY_TIMES:=15}"
12+
: "${RETRY_DELAY:=2}"
13+
: "${JOB_RETRY_DELAY:=1}"
14+
: "${JOB_MAX_WAIT:=120}"
15+
: "${JOB_POLL_INTERVAL:=3}"
16+
: "${LOG_TAIL_LINES:=100}"
17+
1018
echo "::group::Clean up previous containers"
1119
docker compose down -v 2>/dev/null || true
1220
echo "::endgroup::"
@@ -38,26 +46,30 @@ echo "::endgroup::"
3846

3947
echo "::group::Wait for services"
4048
echo "Waiting for slurmctld to be ready..."
41-
# Give slurmctld up to 30 seconds to start (15 retries * 2 seconds)
42-
RETRIES=15
43-
DELAY=2
44-
for i in $(seq 1 $RETRIES); do
49+
# Give slurmctld up to RETRY_TIMES * RETRY_DELAY seconds to start
50+
for i in $(seq 1 $RETRY_TIMES); do
4551
if docker compose exec -T slurmctld scontrol ping >/dev/null 2>&1; then
46-
echo "✓ Slurm cluster is ready (attempt $i/$RETRIES)"
52+
echo "✓ Slurm cluster is ready (attempt $i/$RETRY_TIMES)"
4753
break
4854
fi
49-
if [ $i -eq $RETRIES ]; then
50-
echo "ERROR: slurmctld not ready after $((RETRIES * DELAY)) seconds"
55+
if [ $i -eq $RETRY_TIMES ]; then
56+
echo "ERROR: slurmctld not ready after $((RETRY_TIMES * RETRY_DELAY)) seconds"
5157
docker compose logs slurmctld
5258
exit 1
5359
fi
54-
sleep $DELAY
60+
sleep $RETRY_DELAY
5561
done
5662
echo "::endgroup::"
5763

5864
echo "::group::Run integration tests"
5965
set +e # Temporarily disable exit on error
60-
docker compose exec -T slurmctld /workspace/tests/runtime/test-integration.sh
66+
docker compose exec -T \
67+
-e RETRY_TIMES="$RETRY_TIMES" \
68+
-e RETRY_DELAY="$RETRY_DELAY" \
69+
-e JOB_RETRY_DELAY="$JOB_RETRY_DELAY" \
70+
-e JOB_MAX_WAIT="$JOB_MAX_WAIT" \
71+
-e JOB_POLL_INTERVAL="$JOB_POLL_INTERVAL" \
72+
slurmctld /workspace/tests/runtime/test-integration.sh
6173
TEST_EXIT_CODE=$?
6274
set -e # Re-enable exit on error
6375
echo "::endgroup::"
@@ -76,12 +88,12 @@ fi
7688

7789
# Show logs if tests failed
7890
if [ $TEST_EXIT_CODE -ne 0 ]; then
79-
echo "::group::slurmctld logs (last 100 lines)"
80-
docker compose logs --tail=100 slurmctld
91+
echo "::group::slurmctld logs (last $LOG_TAIL_LINES lines)"
92+
docker compose logs --tail="$LOG_TAIL_LINES" slurmctld
8193
echo "::endgroup::"
8294

83-
echo "::group::slurmd logs (last 100 lines)"
84-
docker compose logs --tail=100 slurmd
95+
echo "::group::slurmd logs (last $LOG_TAIL_LINES lines)"
96+
docker compose logs --tail="$LOG_TAIL_LINES" slurmd
8597
echo "::endgroup::"
8698

8799
echo "::group::Container status"

tests/runtime/test-integration.sh

Lines changed: 44 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -4,22 +4,37 @@
44

55
set -e
66

7+
# Configuration - can be overridden via environment variables
8+
: "${PLUGIN_LIBEXEC_DIR:=/usr/libexec}"
9+
: "${SLURM_SYSCONFDIR:=/etc/slurm}"
10+
: "${SLURM_JOB_SPOOL:=/var/spool/slurm-jobs}"
11+
: "${SLURM_LOG_DIR:=/var/log/slurm}"
12+
: "${SLURM_PARTITION:=debug}"
13+
: "${RETRY_TIMES:=30}"
14+
: "${RETRY_DELAY:=2}"
15+
: "${JOB_RETRY_DELAY:=1}"
16+
: "${JOB_MAX_WAIT:=120}"
17+
: "${JOB_POLL_INTERVAL:=3}"
18+
19+
PLUGIN_SO="${PLUGIN_LIBEXEC_DIR}/slurm-singularity-exec.so"
20+
PLUGSTACK_CONF="${SLURM_SYSCONFDIR}/plugstack.conf.d/singularity-exec.conf"
21+
722
echo "=== Slurm Singularity Plugin Runtime Tests ==="
823
echo
924

1025
# Test 1: Verify plugin files are installed
1126
echo "Test 1: Verifying plugin installation..."
12-
if [ -f "/usr/libexec/slurm-singularity-exec.so" ]; then
13-
echo "✓ Found plugin library: /usr/libexec/slurm-singularity-exec.so"
27+
if [ -f "$PLUGIN_SO" ]; then
28+
echo "✓ Found plugin library: $PLUGIN_SO"
1429
else
15-
echo "✗ ERROR: Plugin library not found at /usr/libexec/slurm-singularity-exec.so"
30+
echo "✗ ERROR: Plugin library not found at $PLUGIN_SO"
1631
exit 1
1732
fi
1833

19-
if [ -f "/etc/slurm/plugstack.conf.d/singularity-exec.conf" ]; then
20-
echo "✓ Found plugin config: /etc/slurm/plugstack.conf.d/singularity-exec.conf"
34+
if [ -f "$PLUGSTACK_CONF" ]; then
35+
echo "✓ Found plugin config: $PLUGSTACK_CONF"
2136
else
22-
echo "✗ ERROR: Plugin config not found at /etc/slurm/plugstack.conf.d/singularity-exec.conf"
37+
echo "✗ ERROR: Plugin config not found at $PLUGSTACK_CONF"
2338
exit 1
2439
fi
2540
echo
@@ -84,7 +99,7 @@ echo
8499
if [ "$SKIP_CONTAINER_TEST" != "true" ]; then
85100
echo "Test 5: Creating a test container image..."
86101
# Use shared directory so container is accessible from both slurmctld and slurmd
87-
TEST_CONTAINER="/var/spool/slurm-jobs/test-debian.sif"
102+
TEST_CONTAINER="${SLURM_JOB_SPOOL}/test-debian.sif"
88103
if [ ! -f "$TEST_CONTAINER" ]; then
89104
# Create a minimal Debian container
90105
$SINGULARITY_CMD pull "$TEST_CONTAINER" docker://debian:stable-slim
@@ -102,23 +117,23 @@ fi
102117

103118
# Test 6: Wait for Slurm to be ready
104119
echo "Test 6: Waiting for Slurm cluster to be ready..."
105-
if ! retry --times=30 --delay=2 -- scontrol ping >/dev/null 2>&1; then
120+
if ! retry --times="$RETRY_TIMES" --delay="$RETRY_DELAY" -- scontrol ping >/dev/null 2>&1; then
106121
echo "✗ ERROR: Slurm controller not responding"
107122
exit 1
108123
fi
109124
echo "✓ Slurm controller is responding"
110125

111126
# Wait for node to be ready
112-
if ! retry --times=30 --delay=2 -- bash -c 'sinfo -h -o "%T" 2>/dev/null | grep -qE "idle|mixed|alloc"'; then
127+
if ! retry --times="$RETRY_TIMES" --delay="$RETRY_DELAY" -- bash -c 'sinfo -h -o "%T" 2>/dev/null | grep -qE "idle|mixed|alloc"'; then
113128
echo "✗ ERROR: No compute nodes are ready"
114129
echo "Showing sinfo output:"
115130
sinfo
116131
echo
117132
echo "Showing last 50 lines of slurmd logs:"
118-
tail -50 /var/log/slurm/slurmd.log 2>/dev/null || echo "Could not read slurmd logs"
133+
tail -50 "${SLURM_LOG_DIR}/slurmd.log" 2>/dev/null || echo "Could not read slurmd logs"
119134
echo
120135
echo "Showing last 50 lines of slurmctld logs:"
121-
tail -50 /var/log/slurm/slurmctld.log 2>/dev/null || echo "Could not read slurmctld logs"
136+
tail -50 "${SLURM_LOG_DIR}/slurmctld.log" 2>/dev/null || echo "Could not read slurmctld logs"
122137
exit 1
123138
fi
124139
echo "✓ Compute node is ready"
@@ -140,7 +155,7 @@ fi
140155

141156
# Wait for job to complete
142157
echo " Waiting for job $TEST_JOB_ID to complete..."
143-
retry --times=30 --delay=1 -- bash -c "scontrol show job $TEST_JOB_ID 2>/dev/null | grep -qE 'JobState=(COMPLETED|FAILED|CANCELLED)'" >/dev/null 2>&1
158+
retry --times="$RETRY_TIMES" --delay="$JOB_RETRY_DELAY" -- bash -c "scontrol show job $TEST_JOB_ID 2>/dev/null | grep -qE 'JobState=(COMPLETED|FAILED|CANCELLED)'" >/dev/null 2>&1
144159

145160
JOB_STATE=$(scontrol show job "$TEST_JOB_ID" 2>/dev/null | grep "JobState" | awk '{print $1}' | cut -d= -f2)
146161
if [ "$JOB_STATE" = "COMPLETED" ]; then
@@ -158,24 +173,24 @@ echo
158173
if [ "$SKIP_CONTAINER_TEST" != "true" ]; then
159174
echo "Test 8: Submitting a containerized test job..."
160175
JOB_SCRIPT=$(mktemp /tmp/test_job.XXXXXX.sh)
161-
cat > "$JOB_SCRIPT" <<'JOBEOF'
176+
cat > "$JOB_SCRIPT" <<JOBEOF
162177
#!/bin/bash
163178
#SBATCH --job-name=test-singularity
164-
#SBATCH --output=/var/spool/slurm-jobs/test_job_%j.out
165-
#SBATCH --error=/var/spool/slurm-jobs/test_job_%j.err
166-
#SBATCH --partition=debug
179+
#SBATCH --output=${SLURM_JOB_SPOOL}/test_job_%j.out
180+
#SBATCH --error=${SLURM_JOB_SPOOL}/test_job_%j.err
181+
#SBATCH --partition=${SLURM_PARTITION}
167182
#SBATCH --time=00:01:00
168183
#SBATCH --nodes=1
169184
#SBATCH --ntasks=1
170185
171-
echo "Job started at: $(date)"
172-
echo "Running on node: $(hostname)"
173-
echo "Job ID: $SLURM_JOB_ID"
186+
echo "Job started at: \$(date)"
187+
echo "Running on node: \$(hostname)"
188+
echo "Job ID: \$SLURM_JOB_ID"
174189
175190
# Test command inside container
176191
cat /etc/os-release | grep -i pretty
177192
178-
echo "Job completed at: $(date)"
193+
echo "Job completed at: \$(date)"
179194
JOBEOF
180195

181196
chmod +x "$JOB_SCRIPT"
@@ -192,39 +207,38 @@ echo
192207

193208
# Test 9: Wait for job to complete
194209
echo "Test 9: Waiting for job to complete..."
195-
max_wait=120
196210
waited=0
197211
while true; do
198212
JOB_STATE=$(scontrol show job "$JOB_ID" 2>/dev/null | grep "JobState=" | sed 's/.*JobState=\([^ ]*\).*/\1/')
199-
213+
200214
if [ "$JOB_STATE" = "COMPLETED" ]; then
201215
echo "✓ Job completed successfully"
202216
break
203217
elif [ "$JOB_STATE" = "FAILED" ] || [ "$JOB_STATE" = "CANCELLED" ] || [ "$JOB_STATE" = "TIMEOUT" ]; then
204218
echo "✗ ERROR: Job failed with state: $JOB_STATE"
205219
scontrol show job "$JOB_ID"
206220
exit 1
207-
elif [ $waited -ge $max_wait ]; then
208-
echo "✗ ERROR: Job did not complete within ${max_wait}s"
221+
elif [ $waited -ge $JOB_MAX_WAIT ]; then
222+
echo "✗ ERROR: Job did not complete within ${JOB_MAX_WAIT}s"
209223
scontrol show job "$JOB_ID"
210224
scancel "$JOB_ID"
211225
exit 1
212226
fi
213-
214-
echo " Job state: $JOB_STATE (${waited}s/${max_wait}s)"
215-
sleep 3
216-
waited=$((waited + 3))
227+
228+
echo " Job state: $JOB_STATE (${waited}s/${JOB_MAX_WAIT}s)"
229+
sleep "$JOB_POLL_INTERVAL"
230+
waited=$((waited + JOB_POLL_INTERVAL))
217231
done
218232
echo
219233

220234
# Test 10: Check job output
221235
echo "Test 10: Checking job output..."
222-
JOB_OUTPUT="/var/spool/slurm-jobs/test_job_${JOB_ID}.out"
236+
JOB_OUTPUT="${SLURM_JOB_SPOOL}/test_job_${JOB_ID}.out"
223237
if [ -f "$JOB_OUTPUT" ]; then
224238
echo "Job output:"
225239
cat "$JOB_OUTPUT"
226240
echo
227-
241+
228242
if grep -q "PRETTY_NAME" "$JOB_OUTPUT"; then
229243
echo "✓ Job produced expected output (found PRETTY_NAME)"
230244
else

0 commit comments

Comments
 (0)