Skip to content

Conversation

faganihajizada
Copy link
Contributor

@faganihajizada faganihajizada commented Oct 15, 2025

Summary

This PR enables SSH access to Slurm worker nodes with pam_slurm_adopt integration for job-based access control. This allows users to SSH into worker nodes where they have running jobs, with the PAM module preventing unauthorized access and tracking SSH connections for accounting and cleanup.

Related PR: SlinkyProject/slurm-operator#66

Key Changes:

  • Install openssh-server in worker (slurmd) images
  • Configure pam_slurm_adopt dynamically at container startup
  • Reuse existing sshd.conf/sshd.ini supervisor configurations from login images
  • Expose port 22 on worker containers

Breaking Changes

N/A

Testing Notes

Built operator based on ...

srun -p <partition> -n1 --time=5:00 sleep 300 &

# 2. Find the worker node running the job
squeue -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %R"

# 3. SSH from login node to worker node (should succeed)
ssh <worker-pod-hostname>

# 4. Verify you're on the worker node
hostname
ps aux | grep sleep

# 5. Cancel the job
scancel <job-id>

# 6. Try to SSH again (denied)
ssh <worker-pod-hostname>

Verified:

  • SSH daemon starts alongside slurmd
  • Unique SSH host keys generated per pod
  • Users can SSH to nodes where they have running jobs
  • Users cannot SSH to nodes without running jobs
  • PAM configuration applied correctly

Enable SSH access to worker nodes with PAM integration to restrict access
to users with running jobs.

https://slurm.schedmd.com/pam_slurm_adopt.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant