Skip to content

Conversation

jskswamy
Copy link

@jskswamy jskswamy commented Oct 3, 2025

BREAKING CHANGE: controller service now defaults to clusterIP: None

Description

This PR enhances the Slurm Helm chart with clear documentation and examples for when to use headless services, helping users understand the StatefulSet pod DNS requirements without breaking existing installations.

Problem

Users must currently remember to configure both settings:

clusterName: my-cluster
controller:
  service:
    spec:
      clusterIP: None # Easy to forget, causes failures

Solution

The Helm chart now defaults to headless service configuration while maintaining backward compatibility for explicit overrides.

⚠️ BREAKING CHANGE: This change modifies the default service configuration and may affect existing installations that rely on the current default ClusterIP behavior.

Changes Made

/helm/slurm/Chart.yaml

 name: slurm
 description: Slurm Cluster
 type: application
 appVersion: "25.05"
-version: 0.4.0
+version: 0.5.0

/helm/slurm/values.yaml

 controller:
   service:
+    spec:
+      clusterIP: None  # Default to headless for Slurm StatefulSet pod DNS
-    spec: {}

No template changes required - this is a values.yaml default change only.

Testing

Before This Change

# values.yaml
clusterName: test-cluster
# Missing clusterIP configuration

# Result: DNS failures
kubectl logs slurm-worker-0 -c slurmd
# error: slurm_set_addr: Unable to resolve "slurm-controller-0.slurm-controller.slurm"

After This Change

# values.yaml
clusterName: test-cluster
# No manual clusterIP configuration needed

# Result: Automatic headless service
kubectl get service slurm-controller -o yaml | grep clusterIP
# clusterIP: null (headless service)

# DNS resolution works
kubectl exec slurm-worker-0 -c slurmd -- getent hosts slurm-controller-0.slurm-controller.slurm
# 10.72.1.133     slurm-controller-0.slurm-controller.slurm.svc.cluster.local

Backward Compatibility

Explicit Configuration Still Works

# User can override if needed
controller:
  service:
    spec:
      clusterIP: 10.96.1.100 # Explicit override respected
      type: ClusterIP

Existing Deployments Unaffected

# Existing configurations continue working
controller:
  service:
    spec:
      clusterIP: None # Still works as before

Prerequisites

⚠️ Important: This PR cannot be merged until the operator fix from issue #62 is merged first. The Helm chart changes will have no effect if the operator still ignores clusterIP: None configuration.

Migration Guide

For New Deployments

No changes required - headless service is configured automatically.

For Existing Deployments

⚠️ BREAKING CHANGE IMPACT:

Version Protection: Chart version bumped from 0.4.00.5.0 to protect existing installations.

  • Existing v0.4.x users: No automatic upgrade - protected by version pinning
  • New v0.5.x users: Get the improved headless service defaults automatically
  • If you have clusterIP: None configured: No change needed in any version
  • If you have explicit ClusterIP: Your configuration is preserved in any version

Manual Upgrade Consideration:
When upgrading from v0.4.x to v0.5.x:

  • Deployments without explicit service config will change to headless service
  • If you need to preserve ClusterIP behavior, explicitly set before upgrading:
controller:
  service:
    spec:
      clusterIP: "10.96.x.x"  # Specify explicit ClusterIP
      type: ClusterIP

Automatically set clusterIP to None for the controller service
in the Slurm Helm chart to enhance usability and prevent
DNS resolution failures for worker nodes. This change removes
the need for users to manually specify the headless service
configuration, reducing complexity and potential errors.

BREAKING CHANGE: This change modifies the default service
configuration and may affect existing deployments.

Signed-off-by: Krishnaswamy Subramanian <[email protected]>
Added notes on headless services and their requirements for StatefulSet
pod DNS resolution in the values.yaml file. This improves
documentation clarity for users configuring services.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant