-
-
Notifications
You must be signed in to change notification settings - Fork 783
Description
[Feature Request] Support multiple --service-account-issuer flags for zero-downtime endpoint migration
Feature Request
Add support for configuring multiple --service-account-issuer flags in the Talos API server configuration to enable zero-downtime migration when changing cluster control plane endpoints.
Problem Statement
By default, Talos automatically sets the service-account-issuer equal to cluster.controlPlane.endpoint. When the control plane endpoint needs to be changed (e.g., during load balancer migration, DNS changes, or infrastructure updates) or when the service-account-issuer needs to be different than the control plane endpoint (e.g., hosted externally), this causes immediate authentication failures and cluster downtime because:
- Existing service account tokens were issued by the old issuer URL
- The API server immediately switches to only accept tokens from the new issuer URL
- All running workloads with mounted service account tokens lose authentication until pods are restarted
Current Behavior
# Current Talos configuration
cluster:
controlPlane:
endpoint: https://new-endpoint.example.com:6443This results in:
- Immediate authentication failures for existing workloads
- Required pod restarts across the entire cluster
- Operational downtime during endpoint changes
Proposed Solution
Enable configuration of multiple service account issuers, similar to how Kubernetes API server supports this natively since v1.22:
Option 1: Array configuration in controlPlane section
cluster:
controlPlane:
endpoint: https://new-endpoint.example.com:6443
serviceAccountIssuers:
- https://old-endpoint.example.com:6443 # Still validate existing tokens
- https://new-endpoint.example.com:6443 # Generate new tokens (first in list)Option 2: Enhanced extraArgs support for repeatable flags
cluster:
apiServer:
extraArgs:
service-account-issuer:
- https://old-endpoint.example.com:6443
- https://new-endpoint.example.com:6443Note: This would require Talos to enhance extraArgs to detect array values and automatically convert them to multiple flag instances (e.g., --service-account-issuer=https://old-endpoint.example.com:6443 --service-account-issuer=https://new-endpoint.example.com:6443). This approach would also benefit other repeatable Kubernetes API server flags.
Technical Background
Kubernetes API server has supported multiple --service-account-issuer flags since v1.22, where:
- The first issuer generates new tokens
- All issuers are used to validate existing tokens
- This enables non-disruptive issuer changes per Kubernetes documentation
Use Cases
- Load Balancer Migration: Moving from one load balancer to another
- DNS Changes: Updating cluster endpoint DNS without service interruption
- Multi-Region Setup: Supporting multiple endpoint URLs for the same cluster
- Certificate Rotation: Changing endpoint certificates with different CN/SANs
- Infrastructure Migration: Moving control plane infrastructure
Benefits
- Zero-downtime endpoint migrations
- Improved operational safety during infrastructure changes
- Alignment with Kubernetes best practices
- Enhanced cluster reliability during maintenance operations
References
- Related issue: #9609 - Unable to change cluster.endpoint without downtime
- Kubernetes docs: kube-apiserver flags
- Service Account documentation: Managing Service Accounts
Implementation Considerations
- Maintain backward compatibility with current single endpoint configuration
- Ensure proper validation of issuer URLs (HTTPS, OIDC compliance)
- Consider configuration precedence (explicit vs. derived from controlPlane.endpoint)
- Integration with existing service account key management
This feature would significantly improve operational workflows for Talos clusters by eliminating forced downtime during common infrastructure operations while following established Kubernetes patterns.