Skip to content

Commit 5552337

Browse files
Abhishek KumarAbhishek Kumar
authored andcommitted
fix: relax hostname constraint to prevent scheduling deadlock in multi-zone distribution
1 parent 6b02291 commit 5552337

8 files changed

+45
-20
lines changed

docs/user/MultiZoneDistribution.md

Lines changed: 38 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -39,10 +39,15 @@ V4_CFG_SINGLE_ZONE_FALLBACK: true
3939
4040
### Topology Spread Constraints (Balanced Approach)
4141
- **Zone Distribution**: `maxSkew: 1` on `topology.kubernetes.io/zone` with `DoNotSchedule`
42-
- Distributes pods across zones with some tolerance for imbalance
43-
- Ensures scheduling reliability while maintaining zone distribution
44-
- **Node Distribution**: `maxSkew: 1` on `kubernetes.io/hostname` with `DoNotSchedule`
45-
- Spreads pods across nodes when possible
42+
- **Strict enforcement** at zone level to prevent concentration
43+
- Ensures StatefulSet replicas are distributed across availability zones
44+
- Primary protection against zone failures (PSCLOUD-64 resolution)
45+
46+
- **Node Distribution**: `maxSkew: 1` on `kubernetes.io/hostname` with `ScheduleAnyway`
47+
- **Best-effort spreading** at node level without blocking scheduling
48+
- Kubernetes attempts to spread pods across different nodes when possible
49+
- Will not prevent pod scheduling if perfect node balance cannot be achieved
50+
- Prevents scheduling deadlock when combined with zone-level constraints
4651

4752
### Node Affinity (Nodepool Restriction)
4853
- **Required Node Affinity**: Configurable nodepool label restriction (default: `workload.sas.com/class=stateful`)
@@ -125,18 +130,25 @@ Chaos testing was performed to validate multi-zone resilience by cordoning all n
125130

126131
**Validation Result**: Topology constraints working as designed
127132

133+
**Production Deployment Note**:
134+
The hostname-level constraint uses `ScheduleAnyway` (best-effort) to ensure StatefulSets
135+
can schedule successfully even when perfect node-level balance is not achievable. This
136+
prevents scheduling deadlock while maintaining strict zone-level protection. Zone-level
137+
distribution remains strictly enforced with `DoNotSchedule` to prevent concentration.
138+
128139
### Known Limitation (By Design)
129140

130141
**Complete Zone Failure Behavior**:
131142
- When an entire availability zone becomes unavailable (all nodes cordoned/failed), affected StatefulSet pods **cannot reschedule** to remaining zones
132143
- Pods remain in `Pending` state until the failed zone recovers
133-
- This is the intended behavior with `maxSkew: 1` + `whenUnsatisfiable: DoNotSchedule`
144+
- This is the intended behavior with strict zone-level constraint: `maxSkew: 1` + `whenUnsatisfiable: DoNotSchedule`
134145

135146
**Why This is Acceptable**:
136147
1. **Primary Goal Achieved**: Prevents cross-nodepool pods from concentrating in a single zone during normal operations
137148
2. **Rare Scenario**: Complete zone failures are uncommon (Azure/AWS/GCP multi-zone SLA > 99.99%)
138149
3. **Planned Maintenance**: Production zone maintenance is typically planned, allowing for graceful pod draining
139150
4. **Trade-off Decision**: Temporary unavailability during zone outage vs. chronic concentration risk in normal operations
151+
5. **Production Safety**: Hostname-level constraint uses `ScheduleAnyway` to prevent scheduling issues during normal operations while zone-level remains strict
140152

141153
**Recovery**:
142154
Once the zone becomes available again, pods automatically reschedule and rebalance:
@@ -147,18 +159,31 @@ kubectl uncordon <zone-nodes>
147159

148160
### Alternative Constraint Options
149161

150-
If complete zone failure rescheduling is required, consider:
162+
If different scheduling behavior is required, consider:
151163

152-
**Option A: Use `ScheduleAnyway`**
164+
**Option A: Strict Hostname Enforcement**
153165
```yaml
154-
whenUnsatisfiable: ScheduleAnyway # Allows scheduling during zone failure
166+
whenUnsatisfiable: DoNotSchedule # For both zone AND hostname
167+
```
168+
- Warning: May cause scheduling deadlock in constrained environments
169+
- Only recommended for clusters with abundant stateful node capacity
170+
171+
**Option B: Relax Zone Constraint**
172+
```yaml
173+
# Zone-level
174+
whenUnsatisfiable: ScheduleAnyway # Allows zone concentration
175+
176+
# Hostname-level
177+
whenUnsatisfiable: ScheduleAnyway # Current: best-effort spreading
155178
```
156-
- Warning: Weakens constraint enforcement during normal operations
179+
- Warning: Weakens primary PSCLOUD-64 protection
180+
- Not recommended for production multi-zone deployments
157181

158-
**Option B: Increase `maxSkew`**
182+
**Option C: Increase Zone maxSkew**
159183
```yaml
160-
maxSkew: 2 # Allows 0-2-1 distribution during zone failure
184+
maxSkew: 2 # Allows more imbalanced zone distribution
161185
```
162-
- Warning: Permits less balanced distribution in normal conditions
186+
- Warning: Permits concentration (e.g., 0-2-1 or 1-3-2 distribution)
187+
- Reduces protection against zone failures
163188

164-
**Current Implementation**: Uses strict enforcement (`DoNotSchedule`, `maxSkew: 1`) to prioritize prevention of zone concentration during normal operations.
189+
**Current Implementation (Recommended)**: Uses strict zone enforcement (`DoNotSchedule`, `maxSkew: 1`) with best-effort hostname spreading (`ScheduleAnyway`, `maxSkew: 1`) to balance zone protection with reliable scheduling.

roles/vdm/templates/transformers/consul-zone-distribution.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ patch: |-
1818
app: sas-consul-server
1919
- maxSkew: 1
2020
topologyKey: kubernetes.io/hostname
21-
whenUnsatisfiable: DoNotSchedule
21+
whenUnsatisfiable: ScheduleAnyway
2222
labelSelector:
2323
matchLabels:
2424
app: sas-consul-server

roles/vdm/templates/transformers/data-agent-zone-distribution.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ patch: |-
1818
app.kubernetes.io/name: sas-data-agent-server-colocated
1919
- maxSkew: 1
2020
topologyKey: kubernetes.io/hostname
21-
whenUnsatisfiable: DoNotSchedule
21+
whenUnsatisfiable: ScheduleAnyway
2222
labelSelector:
2323
matchLabels:
2424
app.kubernetes.io/name: sas-data-agent-server-colocated

roles/vdm/templates/transformers/opendistro-zone-distribution.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ patch: |-
1818
app.kubernetes.io/name: sas-opendistro
1919
- maxSkew: 1
2020
topologyKey: kubernetes.io/hostname
21-
whenUnsatisfiable: DoNotSchedule
21+
whenUnsatisfiable: ScheduleAnyway
2222
labelSelector:
2323
matchLabels:
2424
app.kubernetes.io/name: sas-opendistro

roles/vdm/templates/transformers/postgres-zone-distribution.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ patch: |-
1818
postgres-operator.crunchydata.com/cluster: shared-postgres
1919
- maxSkew: 1
2020
topologyKey: kubernetes.io/hostname
21-
whenUnsatisfiable: DoNotSchedule
21+
whenUnsatisfiable: ScheduleAnyway
2222
labelSelector:
2323
matchLabels:
2424
postgres-operator.crunchydata.com/cluster: shared-postgres

roles/vdm/templates/transformers/rabbitmq-zone-distribution.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ patch: |-
1818
app.kubernetes.io/name: sas-rabbitmq-server
1919
- maxSkew: 1
2020
topologyKey: kubernetes.io/hostname
21-
whenUnsatisfiable: DoNotSchedule
21+
whenUnsatisfiable: ScheduleAnyway
2222
labelSelector:
2323
matchLabels:
2424
app.kubernetes.io/name: sas-rabbitmq-server

roles/vdm/templates/transformers/redis-zone-distribution.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ patch: |-
1818
app.kubernetes.io/name: sas-redis-server
1919
- maxSkew: 1
2020
topologyKey: kubernetes.io/hostname
21-
whenUnsatisfiable: DoNotSchedule
21+
whenUnsatisfiable: ScheduleAnyway
2222
labelSelector:
2323
matchLabels:
2424
app.kubernetes.io/name: sas-redis-server

roles/vdm/templates/transformers/workload-orchestrator-zone-distribution.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ patch: |-
1818
app.kubernetes.io/name: sas-workload-orchestrator
1919
- maxSkew: 1
2020
topologyKey: kubernetes.io/hostname
21-
whenUnsatisfiable: DoNotSchedule
21+
whenUnsatisfiable: ScheduleAnyway
2222
labelSelector:
2323
matchLabels:
2424
app.kubernetes.io/name: sas-workload-orchestrator

0 commit comments

Comments
 (0)