Skip to content

Commit 29c805f

Browse files
committed
reject mis-scheduled pod
1 parent 95584e0 commit 29c805f

File tree

1 file changed

+15
-11
lines changed
  • keps/sig-storage/5381-mutable-pv-affinity

1 file changed

+15
-11
lines changed

keps/sig-storage/5381-mutable-pv-affinity/README.md

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -280,17 +280,6 @@ This might be a good place to talk about core concepts and how they relate.
280280
It is never re-evaluated when the pod is already running.
281281
It is storage provider's responsibility to ensure that the running workload is not interrupted.
282282

283-
**Possible race condition**
284-
285-
There is a race condition between volume modification and pod scheduling:
286-
1. User modifies the volume from storage provider.
287-
3. A new Pod is created and scheduler schedules it with the old affinity.
288-
4. User sets the new affinity to the PV.
289-
5. KCM/external-attacher attaches the volume to the node, and find the affinity mismatch.
290-
291-
If this happens, the pod will be stuck in a `ContainerCreating` state.
292-
User will have to manually delete the pod, or using Kubernetes [descheduler](https://github.com/kubernetes-sigs/descheduler) or similar.
293-
294283

295284
### Risks and Mitigations
296285

@@ -315,6 +304,21 @@ required) or even code snippets. If there's any ambiguity about HOW your
315304
proposal will be implemented, this is the place to discuss them.
316305
-->
317306

307+
### Handling race condition
308+
309+
There is a race condition between volume modification and pod scheduling:
310+
1. User modifies the volume from storage provider.
311+
3. A new Pod is created and scheduler schedules it with the old affinity.
312+
4. User sets the new affinity to the PV.
313+
5. KCM/external-attacher attaches the volume to the node, and find the affinity mismatch.
314+
315+
If this happens, the pod will be stuck in a `ContainerCreating` state.
316+
Kubelet should detect this contidion and reject the pod.
317+
Hopefully some other controllers (StatefulSet controller) will re-create the pod and it will be scheduled to the correct node.
318+
319+
Specifically, kubelet investigates the cause of the failure by checking the status of the underlying VolumeAttachment object.
320+
If `FailedPrecondition` error is found, and PV's nodeAffinity does not match current node,
321+
kubelet will setting pod phase to 'Failed'
318322

319323
### Test Plan
320324

0 commit comments

Comments
 (0)