@@ -280,17 +280,6 @@ This might be a good place to talk about core concepts and how they relate.
280
280
It is never re-evaluated when the pod is already running.
281
281
It is storage provider's responsibility to ensure that the running workload is not interrupted.
282
282
283
- **Possible race condition**
284
-
285
- There is a race condition between volume modification and pod scheduling :
286
- 1. User modifies the volume from storage provider.
287
- 3. A new Pod is created and scheduler schedules it with the old affinity.
288
- 4. User sets the new affinity to the PV.
289
- 5. KCM/external-attacher attaches the volume to the node, and find the affinity mismatch.
290
-
291
- If this happens, the pod will be stuck in a `ContainerCreating` state.
292
- User will have to manually delete the pod, or using Kubernetes [descheduler](https://github.com/kubernetes-sigs/descheduler) or similar.
293
-
294
283
295
284
# ## Risks and Mitigations
296
285
@@ -315,6 +304,21 @@ required) or even code snippets. If there's any ambiguity about HOW your
315
304
proposal will be implemented, this is the place to discuss them.
316
305
-->
317
306
307
+ # ## Handling race condition
308
+
309
+ There is a race condition between volume modification and pod scheduling :
310
+ 1. User modifies the volume from storage provider.
311
+ 3. A new Pod is created and scheduler schedules it with the old affinity.
312
+ 4. User sets the new affinity to the PV.
313
+ 5. KCM/external-attacher attaches the volume to the node, and find the affinity mismatch.
314
+
315
+ If this happens, the pod will be stuck in a `ContainerCreating` state.
316
+ Kubelet should detect this contidion and reject the pod.
317
+ Hopefully some other controllers (StatefulSet controller) will re-create the pod and it will be scheduled to the correct node.
318
+
319
+ Specifically, kubelet investigates the cause of the failure by checking the status of the underlying VolumeAttachment object.
320
+ If `FailedPrecondition` error is found, and PV's nodeAffinity does not match current node,
321
+ kubelet will setting pod phase to 'Failed'
318
322
319
323
# ## Test Plan
320
324
0 commit comments