@@ -73,6 +73,7 @@ SIG Architecture for cross-cutting KEPs).
73
73
- [ Notes/Constraints/Caveats (Optional)] ( #notesconstraintscaveats-optional )
74
74
- [ Risks and Mitigations] ( #risks-and-mitigations )
75
75
- [ Design Details] ( #design-details )
76
+ - [ Handling race condition] ( #handling-race-condition )
76
77
- [ Test Plan] ( #test-plan )
77
78
- [ Prerequisite testing updates] ( #prerequisite-testing-updates )
78
79
- [ Unit tests] ( #unit-tests )
@@ -280,17 +281,6 @@ This might be a good place to talk about core concepts and how they relate.
280
281
It is never re-evaluated when the pod is already running.
281
282
It is storage provider's responsibility to ensure that the running workload is not interrupted.
282
283
283
- **Possible race condition**
284
-
285
- There is a race condition between volume modification and pod scheduling :
286
- 1. User modifies the volume from storage provider.
287
- 3. A new Pod is created and scheduler schedules it with the old affinity.
288
- 4. User sets the new affinity to the PV.
289
- 5. KCM/external-attacher attaches the volume to the node, and find the affinity mismatch.
290
-
291
- If this happens, the pod will be stuck in a `ContainerCreating` state.
292
- User will have to manually delete the pod, or using Kubernetes [descheduler](https://github.com/kubernetes-sigs/descheduler) or similar.
293
-
294
284
295
285
# ## Risks and Mitigations
296
286
@@ -315,6 +305,21 @@ required) or even code snippets. If there's any ambiguity about HOW your
315
305
proposal will be implemented, this is the place to discuss them.
316
306
-->
317
307
308
+ # ## Handling race condition
309
+
310
+ There is a race condition between volume modification and pod scheduling :
311
+ 1. User modifies the volume from storage provider.
312
+ 3. A new Pod is created and scheduler schedules it with the old affinity.
313
+ 4. User sets the new affinity to the PV.
314
+ 5. KCM/external-attacher attaches the volume to the node, and find the affinity mismatch.
315
+
316
+ If this happens, the pod will be stuck in a `ContainerCreating` state.
317
+ Kubelet should detect this contidion and reject the pod.
318
+ Hopefully some other controllers (StatefulSet controller) will re-create the pod and it will be scheduled to the correct node.
319
+
320
+ Specifically, kubelet investigates the cause of the failure by checking the status of the underlying VolumeAttachment object.
321
+ If `FailedPrecondition` error is found, and PV's nodeAffinity does not match current node,
322
+ kubelet will setting pod phase to 'Failed'
318
323
319
324
# ## Test Plan
320
325
0 commit comments