@@ -93,6 +93,8 @@ tags, and then generate with `hack/update-toc.sh`.
93
93
- [ Risks and Mitigations] ( #risks-and-mitigations )
94
94
- [ Design Details] ( #design-details )
95
95
- [ Kubernetes Changes, Access Mode] ( #kubernetes-changes-access-mode )
96
+ - [ Scheduler Enforcement] ( #scheduler-enforcement )
97
+ - [ Mount Enforcement] ( #mount-enforcement )
96
98
- [ CSI Specification Changes, Volume Capabilities] ( #csi-specification-changes-volume-capabilities )
97
99
- [ Test Plan] ( #test-plan )
98
100
- [ Validation of PersistentVolumeSpec Object] ( #validation-of-persistentvolumespec-object )
@@ -385,14 +387,35 @@ access mode type if the feature gate is enabled.
385
387
386
388
This access mode will be enforced in two places:
387
389
388
- - First is at the time a pod is scheduled. When scheduling a pod, if another pod
389
- is found using the same PVC and the PVC uses ReadWriteOncePod, then scheduling
390
- will fail and the pod will be considered unresolvable.
391
- - As an additional precaution this will also be enforced at the time a volume is
392
- mounted for filesystem devices, and at the time a volume is mapped for block
393
- devices. During the mount operation, kubelet will check the actual state of
394
- the world to determine if the volume is already in-use by another pod. If it
395
- is, kubelet will fail mounting with an appropriate error message.
390
+ #### Scheduler Enforcement
391
+
392
+ First is at the time a pod is scheduled. When scheduling a pod, if another pod
393
+ is found using the same PVC and the PVC uses ReadWriteOncePod, then scheduling
394
+ will fail and the pod will be considered unresolvable.
395
+
396
+ In order to determine if a pod using a ReadWriteOncePod PVC can be scheduled, we
397
+ need to enumerate all pods and check if any are already consuming this PVC. This
398
+ logic will take place as part of the PreFilter extension point in the [ volume
399
+ restrictions plugin] .
400
+
401
+ The [ node info cache] will be extended to map the PVC name to a reference count
402
+ for the PVC. In the PreFilter extension point, if the pod's PVC is using
403
+ ReadWriteOncePod, we will query this map for each node checking for references
404
+ to the scheduled pod's PVC. If one is found the pod will fail scheduling and be
405
+ marked unresolvable.
406
+
407
+ [ volume restrictions plugin ] : https://github.com/kubernetes/kubernetes/blob/v1.21.0/pkg/scheduler/framework/plugins/volumerestrictions/volume_restrictions.go#L29
408
+ [ node info cache ] : https://github.com/kubernetes/kubernetes/blob/v1.21.0/pkg/scheduler/framework/types.go#L357
409
+
410
+ #### Mount Enforcement
411
+
412
+ As an additional precaution this will also be enforced at the time a volume is
413
+ mounted for filesystem devices, and at the time a volume is mapped for block
414
+ devices. During the mount operation, kubelet will check the [ actual state of the
415
+ world cache] to determine if the volume is already in-use by another pod. If it
416
+ is, kubelet will fail mounting with an appropriate error message.
417
+
418
+ [ actual state of the world cache ] : https://github.com/kubernetes/kubernetes/blob/v1.21.0/pkg/kubelet/volumemanager/cache/actual_state_of_world.go#L46
396
419
397
420
### CSI Specification Changes, Volume Capabilities
398
421
0 commit comments