@@ -988,7 +988,7 @@ for mount behavior (if the the feature gate is enabled).
988
988
This section must be completed when targeting beta to a release.
989
989
-->
990
990
991
- ###### How can a rollout fail? Can it impact already running workloads?
991
+ ###### How can a rollout or rollback fail? Can it impact already running workloads?
992
992
993
993
<!--
994
994
Try to be as paranoid as possible - e.g., what if some components will restart
@@ -1086,19 +1086,28 @@ An operator can query for PersistentVolumeClaims and PersistentVolumes in the
1086
1086
cluster with the ReadWriteOncePod access mode. If any exist then the feature is
1087
1087
in use.
1088
1088
1089
- ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service ?
1089
+ ###### How can someone using this feature know that it is working for their instance ?
1090
1090
1091
1091
<!--
1092
+ For instance, if this is a pod-related feature, it should be possible to determine if the feature is functioning properly
1093
+ for each individual pod.
1092
1094
Pick one more of these and delete the rest.
1095
+ Please describe all items visible to end users below with sufficient detail so that they can verify correct enablement
1096
+ and operation of this feature.
1097
+ Recall that end users cannot usually observe component logs or access metrics.
1093
1098
-->
1094
1099
1095
- - [X] Metrics
1096
- - Metric name: ` scheduler_unschedulable_pods{plugin="VolumeRestrictions"} `
1097
- - [ Optional] Aggregation method:
1098
- - Components exposing the metric:
1099
- - kube-scheduler
1100
+ - [X] Other
1101
+ - Details:
1102
+ - Create two Pods using the same PersistentVolumeClaim with the ReadWriteOncePod access mode
1103
+ - (If cluster access available) A PersistentVolume should be created with ` .status.phase=Bound `
1104
+ - A PersistentVolumeClaim should be created with ` .status.phase=Bound ` and have ExternalProvisioning, Provisioning, and ProvisioningSucceeded events
1105
+ - (If cluster access available) A VolumeAttachment should be created with ` .status.attached=True `
1106
+ - One Pod should have a SuccessfulAttachVolume event and its Ready status condition set to True
1107
+ - The other Pod should have a PodScheduled status condition set to False wth reason "Unschedulable" and FailedScheduling events
1108
+ - The successful Pod should be able to access the volume at the provided mount path
1100
1109
1101
- ###### What are the reasonable SLOs (Service Level Objectives) for the above SLIs ?
1110
+ ###### What are the reasonable SLOs (Service Level Objectives) for the enhancement ?
1102
1111
1103
1112
<!--
1104
1113
At a high level, this usually will be in the form of "high percentile of SLI
@@ -1127,6 +1136,18 @@ kubelet.
1127
1136
You may also see an increase in the ` csi_sidecar_operations_seconds_bucket `
1128
1137
metric exported by CSI sidecars if there are issues performing CSI operations.
1129
1138
1139
+ ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
1140
+
1141
+ <!--
1142
+ Pick one more of these and delete the rest.
1143
+ -->
1144
+
1145
+ - [X] Metrics
1146
+ - Metric name: ` scheduler_unschedulable_pods{plugin="VolumeRestrictions"} `
1147
+ - [ Optional] Aggregation method:
1148
+ - Components exposing the metric:
1149
+ - kube-scheduler
1150
+
1130
1151
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
1131
1152
1132
1153
<!--
@@ -1263,6 +1284,20 @@ This through this both in small and large cases, again with respect to the
1263
1284
1264
1285
No, the solution will involve using the same ActualStateOfWorld cache in kubelet.
1265
1286
1287
+ ###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
1288
+
1289
+ <!--
1290
+ Focus not just on happy cases, but primarily on more pathological cases
1291
+ (e.g. probes taking a minute instead of milliseconds, failed pods consuming resources, etc.).
1292
+ If any of the resources can be exhausted, how this is mitigated with the existing limits
1293
+ (e.g. pods per node) or new limits added by this KEP?
1294
+
1295
+ Are there any tests that were run/should be run to understand performance characteristics better
1296
+ and validate the declared limits?
1297
+ -->
1298
+
1299
+ No.
1300
+
1266
1301
### Troubleshooting
1267
1302
1268
1303
<!--
0 commit comments