@@ -801,6 +801,9 @@ A rollback might be considered if the metric `scheduler_pending_pods{queue="gate
801
801
high watermark for a long time. It, if not intentionally, may reveal that some controllers forget
802
802
to empty the Pods' scheduling gates, which keep them in pending state.
803
803
804
+ Another indicator for rollback is the 90-percentile value of metric ` scheduler_plugin_execution_duration_seconds{plugin="SchedulingGates"} `
805
+ exceeds 100ms steadily.
806
+
804
807
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
805
808
806
809
<!--
@@ -836,6 +839,9 @@ Node to host the Pod
836
839
- ` scheduler_pending_pods{queue="gated"} ` (new): scheduler respect the Pod's present ` schedulingGates `
837
840
and hence not schedule it
838
841
842
+ The metric ` scheduler_plugin_execution_duration_seconds{plugin="SchedulingGates"} ` gives a histogram
843
+ to show the Nth percentile value how SchedulingGates plugin is executed.
844
+
839
845
Moreover, to explicitly indicate a Pod's scheduling-unready state, a condition
840
846
` {type:PodScheduled, reason:SchedulingGated} ` is introduced.
841
847
@@ -848,6 +854,7 @@ logs or events for this purpose.
848
854
-->
849
855
850
856
- observe non-zero value for the metric ` pending_pods{queue="gated"} `
857
+ - observe entries for the metric ` scheduler_plugin_execution_duration_seconds{plugin="SchedulingGates"} `
851
858
- observe non-empty value in a Pod's ` .spec.schedulingGates ` field
852
859
853
860
###### How can someone using this feature know that it is working for their instance?
@@ -901,6 +908,7 @@ Pick one more of these and delete the rest.
901
908
902
909
- [x] Metrics
903
910
- Metric name: scheduler_pending_pods{queue="gated"}
911
+ - Metric name: scheduler_plugin_execution_duration_seconds{plugin="SchedulingGates"}
904
912
- Components exposing the metric: kube-scheduler
905
913
906
914
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
0 commit comments