@@ -52,20 +52,20 @@ for a general explanation of feature gates and how to enable and
52
52
disable them. The name of the feature gate for APF is
53
53
"APIPriorityAndFairness". This feature also involves an {{<
54
54
glossary_tooltip term_id="api-group" text="API Group" >}} with: (a) a
55
- ` v1alpha1 ` version, disabled by default, and (b) ` v1beta1 ` and
56
- ` v1beta2 ` versions, enabled by default. You can disable the feature
57
- gate and API group beta versions by adding the following
58
- command-line flags to your ` kube-apiserver ` invocation:
55
+ ` v1alpha1 ` version and a ` v1beta1 ` version , disabled by default, and
56
+ (b) ` v1beta2 ` and ` v1beta3 ` versions, enabled by default. You can
57
+ disable the feature gate and API group beta versions by adding the
58
+ following command-line flags to your ` kube-apiserver ` invocation:
59
59
60
60
``` shell
61
61
kube-apiserver \
62
62
--feature-gates=APIPriorityAndFairness=false \
63
- --runtime-config=flowcontrol.apiserver.k8s.io/v1beta1 =false,flowcontrol.apiserver.k8s.io/v1beta2 =false \
63
+ --runtime-config=flowcontrol.apiserver.k8s.io/v1beta2 =false,flowcontrol.apiserver.k8s.io/v1beta3 =false \
64
64
# …and other flags as usual
65
65
```
66
66
67
- Alternatively, you can enable the v1alpha1 version of the API group
68
- with ` --runtime-config=flowcontrol.apiserver.k8s.io/v1alpha1=true ` .
67
+ Alternatively, you can enable the v1alpha1 and v1beta1 versions of the API group
68
+ with ` --runtime-config=flowcontrol.apiserver.k8s.io/v1alpha1=true,flowcontrol.apiserver.k8s.io/v1beta1=true ` .
69
69
70
70
The command-line flag ` --enable-priority-and-fairness=false ` will disable the
71
71
API Priority and Fairness feature, even if other flags have enabled it.
@@ -89,14 +89,21 @@ Without APF enabled, overall concurrency in the API server is limited by the
89
89
defined by these flags are summed and then the sum is divided up among a
90
90
configurable set of _ priority levels_ . Each incoming request is assigned to a
91
91
single priority level, and each priority level will only dispatch as many
92
- concurrent requests as its configuration allows.
92
+ concurrent requests as its particular limit allows.
93
93
94
94
The default configuration, for example, includes separate priority levels for
95
95
leader-election requests, requests from built-in controllers, and requests from
96
96
Pods. This means that an ill-behaved Pod that floods the API server with
97
97
requests cannot prevent leader election or actions by the built-in controllers
98
98
from succeeding.
99
99
100
+ The concurrency limits of the priority levels are periodically
101
+ adjusted, allowing under-utilized priority levels to temporarily lend
102
+ concurrency to heavily-utilized levels. These limits are based on
103
+ nominal limits and bounds on how much concurrency a priority level may
104
+ lend and how much it may borrow, all derived from the configuration
105
+ objects mentioned below.
106
+
100
107
### Seats Occupied by a Request
101
108
102
109
The above description of concurrency management is the baseline story.
@@ -187,15 +194,38 @@ A PriorityLevelConfiguration represents a single priority level. Each
187
194
PriorityLevelConfiguration has an independent limit on the number of outstanding
188
195
requests, and limitations on the number of queued requests.
189
196
190
- Concurrency limits for PriorityLevelConfigurations are not specified in absolute
191
- number of requests, but rather in "concurrency shares." The total concurrency
192
- limit for the API Server is distributed among the existing
193
- PriorityLevelConfigurations in proportion with these shares. This allows a
194
- cluster administrator to scale up or down the total amount of traffic to a
195
- server by restarting ` kube-apiserver ` with a different value for
196
- ` --max-requests-inflight ` (or ` --max-mutating-requests-inflight ` ), and all
197
- PriorityLevelConfigurations will see their maximum allowed concurrency go up (or
198
- down) by the same fraction.
197
+ The nominal oncurrency limit for a PriorityLevelConfiguration is not
198
+ specified in an absolute number of seats, but rather in "nominal
199
+ concurrency shares." The total concurrency limit for the API Server is
200
+ distributed among the existing PriorityLevelConfigurations in
201
+ proportion to these shares, to give each level its nominal limit in
202
+ terms of seats. This allows a cluster administrator to scale up or
203
+ down the total amount of traffic to a server by restarting
204
+ ` kube-apiserver ` with a different value for ` --max-requests-inflight `
205
+ (or ` --max-mutating-requests-inflight ` ), and all
206
+ PriorityLevelConfigurations will see their maximum allowed concurrency
207
+ go up (or down) by the same fraction.
208
+
209
+ {{< caution >}}
210
+ In the versions before ` v1beta3 ` the relevant
211
+ PriorityLevelConfiguration field is named "assured concurrency shares"
212
+ rather than "nominal concurrency shares". Also, in Kubernetes release
213
+ 1.25 and earlier there were no periodic adjustments: the
214
+ nominal/assured limits were always applied without adjustment.
215
+ {{< /caution >}}
216
+
217
+ The bounds on how much concurrency a priority level may lend and how
218
+ much it may borrow are expressed in the PriorityLevelConfiguration as
219
+ percentages of the level's nominal limit. These are resolved to
220
+ absolute numbers of seats by multiplying with the nominal limit /
221
+ 100.0 and rounding. The dynamically adjusted concurrency limit of a
222
+ priority level is constrained to lie between (a) a lower bound of its
223
+ nominal limit minus its lendable seats and (b) an upper bound of its
224
+ nominal limit plus the seats it may borrow. At each adjustment the
225
+ dynamic limits are derived by each priority level reclaiming any lent
226
+ seats for which demand recently appeared and then jointly fairly
227
+ responding to the recent seat demand on the priority levels, within
228
+ the bounds just described.
199
229
200
230
{{< caution >}}
201
231
With the Priority and Fairness feature enabled, the total concurrency limit for
@@ -606,10 +636,55 @@ poorly-behaved workloads that may be harming system health.
606
636
to increase that PriorityLevelConfiguration's concurrency shares.
607
637
{{< /note >}}
608
638
609
- * ` apiserver_flowcontrol_request_concurrency_limit ` is a gauge vector
610
- holding the computed concurrency limit (based on the API server's
611
- total concurrency limit and PriorityLevelConfigurations' concurrency
612
- shares), broken down by the label ` priority_level ` .
639
+ * ` apiserver_flowcontrol_request_concurrency_limit ` is the same as
640
+ ` apiserver_flowcontrol_nominal_limit_seats ` . Before the
641
+ introduction of concurrency borrowing between priority levels, this
642
+ was always equal to ` apiserver_flowcontrol_current_limit_seats `
643
+ (which did not exist as a distinct metric).
644
+
645
+ * ` apiserver_flowcontrol_nominal_limit_seats ` is a gauge vector
646
+ holding each priority level's nominal concurrency limit, computed
647
+ from the API server's total concurrency limit and the priority
648
+ level's configured nominal concurrency shares.
649
+
650
+ * ` apiserver_flowcontrol_lower_limit_seats ` is a gauge vector holding
651
+ the lower bound on each priority level's dynamic concurrency limit.
652
+
653
+ * ` apiserver_flowcontrol_upper_limit_seats ` is a gauge vector holding
654
+ the upper bound on each priority level's dynamic concurrency limit.
655
+
656
+ * ` apiserver_flowcontrol_demand_seats ` is a histogram vector counting
657
+ observations, at the end of every nanosecond, of each priority
658
+ level's ratio of (seat demand) / (nominal concurrency limit). A
659
+ priority level's seat demand is the sum, over both queued requests
660
+ and those in the initial phase of execution, of the maximum of the
661
+ number of seats occupied in the request's initial and final
662
+ execution phases.
663
+
664
+ * ` apiserver_flowcontrol_demand_seats_high_watermark ` is a gauge vector
665
+ holding, for each priority level, the maximum seat demand seen
666
+ during the last concurrency borrowing adjustment period.
667
+
668
+ * ` apiserver_flowcontrol_demand_seats_average ` is a gauge vector
669
+ holding, for each priority level, the time-weighted average seat
670
+ demand seen during the last concurrency borrowing adjustment period.
671
+
672
+ * ` apiserver_flowcontrol_demand_seats_stdev ` is a gauge vector
673
+ holding, for each priority level, the time-weighted population
674
+ standard deviation of seat demand seen during the last concurrency
675
+ borrowing adjustment period.
676
+
677
+ * ` apiserver_flowcontrol_target_seats ` is a gauge vector holding, for
678
+ each priority level, the concurrency target going into the borrowing
679
+ allocation problem.
680
+
681
+ * ` apiserver_flowcontrol_seat_fair_frac ` is a gauge holding the fair
682
+ allocation fraction determined in the last borrowing adjustment.
683
+
684
+ * ` apiserver_flowcontrol_current_limit_seats ` is a gauge vector
685
+ holding, for each priority level, the dynamic concurrency limit
686
+ derived in the last adjustment.
687
+
613
688
614
689
* ` apiserver_flowcontrol_request_wait_duration_seconds ` is a histogram
615
690
vector of how long requests spent queued, broken down by the labels
0 commit comments