Skip to content

Commit 3801b57

Browse files
authored
Merge pull request #38016 from MikeSpreitzer/add-borrowing-metrics
Update APF doc to track introduction of borrowing
2 parents a00215d + 5ce3dcf commit 3801b57

File tree

1 file changed

+96
-21
lines changed

1 file changed

+96
-21
lines changed

content/en/docs/concepts/cluster-administration/flow-control.md

Lines changed: 96 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -52,20 +52,20 @@ for a general explanation of feature gates and how to enable and
5252
disable them. The name of the feature gate for APF is
5353
"APIPriorityAndFairness". This feature also involves an {{<
5454
glossary_tooltip term_id="api-group" text="API Group" >}} with: (a) a
55-
`v1alpha1` version, disabled by default, and (b) `v1beta1` and
56-
`v1beta2` versions, enabled by default. You can disable the feature
57-
gate and API group beta versions by adding the following
58-
command-line flags to your `kube-apiserver` invocation:
55+
`v1alpha1` version and a `v1beta1` version, disabled by default, and
56+
(b) `v1beta2` and `v1beta3` versions, enabled by default. You can
57+
disable the feature gate and API group beta versions by adding the
58+
following command-line flags to your `kube-apiserver` invocation:
5959

6060
```shell
6161
kube-apiserver \
6262
--feature-gates=APIPriorityAndFairness=false \
63-
--runtime-config=flowcontrol.apiserver.k8s.io/v1beta1=false,flowcontrol.apiserver.k8s.io/v1beta2=false \
63+
--runtime-config=flowcontrol.apiserver.k8s.io/v1beta2=false,flowcontrol.apiserver.k8s.io/v1beta3=false \
6464
# …and other flags as usual
6565
```
6666

67-
Alternatively, you can enable the v1alpha1 version of the API group
68-
with `--runtime-config=flowcontrol.apiserver.k8s.io/v1alpha1=true`.
67+
Alternatively, you can enable the v1alpha1 and v1beta1 versions of the API group
68+
with `--runtime-config=flowcontrol.apiserver.k8s.io/v1alpha1=true,flowcontrol.apiserver.k8s.io/v1beta1=true`.
6969

7070
The command-line flag `--enable-priority-and-fairness=false` will disable the
7171
API Priority and Fairness feature, even if other flags have enabled it.
@@ -89,14 +89,21 @@ Without APF enabled, overall concurrency in the API server is limited by the
8989
defined by these flags are summed and then the sum is divided up among a
9090
configurable set of _priority levels_. Each incoming request is assigned to a
9191
single priority level, and each priority level will only dispatch as many
92-
concurrent requests as its configuration allows.
92+
concurrent requests as its particular limit allows.
9393

9494
The default configuration, for example, includes separate priority levels for
9595
leader-election requests, requests from built-in controllers, and requests from
9696
Pods. This means that an ill-behaved Pod that floods the API server with
9797
requests cannot prevent leader election or actions by the built-in controllers
9898
from succeeding.
9999

100+
The concurrency limits of the priority levels are periodically
101+
adjusted, allowing under-utilized priority levels to temporarily lend
102+
concurrency to heavily-utilized levels. These limits are based on
103+
nominal limits and bounds on how much concurrency a priority level may
104+
lend and how much it may borrow, all derived from the configuration
105+
objects mentioned below.
106+
100107
### Seats Occupied by a Request
101108

102109
The above description of concurrency management is the baseline story.
@@ -187,15 +194,38 @@ A PriorityLevelConfiguration represents a single priority level. Each
187194
PriorityLevelConfiguration has an independent limit on the number of outstanding
188195
requests, and limitations on the number of queued requests.
189196

190-
Concurrency limits for PriorityLevelConfigurations are not specified in absolute
191-
number of requests, but rather in "concurrency shares." The total concurrency
192-
limit for the API Server is distributed among the existing
193-
PriorityLevelConfigurations in proportion with these shares. This allows a
194-
cluster administrator to scale up or down the total amount of traffic to a
195-
server by restarting `kube-apiserver` with a different value for
196-
`--max-requests-inflight` (or `--max-mutating-requests-inflight`), and all
197-
PriorityLevelConfigurations will see their maximum allowed concurrency go up (or
198-
down) by the same fraction.
197+
The nominal oncurrency limit for a PriorityLevelConfiguration is not
198+
specified in an absolute number of seats, but rather in "nominal
199+
concurrency shares." The total concurrency limit for the API Server is
200+
distributed among the existing PriorityLevelConfigurations in
201+
proportion to these shares, to give each level its nominal limit in
202+
terms of seats. This allows a cluster administrator to scale up or
203+
down the total amount of traffic to a server by restarting
204+
`kube-apiserver` with a different value for `--max-requests-inflight`
205+
(or `--max-mutating-requests-inflight`), and all
206+
PriorityLevelConfigurations will see their maximum allowed concurrency
207+
go up (or down) by the same fraction.
208+
209+
{{< caution >}}
210+
In the versions before `v1beta3` the relevant
211+
PriorityLevelConfiguration field is named "assured concurrency shares"
212+
rather than "nominal concurrency shares". Also, in Kubernetes release
213+
1.25 and earlier there were no periodic adjustments: the
214+
nominal/assured limits were always applied without adjustment.
215+
{{< /caution >}}
216+
217+
The bounds on how much concurrency a priority level may lend and how
218+
much it may borrow are expressed in the PriorityLevelConfiguration as
219+
percentages of the level's nominal limit. These are resolved to
220+
absolute numbers of seats by multiplying with the nominal limit /
221+
100.0 and rounding. The dynamically adjusted concurrency limit of a
222+
priority level is constrained to lie between (a) a lower bound of its
223+
nominal limit minus its lendable seats and (b) an upper bound of its
224+
nominal limit plus the seats it may borrow. At each adjustment the
225+
dynamic limits are derived by each priority level reclaiming any lent
226+
seats for which demand recently appeared and then jointly fairly
227+
responding to the recent seat demand on the priority levels, within
228+
the bounds just described.
199229

200230
{{< caution >}}
201231
With the Priority and Fairness feature enabled, the total concurrency limit for
@@ -606,10 +636,55 @@ poorly-behaved workloads that may be harming system health.
606636
to increase that PriorityLevelConfiguration's concurrency shares.
607637
{{< /note >}}
608638

609-
* `apiserver_flowcontrol_request_concurrency_limit` is a gauge vector
610-
holding the computed concurrency limit (based on the API server's
611-
total concurrency limit and PriorityLevelConfigurations' concurrency
612-
shares), broken down by the label `priority_level`.
639+
* `apiserver_flowcontrol_request_concurrency_limit` is the same as
640+
`apiserver_flowcontrol_nominal_limit_seats`. Before the
641+
introduction of concurrency borrowing between priority levels, this
642+
was always equal to `apiserver_flowcontrol_current_limit_seats`
643+
(which did not exist as a distinct metric).
644+
645+
* `apiserver_flowcontrol_nominal_limit_seats` is a gauge vector
646+
holding each priority level's nominal concurrency limit, computed
647+
from the API server's total concurrency limit and the priority
648+
level's configured nominal concurrency shares.
649+
650+
* `apiserver_flowcontrol_lower_limit_seats` is a gauge vector holding
651+
the lower bound on each priority level's dynamic concurrency limit.
652+
653+
* `apiserver_flowcontrol_upper_limit_seats` is a gauge vector holding
654+
the upper bound on each priority level's dynamic concurrency limit.
655+
656+
* `apiserver_flowcontrol_demand_seats` is a histogram vector counting
657+
observations, at the end of every nanosecond, of each priority
658+
level's ratio of (seat demand) / (nominal concurrency limit). A
659+
priority level's seat demand is the sum, over both queued requests
660+
and those in the initial phase of execution, of the maximum of the
661+
number of seats occupied in the request's initial and final
662+
execution phases.
663+
664+
* `apiserver_flowcontrol_demand_seats_high_watermark` is a gauge vector
665+
holding, for each priority level, the maximum seat demand seen
666+
during the last concurrency borrowing adjustment period.
667+
668+
* `apiserver_flowcontrol_demand_seats_average` is a gauge vector
669+
holding, for each priority level, the time-weighted average seat
670+
demand seen during the last concurrency borrowing adjustment period.
671+
672+
* `apiserver_flowcontrol_demand_seats_stdev` is a gauge vector
673+
holding, for each priority level, the time-weighted population
674+
standard deviation of seat demand seen during the last concurrency
675+
borrowing adjustment period.
676+
677+
* `apiserver_flowcontrol_target_seats` is a gauge vector holding, for
678+
each priority level, the concurrency target going into the borrowing
679+
allocation problem.
680+
681+
* `apiserver_flowcontrol_seat_fair_frac` is a gauge holding the fair
682+
allocation fraction determined in the last borrowing adjustment.
683+
684+
* `apiserver_flowcontrol_current_limit_seats` is a gauge vector
685+
holding, for each priority level, the dynamic concurrency limit
686+
derived in the last adjustment.
687+
613688

614689
* `apiserver_flowcontrol_request_wait_duration_seconds` is a histogram
615690
vector of how long requests spent queued, broken down by the labels

0 commit comments

Comments
 (0)