Merge pull request #38016 from MikeSpreitzer/add-borrowing-metrics

k8s-ci-robot · web-flow · commit 3801b57dda1a · 2022-11-29T12:53:08.000-08:00
Update APF doc to track introduction of borrowing
diff --git a/content/en/docs/concepts/cluster-administration/flow-control.md b/content/en/docs/concepts/cluster-administration/flow-control.md
@@ -52,20 +52,20 @@ for a general explanation of feature gates and how to enable and
 disable them.  The name of the feature gate for APF is
 "APIPriorityAndFairness".  This feature also involves an {{<
 glossary_tooltip term_id="api-group" text="API Group" >}} with: (a) a
-`v1alpha1` version, disabled by default, and (b) `v1beta1` and
-`v1beta2` versions, enabled by default.  You can disable the feature
-gate and API group beta versions by adding the following
-command-line flags to your `kube-apiserver` invocation:
+`v1alpha1` version and a `v1beta1` version, disabled by default, and
+(b) `v1beta2` and `v1beta3` versions, enabled by default.  You can
+disable the feature gate and API group beta versions by adding the
+following command-line flags to your `kube-apiserver` invocation:
 
 ```shell
 kube-apiserver \
 --feature-gates=APIPriorityAndFairness=false \
---runtime-config=flowcontrol.apiserver.k8s.io/v1beta1=false,flowcontrol.apiserver.k8s.io/v1beta2=false \
+--runtime-config=flowcontrol.apiserver.k8s.io/v1beta2=false,flowcontrol.apiserver.k8s.io/v1beta3=false \
  # …and other flags as usual
 ```
 
-Alternatively, you can enable the v1alpha1 version of the API group
-with `--runtime-config=flowcontrol.apiserver.k8s.io/v1alpha1=true`.
+Alternatively, you can enable the v1alpha1 and v1beta1 versions of the API group
+with `--runtime-config=flowcontrol.apiserver.k8s.io/v1alpha1=true,flowcontrol.apiserver.k8s.io/v1beta1=true`.
 
 The command-line flag `--enable-priority-and-fairness=false` will disable the
 API Priority and Fairness feature, even if other flags have enabled it.
@@ -89,14 +89,21 @@ Without APF enabled, overall concurrency in the API server is limited by the
 defined by these flags are summed and then the sum is divided up among a
 configurable set of _priority levels_. Each incoming request is assigned to a
 single priority level, and each priority level will only dispatch as many
-concurrent requests as its configuration allows.
+concurrent requests as its particular limit allows.
 
 The default configuration, for example, includes separate priority levels for
 leader-election requests, requests from built-in controllers, and requests from
 Pods. This means that an ill-behaved Pod that floods the API server with
 requests cannot prevent leader election or actions by the built-in controllers
 from succeeding.
 
+The concurrency limits of the priority levels are periodically
+adjusted, allowing under-utilized priority levels to temporarily lend
+concurrency to heavily-utilized levels.  These limits are based on
+nominal limits and bounds on how much concurrency a priority level may
+lend and how much it may borrow, all derived from the configuration
+objects mentioned below.
+
 ### Seats Occupied by a Request
 
 The above description of concurrency management is the baseline story.
@@ -187,15 +194,38 @@ A PriorityLevelConfiguration represents a single priority level. Each
 PriorityLevelConfiguration has an independent limit on the number of outstanding
 requests, and limitations on the number of queued requests.
 
-Concurrency limits for PriorityLevelConfigurations are not specified in absolute
-number of requests, but rather in "concurrency shares." The total concurrency
-limit for the API Server is distributed among the existing
-PriorityLevelConfigurations in proportion with these shares. This allows a
-cluster administrator to scale up or down the total amount of traffic to a
-server by restarting `kube-apiserver` with a different value for
-`--max-requests-inflight` (or `--max-mutating-requests-inflight`), and all
-PriorityLevelConfigurations will see their maximum allowed concurrency go up (or
-down) by the same fraction.
+The nominal oncurrency limit for a PriorityLevelConfiguration is not
+specified in an absolute number of seats, but rather in "nominal
+concurrency shares." The total concurrency limit for the API Server is
+distributed among the existing PriorityLevelConfigurations in
+proportion to these shares, to give each level its nominal limit in
+terms of seats. This allows a cluster administrator to scale up or
+down the total amount of traffic to a server by restarting
+`kube-apiserver` with a different value for `--max-requests-inflight`
+(or `--max-mutating-requests-inflight`), and all
+PriorityLevelConfigurations will see their maximum allowed concurrency
+go up (or down) by the same fraction.
+
+{{< caution >}}
+In the versions before `v1beta3` the relevant
+PriorityLevelConfiguration field is named "assured concurrency shares"
+rather than "nominal concurrency shares".  Also, in Kubernetes release
+1.25 and earlier there were no periodic adjustments: the
+nominal/assured limits were always applied without adjustment.
+{{< /caution >}}
+
+The bounds on how much concurrency a priority level may lend and how
+much it may borrow are expressed in the PriorityLevelConfiguration as
+percentages of the level's nominal limit.  These are resolved to
+absolute numbers of seats by multiplying with the nominal limit /
+100.0 and rounding.  The dynamically adjusted concurrency limit of a
+priority level is constrained to lie between (a) a lower bound of its
+nominal limit minus its lendable seats and (b) an upper bound of its
+nominal limit plus the seats it may borrow.  At each adjustment the
+dynamic limits are derived by each priority level reclaiming any lent
+seats for which demand recently appeared and then jointly fairly
+responding to the recent seat demand on the priority levels, within
+the bounds just described.
 
 {{< caution >}}
 With the Priority and Fairness feature enabled, the total concurrency limit for
@@ -606,10 +636,55 @@ poorly-behaved workloads that may be harming system health.
   to increase that PriorityLevelConfiguration's concurrency shares.
   {{< /note >}}
 
-* `apiserver_flowcontrol_request_concurrency_limit` is a gauge vector
-  holding the computed concurrency limit (based on the API server's
-  total concurrency limit and PriorityLevelConfigurations' concurrency
-  shares), broken down by the label `priority_level`.
+* `apiserver_flowcontrol_request_concurrency_limit` is the same as
+  `apiserver_flowcontrol_nominal_limit_seats`.  Before the
+  introduction of concurrency borrowing between priority levels, this
+  was always equal to `apiserver_flowcontrol_current_limit_seats`
+  (which did not exist as a distinct metric).
+
+* `apiserver_flowcontrol_nominal_limit_seats` is a gauge vector
+  holding each priority level's nominal concurrency limit, computed
+  from the API server's total concurrency limit and the priority
+  level's configured nominal concurrency shares.
+
+* `apiserver_flowcontrol_lower_limit_seats` is a gauge vector holding
+  the lower bound on each priority level's dynamic concurrency limit.
+
+* `apiserver_flowcontrol_upper_limit_seats` is a gauge vector holding
+  the upper bound on each priority level's dynamic concurrency limit.
+
+* `apiserver_flowcontrol_demand_seats` is a histogram vector counting
+  observations, at the end of every nanosecond, of each priority
+  level's ratio of (seat demand) / (nominal concurrency limit).  A
+  priority level's seat demand is the sum, over both queued requests
+  and those in the initial phase of execution, of the maximum of the
+  number of seats occupied in the request's initial and final
+  execution phases.
+
+* `apiserver_flowcontrol_demand_seats_high_watermark` is a gauge vector
+  holding, for each priority level, the maximum seat demand seen
+  during the last concurrency borrowing adjustment period.
+
+* `apiserver_flowcontrol_demand_seats_average` is a gauge vector
+  holding, for each priority level, the time-weighted average seat
+  demand seen during the last concurrency borrowing adjustment period.
+
+* `apiserver_flowcontrol_demand_seats_stdev` is a gauge vector
+  holding, for each priority level, the time-weighted population
+  standard deviation of seat demand seen during the last concurrency
+  borrowing adjustment period.
+
+* `apiserver_flowcontrol_target_seats` is a gauge vector holding, for
+  each priority level, the concurrency target going into the borrowing
+  allocation problem.
+
+* `apiserver_flowcontrol_seat_fair_frac` is a gauge holding the fair
+  allocation fraction determined in the last borrowing adjustment.
+
+* `apiserver_flowcontrol_current_limit_seats` is a gauge vector
+  holding, for each priority level, the dynamic concurrency limit
+  derived in the last adjustment.
+
 
 * `apiserver_flowcontrol_request_wait_duration_seconds` is a histogram
   vector of how long requests spent queued, broken down by the labels