Skip to content

Commit 0290d41

Browse files
committed
Update APF for borrowing by exempt priority levels
1 parent e7b7edb commit 0290d41

File tree

1 file changed

+206
-74
lines changed
  • keps/sig-api-machinery/1040-priority-and-fairness

1 file changed

+206
-74
lines changed

keps/sig-api-machinery/1040-priority-and-fairness/README.md

Lines changed: 206 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -641,20 +641,24 @@ queue has a chance of eventually getting useful work done.
641641

642642
Requests of an exempt priority are never held up in a queue; they are
643643
always dispatched immediately. Following is how the other requests
644-
are dispatched at a given apiserver.
644+
are dispatched at a given apiserver. Note that the dispatching of
645+
exempt requests can affect the dispatching of non-exempt requests,
646+
through borrowing of concurrency allocations.
645647

646648
As mentioned [above](#non-goals), the functionality described here
647649
operates independently in each apiserver.
648650

649-
The concurrency limit of an apiserver is divided among the non-exempt
650-
priority levels, and they can do a limited amount of borrowing from
651-
each other.
651+
The concurrency limit of an apiserver is divided among all the
652+
priority levels, exempt as well as non-exempt. There is a nominal
653+
division according to configuration, and a limited amount of dynamic
654+
borrowing between priority levels that responds to recent load.
652655

653656
Two fields of `LimitedPriorityLevelConfiguration`, introduced in the
654-
midst of the `v1beta2` lifetime, limit the borrowing. The fields are
655-
added in all the versions (`v1alpha1`, `v1beta1`, and `v1beta2`). The
656-
following display shows the new fields along with the updated
657-
description for the `AssuredConcurrencyShares` field, in `v1beta2`.
657+
midst of the `v1beta2` lifetime, limit the borrowing by and from
658+
non-exempt priority levels. The fields are added in all the versions
659+
(`v1alpha1`, `v1beta1`, `v1beta2`, and `v1beta3`). The following
660+
display shows the new fields along with the updated description for
661+
the `AssuredConcurrencyShares` field, in `v1beta2`.
658662

659663
```go
660664
type LimitedPriorityLevelConfiguration struct {
@@ -687,7 +691,7 @@ type LimitedPriorityLevelConfiguration struct {
687691
//
688692
// +optional
689693
LendablePercent int32
690-
694+
691695
// `borrowingLimitPercent`, if present, specifies a limit on how many seats
692696
// this priority level can borrow from other priority levels. The limit
693697
// is known as this level's BorrowingConcurrencyLimit (BorrowingCL) and
@@ -713,6 +717,74 @@ existing systems will be more continuous if we keep the meaning of
713717
`AssuredConcurrencyShares` has been renamed to
714718
`NominalConcurrencyShares`.
715719

720+
In the midst of the `v1beta3` lifetime a field was added to
721+
`PriorityLevelConfigurationSpec` to make it possible to specify the
722+
`NominalConcurrencyShares` and `LendablePercent` of exempt priority
723+
levels. That field is shown next. Also, the definition of `sum_acs`
724+
in `LimitedPriorityLevelConfiguration` was updated to sum over all
725+
priority levels rather than just the non-exempt ones. Before this
726+
change, the exempt priority levels did not get any nominal concurrency
727+
allocation nor lending limit and did not participate in borrowing ---
728+
they simply had unlimited dispatching and that had no relation with
729+
the dispatching for non-exempt priority levels.
730+
731+
```go
732+
// `exempt` specifies how requests are handled for an exempt priority level.
733+
// This field MUST be empty if `type` is `"Limited"`.
734+
// This field MAY be non-empty if `type` is `"Exempt"`.
735+
// If empty and `type` is `"Exempt"` then the default values
736+
// for `ExemptPriorityLevelConfiguration` apply.
737+
// +optional
738+
Exempt *ExemptPriorityLevelConfiguration
739+
```
740+
741+
At the same time, the relevant new datatype was added. It is shown below.
742+
743+
```go
744+
type ExemptPriorityLevelConfiguration struct {
745+
// `nominalConcurrencyShares` (NCS) contributes to the computation of the
746+
// NominalConcurrencyLimit (NominalCL) of this level.
747+
// This is the number of execution seats nominally reserved for this priority level.
748+
// This DOES NOT limit the dispatching from this priority level
749+
// but affects the other priority levels through the borrowing mechanism.
750+
// The server's concurrency limit (ServerCL) is divided among all the
751+
// priority levels in proportion to their NCS values:
752+
//
753+
// NominalCL(i) = ceil( ServerCL * NCS(i) / sum_ncs )
754+
// sum_ncs = sum[priority level k] NCS(k)
755+
//
756+
// Bigger numbers mean a larger nominal concurrency limit,
757+
// at the expense of every other Limited priority level.
758+
// This field has a default value of 30.
759+
// +optional
760+
NominalConcurrencyShares int32
761+
762+
// `lendablePercent` prescribes the fraction of the level's NominalCL that
763+
// can be borrowed by other priority levels. This value of this
764+
// field must be between 0 and 100, inclusive, and it defaults to 0.
765+
// The number of seats that other levels can borrow from this level, known
766+
// as this level's LendableConcurrencyLimit (LendableCL), is defined as follows.
767+
//
768+
// LendableCL(i) = round( NominalCL(i) * lendablePercent(i)/100.0 )
769+
//
770+
// +optional
771+
LendablePercent int32
772+
773+
// The `BorrowingCL` of an Exempt priority level is implicitly `ServerCL`.
774+
// In other words, an exempt priority level
775+
// has no meaningful limit on how much it borrows.
776+
// There is no explicit representation of that here.
777+
}
778+
```
779+
780+
The fields of `ExemptPriorityLevelConfiguration` limit the borrowing
781+
from exempt priority levels. This type and its use are added in all
782+
the versions (`v1alpha1`, `v1beta1`, `v1beta2`, and `v1beta3`). In
783+
the next version, the common fields of
784+
`LimitedPriorityLevelConfiguration` and
785+
`ExemptPriorityLevelConfiguration` will move to their common ancestor
786+
`PriorityLevelConfigurationSpec`.
787+
716788
The limits on borrowing are two-sided: a given priority level has a
717789
limit on how much it may borrow and a limit on how much may be
718790
borrowed from it. The latter is a matter of protection, the former is
@@ -728,11 +800,11 @@ may continue to do so, but there will always remain the possibility
728800
that some class of requests is much "heavier" than the APF code
729801
estimates; for those, a deliberate jail is useful.
730802

731-
The following table shows the current default non-exempt priority
732-
levels and a proposal for their new configuration.
803+
The following table shows the values for the non-exempt priority
804+
levels in the default configuration.
733805

734-
| Name | Assured Shares | Proposed Lendable | Proposed Borrowing Limit |
735-
| ---- | -------------: | ----------------: | -----------------------: |
806+
| Name | Nominal Shares | Lendable | Proposed Borrowing Limit |
807+
| ---- | -------------: | -------: | -----------------------: |
736808
| leader-election | 10 | 0% | none |
737809
| node-high | 40 | 25% | none |
738810
| system | 30 | 33% | none |
@@ -741,14 +813,23 @@ levels and a proposal for their new configuration.
741813
| global-default | 20 | 50% | none |
742814
| catch-all | 5 | 0% | none |
743815

744-
Each non-exempt priority level `i` has two concurrency limits: its
816+
The following table shows the `ExemptPriorityLevelConfiguration`
817+
introduced for the exempt priority levels in the default
818+
configuration.
819+
820+
| Name | Nominal Shares | Lendable |
821+
| ---- | -------------- | -------- |
822+
| exempt | 30 | 50% |
823+
824+
Every priority level `i` has two concurrency limits: its
745825
NominalConcurrencyLimit (`NominalCL(i)`) as defined above by
746-
configuration, and a CurrentConcurrencyLimit (`CurrentCL(i)`) that is
747-
used in dispatching requests. The CurrentCLs are adjusted
748-
periodically, based on configuration, the current situation at
749-
adjustment time, and recent observations. The "borrowing" resides in
750-
the differences between CurrentCL and NominalCL. There are upper and lower
751-
bound on each non-exempt priority level's CurrentCL, as follows.
826+
configuration, and a CurrentConcurrencyLimit (`CurrentCL(i)`) ---
827+
which, for non-exempt priority levels, is used in dispatching
828+
requests. The CurrentCLs are adjusted periodically, based on
829+
configuration, the current situation at adjustment time, and recent
830+
observations. The "borrowing" resides in the differences between
831+
CurrentCL and NominalCL. There are upper and lower bound on each
832+
non-exempt priority level's CurrentCL, as follows.
752833

753834
```
754835
MaxCL(i) = NominalCL(i) + BorrowingCL(i)
@@ -762,13 +843,15 @@ CurrentCLs is always equal to the server's concurrency limit
762843
the NominalCLs and plus or minus a little for rounding in the
763844
adjustment algorithm below.
764845

765-
Dispatching is done independently for each priority level. Whenever
766-
(1) a non-exempt priority level's number of occupied seats is zero or
767-
below the level's CurrentCL and (2) that priority level has a
768-
non-empty queue, it is time to consider dispatching another request
769-
for service. The Fair Queuing for Server Requests algorithm below is
770-
used to pick a non-empty queue at that priority level. Then the
771-
request at the head of that queue is dispatched if possible.
846+
Dispatching is done independently for each priority level.
847+
Dispatching for an exempt priority level is never held up. For a
848+
non-exempt priority level: whenever (1) that priority level's number
849+
of occupied seats is zero or below the level's CurrentCL and (2) that
850+
priority level has a non-empty queue, it is time to consider
851+
dispatching another request for service. The Fair Queuing for Server
852+
Requests algorithm below is used to pick a non-empty queue at that
853+
priority level. Then the request at the head of that queue is
854+
dispatched if possible.
772855

773856
Every 10 seconds, all the CurrentCLs are adjusted. We do smoothing on
774857
the inputs to the adjustment logic in order to dampen control
@@ -779,18 +862,28 @@ high watermark `HighSeatDemand(i)`, time-weighted average
779862
`StDevSeatDemand(i)` of each priority level `i`'s seat demand over the
780863
just-concluded adjustment period. A priority level's seat demand at
781864
any given moment is the sum of its occupied seats and the number of
782-
seats in the queued requests. We also define `EnvelopeSeatDemand(i) =
783-
AvgSeatDemand(i) + StDevSeatDemand(i)`. The adjustment logic is
784-
driven by a quantity called smoothed seat demand
785-
(`SmoothSeatDemand(i)`), which does an exponential averaging of
865+
seats in the queued requests (this second term is necessarily zero for
866+
an exempt priority level). We also define a quantity
867+
`EnvelopeSeatDemand` as follows.
868+
869+
```
870+
EnvelopeSeatDemand(i) = AvgSeatDemand(i) + StDevSeatDemand(i)
871+
```
872+
873+
The adjustment logic is driven by a quantity called smoothed seat
874+
demand (`SmoothSeatDemand(i)`), which does an exponential averaging of
786875
EnvelopeSeatDemand values using a coeficient A in the range (0,1) and
787876
immediately tracks EnvelopeSeatDemand when it exceeds
788877
SmoothSeatDemand. The rule for updating priority level `i`'s
789-
SmoothSeatDemand at the end of an adjustment period is
790-
`SmoothSeatDemand(i) := max( EnvelopeSeatDemand(i),
791-
A*SmoothSeatDemand(i) + (1-A)*EnvelopeSeatDemand(i) )`. The value of
792-
`A` is fixed at 0.977 in the code, which means that the half-life of
793-
the exponential decay is about 5 minutes.
878+
SmoothSeatDemand at the end of an adjustment period is as follows.
879+
880+
```
881+
SmoothSeatDemand(i) := max( EnvelopeSeatDemand(i),
882+
A*SmoothSeatDemand(i) + (1-A)*EnvelopeSeatDemand(i) )
883+
```
884+
885+
The value of `A` is fixed at 0.977 in the code, which means that the
886+
half-life of the exponential decay is about 5 minutes.
794887

795888
Adjustment is also done on configuration change, when a priority level
796889
is introduced or removed or its NominalCL, LendableCL, or BorrowingCL
@@ -803,54 +896,93 @@ SmoothSeatDemand to a higher value would risk creating an illusion of
803896
pressure that decays only slowly; initializing to zero is safe because
804897
the arrival of actual pressure gets a quick response.
805898

806-
For adjusting the CurrentCL values, each non-exempt priority level `i`
807-
has a lower bound (`MinCurrentCL(i)`) for the new value. It is simply
808-
HighSeatDemand clipped by the configured concurrency limits:
809-
`MinCurrentCL(i) = max( MinCL(i), min( NominalCL(i), HighSeatDemand(i)
810-
) )`.
899+
For adjusting the CurrentCL values, each priority level `i` has a
900+
lower bound (`MinCurrentCL(i)`) for the new value. It is
901+
HighSeatDemand clipped by the configured lower, and upper if
902+
non-exempt, limit. The more aggressive setting for exempt priority
903+
levels gives them precedence when borrowing: they get all they want,
904+
and the remainder is available to the non-exempt levels.
905+
906+
```
907+
MinCurrentCL(i) = max( MinCL(i), min( NominalCL(i), HighSeatDemand(i) ) ) -- if non-exempt
908+
MinCurrentCL(i) = max( MinCL(i), HighSeatDemand(i) ) -- if exempt
909+
```
910+
911+
For the following logic we let the CurrentCL values be floating-point
912+
numbers, not necessarily integers.
811913

812-
If `MinCurrentCL(i) = NominalCL(i)` for every non-exempt priority
813-
level `i` then there is no wiggle room. In this situation, no
814-
priority level is willing to lend any seats. The new CurrentCL values
815-
must equal the NominalCL values. Otherwise there is wiggle room and
816-
the adjustment proceeds as follows. For the following logic we let
817-
the CurrentCL values be floating-point numbers, not necessarily
818-
integers.
819914

820-
The priority levels would all be fairly happy if we set CurrentCL =
821-
SmoothSeatDemand for each. We clip that by the lower bound just shown
822-
and define `Target(i)` as follows, taking it as a first-order target
823-
for each non-exempt priority level `i`.
915+
If `MinCurrentCL(i) = NominalCL(i)` for every priority level `i` then
916+
no adjustment is needed: the new CurrentCL values are set to the
917+
NominalCL values. Otherwise adjustment is in order and proceeds as
918+
follows.
919+
920+
For each exempt priority level, `CurrentCL` is set to `MinCurrentCL`.
921+
Not that this matters much, because dispatching for those is not
922+
actually limited. The sum of those limits, however, is subtracted
923+
from `ServerCL` to produce a value called `RemainingServerCL` that is
924+
used in computing the allocations for the non-exempt priority levels.
925+
If `RemainingServerCL` is zero or negative then all the non-exempt
926+
priority levels get `CurrentCL = 0`. Otherwise, the computation
927+
proceeds as follows.
928+
929+
Because of the borrowing by exempt priority levels, lower bounds could
930+
be problematic. Define `LowerBoundSum` as follows.
931+
932+
```
933+
LowerBoundSum = sum[non-exempt priority level i] MinCurrentCL(i)
934+
```
935+
936+
If `LowerBoundSum = RemainingServerCL` then there is no wiggle room:
937+
each non-exempt priority level gets `CurrentCL = MinCurrentCL`.
938+
939+
If `LowerBoundSum > RemainingServerCL` then the problem is
940+
over-constrained. The solution taken is to reduce all the lower
941+
bounds in the same proportion, to the point where their sum is
942+
feasible. At that point, there is no wiggle room. Thus, in this case
943+
the settings are as follows.
944+
945+
```
946+
CurrentCL(i) = MinCurrentCL(i) * RemainingServerCL / LowerBoundSum
947+
```
948+
949+
Finally, when `LowerBoundSum < RemainingServerCL` there _is_ wiggle
950+
room and the borrowing computation proceeds as follows.
951+
952+
The non-exempt priority levels would all be fairly happy if we set
953+
CurrentCL = SmoothSeatDemand for each. We clip that by the lower
954+
bound just shown and define `Target(i)` as follows, taking it as a
955+
first-order target for each non-exempt priority level `i`.
824956

825957
```
826958
Target(i) = max( MinCurrentCL(i), SmoothSeatDemand(i) )
827959
```
828960

829961
Sadly, the sum of the Target values --- let's name that TargetSum ---
830-
is not necessarily equal to ServerCL. However, if `TargetSum <
831-
ServerCL` then all the Targets could be scaled up in the same
832-
proportion `FairProp = ServerCL / TargetSum` (if that did not violate
833-
any upper bound) to get the new concurrency limits `CurrentCL(i) :=
834-
FairProp * Target(i)` for each non-exempt priority level `i`.
835-
Similarly, if `TargetSum > ServerCL` then all the Targets could be
836-
scaled down in the same proportion (if that did not violate any lower
837-
bound) to get the new concurrency limits. This shares the wealth or
838-
the pain proportionally among the priority levels (but note: the upper
839-
bound does not affect the target, lest the pain of not achieving a
840-
high SmoothSeatDemand be distorted, while the lower bound _does_
841-
affect the target, so that merely achieving the lower bound is not
842-
considered a gain). The following computation generalizes this idea
843-
to respect the relevant bounds.
962+
is not necessarily equal to `RemainingServerCL`. However, if
963+
`TargetSum < RemainingServerCL` then all the Targets could be scaled
964+
up in the same proportion `FairProp = RemainingServerCL / TargetSum`
965+
(if that did not violate any upper bound) to get the new concurrency
966+
limits `CurrentCL(i) := FairProp * Target(i)` for each non-exempt
967+
priority level `i`. Similarly, if `TargetSum > RemainingServerCL`
968+
then all the Targets could be scaled down in the same proportion (if
969+
that did not violate any lower bound) to get the new concurrency
970+
limits. This shares the wealth or the pain proportionally among the
971+
priority levels (but note: the upper bound does not affect the target,
972+
lest the pain of not achieving a high SmoothSeatDemand be distorted,
973+
while the lower bound _does_ affect the target, so that merely
974+
achieving the lower bound is not considered a gain). The following
975+
computation generalizes this idea to respect the relevant bounds.
844976

845977
We can not necessarily scale all the Targets by the same factor ---
846978
because that might violate some upper or lower bounds. The problem is
847-
to find a proportion `FairProp` that can be shared by all the priority
848-
levels except those with a bound that forbids it. This means to find
849-
a value of `FairProp` that simultaneously solves all the following
850-
conditions, for the non-exempt priority levels `i`, and also makes the
851-
CurrentCL values sum to ServerCL. In some cases there are many
852-
satisfactory values of `FairProp` --- and that is OK, because they all
853-
produce the same CurrentCL values.
979+
to find a proportion `FairProp` that can be shared by all the
980+
non-exempt priority levels except those with a bound that forbids it.
981+
This means to find a value of `FairProp` that simultaneously solves
982+
all the following conditions, for the non-exempt priority levels `i`,
983+
and also makes the CurrentCL values sum to `RemainingServerCL`. In
984+
some cases there are many satisfactory values of `FairProp` --- and
985+
that is OK, because they all produce the same CurrentCL values.
854986

855987
```
856988
CurrentCL(i) = min( MaxCL(i), max( MinCurrentCL(i), FairProp * Target(i) ))
@@ -1916,7 +2048,7 @@ spec:
19162048
match:
19172049
- and: [ ] # match everything
19182050
```
1919-
2051+
19202052
Following is a FlowSchema that might be used for the requests by the
19212053
aggregated apiservers of
19222054
https://github.com/MikeSpreitzer/kube-examples/tree/add-kos/staging/kos

0 commit comments

Comments
 (0)