@@ -639,12 +639,15 @@ Requests of an exempt priority are never held up in a queue; they are
639
639
always dispatched immediately. Following is how the other requests
640
640
are dispatched at a given apiserver.
641
641
642
+ As mentioned [ above] ( #non-goals ) , the functionality described here
643
+ operates independently in each apiserver.
644
+
642
645
The concurrency limit of an apiserver is divided among the non-exempt
643
646
priority levels, and they can do a limited amount of borrowing from
644
647
each other.
645
648
646
- One field of ` LimitedPriorityLevelConfiguration ` , introduced in the
647
- midst of the ` v1beta2 ` lifetime, limits the borrowing. The field is
649
+ Two fields of ` LimitedPriorityLevelConfiguration ` , introduced in the
650
+ midst of the ` v1beta2 ` lifetime, limit the borrowing. The fields are
648
651
added in all the versions (` v1alpha1 ` , ` v1beta1 ` , and ` v1beta2 ` ). The
649
652
following display shows the new fields along with the updated
650
653
description for the ` AssuredConcurrencyShares ` field, in ` v1beta2 ` .
@@ -657,9 +660,8 @@ type LimitedPriorityLevelConfiguration struct {
657
660
// This is the number of execution seats available at this priority level.
658
661
// This is used both for requests dispatched from
659
662
// this priority level as well as requests dispatched from other priority
660
- // levels borrowing seats from this level. This does not limit dispatching from
661
- // this priority level that borrows seats from other priority levels (those other
662
- // levels do that). The server's concurrency limit (ServerCL) is divided among the
663
+ // levels borrowing seats from this level.
664
+ // The server's concurrency limit (ServerCL) is divided among the
663
665
// Limited priority levels in proportion to their ACS values:
664
666
//
665
667
// NominalCL(i) = ceil( ServerCL * ACS(i) / sum_acs )
@@ -671,53 +673,87 @@ type LimitedPriorityLevelConfiguration struct {
671
673
// +optional
672
674
AssuredConcurrencyShares int32
673
675
674
- // `borrowablePercent ` prescribes the fraction of the level's NominalCL that
676
+ // `lendablePercent ` prescribes the fraction of the level's NominalCL that
675
677
// can be borrowed by other priority levels. This value of this
676
678
// field must be between 0 and 100, inclusive, and it defaults to 0.
677
679
// The number of seats that other levels can borrow from this level, known
678
- // as this level's BorrowableConcurrencyLimit (BorrowableCL ), is defined as follows.
680
+ // as this level's LendableConcurrencyLimit (LendableCL ), is defined as follows.
679
681
//
680
- // BorrowableCL (i) = round( NominalCL(i) * borrowablePercent (i)/100.0 )
682
+ // LendableCL (i) = round( NominalCL(i) * lendablePercent (i)/100.0 )
681
683
//
682
684
// +optional
683
- BorrowablePercent int32
685
+ LendablePercent int32
686
+
687
+ // `borrowingLimitPercent`, if present, specifies a limit on how many seats
688
+ // this priority level can borrow from other priority levels. The limit
689
+ // is known as this level's BorrowingConcurrencyLimit (BorrowingCL) and
690
+ // is a limit on the total number of seats that this level may borrow
691
+ // at any one time. When this field is non-nil, it must hold a non-negative
692
+ // integer and the limit is calculated as follows.
693
+ //
694
+ // BorrowingCL(i) = round( NominalCL(i) * borrowingLimitPercent(i)/100.0 )
695
+ //
696
+ // When this field is left `nil`, the limit is effetively infinite.
697
+ // +optional
698
+ BorrowingLimitPercent *int32
684
699
}
685
700
```
686
701
687
702
Prior to the introduction of borrowing, the ` assuredConcurrencyShares `
688
703
field had two meanings that amounted to the same thing: the total
689
- shares of the level, and the non-borrowable shares of the level.
704
+ shares of the level, and the non-lendable shares of the level.
690
705
While it is somewhat unnatural to keep the meaning of "total shares"
691
706
for a field named "assured" shares, rolling out the new behavior into
692
707
existing systems will be more continuous if we keep the meaning of
693
708
"total shares" for the existing field. In the next version we should
694
709
rename the ` AssuredConcurrencyShares ` to ` NominalConcurrencyShares ` .
695
710
711
+ The limits on borrowing are two-sided: a given priority level has a
712
+ limit on how much it may borrow and a limit on how much may be
713
+ borrowed from it. The latter is a matter of protection, the former is
714
+ a matter of restraint. Introducing just the protection side is not
715
+ enough. If APF gained borrowing without giving the cluster
716
+ administrators a way to limit how much a given priority level borrows
717
+ then the administrators would not be able to do something they could
718
+ do before the introduction of borrowing: use a priority level as a
719
+ deliberate jail for some class of traffic that APF is not limiting
720
+ well. APF dispatches requests based on approximate estimates of how
721
+ much work they involve. We have been improving these estimates, and
722
+ may continue to do so, but there will always remain the possibility
723
+ that some class of requests is much "heavier" than the APF code
724
+ estimates; for those, a deliberate jail is useful.
725
+
696
726
The following table shows the current default non-exempt priority
697
727
levels and a proposal for their new configuration.
698
728
699
- | Name | Assured Shares | Proposed Borrowable Percent |
700
- | ---- | -------------: | --------------------------: |
701
- | leader-election | 10 | 0 |
702
- | node-high | 40 | 25 |
703
- | system | 30 | 33 |
704
- | workload-high | 40 | 50 |
705
- | workload-low | 100 | 90 |
706
- | global-default | 20 | 50 |
707
- | catch-all | 5 | 0 |
729
+ | Name | Assured Shares | Proposed Lendable | Proposed Borrowing Limit |
730
+ | ---- | -------------: | ----------------: | ------------- ----------: |
731
+ | leader-election | 10 | 0% | none |
732
+ | node-high | 40 | 25% | none |
733
+ | system | 30 | 33% | none |
734
+ | workload-high | 40 | 50% | none |
735
+ | workload-low | 100 | 90% | none |
736
+ | global-default | 20 | 50% | none |
737
+ | catch-all | 5 | 0% | none |
708
738
709
739
Each non-exempt priority level ` i ` has two concurrency limits: its
710
740
NominalConcurrencyLimit (` NominalCL(i) ` ) as defined above by
711
741
configuration, and a CurrentConcurrencyLimit (` CurrentCL(i) ` ) that is
712
742
used in dispatching requests. The CurrentCLs are adjusted
713
743
periodically, based on configuration, the current situation at
714
744
adjustment time, and recent observations. The "borrowing" resides in
715
- the differences between CurrentCL and NominalCL. There is a lower
716
- bound on each non-exempt priority level's CurrentCL: `MinCL(i) =
717
- NominalCL(i) - BorrowableCL(i)`; the upper limit is imposed only by
718
- how many seats are available for borrowing from other priority levels.
719
- The sum of the CurrentCLs is always equal to the server's concurrency
720
- limit (ServerCL) plus or minus a little for rounding in the adjustment
745
+ the differences between CurrentCL and NominalCL. There are upper and lower
746
+ bound on each non-exempt priority level's CurrentCL, as follows.
747
+
748
+ ```
749
+ MaxCL(i) = NominalCL(i) + BorrowingCL(i)
750
+ MinCL(i) = NominalCL(i) - LendableCL(i)
751
+ ```
752
+
753
+ Naturally the CurrentCL values are also limited by how many seats are
754
+ available for borrowing from other priority levels. The sum of the
755
+ CurrentCLs is always equal to the server's concurrency limit
756
+ (ServerCL) plus or minus a little for rounding in the adjustment
721
757
algorithm below.
722
758
723
759
Dispatching is done independently for each priority level. Whenever
@@ -751,12 +787,13 @@ line flag `--seat-demand-history-fraction` with a default value of 0.9
751
787
configures A.
752
788
753
789
Adjustment is also done on configuration change, when a priority level
754
- is introduced or removed or its NominalCL or BorrowableCL changes. At
755
- such a time, the current adjustment period comes to an early end and
756
- the regular adjustment logic runs; the adjustment timer is reset to
757
- next fire 10 seconds later. For a newly introduced priority level, we
758
- set HighSeatDemand, AvgSeatDemand, and SmoothSeatDemand to
759
- NominalCL-BorrowableSD/2 and StDevSeatDemand to zero.
790
+ is introduced or removed or its NominalCL, LendableCL, or BorrowingCL
791
+ changes. At such a time, the current adjustment period comes to an
792
+ early end and the regular adjustment logic runs; the adjustment timer
793
+ is reset to next fire 10 seconds later. For a newly introduced
794
+ priority level, we set HighSeatDemand, AvgSeatDemand, and
795
+ SmoothSeatDemand to NominalCL-LendableSD/2 and StDevSeatDemand to
796
+ zero.
760
797
761
798
For adjusting the CurrentCL values, each non-exempt priority level ` i `
762
799
has a lower bound (` MinCurrentCL(i) ` ) for the new value. It is simply
@@ -773,36 +810,46 @@ the CurrentCL values be floating-point numbers, not necessarily
773
810
integers.
774
811
775
812
The priority levels would all be fairly happy if we set CurrentCL =
776
- SmoothSeatDemand for each. We clip that by the lower bound just
777
- shown, taking ` Target(i) = max(SmoothSeatDemand(i), MinCurrentCL(i)) `
778
- as a first-order target for each non-exempt priority level ` i ` .
813
+ SmoothSeatDemand for each. We clip that by the lower bound just shown
814
+ and define ` Target(i) ` as follows, taking it as a first-order target
815
+ for each non-exempt priority level ` i ` .
816
+
817
+ ```
818
+ Target(i) = max( MinCurrentCL(i), SmoothSeatDemand(i) )
819
+ ```
779
820
780
821
Sadly, the sum of the Target values --- let's name that TargetSum ---
781
- is not necessarily equal to ServerCL. However, if `TargetSum <=
782
- ServerCL` then all the Targets can be scaled up in the same proportion
783
- ` FairProp = ServerCL / TargetSum ` to get the new concurrency limits.
784
- That is, ` CurrentCL(i) := FairProp * Target(i) ` for each non-exempt
785
- priority level ` i ` . This shares the wealth proportionally among the
786
- priority levels. Also note, the following computation produces the
787
- same result.
788
-
789
- If ` TargetSum > ServerCL ` then we can not necessarily scale all the
790
- Targets down by the same factor --- because that might violate some
791
- lower bounds. The problem is to find a proportion ` FairProp ` , which
792
- we know must lie somewhere in the range (0,1) when `TargetSum >
793
- ServerCL`, that can be shared by all the priority levels except those
794
- whose lower bound forbids that. This means to find the one value of
795
- ` FairProp ` that solves the following conditions, for all the
796
- non-exempt priority levels ` i ` , and also makes the CurrentCL values
797
- sum to ServerCL.
798
-
799
- ```
800
- CurrentCL(i) = FairProp * Target(i) if FairProp * Target(i) >= MinCurrentCL(i)
801
- CurrentCL(i) = MinCurrentCL(i) if FairProp * Target(i) <= MinCurrentCL(i)
802
- ```
803
-
804
- This is the mirror image of the max-min fairness problem and can be
805
- solved with the same sort of algorithm, taking O(N log N) time and
822
+ is not necessarily equal to ServerCL. However, if `TargetSum <
823
+ ServerCL` then all the Targets could be scaled up in the same
824
+ proportion ` FairProp = ServerCL / TargetSum ` (if that did not violate
825
+ any upper bound) to get the new concurrency limits `CurrentCL(i) :=
826
+ FairProp * Target(i)` for each non-exempt priority level ` i`.
827
+ Similarly, if ` TargetSum > ServerCL ` then all the Targets could be
828
+ scaled down in the same proportion (if that did not violate any lower
829
+ bound) to get the new concurrency limits. This shares the wealth or
830
+ the pain proportionally among the priority levels (but note: the upper
831
+ bound does not affect the target, lest the pain of not achieving a
832
+ high SmoothSeatDemand be distorted, while the lower bound _ does_
833
+ affect the target, so that merely achieving the lower bound is not
834
+ considered a gain). The following computation generalizes this idea
835
+ to respect the relevant bounds.
836
+
837
+ We can not necessarily scale all the Targets by the same factor ---
838
+ because that might violate some upper or lower bounds. The problem is
839
+ to find a proportion ` FairProp ` that can be shared by all the priority
840
+ levels except those with a bound that forbids it. This means to find
841
+ a value of ` FairProp ` that simultaneously solves all the following
842
+ conditions, for the non-exempt priority levels ` i ` , and also makes the
843
+ CurrentCL values sum to ServerCL. In some cases there are many
844
+ satisfactory values of ` FairProp ` --- and that is OK, because they all
845
+ produce the same CurrentCL values.
846
+
847
+ ```
848
+ CurrentCL(i) = min( MaxCL(i), max( MinCurrentCL(i), FairProp * Target(i) ))
849
+ ```
850
+
851
+ This is similar to the max-min fairness problem and can be solved
852
+ using sorting and then a greedy algorithm, taking O(N log N) time and
806
853
O(N) space.
807
854
808
855
After finding the floating point CurrentCL solutions, each one is
@@ -2017,6 +2064,7 @@ This KEP adds the following metrics.
2017
2064
- apiserver_service_duration (histogram, broken down by priority, FlowSchema)
2018
2065
- ` apiserver_flowcontrol_request_concurrency_limit` (gauge of NominalCL, broken down by priority)
2019
2066
- ` apiserver_flowcontrol_request_min_concurrency_limit` (gauge of MinCL, broken down by priority)
2067
+ - ` apiserver_flowcontrol_request_max_concurrency_limit` (gauge of MaxCL, broken down by priority)
2020
2068
- ` apiserver_flowcontrol_request_current_concurrency_limit` (gauge of CurrentCL, broken down by priority)
2021
2069
- ` apiserver_flowcontrol_demand_seats` (timing ratio histogram of seat demand / NominalCL, broken down by priority)
2022
2070
- ` apiserver_flowcontrol_demand_seats_high_water_mark` (gauge of HighSeatDemand, broken down by priority)
0 commit comments