@@ -283,8 +283,10 @@ In short, this proposal is about generalizing the existing
283
283
max-in-flight request handler in apiservers to add more discriminating
284
284
handling of requests. The overall approach is that each request is
285
285
categorized to a priority level and a queue within that priority
286
- level; each priority level dispatches to its own isolated concurrency
287
- pool; within each priority level queues compete with even fairness.
286
+ level; each priority level dispatches to its own concurrency pool and,
287
+ according to a configured limit, unused concurrency borrrowed from
288
+ lower priority levels; within each priority level queues compete with
289
+ even fairness.
288
290
289
291
### Request Categorization
290
292
@@ -638,24 +640,173 @@ always dispatched immediately. Following is how the other requests
638
640
are dispatched at a given apiserver.
639
641
640
642
The concurrency limit of an apiserver is divided among the non-exempt
641
- priority levels in proportion to their assured concurrency shares.
642
- This produces the assured concurrency value (ACV) for each non-exempt
643
- priority level:
643
+ priority levels, and they can do a limited amount of borrowing from
644
+ each other.
644
645
645
- ```
646
- ACV(l) = ceil( SCL * ACS(l) / ( sum[priority levels k] ACS(k) ) )
646
+ One field of ` LimitedPriorityLevelConfiguration ` , introduced in the
647
+ midst of the ` v1beta2 ` lifetime, limits the borrowing. The field is
648
+ added in all the versions (` v1alpha1 ` , ` v1beta1 ` , and ` v1beta2 ` ). The
649
+ following display shows the new fields along with the updated
650
+ description for the ` AssuredConcurrencyShares ` field, in ` v1beta2 ` .
651
+
652
+ ``` go
653
+ type LimitedPriorityLevelConfiguration struct {
654
+ ...
655
+ // `assuredConcurrencyShares` (ACS) contributes to the computation of the
656
+ // NominalConcurrencyLimit (NominalCL) of this level.
657
+ // This is the number of execution seats available at this priority level.
658
+ // This is used both for requests dispatched from
659
+ // this priority level as well as requests dispatched from other priority
660
+ // levels borrowing seats from this level. This does not limit dispatching from
661
+ // this priority level that borrows seats from other priority levels (those other
662
+ // levels do that). The server's concurrency limit (ServerCL) is divided among the
663
+ // Limited priority levels in proportion to their ACS values:
664
+ //
665
+ // NominalCL(i) = ceil( ServerCL * ACS(i) / sum_acs )
666
+ // sum_acs = sum[limited priority level k] ACS(k)
667
+ //
668
+ // Bigger numbers mean a larger nominal concurrency limit, at the expense
669
+ // of every other Limited priority level.
670
+ // This field has a default value of 30.
671
+ // +optional
672
+ AssuredConcurrencyShares int32
673
+
674
+ // `borrowablePercent` prescribes the fraction of the level's NominalCL that
675
+ // can be borrowed by other priority levels. This value of this
676
+ // field must be between 0 and 100, inclusive, and it defaults to 0.
677
+ // The number of seats that other levels can borrow from this level, known
678
+ // as this level's BorrowableConcurrencyLimit (BorrowableCL), is defined as follows.
679
+ //
680
+ // BorrowableCL(i) = round( NominalCL(i) * borrowablePercent(i)/100.0 )
681
+ //
682
+ // +optional
683
+ BorrowablePercent int32
684
+ }
647
685
```
648
686
649
- where SCL is the apiserver's concurrency limit and ACS(l) is the
650
- AssuredConcurrencyShares for priority level l.
687
+ Prior to the introduction of borrowing, the ` assuredConcurrencyShares `
688
+ field had two meanings that amounted to the same thing: the total
689
+ shares of the level, and the non-borrowable shares of the level.
690
+ While it is somewhat unnatural to keep the meaning of "total shares"
691
+ for a field named "assured" shares, rolling out the new behavior into
692
+ existing systems will be more continuous if we keep the meaning of
693
+ "total shares" for the existing field. In the next version we should
694
+ rename the ` AssuredConcurrencyShares ` to ` NominalConcurrencyShares ` .
695
+
696
+ The following table shows the current default non-exempt priority
697
+ levels and a proposal for their new configuration.
698
+
699
+ | Name | Assured Shares | Proposed Borrowable Percent |
700
+ | ---- | -------------: | --------------------------: |
701
+ | leader-election | 10 | 0 |
702
+ | node-high | 40 | 25 |
703
+ | system | 30 | 33 |
704
+ | workload-high | 40 | 50 |
705
+ | workload-low | 100 | 90 |
706
+ | global-default | 20 | 50 |
707
+ | catch-all | 5 | 0 |
708
+
709
+ Each non-exempt priority level ` i ` has two concurrency limits: its
710
+ NominalConcurrencyLimit (` NominalCL(i) ` ) as defined above by
711
+ configuration, and a CurrentConcurrencyLimit (` CurrentCL(i) ` ) that is
712
+ used in dispatching requests. The CurrentCLs are adjusted
713
+ periodically, based on configuration, the current situation at
714
+ adjustment time, and recent observations. The "borrowing" resides in
715
+ the differences between CurrentCL and NominalCL. There is a lower
716
+ bound on each non-exempt priority level's CurrentCL: `MinCL(i) =
717
+ NominalCL(i) - BorrowableCL(i)`; the upper limit is imposed only by
718
+ how many seats are available for borrowing from other priority levels.
719
+ The sum of the CurrentCLs is always equal to the server's concurrency
720
+ limit (ServerCL) plus or minus a little for rounding in the adjustment
721
+ algorithm below.
651
722
652
723
Dispatching is done independently for each priority level. Whenever
653
- (1) a non-exempt priority level's number of running requests is zero
654
- or below the level's assured concurrency value and (2) that priority
655
- level has a non-empty queue, it is time to dispatch another request
724
+ (1) a non-exempt priority level's number of occupied seats is zero or
725
+ below the level's CurrentCL and (2) that priority level has a
726
+ non-empty queue, it is time to consider dispatching another request
656
727
for service. The Fair Queuing for Server Requests algorithm below is
657
728
used to pick a non-empty queue at that priority level. Then the
658
- request at the head of that queue is dispatched.
729
+ request at the head of that queue is dispatched if possible.
730
+
731
+ Every 10 seconds, all the CurrentCLs are adjusted. We do smoothing on
732
+ the inputs to the adjustment logic in order to dampen control
733
+ gyrations, in a way that lets a priority level reclaim lent seats at
734
+ the nearest adjustment time. The adjustments take into account the
735
+ high watermark ` HighSeatDemand(i) ` , time-weighted average
736
+ ` AvgSeatDemand(i) ` , and time-weighted population standard deviation
737
+ ` StDevSeatDemand(i) ` of each priority level ` i ` 's seat demand over the
738
+ just-concluded adjustment period. A priority level's seat demand at
739
+ any given moment is the sum of its occupied seats and the number of
740
+ seats in the queued requests. We also define `EnvelopeSeatDemand(i) =
741
+ AvgSeatDemand(i) + StDevSeatDemand(i)`. The adjustment logic is
742
+ driven by a quantity called smoothed seat demand
743
+ (` SmoothSeatDemand(i) ` ), which does an exponential averaging of
744
+ EnvelopeSeatDemand values using a coeficient A in the range (0,1) and
745
+ immediately tracks EnvelopeSeatDemand when it exceeds
746
+ SmoothSeatDemand. The rule for updating priority level ` i ` 's
747
+ SmoothSeatDemand at the end of an adjustment period is
748
+ `SmoothSeatDemand(i) := max( EnvelopeSeatDemand(i),
749
+ A* SmoothSeatDemand(i) + (1-A)* EnvelopeSeatDemand(i) )`. The command
750
+ line flag ` --seat-demand-history-fraction ` with a default value of 0.9
751
+ configures A.
752
+
753
+ Adjustment is also done on configuration change, when a priority level
754
+ is introduced or removed or its NominalCL or BorrowableCL changes. At
755
+ such a time, the current adjustment period comes to an early end and
756
+ the regular adjustment logic runs; the adjustment timer is reset to
757
+ next fire 10 seconds later. For a newly introduced priority level, we
758
+ set HighSeatDemand, AvgSeatDemand, and SmoothSeatDemand to
759
+ NominalCL-BorrowableSD/2 and StDevSeatDemand to zero.
760
+
761
+ For adjusting the CurrentCL values, each non-exempt priority level ` i `
762
+ has a lower bound (` MinCurrentCL(i) ` ) for the new value. It is simply
763
+ HighSeatDemand clipped by the configured concurrency limits:
764
+ `MinCurrentCL(i) = max( MinCL(i), min( NominalCL(i), HighSeatDemand(i)
765
+ ) )`.
766
+
767
+ If ` MinCurrentCL(i) = NominalCL(i) ` for every non-exempt priority
768
+ level ` i ` then there is no wiggle room. In this situation, no
769
+ priority level is willing to lend any seats. The new CurrentCL values
770
+ must equal the NominalCL values. Otherwise there is wiggle room and
771
+ the adjustment proceeds as follows. For the following logic we let
772
+ the CurrentCL values be floating-point numbers, not necessarily
773
+ integers.
774
+
775
+ The priority levels would all be fairly happy if we set CurrentCL =
776
+ SmoothSeatDemand for each. We clip that by the lower bound just
777
+ shown, taking ` Target(i) = max(SmoothSeatDemand(i), MinCurrentCL(i)) `
778
+ as a first-order target for each non-exempt priority level ` i ` .
779
+
780
+ Sadly, the sum of the Target values --- let's name that TargetSum ---
781
+ is not necessarily equal to ServerCL. However, if `TargetSum <=
782
+ ServerCL` then all the Targets can be scaled up in the same proportion
783
+ ` FairProp = ServerCL / TargetSum ` to get the new concurrency limits.
784
+ That is, ` CurrentCL(i) := FairProp * Target(i) ` for each non-exempt
785
+ priority level ` i ` . This shares the wealth proportionally among the
786
+ priority levels. Also note, the following computation produces the
787
+ same result.
788
+
789
+ If ` TargetSum > ServerCL ` then we can not necessarily scale all the
790
+ Targets down by the same factor --- because that might violate some
791
+ lower bounds. The problem is to find a proportion ` FairProp ` , which
792
+ we know must lie somewhere in the range (0,1) when `TargetSum >
793
+ ServerCL`, that can be shared by all the priority levels except those
794
+ whose lower bound forbids that. This means to find the one value of
795
+ ` FairProp ` that solves the following conditions, for all the
796
+ non-exempt priority levels ` i ` , and also makes the CurrentCL values
797
+ sum to ServerCL.
798
+
799
+ ```
800
+ CurrentCL(i) = FairProp * Target(i) if FairProp * Target(i) >= MinCurrentCL(i)
801
+ CurrentCL(i) = MinCurrentCL(i) if FairProp * Target(i) <= MinCurrentCL(i)
802
+ ```
803
+
804
+ This is the mirror image of the max-min fairness problem and can be
805
+ solved with the same sort of algorithm, taking O(N log N) time and
806
+ O(N) space.
807
+
808
+ After finding the floating point CurrentCL solutions, each one is
809
+ rounded to the nearest integer to use in subsequent dispatching.
659
810
660
811
### Fair Queuing for Server Requests
661
812
@@ -1790,7 +1941,7 @@ others, at any given time this may compute for some priority level(s)
1790
1941
an assured concurrency value that is lower than the number currently
1791
1942
executing. In these situations the total number allowed to execute
1792
1943
will temporarily exceed the apiserver's configured concurrency limit
1793
- (`SCL `) and will settle down to the configured limit as requests
1944
+ (`ServerCL `) and will settle down to the configured limit as requests
1794
1945
complete their service.
1795
1946
1796
1947
# ## Default Behavior
@@ -1864,6 +2015,17 @@ This KEP adds the following metrics.
1864
2015
- apiserver_dispatched_requests (count, broken down by priority, FlowSchema)
1865
2016
- apiserver_wait_duration (histogram, broken down by priority, FlowSchema)
1866
2017
- apiserver_service_duration (histogram, broken down by priority, FlowSchema)
2018
+ - ` apiserver_flowcontrol_request_concurrency_limit` (gauge of NominalCL, broken down by priority)
2019
+ - ` apiserver_flowcontrol_request_min_concurrency_limit` (gauge of MinCL, broken down by priority)
2020
+ - ` apiserver_flowcontrol_request_current_concurrency_limit` (gauge of CurrentCL, broken down by priority)
2021
+ - ` apiserver_flowcontrol_demand_seats` (timing ratio histogram of seat demand / NominalCL, broken down by priority)
2022
+ - ` apiserver_flowcontrol_demand_seats_high_water_mark` (gauge of HighSeatDemand, broken down by priority)
2023
+ - ` apiserver_flowcontrol_demand_seats_average` (gauge of AvgSeatDemand, broken down by priority)
2024
+ - ` apiserver_flowcontrol_demand_seats_stdev` (gauge of StDevSeatDemand, broken down by priority)
2025
+ - ` apiserver_flowcontrol_envelope_seats` (gauge of EnvelopeSeatDemand, broken down by priority)
2026
+ - ` apiserver_flowcontrol_smoothed_demand_seats` (gauge of SmoothSeatDemand, broken down by priority)
2027
+ - ` apiserver_flowcontrol_target_seats` (gauge of Target, brokwn down by priority)
2028
+ - ` apiserver_flowcontrol_seat_fair_frac` (gauge of FairProp)
1867
2029
1868
2030
# ## Testing
1869
2031
0 commit comments