@@ -784,120 +784,121 @@ APF adds the following two headers to each HTTP response message.
784
784
785
785
## Good practices for using API Priority and Fairness
786
786
787
- When a given priority level exceeds its permitted concurrency, requests can
788
- experience increased latency or be dropped with an HTTP 429 (Too Many Requests)
789
- error. To prevent these side effects of APF, you can modify your workload or
787
+ When a given priority level exceeds its permitted concurrency, requests can
788
+ experience increased latency or be dropped with an HTTP 429 (Too Many Requests)
789
+ error. To prevent these side effects of APF, you can modify your workload or
790
790
tweak your APF settings to ensure there are sufficient seats available to serve
791
791
your requests.
792
792
793
793
To detect whether requests are being rejected due to APF, check the following
794
794
metrics:
795
- - apiserver_flowcontrol_rejected_requests_total: the total number of requests
796
- rejected per FlowSchema and PriorityLevelConfiguration.
795
+
796
+ - apiserver_flowcontrol_rejected_requests_total: the total number of requests
797
+ rejected per FlowSchema and PriorityLevelConfiguration.
797
798
- apiserver_flowcontrol_current_inqueue_requests: the current number of requests
798
- queued per FlowSchema and PriorityLevelConfiguration.
799
+ queued per FlowSchema and PriorityLevelConfiguration.
799
800
- apiserver_flowcontrol_request_wait_duration_seconds: the latency added to
800
- requests waiting in queues.
801
- - apiserver_flowcontrol_priority_level_seat_utilization: the seat utilization
802
- per PriorityLevelConfiguration.
801
+ requests waiting in queues.
802
+ - apiserver_flowcontrol_priority_level_seat_utilization: the seat utilization
803
+ per PriorityLevelConfiguration.
803
804
804
805
### Workload modifications {#good-practice-workload-modifications}
805
806
806
- To prevent requests from queuing and adding latency or being dropped due to APF,
807
+ To prevent requests from queuing and adding latency or being dropped due to APF,
807
808
you can optimize your requests by:
808
809
809
- - Reducing the rate at which requests are executed. A fewer number of requests
810
- over a fixed period will result in a fewer number of seats being needed at a
811
- given time.
812
- - Avoid issuing a large number of expensive requests concurrently. Requests can
813
- be optimized to use fewer seats or have lower latency so that these requests
814
- hold those seats for a shorter duration. List requests can occupy more than 1
815
- seat depending on the number of objects fetched during the request. Restricting
816
- the number of objects retrieved in a list request, for example by using
817
- pagination, will use less total seats over a shorter period. Furthermore,
818
- replacing list requests with watch requests will require lower total concurrency
819
- shares as watch requests only occupy 1 seat during its initial burst of
820
- notifications. If using streaming lists in versions 1.27 and later, watch
821
- requests will occupy the same number of seats as a list request for its initial
822
- burst of notifications because the entire state of the collection has to be
823
- streamed. Note that in both cases, a watch request will not hold any seats after
824
- this initial phase.
825
-
826
- Keep in mind that queuing or rejected requests from APF could be induced by
827
- either an increase in the number of requests or an increase in latency for
810
+ - Reducing the rate at which requests are executed. A fewer number of requests
811
+ over a fixed period will result in a fewer number of seats being needed at a
812
+ given time.
813
+ - Avoid issuing a large number of expensive requests concurrently. Requests can
814
+ be optimized to use fewer seats or have lower latency so that these requests
815
+ hold those seats for a shorter duration. List requests can occupy more than 1
816
+ seat depending on the number of objects fetched during the request. Restricting
817
+ the number of objects retrieved in a list request, for example by using
818
+ pagination, will use less total seats over a shorter period. Furthermore,
819
+ replacing list requests with watch requests will require lower total concurrency
820
+ shares as watch requests only occupy 1 seat during its initial burst of
821
+ notifications. If using streaming lists in versions 1.27 and later, watch
822
+ requests will occupy the same number of seats as a list request for its initial
823
+ burst of notifications because the entire state of the collection has to be
824
+ streamed. Note that in both cases, a watch request will not hold any seats after
825
+ this initial phase.
826
+
827
+ Keep in mind that queuing or rejected requests from APF could be induced by
828
+ either an increase in the number of requests or an increase in latency for
828
829
existing requests. For example, if requests that normally take 1s to execute
829
- start taking 60s, it is possible that APF will start rejecting requests because
830
- requests are occupying seats for a longer duration than normal due to this
831
- increase in latency. If APF starts rejecting requests across multiple priority
832
- levels without a significant change in workload, it is possible there is an
833
- underlying issue with control plane performance rather than the workload or APF
830
+ start taking 60s, it is possible that APF will start rejecting requests because
831
+ requests are occupying seats for a longer duration than normal due to this
832
+ increase in latency. If APF starts rejecting requests across multiple priority
833
+ levels without a significant change in workload, it is possible there is an
834
+ underlying issue with control plane performance rather than the workload or APF
834
835
settings.
835
836
836
837
### Priority and fairness settings {#good-practice-apf-settings}
837
838
838
- You can also modify the default FlowSchema and PriorityLevelConfiguration
839
- objects or create new objects of these types to better accommodate your
839
+ You can also modify the default FlowSchema and PriorityLevelConfiguration
840
+ objects or create new objects of these types to better accommodate your
840
841
workload.
841
842
842
843
APF settings can be modified to:
844
+
843
845
- Give more seats to high priority requests.
844
846
- Isolate non-essential or expensive requests that would starve a concurrency
845
- level if it was shared with other flows.
847
+ level if it was shared with other flows.
846
848
847
849
#### Give more seats to high priority requests
848
850
849
- 1 . If possible, the number of seats available across all priority levels for a
850
- particular ` kube-apiserver ` can be increased by increasing the values for the
851
- ` max-requests-inflight ` and ` max-mutating-requests-inflight ` flags. Alternatively,
852
- horizontally scaling the number of ` kube-apiserver ` instances will increase the
853
- total concurrency per priority level across the cluster assuming there is
854
- sufficient load balancing of requests.
855
- 2 . You can create a new FlowSchema which references a PriorityLevelConfiguration
856
- with a larger concurrency level. This new PriorityLevelConfiguration could be an
857
- existing level or a new level with its own set of nominal concurrency shares.
858
- For example, a new FlowSchema could be introduced to change the
859
- PriorityLevelConfiguration for your requests from global-default to workload-low
860
- to increase the number of seats available to your user. Creating a new
861
- PriorityLevelConfiguration will reduce the number of seats designated for
862
- existing levels. Recall that editing a default FlowSchema or
863
- PriorityLevelConfiguration will require setting the
864
- ` apf.kubernetes.io/autoupdate-spec ` annotation to false.
865
- 3 . You can also increase the NominalConcurrencyShares for the
866
- PriorityLevelConfiguration which is serving your high priority requests.
867
- Alternatively, for versions 1.26 and later, you can increase the LendablePercent
868
- for competing priority levels so that the given priority level has a higher pool
869
- of seats it can borrow.
851
+ 1 . If possible, the number of seats available across all priority levels for a
852
+ particular ` kube-apiserver ` can be increased by increasing the values for the
853
+ ` max-requests-inflight ` and ` max-mutating-requests-inflight ` flags. Alternatively,
854
+ horizontally scaling the number of ` kube-apiserver ` instances will increase the
855
+ total concurrency per priority level across the cluster assuming there is
856
+ sufficient load balancing of requests.
857
+ 1 . You can create a new FlowSchema which references a PriorityLevelConfiguration
858
+ with a larger concurrency level. This new PriorityLevelConfiguration could be an
859
+ existing level or a new level with its own set of nominal concurrency shares.
860
+ For example, a new FlowSchema could be introduced to change the
861
+ PriorityLevelConfiguration for your requests from global-default to workload-low
862
+ to increase the number of seats available to your user. Creating a new
863
+ PriorityLevelConfiguration will reduce the number of seats designated for
864
+ existing levels. Recall that editing a default FlowSchema or
865
+ PriorityLevelConfiguration will require setting the
866
+ ` apf.kubernetes.io/autoupdate-spec ` annotation to false.
867
+ 1 . You can also increase the NominalConcurrencyShares for the
868
+ PriorityLevelConfiguration which is serving your high priority requests.
869
+ Alternatively, for versions 1.26 and later, you can increase the LendablePercent
870
+ for competing priority levels so that the given priority level has a higher pool
871
+ of seats it can borrow.
870
872
871
873
#### Isolate non-essential requests from starving other flows
872
874
873
- For request isolation, you can create a FlowSchema whose subject matches the
874
- user making these requests or create a FlowSchema that matches what the request
875
- is (corresponding to the resourceRules). Next, you can map this FlowSchema to a
875
+ For request isolation, you can create a FlowSchema whose subject matches the
876
+ user making these requests or create a FlowSchema that matches what the request
877
+ is (corresponding to the resourceRules). Next, you can map this FlowSchema to a
876
878
PriorityLevelConfiguration with a low share of seats.
877
879
878
- For example, suppose list event requests from Pods running in the default namespace
879
- are using 10 seats each and execute for 1 minute. To prevent these expensive
880
+ For example, suppose list event requests from Pods running in the default namespace
881
+ are using 10 seats each and execute for 1 minute. To prevent these expensive
880
882
requests from impacting requests from other Pods using the existing service-accounts
881
- FlowSchema, you can apply the following FlowSchema to isolate these list calls
883
+ FlowSchema, you can apply the following FlowSchema to isolate these list calls
882
884
from other requests.
883
885
884
886
Example FlowSchema object to isolate list event requests:
885
887
886
888
{{% code file="priority-and-fairness/list-events-default-service-account.yaml" %}}
887
889
888
- - This FlowSchema captures all list event calls made by the default service
889
- account in the default namespace. The matching precedence 8000 is lower than the
890
- value of 9000 used by the existing service-accounts FlowSchema so these list
891
- event calls will match list-events-default-service-account rather than
892
- service-accounts.
893
- - The catch-all PriorityLevelConfiguration is used to isolate these requests.
894
- The catch-all priority level has a very small concurrency share and does not
895
- queue requests.
890
+ - This FlowSchema captures all list event calls made by the default service
891
+ account in the default namespace. The matching precedence 8000 is lower than the
892
+ value of 9000 used by the existing service-accounts FlowSchema so these list
893
+ event calls will match list-events-default-service-account rather than
894
+ service-accounts.
895
+ - The catch-all PriorityLevelConfiguration is used to isolate these requests.
896
+ The catch-all priority level has a very small concurrency share and does not
897
+ queue requests.
896
898
897
899
## {{% heading "whatsnext" %}}
898
900
899
-
900
901
For background information on design details for API priority and fairness, see
901
902
the [ enhancement proposal] ( https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1040-priority-and-fairness ) .
902
- You can make suggestions and feature requests via [ SIG API Machinery] ( https://github.com/kubernetes/community/tree/master/sig-api-machinery )
903
+ You can make suggestions and feature requests via [ SIG API Machinery] ( https://github.com/kubernetes/community/tree/master/sig-api-machinery )
903
904
or the feature's [ slack channel] ( https://kubernetes.slack.com/messages/api-priority-and-fairness ) .
0 commit comments