set k8s registry QPS to match MAX_PODS#1801
set k8s registry QPS to match MAX_PODS#1801msporleder-work wants to merge 1 commit intoawslabs:mainfrom
Conversation
| if [[ "$USE_MAX_PODS" = "true" ]]; then | ||
| echo "$(jq ".maxPods=$MAX_PODS" $KUBELET_CONFIG)" > $KUBELET_CONFIG | ||
| #set registryPullQPS to match MAX_PODS to prevent startup problems when nodes failover | ||
| echo "$(jq --arg MAX_PODS $MAX_PODS '.+= {"registryPullQPS":$MAX_PODS}' $KUBELET_CONFIG)" > $KUBELET_CONFIG |
There was a problem hiding this comment.
I think it makes sense to increase this beyond 5; but I don't think we want to go all the way to MAX_PODS -- that's a very large value on many instance types. Have you tested a more moderate increase, something like 10+15burst (up from 5+10burst)?
There was a problem hiding this comment.
I actually just set it to 0 (disabled). containerd times out commonly for me starting up 50-ish pods at once so I've also started to bump runtimeRequestTimeout.
I definitely think this number should be dynamic and I was suggesting max_pods as a proxy since, if a pod is "full" and it fails, the next one to boot up will potentially get all of those pods assigned.
|
This pull request is stale because it has been open for 60 days with no activity. Remove the stale label or comment to avoid closure in 14 days |
Issue #, if available:
Description of changes:
When nodes failover in EKS, regardless of their size, the default RegistryPullQPS of 5 highly limits their ability to startup cleanly when running a cluster with more than a few pods.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Testing Done
node failovers/drains on my clusters constantly fail. Bumping the QPS with user-data solves it but this, I think, is a better default.
See this guide for recommended testing for PRs. Some tests may not apply. Completing tests and providing additional validation steps are not required, but it is recommended and may reduce review time and time to merge.