-
Notifications
You must be signed in to change notification settings - Fork 4.3k
[Granular resource limits] Add support for granular resource quotas #8662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Granular resource limits] Add support for granular resource quotas #8662
Conversation
|
Hi @norbertcyran. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
FYI: not ready for review yet |
9a781ed to
48db6ad
Compare
475cac9 to
e313251
Compare
e313251 to
080fd15
Compare
Ready now |
elmiko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code is looking good to me, i have a couple questions. i like the tests too.
| continue | ||
| } | ||
|
|
||
| if limitsLeft < resourceDelta*int64(nodeDelta) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not following the math here, could you explain what resourceDelta*int64(nodeDelta) is calculating?
i might be confused about nodeDelta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nodeDelta is the number of nodes (of the same shape) to be added to the cluster, resourceDelta is the quantity of a specific resources in a node of that shape. For instance, if we want to add 3 nodes with 4 CPU each, resourceDelta*int64(nodeDelta) will evaluate to 12. This condition basically checks if adding 12 CPUs to the cluster would exceed the limit
Perhaps it would be cleaner to call these nodesToBeAdded and resourcesToBeAdded or something similar. However, I'm thinking about adding support for negative deltas later on to remove duplication in the scale down logic (https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/core/scaledown/planner/planner.go#L164, https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/core/scaledown/resource/limits.go).
I can add some comments to clarify what deltas mean, unless you have other suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is great, thank you for the explanation. it makes sense to me now.
Perhaps it would be cleaner to call these nodesToBeAdded and resourcesToBeAdded or something similar.
i like this, perhaps names that are more descriptive with what is planned next, but this would definitely help with readability.
I can add some comments to clarify what deltas mean, unless you have other suggestions?
i think changing the variable names would help, and i also like having more comments here. i think even something as brief as what you described here would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, wrapped resourceDelta*int64(nodeDelta) in a resourcesNeeded variable, added a comment, and expanded docs for CheckDelta function.
I didn't rename the params, as I still think that I'll handle scale down limits with that code. If it turns out to be infeasible, I'll get back to this code for sure
779cfa9 to
8f42eba
Compare
| // NewQuotasTracker calculates resources used by the nodes for every | ||
| // quota returned by the Provider. Then, based on usages and limits it calculates | ||
| // how many resources can be still added to the cluster. Returns a Tracker object. | ||
| func (f *TrackerFactory) NewQuotasTracker(autoscalingCtx *context.AutoscalingContext, nodes []*corev1.Node) (*Tracker, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea that this will be recreated every loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now yes, see #8662 (comment)
82a115f to
2637bc6
Compare
2637bc6 to
d05d8df
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: BigDarkClown, norbertcyran The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR is a part of granular resource limits initiative (#8703). It implements the foundation for the new resource quotas system. The legacy system supports only cluster-wide resource limits coming from the cloud provider. This PR introduces possibility to provide multiple quotas that can apply to different subset of nodes.
For now, the new package is not integrated with the rest of the codebase. This is done on purpose to safely ship the new system in smaller chunks. Therefore, this PR does not introduce any user-facing changes.
Which issue(s) this PR fixes:
Part of #8703.
Special notes for your reviewer:
This PR ended up larger than I expected. Caching of node deltas, support for storage and ephemeral storage, and integration with scale up and scale down will be implemented in the next PRs. See the proposal #8702 for more details.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: