Replies: 34 comments
-
We've had some conversation about this among the maintainers. IMO, this features basically comes down to -- should we consolidate based on limits? If you apply a more restrictive limit to your NodePool, does that mean that you are implying that the NodePool should deprovision nodes until it gets back to complying with its limits. IMO: This strikes me as an intuitive desired state mechanism -- you have set a new desired state on your NodePool -- implying that you no longer support a given capacity. Now comes the more difficult question: Should Karpenter force application pods off of your nodes unsafely if you have enforced stricter limits on your NodePool and those pods have nowhere else to schedule? This breaks current assumptions that we have around the safety of disruption -- that is, if we disrupt a node (unless it is due to spot interruption), we assume that we are doing so assuming that we can reschedule the existing pods on the node onto some other capacity (either existing or new). This feature would have us force delete pods regardless of whether they can schedule or not -- which starts to look a bit scary.
I know you mentioned that you can't delete the NodePool to spin down nodes but I'm curious what you mean by "controlled environment". Wouldn't updating the limits also cause similar changes to your cluster that I assume would also be subject to this "controlled environment?" |
Beta Was this translation helpful? Give feedback.
-
Yes, I believe this is what is being implied. If the cpu limit is set to 0, that would mean that we want to deprovision existing nodes, similar to setting the min/max/desired values to 0 for an ASG. Even if something similar to an ASG Scheduled Action was introduced to where I can create a configuration inside the NodePool to deprovision existing nodes and not spin up any additional nodes. A flaw that we've uncovered with our current approach of using a lambda to patch the cpu limit to 0 and then delete existing Karpenter provisioned nodes is that if a node was provisioned right before the cpu limit was set and is now in the "NotReady" state, this node will not get cleaned up as it is not yet recognized as an active node and will remain running. We're having to come up with a solution to rerun the lambda multiple times to make sure nodes get cleaned up if this happens. We will not only have to delete the finalizer from the node before deleting from the cluster, but we will also have to terminate the node in AWS as a
Yes. This is the behavior that currently happens for ASG's. Our pods will stay in a Pending state until the next workday when the ASG min/max/desired settings are updated to their previous work hour values. With no nodes running during non-work hours our savings are pretty significant.
By controlled environment we mean that certain changes to the environment will require going through change control (testing the change, creating change request, verifying test results, getting approvals to implement said request, implementing the change, verifying the change). Doing this daily is not feasible IMO. Yes, technically patching the limit is subject to the "controlled environment", but it's easier based on our current process to patch the cpu limit with a scheduled lambda function as opposed to deleting an entire k8s resource and having to go through the steps mentioned above in order to kick off a pipeline to get the resource re-applied. That's why the ask here is to have this feature built into Karpenter. If designed properly, IMO, this would be a huge win. |
Beta Was this translation helpful? Give feedback.
-
|
You can use below yaml to delete and create Karpenter nodes, Logic is to delete the nodepool on friday and re-create on sunday. i have tested this in non-prod and it is running without any issues from a while. |
Beta Was this translation helpful? Give feedback.
-
|
Unfortunately deleting and reapplying nodepool resources is not an option for us. What would be ideal, IMO, would to have something like the disruption budget schedule that we can set that would basically scale down all instances provisioned by a given nodepool |
Beta Was this translation helpful? Give feedback.
-
|
I've stumbled upon this issue after doing the same thing @cp1408 suggested. My cronjob does:
I think the ideal scenario is something like this:
|
Beta Was this translation helpful? Give feedback.
-
|
Has anyone stumbled across a decent solution for partially shutting down Karpenter provisioned nodes when the definitions of the Karpenter and it's NodePools are defined in a GitOps tool like Argo CD which has self-healing and automated sync enabled. If I delete a NodePool, Argo CD will re-sync/re-create the NodePool object as it is defined in a GitHub repository. One scenario we have been able to consider is terminating Argo CD prior to deleting the NodePool or even patching the NodePools Another option would be to automate commits to our upstream GitHub repositories to comment out the NodePool specification, but was hoping to avoid this, as it will flood our GitHub repository commits with daily shutdown and startup commits. Finally, we considered scaling down Deployments/StatefulSets/Jobs to allow for Karpenter to automatically shutdown NodePools, but again, given majority of the workloads are deployed via Argo CD which will reconcile the replica state (as most of our consumers define a hard-coded replica count in their GitHub repository). We would be left with the same problem above where we would either have to terminate Argo CD so it doesn't re-sync the workloads, or force all of our users to no longer define replica count in their workloads and rely on things like HPAs. The most intuitive option seems to be directly committing changes to the GitHub repository that Argo CD watches but was wondering if anyone has faced similar issues and have any suggestions for alternative approaches to enable granular shutdown of Karpenter provisioned nodes. |
Beta Was this translation helpful? Give feedback.
-
Check ArgoCD's sync windows. We're currently using them to avoid the GitOps reconciliation when scaling down the Deployments off-hours like you mention, but you could also use them to prevent NodePool recreation if you handle those via GitOps. |
Beta Was this translation helpful? Give feedback.
-
|
Hello @ronberna, Am I right that it first set cpu on provisioner to 0 then delete nodes and then delete ec2 instances on AWS? |
Beta Was this translation helpful? Give feedback.
-
|
@Pilotindream I can't say for @ronberna , but I do this as well and yes. This is the right order. I can bring you the shell script tomorrow. In my case, I run it inside my kubernetes cluster as a cronjob, since I have a pair of nodes not managed by karpenter. |
Beta Was this translation helpful? Give feedback.
-
|
@felipewnp, thanks for your reply. It would be nice if you can you me example of script. Will wait for your reply. |
Beta Was this translation helpful? Give feedback.
-
|
@olsib wrote a great blog post on how to scale down to zero (for now) on staging environments https://aircall.io/blog/tech-team-stories/scale-karpenter-zero-optimize-costs/. |
Beta Was this translation helpful? Give feedback.
-
|
@Pilotindream the link provided by @barryib is right, you can go from there! |
Beta Was this translation helpful? Give feedback.
-
Thanks so much for this!, great read! I wonder how would you deal with making sure the CPU limit is in sync with git (especially if it is updated say every few days) ? We did a quick test and saw the cpu limit is never synced to git again (as expected). Is using namespace resource quotas enough in your use case? |
Beta Was this translation helpful? Give feedback.
-
If you use gitops, in the script where you change the karpenter nodepool cpu limit, you could commit the changes to your git repo as well. |
Beta Was this translation helpful? Give feedback.
-
|
@wa20221001 if you use ArgoCD you can use |
Beta Was this translation helpful? Give feedback.
-
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
Beta Was this translation helpful? Give feedback.
-
|
/remove-lifecycle stale |
Beta Was this translation helpful? Give feedback.
-
|
Could this be achieved by temporarily updating all NodePools to have a NoExecute taint so that every deployment gets kicked off and Karpenter will eventually delete the nodes? You'd need drift enabled, but other than that (and no non-daemonset resources that tolerate all taints) I think there are no requirements. That's how I am thinking of doing this until there is a native solution from Karpenter |
Beta Was this translation helpful? Give feedback.
-
|
We have resources with NoExecute tolerations, so this approach wouldn't work for our use case. |
Beta Was this translation helpful? Give feedback.
-
|
I humbly request that future comments consider how the Karpenter would support scaling to the zero natively rather than proposing a workaround solution. This approach would facilitate the maintenance of Karpenter and enhance its support for this natively. |
Beta Was this translation helpful? Give feedback.
-
|
How do you solve the chicken and egg problem of karpenter now been unschedulable because it has no nodes to run on? How do you bring the cluster back? |
Beta Was this translation helpful? Give feedback.
-
Karpetner never manages 100% of nodes in the cluster, this "problem" is also relevant to initial cluster setup, how do you spin up nodes, if karpenter controller cannot reach running state. Personally we use EKS and alway have 1 (auto scaled), very small, managed node, that is taint and can host around 8 system critical pods when karpenter controller is one of them |
Beta Was this translation helpful? Give feedback.
-
|
Found this issue after a bit of a rabbit hole looking to replace our AWS ASG scaling (cluster autoscaler) with karpenter. We want turn off our worker nodes to save on costs when people are sleeping in our non-prod environments. As others note, this has to be external to k8s to avoid the chicken and egg issue (which AWS ASGs deal with via scheduled actions). I'm probably going with an AWS Event bridge (or another scheduler) + AWS Lambda approach that is deployed in the same VPC and has a custom role mapped to the |
Beta Was this translation helpful? Give feedback.
-
|
If you want to try a different approach: Instead of reducing the number of nodes you can downscale all of your workloads and let the Right now I am testing the Kube Downscaler here. It pause jobs, reduce the number of replicas of your workloads to 0 during your defined period or hours. It is a active project and it may be worth checking it out. Regards |
Beta Was this translation helpful? Give feedback.
-
|
Ive done this.
I wrote a script that:
- Set all karpenter nodepools to 0.
- Force remove all pods on all karpenter nodes
- Remove all karpenter nodes.
Em ter., 8 de jul. de 2025, 08:52, Ted Kim ***@***.***>
escreveu:
… *edify42* left a comment (kubernetes-sigs/karpenter#1177)
<#1177 (comment)>
Found this issue after a bit of a rabbit hole looking to replace our AWS
ASG scaling (cluster autoscaler) with karpenter. We want turn off our
worker nodes to save on costs when people are sleeping in our non-prod
environments.
As others note, this has to be external to k8s to avoid the chicken and
egg issue (which AWS ASGs deal with via scheduled actions).
I'm probably going with an AWS Event bridge (or another scheduler) + AWS
Lambda approach that is deployed in the same VPC and has a custom role
mapped to the aws-auth configmap to update the nodepools of interest and
set the CPU to 0. Then find all EC2 instances in the VPC that match the
label from the nodepool and issue a termination. I think we'll still have a
minimal EC2 node(s) deployed via an ASG which we turn off with our existing
pattern.
—
Reply to this email directly, view it on GitHub
<#1177 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIEQPKSLZB5RYHS2HOFNIXT3HOWHJAVCNFSM6AAAAABF5ZKOICVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTANBYGU4TQMZXGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
Beta Was this translation helpful? Give feedback.
-
|
/remove-lifecycle stale |
Beta Was this translation helpful? Give feedback.
-
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
Beta Was this translation helpful? Give feedback.
-
|
/remove-lifecycle stale |
Beta Was this translation helpful? Give feedback.
-
|
Declarative, schedule-based scaling targets isn't something we currently consider in scope for the project. There have been a number of discussions about this in other issues and in working group meetings, and the consensus has been that the correct approach is to drive scale-down via workloads rather than via the node orchestrator. This previous comment summarizes that approach well:
Since we don't currently consider this in scope we're going to convert this to a discussion. We believe this is a better format for gathering data to justify a feature as opposed to a linear issue. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Description
What problem are you trying to solve?
We've recently begun the migration from using ASG's (AutoScaling Groups) and CAS (Cluster AutoScaler) to Karpenter. With ASG's, as part of cost saving measures, our EKS clusters are scaled down during off hours and weekends in lower environments, and then scaled back up during office hours. This was performed by running a lambda at a scheduled time to set the min/max/desired settings of the ASG to 0. The current values of the min/max/desired settings before the update to 0 are captured and stored in ssm. For the scale up, the lambda reads this ssm parameter to set the ASG min/max/desired values. With Karpenter, this is not possible.
As a workaround, we have a lambda that will patch the cpu limit of the nodepool and set it to 0 so that no new Karpenter nodes will be provisioned. The lambda will then take care of deleting the previously provisioned Karpenter nodes. We have a mix of workloads running in the cluster with some using HPA and some not, so trying to scale down all of the deployments to remove the Karpenter provisioned nodes will not work. It has also been suggested to delete the nodepool and reapply it via a cronjob. This option will also not work since some of our clusters are in a controlled environment.
The ask here is to introduce a feature in Karpenter that will handle scaling down/up all Karpenter provisioned nodes on-demand via a flag or possibly with the update of the cpu limit, Karpenter will not provision any new nodes and will also clean up previously provisioned nodes without having to introduce additional cronjobs, lambdas, or deleting nodepools.
How important is this feature to you?
This feature is important as it will help with AWS cost savings by not having EC2 instances running during off hours and not having to add additional components (lambdas, cronjobs, etc...) to aid with scaling Karpenter provisioned instances.
Beta Was this translation helpful? Give feedback.
All reactions