-
Notifications
You must be signed in to change notification settings - Fork 130
[CASCL-647] Add a subcommand to kubectl plugin to uninstall Karpenter #2424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
be38b1c to
03da607
Compare
|
@codex review |
…c/CASCL-647_karpenter_uninstaller
03da607 to
f208c10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…c/CASCL-647_karpenter_uninstaller
87a978b to
13a542e
Compare
13a542e to
f580791
Compare
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f580791782
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4c5bbded96
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
|
Codex Review: Didn't find any major issues. 👍 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
8257bb2 to
a2f039c
Compare
…c/CASCL-647_karpenter_uninstaller
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9881734887
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if configFlags.Context != nil { | ||
| kubeContext = *configFlags.Context | ||
| } | ||
|
|
||
| restClientGetter := kube.GetConfig(kubeConfig, kubeContext, namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Respect full kubeconfig overrides for Helm config
This uses kube.GetConfig(kubeConfig, kubeContext, namespace), which only honors the kubeconfig path and context and ignores other ConfigFlags overrides like --server, --token, --certificate-authority, --insecure-skip-tls-verify, or impersonation flags. In environments where users rely on those flags (e.g., CI or ephemeral clusters without a kubeconfig file), the Helm install/uninstall will silently target the default kubeconfig context or fail to authenticate, causing operations to hit the wrong cluster or error. Consider passing the ConfigFlags RESTClientGetter directly so all overrides are respected.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| { | ||
| Name: awssdk.String("tag-key"), | ||
| Values: []string{"karpenter.sh/nodepool"}, | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scope EC2 wait to managed Karpenter nodepools
The EC2 filter matches any instance with the karpenter.sh/nodepool tag in the cluster, regardless of whether the nodepool was created by this tool. If the cluster has other Karpenter nodepools not labeled for this uninstall, the wait loop will continue counting those nodes and likely time out, yet the uninstall still proceeds to remove Helm/CloudFormation resources, potentially leaving unrelated nodepools orphaned. The wait should be limited to the nodepools this command deletes (e.g., filter by tag value list) or explicitly handle non-managed pools.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| found := false | ||
| updatedRoles := make([]RoleMapping, 0, len(roles)) | ||
| for _, role := range roles { | ||
| if role.RoleArn == roleArn { | ||
| found = true | ||
| continue | ||
| } | ||
| updatedRoles = append(updatedRoles, role) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could use slices.DeleteFunc:
oldLen := len(roles)
roles = slices.DeleteFunc(roles, func(role RoleMapping) bool { return role.RoleArn == roleArn })
found := oldLen != len(roles)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s much better! Thanks!
Implemented in f8616f7#diff-75e34b5bb02db7ae8e1635b717b48c7090c87a7f1a2d2a63b955847f3db9acd3
| { | ||
| name: "single line", | ||
| lines: []string{"Hello"}, | ||
| expected: "╭───────╮\n" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could line them up nicely via empty string prefix:
expected: "" +
"╭──────────╮\n" +
"│ Hello 🎉 │\n" +
"│ World │\n" +
"╰──────────╯\n",There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
Implemented in f8616f7#diff-774caa6680aa940302346b336b4cb70b818f61b82b1cab7ecd107ff245fc540a
| if apierrors.IsNotFound(err) { | ||
| log.Printf("%s %s not found, skipping deletion.", object.GetObjectKind().GroupVersionKind().Kind, object.GetName()) | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a good idea to mask 404? E.g. update bubbles it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is inside the Delete function the goal of which is to ensure that the given object is eventually not existing anymore.
If the given object cannot be found, it is already not existing, then the system is already in the expected state.
The promise of the Delete function, which is: “The object will eventually be deleted” is already kept.
Such a situation shouldn’t be considered as an error.
The question might then be: why would we try to delete an object that doesn’t exist?
Well, this might happen if a user tries to manually delete some objects concurrently to the execution of this script.
One could argue that it isn’t a good idea to do manual actions that might conflict with a running script.
But if we can handle this case gracefully, that’s still better.
| // Accumulate errors from cleanup steps - continue on failure to clean up as much as possible | ||
| var errs []error | ||
|
|
||
| if err = deleteKarpenterNodePools(ctx, cli); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to proceed in case of error? Maybe its better to bail out an let user resolve some steps manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as cleanup is concerned, I preferred to have a best-effort approach, trying to clean up as much things as possible.
The errors won’t be lost and they will eventually be logged at the end.
I wanted to avoid that a failure on one step prevents other steps.
Let’s imagine a situation where the user has broken their Kubernetes cluster so that the API server isn’t reachable anymore and they want to destroy their broken cluster and to cleanup everything.
If the Kubernetes cluster is so broken that the API server isn’t reachable, we won’t be able to list and delete the Karpenter node pools.
This isn’t a big deal because the user will destroy the whole cluster anyway.
Yet, we still need to destroy the CloudFormation stacks that we created.
If our “uninstall” command exits at the first error, it means that it won’t be helpful to clean AWS objects when the K8s cluster isn’t reachable.
We can also imagine a situation where a user first deletes their Kubernetes cluster and then realizes afterwards that kubectl datadog autoscaling cluster uninstall still needs to be executed to clean AWS objects up.
Then, this command should be able to delete the AWS objects even if the K8s cluster isn’t reachable anymore.
That’s why I think that a failure at one cleanup step shouldn’t prevent the other steps from running.
Otherwise, there will be some situations where the user won’t be able to use that script to clean things up.
| func displayResourceSummary(ctx context.Context, cmd *cobra.Command, cli *clients.Clients, clusterName string) []string { | ||
| cmd.Println("\nThis will delete:") | ||
|
|
||
| if nodePools, err := listKarpenterNodePools(ctx, cli); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a good sign that something is off? Maybe its better to bail out if we can not list stuff to delete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially changed the behavior in f8616f7#diff-a42350cb037bad7adef8e1396310575c39180e73ea1181e182fa3179a7464f32 to make the script abort if it’s unable to list stuff to delete.
But the more I think about it, the more I think it’s a mistake.
As explained in my above comment, I want this uninstall command to still be useful for users to clean up the AWS objects even if they deleted their Kubernetes cluster first.
|
@codex review |
|
Codex Review: Didn't find any major issues. More of your lovely PRs please. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@codex review |
|
Codex Review: Didn't find any major issues. Chef's kiss. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
What does this PR do?
Add a subcommand to the kubectl plugin to uninstall Karpenter on a cluster:
Motivation
#2301 added a subcommand to install Karpenter.
This PR introduce a subcommend to do the reverse operation.
Additional Notes
Minimum Agent Versions
Are there minimum versions of the Datadog Agent and/or Cluster Agent required?
No
Describe your test plan
kubectl datadog autoscaling cluster install;kubectl datadog autoscaling cluster uninstall.Checklist
bug,enhancement,refactoring,documentation,tooling, and/ordependenciesqa/skip-qalabel