Skip to content

Conversation

@L3n41c
Copy link
Member

@L3n41c L3n41c commented Dec 22, 2025

What does this PR do?

Add a subcommand to the kubectl plugin to uninstall Karpenter on a cluster:

kubectl datadog autoscaling cluster uninstall

Motivation

#2301 added a subcommand to install Karpenter.
This PR introduce a subcommend to do the reverse operation.

Additional Notes

Minimum Agent Versions

Are there minimum versions of the Datadog Agent and/or Cluster Agent required?
No

Describe your test plan

  • Create an EKS cluster;
  • Install Karpenter on it with kubectl datadog autoscaling cluster install;
  • Then validate that it can be uninstalled with kubectl datadog autoscaling cluster uninstall.

Checklist

  • PR has at least one valid label: bug, enhancement, refactoring, documentation, tooling, and/or dependencies
  • PR has a milestone or the qa/skip-qa label

@L3n41c L3n41c added this to the v1.23.0 milestone Dec 22, 2025
@L3n41c L3n41c added enhancement New feature or request wip Work in progress component/plugin labels Dec 22, 2025
@codecov-commenter
Copy link

codecov-commenter commented Dec 22, 2025

Codecov Report

❌ Patch coverage is 2.55906% with 495 lines in your changes missing coverage. Please review.
✅ Project coverage is 38.05%. Comparing base (43cb394) to head (ea85b71).
⚠️ Report is 16 commits behind head on main.

Files with missing lines Patch % Lines
...datadog/autoscaling/cluster/uninstall/uninstall.go 0.00% 307 Missing ⚠️
...adog/autoscaling/cluster/common/clients/clients.go 0.00% 51 Missing ⚠️
...ctl-datadog/autoscaling/cluster/install/install.go 0.00% 46 Missing ⚠️
...g/autoscaling/cluster/common/aws/cloudformation.go 0.00% 31 Missing ⚠️
...datadog/autoscaling/cluster/common/aws/aws-auth.go 0.00% 26 Missing ⚠️
...tl-datadog/autoscaling/cluster/common/helm/helm.go 0.00% 21 Missing ⚠️
...l-datadog/autoscaling/cluster/common/k8s/object.go 0.00% 10 Missing ⚠️
cmd/kubectl-datadog/autoscaling/cluster/cluster.go 0.00% 1 Missing ⚠️
...og/autoscaling/cluster/install/k8s/ec2nodeclass.go 0.00% 1 Missing ⚠️
...atadog/autoscaling/cluster/install/k8s/nodepool.go 0.00% 1 Missing ⚠️

❌ Your patch status has failed because the patch coverage (2.55%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2424      +/-   ##
==========================================
- Coverage   38.15%   38.05%   -0.10%     
==========================================
  Files         292      303      +11     
  Lines       24822    27123    +2301     
==========================================
+ Hits         9470    10323     +853     
- Misses      14638    16029    +1391     
- Partials      714      771      +57     
Flag Coverage Δ
unittests 38.05% <2.55%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...adog/autoscaling/cluster/common/display/display.go 100.00% <100.00%> (ø)
cmd/kubectl-datadog/autoscaling/cluster/cluster.go 0.00% <0.00%> (ø)
...og/autoscaling/cluster/install/k8s/ec2nodeclass.go 0.00% <0.00%> (ø)
...atadog/autoscaling/cluster/install/k8s/nodepool.go 0.00% <0.00%> (ø)
...l-datadog/autoscaling/cluster/common/k8s/object.go 0.00% <0.00%> (ø)
...tl-datadog/autoscaling/cluster/common/helm/helm.go 0.00% <0.00%> (ø)
...datadog/autoscaling/cluster/common/aws/aws-auth.go 34.69% <0.00%> (ø)
...g/autoscaling/cluster/common/aws/cloudformation.go 0.00% <0.00%> (ø)
...ctl-datadog/autoscaling/cluster/install/install.go 16.82% <0.00%> (+3.91%) ⬆️
...adog/autoscaling/cluster/common/clients/clients.go 0.00% <0.00%> (ø)
... and 1 more

... and 31 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 43cb394...ea85b71. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@L3n41c L3n41c force-pushed the lenaic/CASCL-647_karpenter_uninstaller branch from be38b1c to 03da607 Compare December 22, 2025 16:21
@L3n41c
Copy link
Member Author

L3n41c commented Dec 22, 2025

@codex review

@L3n41c L3n41c force-pushed the lenaic/CASCL-647_karpenter_uninstaller branch from 03da607 to f208c10 Compare December 22, 2025 16:24
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@L3n41c
Copy link
Member Author

L3n41c commented Dec 23, 2025

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@L3n41c L3n41c force-pushed the lenaic/CASCL-647_karpenter_uninstaller branch from 87a978b to 13a542e Compare January 12, 2026 06:51
@L3n41c L3n41c force-pushed the lenaic/CASCL-647_karpenter_uninstaller branch from 13a542e to f580791 Compare January 12, 2026 06:53
@L3n41c
Copy link
Member Author

L3n41c commented Jan 12, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f580791782

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@L3n41c
Copy link
Member Author

L3n41c commented Jan 12, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4c5bbded96

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@L3n41c
Copy link
Member Author

L3n41c commented Jan 12, 2026

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@L3n41c L3n41c force-pushed the lenaic/CASCL-647_karpenter_uninstaller branch from 8257bb2 to a2f039c Compare January 12, 2026 12:58
@L3n41c
Copy link
Member Author

L3n41c commented Jan 13, 2026

@codex review

@L3n41c L3n41c removed the wip Work in progress label Jan 13, 2026
@L3n41c L3n41c marked this pull request as ready for review January 13, 2026 11:18
@L3n41c L3n41c requested review from a team as code owners January 13, 2026 11:18
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9881734887

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 166 to 170
if configFlags.Context != nil {
kubeContext = *configFlags.Context
}

restClientGetter := kube.GetConfig(kubeConfig, kubeContext, namespace)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Respect full kubeconfig overrides for Helm config

This uses kube.GetConfig(kubeConfig, kubeContext, namespace), which only honors the kubeconfig path and context and ignores other ConfigFlags overrides like --server, --token, --certificate-authority, --insecure-skip-tls-verify, or impersonation flags. In environments where users rely on those flags (e.g., CI or ephemeral clusters without a kubeconfig file), the Helm install/uninstall will silently target the default kubeconfig context or fail to authenticate, causing operations to hit the wrong cluster or error. Consider passing the ConfigFlags RESTClientGetter directly so all overrides are respected.

Useful? React with 👍 / 👎.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 423 to 426
{
Name: awssdk.String("tag-key"),
Values: []string{"karpenter.sh/nodepool"},
},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Scope EC2 wait to managed Karpenter nodepools

The EC2 filter matches any instance with the karpenter.sh/nodepool tag in the cluster, regardless of whether the nodepool was created by this tool. If the cluster has other Karpenter nodepools not labeled for this uninstall, the wait loop will continue counting those nodes and likely time out, yet the uninstall still proceeds to remove Helm/CloudFormation resources, potentially leaving unrelated nodepools orphaned. The wait should be limited to the nodepools this command deletes (e.g., filter by tag value list) or explicitly handle non-managed pools.

Useful? React with 👍 / 👎.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 75 to 83
found := false
updatedRoles := make([]RoleMapping, 0, len(roles))
for _, role := range roles {
if role.RoleArn == roleArn {
found = true
continue
}
updatedRoles = append(updatedRoles, role)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could use slices.DeleteFunc:

oldLen := len(roles)
roles = slices.DeleteFunc(roles, func(role RoleMapping) bool { return role.RoleArn == roleArn })
found := oldLen != len(roles)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{
name: "single line",
lines: []string{"Hello"},
expected: "╭───────╮\n" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could line them up nicely via empty string prefix:

			expected: "" +
				"╭──────────╮\n" +
				"│ Hello 🎉 │\n" +
				"│ World    │\n" +
				"╰──────────╯\n",

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +66 to +69
if apierrors.IsNotFound(err) {
log.Printf("%s %s not found, skipping deletion.", object.GetObjectKind().GroupVersionKind().Kind, object.GetName())
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a good idea to mask 404? E.g. update bubbles it up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inside the Delete function the goal of which is to ensure that the given object is eventually not existing anymore.
If the given object cannot be found, it is already not existing, then the system is already in the expected state.
The promise of the Delete function, which is: “The object will eventually be deleted” is already kept.
Such a situation shouldn’t be considered as an error.

The question might then be: why would we try to delete an object that doesn’t exist?
Well, this might happen if a user tries to manually delete some objects concurrently to the execution of this script.
One could argue that it isn’t a good idea to do manual actions that might conflict with a running script.
But if we can handle this case gracefully, that’s still better.

// Accumulate errors from cleanup steps - continue on failure to clean up as much as possible
var errs []error

if err = deleteKarpenterNodePools(ctx, cli); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to proceed in case of error? Maybe its better to bail out an let user resolve some steps manually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as cleanup is concerned, I preferred to have a best-effort approach, trying to clean up as much things as possible.
The errors won’t be lost and they will eventually be logged at the end.

I wanted to avoid that a failure on one step prevents other steps.

Let’s imagine a situation where the user has broken their Kubernetes cluster so that the API server isn’t reachable anymore and they want to destroy their broken cluster and to cleanup everything.

If the Kubernetes cluster is so broken that the API server isn’t reachable, we won’t be able to list and delete the Karpenter node pools.
This isn’t a big deal because the user will destroy the whole cluster anyway.
Yet, we still need to destroy the CloudFormation stacks that we created.
If our “uninstall” command exits at the first error, it means that it won’t be helpful to clean AWS objects when the K8s cluster isn’t reachable.

We can also imagine a situation where a user first deletes their Kubernetes cluster and then realizes afterwards that kubectl datadog autoscaling cluster uninstall still needs to be executed to clean AWS objects up.
Then, this command should be able to delete the AWS objects even if the K8s cluster isn’t reachable anymore.

That’s why I think that a failure at one cleanup step shouldn’t prevent the other steps from running.
Otherwise, there will be some situations where the user won’t be able to use that script to clean things up.

func displayResourceSummary(ctx context.Context, cmd *cobra.Command, cli *clients.Clients, clusterName string) []string {
cmd.Println("\nThis will delete:")

if nodePools, err := listKarpenterNodePools(ctx, cli); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a good sign that something is off? Maybe its better to bail out if we can not list stuff to delete.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially changed the behavior in f8616f7#diff-a42350cb037bad7adef8e1396310575c39180e73ea1181e182fa3179a7464f32 to make the script abort if it’s unable to list stuff to delete.

But the more I think about it, the more I think it’s a mistake.
As explained in my above comment, I want this uninstall command to still be useful for users to clean up the AWS objects even if they deleted their Kubernetes cluster first.

@L3n41c
Copy link
Member Author

L3n41c commented Jan 16, 2026

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. More of your lovely PRs please.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@L3n41c
Copy link
Member Author

L3n41c commented Jan 16, 2026

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/plugin enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants