Skip to content

Conversation

@AndiDog
Copy link
Contributor

@AndiDog AndiDog commented Jun 11, 2025

(reopened from #5173 because I didn't have enough permissions to force-push)

What type of PR is this?

What this PR does / why we need it:

Changing any relevant spec.* for an AWSMachinePool triggers rolling of nodes via ASG instance refresh. If another change happens shortly afterwards, it has to wait until the first rollout is done, and will then trigger another instance refresh. But it is neither necessary nor desired to roll all worker nodes twice in such a case, and it's much slower. Instead, cancel the first pending instance refresh, wait until another one can be started, and apply the latest change as soon as possible with the second instance refresh.

This change has been running fine in Giant Swarm's CAPA fork for almost a year at the time of opening this PR.

Checklist:

  • squashed commits
  • includes documentation
  • includes emoji in title
  • adds unit tests
  • adds or updates e2e tests

Release note:

Cancel instance refresh on any relevant change to ASG instead of blocking until previous one is finished

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 11, 2025
@k8s-ci-robot k8s-ci-robot added needs-priority size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 11, 2025
@AndiDog AndiDog force-pushed the cancel-instance-refresh branch from 56f83a4 to a99b8ca Compare June 11, 2025 15:23
@AndiDog
Copy link
Contributor Author

AndiDog commented Jun 12, 2025

/test pull-cluster-api-provider-aws-e2e
/test pull-cluster-api-provider-aws-e2e-eks

@AndiDog AndiDog force-pushed the cancel-instance-refresh branch from a99b8ca to af59d86 Compare June 24, 2025 07:02
@AndiDog AndiDog force-pushed the cancel-instance-refresh branch 2 times, most recently from 615c57c to d4d3662 Compare July 2, 2025 14:41
@AndiDog
Copy link
Contributor Author

AndiDog commented Jul 16, 2025

@richardcase I think you have some machine pool experience and could maybe review this?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 29, 2025
…king until previous one is finished (which may have led to failing nodes due to outdated join token)
@AndiDog AndiDog force-pushed the cancel-instance-refresh branch from d4d3662 to baf3527 Compare August 28, 2025 14:15
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 28, 2025
@AndiDog
Copy link
Contributor Author

AndiDog commented Aug 28, 2025

Rebased onto AWS SDK Go v2 changes

/test pull-cluster-api-provider-aws-e2e
/test pull-cluster-api-provider-aws-e2e-eks

@AndiDog
Copy link
Contributor Author

AndiDog commented Aug 28, 2025

/test pull-cluster-api-provider-aws-e2e
/test pull-cluster-api-provider-aws-e2e-eks

@AndiDog
Copy link
Contributor Author

AndiDog commented Aug 30, 2025

Rate limit errors in the test, trying again

/test pull-cluster-api-provider-aws-e2e

Copy link
Contributor

@fiunchinho fiunchinho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 18, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 524f880dcd8a5eff0e7f608006e4e86f3fc3786c

@richardcase
Copy link
Member

@AndiDog - do you think there is any benefit in adding an e2e that covers this scenario?

@AndiDog
Copy link
Contributor Author

AndiDog commented Oct 1, 2025

@AndiDog - do you think there is any benefit in adding an e2e that covers this scenario?

@richardcase I'd say the feature is not that critical to require an E2E test. The behavior is well-covered in exp/controllers/awsmachinepool_controller_test.go (and I'm saying that as a huge fan of those quickly-running, mock-based unit tests because they're much easier to write).

@AndiDog
Copy link
Contributor Author

AndiDog commented Oct 8, 2025

@richardcase Do you see the E2E test as required? Or other concerns for the merge?

Btw this feature has been working for much over a year in our fork, so I'm very confident.

@Ankitasw
Copy link
Member

/approve

If E2E is needed, we can add it in follow up PR

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Ankitasw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 15, 2025
@k8s-ci-robot k8s-ci-robot merged commit 15a2d14 into kubernetes-sigs:main Oct 15, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/machinepool cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants