Skip to content

Conversation

@mergify
Copy link
Contributor

@mergify mergify bot commented Apr 8, 2025

What is the problem this PR solves?

elastic/kibana#212744 adds retry logic to the task that automatically ugprades agents. Agents that were upgraded through this task have their new upgrade_attempts property populated. It is missing a way to clear this property when the upgrade completes successfully.

How does this PR solve the problem?

The change in this PR clears upgrade_attempts when the upgrade details of the agent get into UPG_WATCHING state and are processed in handleCheckin.

How to test this PR locally

This should be tested alongside elastic/kibana#212744 (or after it is merged - this is fine, since automatic upgrades are currently behind the enableAutomaticAgentUpgrades feature flag). With this change, agents upgraded through the automatic upgrade task should have their upgrade_attempts property set to null when the upgrade is successful.

Testing should also validate that upgrade_attempts stays set if the upgrade failed, e.g. after requesting an upgrade to an invalid version.

Design Checklist

  • I have ensured my design is stateless and will work when multiple fleet-server instances are behind a load balancer.
  • I have or intend to scale test my changes, ensuring it will work reliably with 100K+ agents connected.
  • I have included fail safe mechanisms to limit the load on fleet-server: rate limiting, circuit breakers, caching, load shedding, etc.

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool

Related issues

Relates https://github.com/elastic/ingest-dev/issues/4720


This is an automatic backport of pull request #4528 done by Mergify.

* Clear agent.upgrade_attemps on upgrade complete

* This actually works

* Silence nolintlint error in handleCheckin.go

* Remove nolint comment altogether

* Add changelog

* Update handleCheckin unit test

* Change approach

* Revert unit test change

* This seems needed

* Run make generate

* Remove internal link

* add unit test

* reduce complexity

* return nil if action is nil

---------

Co-authored-by: Julia Bardi <[email protected]>
Co-authored-by: Julia Bardi <[email protected]>
(cherry picked from commit 2b40416)
@mergify mergify bot requested a review from a team as a code owner April 8, 2025 09:09
@mergify mergify bot added the backport label Apr 8, 2025
@mergify mergify bot requested review from andrzej-stencel and swiatekm April 8, 2025 09:09
@github-actions github-actions bot added enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Apr 8, 2025
@elastic-sonarqube
Copy link

@juliaElastic juliaElastic merged commit fa543ec into 8.x Apr 8, 2025
10 checks passed
@juliaElastic juliaElastic deleted the mergify/bp/8.x/pr-4528 branch April 8, 2025 10:35
juliaElastic pushed a commit that referenced this pull request Apr 8, 2025
* Clear agent.upgrade_attemps on upgrade complete

* This actually works

* Silence nolintlint error in handleCheckin.go

* Remove nolint comment altogether

* Add changelog

* Update handleCheckin unit test

* Change approach

* Revert unit test change

* This seems needed

* Run make generate

* Remove internal link

* add unit test

* reduce complexity

* return nil if action is nil

---------

Co-authored-by: Julia Bardi <[email protected]>
Co-authored-by: Julia Bardi <[email protected]>
(cherry picked from commit 2b40416)

Co-authored-by: Jill Guyonnet <[email protected]>
juliaElastic added a commit that referenced this pull request Apr 9, 2025
* Clear upgrade_attempts on handleAck (#4762)

* clear upgrade_attempts on handleAck

* clear upgrade_attempts if upgrade_details is missing

* added unit test

(cherry picked from commit fb093cc)

* Clear agent.upgrade_attempts on upgrade complete (#4528) (#4777)

* Clear agent.upgrade_attemps on upgrade complete

* This actually works

* Silence nolintlint error in handleCheckin.go

* Remove nolint comment altogether

* Add changelog

* Update handleCheckin unit test

* Change approach

* Revert unit test change

* This seems needed

* Run make generate

* Remove internal link

* add unit test

* reduce complexity

* return nil if action is nil

---------

Co-authored-by: Julia Bardi <[email protected]>
Co-authored-by: Julia Bardi <[email protected]>
(cherry picked from commit 2b40416)

Co-authored-by: Jill Guyonnet <[email protected]>

---------

Co-authored-by: Julia Bardi <[email protected]>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Jill Guyonnet <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants