Skip to content

Conversation

@mergify
Copy link
Contributor

@mergify mergify bot commented Apr 8, 2025

What is the problem this PR solves?

upgrade_attempts were not cleared correctly when agent doesn't have upgrade_details (for example in horde or older versions).

How does this PR solve the problem?

Clear upgrade_attempts when upgrade is acked (at the same time when upgrade_started_at field is cleared).

How to test this PR locally

Test with agent policy with auto upgrade config and a few horde agents enrolled. Verify that after the upgrade completed, upgrade_attempts is set to null.
The upgrade_attempts field is only cleared if there is no upgrade_details. Tested with a real agent upgraded to a non-existent version, the agent going to UPG_FAILED state.

Design Checklist

  • I have ensured my design is stateless and will work when multiple fleet-server instances are behind a load balancer.
  • I have or intend to scale test my changes, ensuring it will work reliably with 100K+ agents connected.
  • I have included fail safe mechanisms to limit the load on fleet-server: rate limiting, circuit breakers, caching, load shedding, etc.

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool

Related issues

Relates #4528
Relates elastic/kibana#212744


This is an automatic backport of pull request #4762 done by [Mergify](https://mergify.com).

Rebased #4777 as this pr depends on it.

* clear upgrade_attempts on handleAck

* clear upgrade_attempts if upgrade_details is missing

* added unit test

(cherry picked from commit fb093cc)
@mergify mergify bot requested a review from a team as a code owner April 8, 2025 09:09
@mergify mergify bot added the backport label Apr 8, 2025
@mergify mergify bot requested review from michalpristas and pchila April 8, 2025 09:09
@github-actions github-actions bot added enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Apr 8, 2025
* Clear agent.upgrade_attemps on upgrade complete

* This actually works

* Silence nolintlint error in handleCheckin.go

* Remove nolint comment altogether

* Add changelog

* Update handleCheckin unit test

* Change approach

* Revert unit test change

* This seems needed

* Run make generate

* Remove internal link

* add unit test

* reduce complexity

* return nil if action is nil

---------

Co-authored-by: Julia Bardi <[email protected]>
Co-authored-by: Julia Bardi <[email protected]>
(cherry picked from commit 2b40416)

Co-authored-by: Jill Guyonnet <[email protected]>
@elastic-sonarqube
Copy link

@juliaElastic
Copy link
Contributor

@michalpristas Hey, could you review this backport?

@juliaElastic juliaElastic merged commit 561841c into 8.x Apr 9, 2025
9 checks passed
@juliaElastic juliaElastic deleted the mergify/bp/8.x/pr-4762 branch April 9, 2025 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants