-
Notifications
You must be signed in to change notification settings - Fork 203
Support manual rollback for Fleet-managed agents #11143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This pull request does not have a backport label. Could you fix it @pchila? 🙏
|
|
This pull request is now in conflicts. Could you fix it? 🙏 |
2d1051f to
d79ff2e
Compare
d79ff2e to
cbd20bb
Compare
💛 Build succeeded, but was flaky
Failed CI StepsHistory
cc @pchila |
|
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
swiatekm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tried to manually test this, but the logic looks good to me.
What does this PR do?
Introduces manual rollback for managed agents (requires elastic/fleet-server#5975 on fleet-server side)
Why is it important?
To implement manual rollback feature for Fleet-managed agents as it has been implemented for standalone agents in #9643
Checklist
./changelog/fragmentsusing the changelog toolDisruptive User Impact
How to test this PR locally
Prerequisites
In order to test this PR we need to use a fleet-server that contains the changes of PR elastic/fleet-server#5975.
In case the fleet server PR isn't merged yet we can build a docker image (to create a stack on ECH CFT region or using
elastic-package) and a zip/tar.gz including a local fleet-server artifact from a local build as follows (MANIFEST_URL, PLATFORMS, AGENT_DROP_PATH location can be changed/updated as needed)Alternatively, if fleet-server is already available in the manifest pointed at by
.package-versionwe only need to create 2 elastic-agents archives with a simpler commandbeats/elastic-agentdirectory tree underbuild/distributions(could be a different root folder if so preferred)beats/elastic-agentcp elastic-agent-9.3.0+build20251125-SNAPSHOT-linux-x86_64.tar.gz* beats/elastic-agent0001-DO-NOT-MERGE-Test-commit-to-skip-verifying-upgrade-p.patch
or setup an alternative PGP key to sign and verify the packages produces from this PR
Testing
integration_server.config.docker_imagevalue)Agent binary download9.3.0-SNAPSHOT.fleet-agentfrom the Dev Console...in the example below for brevity), notice the keyavailable_rollbacks{ "_index": ".fleet-agents-7", "_id": "8432a3dc-0b75-43f2-9fac-87f8701b82a2", "_score": 0.6931471, "_source": { "access_api_key_id": "OLrfxZoBUWrzn3OjJDDo", "action_seq_no": [ -1 ], "active": true, "agent": { "id": "8432a3dc-0b75-43f2-9fac-87f8701b82a2", "version": "9.3.0" }, "enrolled_at": "2025-11-27T15:12:06Z", "local_metadata": { ... }, "namespaces": [ "default" ], "policy_id": "6eeca4e4-79eb-477f-8f60-73b4c04b6be2", "type": "PERMANENT", "outputs": { ... }, "policy_revision_idx": 2, "updated_at": "2025-11-27T15:12:53Z", "available_rollbacks": [], "components": [ ... ], "last_checkin_message": "Running", "last_checkin_status": "online", "last_checkin": "2025-11-27T15:12:44Z", "unhealthy_reason": null, "last_known_status": "online" } }9.3.0+build20251125-SNAPSHOT(or whatever version has been used for repackaging the agent) via the Fleet UIand wait till the agent restarts with the new version and is in state
Upgrade monitoring.fleet-agents{ "_index": ".fleet-agents-7", "_id": "8432a3dc-0b75-43f2-9fac-87f8701b82a2", "_score": 0.6931471, "_ignored": [ "local_metadata.elastic.agent.version.keyword", "upgrade_details.target_version.keyword" ], "_source": { "access_api_key_id": "OLrfxZoBUWrzn3OjJDDo", "action_seq_no": [ 1 ], "active": true, "agent": { "id": "8432a3dc-0b75-43f2-9fac-87f8701b82a2", "version": "9.3.0+build20251125" }, "enrolled_at": "2025-11-27T15:12:06Z", "local_metadata": { ... }, "namespaces": [ "default" ], "policy_id": "6eeca4e4-79eb-477f-8f60-73b4c04b6be2", "type": "PERMANENT", "outputs": { ... }, "policy_revision_idx": 4, "updated_at": "2025-11-27T15:35:03Z", "available_rollbacks": [ { "valid_until": "2025-11-27T15:48:59Z", "version": "9.3.0-SNAPSHOT" } ], "components": [], "last_checkin_message": "Running", "last_checkin_status": "online", "last_checkin": "2025-11-27T15:35:02Z", "unhealthy_reason": [ "output" ], "last_known_status": "online", "upgrade_started_at": null, "upgraded_at": "2025-11-27T15:34:00Z", "upgrade_details": { "metadata": { "download_percent": 1 }, "action_id": "e26c5b33-5ab3-44a1-9f4a-163832603b0f", "state": "UPG_WATCHING", "target_version": "9.3.0+build20251125" }, "upgrade_status": null } }available_rollbacksshows9.3.0-SNAPSHOTas a possible rollback target.valid_untilattribute), let's manually rollback the agent.fleet-agents{ "_index": ".fleet-agents-7", "_id": "8432a3dc-0b75-43f2-9fac-87f8701b82a2", "_score": 0.53899646, "_source": { "access_api_key_id": "OLrfxZoBUWrzn3OjJDDo", "action_seq_no": [ 3 ], "active": true, "agent": { "id": "8432a3dc-0b75-43f2-9fac-87f8701b82a2", "version": "9.3.0" }, "enrolled_at": "2025-11-27T15:12:06Z", "local_metadata": { ... }, "namespaces": [ "default" ], "policy_id": "6eeca4e4-79eb-477f-8f60-73b4c04b6be2", "type": "PERMANENT", "outputs": { ... }, "policy_revision_idx": 4, "updated_at": "2025-11-27T15:45:23Z", "available_rollbacks": [], "components": [], "last_checkin_message": "Running", "last_checkin_status": "online", "last_checkin": "2025-11-27T15:45:14Z", "unhealthy_reason": [ "output" ], "last_known_status": "online", "upgrade_started_at": null, "upgraded_at": "2025-11-27T15:45:12Z", "upgrade_details": { "metadata": { "reason": "manual rollback requested to version 9.3.0-SNAPSHOT", }, "action_id": "f46e5d9e-3418-48dc-a18b-f7a8bfae8cd9", "state": "UPG_ROLLBACK", "target_version": "9.3.0-SNAPSHOT" }, "upgrade_status": null } }available_rollbacksis again empty andupgrade_detailsreports the stateUPG_ROLLBACKwith a specificreasonUpgrade failedmessage (with correct message `` on theitooltip which is incredibly difficult to screenshot)The same info can be found in the `.fleet-agents` document in the `upgrade_details` section
Related issues
Questions to ask yourself