Skip to content

Commit 2f55098

Browse files
authored
Preserve upgrade marker when rolling back and add rollback reason to upgrade details (#8407)
A new reason string field has been added in UpgradeDetails struct. Such field should contain a human-readable description of the cause of the last state transition. In the specific case this is used to distinguish how elastic-agent got to the UPG_ROLLBACK state: the only possible reason at the moment is an automatic rollback initiated by the elastic-agent watcher, but this could be used for any state transition This PR will allow to to distinguish if the cause of an UPG_ROLLBACK status is: - the new agent misbehaving (automatic rollback) - a manual rollback (will be introduced with "Handle rollback command/action in upgrade flow" #6889) * Add reason for rolled back upgrades * Don't immediately remove update marker in case of rollback * Extract repackaging utility functions * Fix assertion about upgrade marker surviving rollback * Watcher should recognize terminal upgrade states * test watch command Add some unit tests for watchCmd function, covering the most common cases. This has required some refactoring to properly assert side effects and interactions. * Add managed agent rollback integration test
1 parent 99daceb commit 2f55098

37 files changed

+1533
-586
lines changed

.mockery.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,3 +29,11 @@ packages:
2929
github.com/elastic/elastic-agent/internal/pkg/agent/application/info:
3030
interfaces:
3131
Agent:
32+
github.com/elastic/elastic-agent/internal/pkg/agent/cmd:
33+
interfaces:
34+
agentWatcher:
35+
config:
36+
mockname: "AgentWatcher"
37+
installationModifier:
38+
config:
39+
mockname: "InstallationModifier"

NOTICE-fips.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1254,11 +1254,11 @@ SOFTWARE
12541254

12551255
--------------------------------------------------------------------------------
12561256
Dependency : github.com/elastic/elastic-agent-libs
1257-
Version: v0.20.1
1257+
Version: v0.21.0
12581258
Licence type (autodetected): Apache-2.0
12591259
--------------------------------------------------------------------------------
12601260

1261-
Contents of probable licence file $GOMODCACHE/github.com/elastic/elastic-agent-libs@v0.20.1/LICENSE:
1261+
Contents of probable licence file $GOMODCACHE/github.com/elastic/elastic-agent-libs@v0.21.0/LICENSE:
12621262

12631263
Apache License
12641264
Version 2.0, January 2004

NOTICE.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1254,11 +1254,11 @@ SOFTWARE
12541254

12551255
--------------------------------------------------------------------------------
12561256
Dependency : github.com/elastic/elastic-agent-libs
1257-
Version: v0.20.1
1257+
Version: v0.21.0
12581258
Licence type (autodetected): Apache-2.0
12591259
--------------------------------------------------------------------------------
12601260

1261-
Contents of probable licence file $GOMODCACHE/github.com/elastic/elastic-agent-libs@v0.20.1/LICENSE:
1261+
Contents of probable licence file $GOMODCACHE/github.com/elastic/elastic-agent-libs@v0.21.0/LICENSE:
12621262

12631263
Apache License
12641264
Version 2.0, January 2004
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Kind can be one of:
2+
# - breaking-change: a change to previously-documented behavior
3+
# - deprecation: functionality that is being removed in a later release
4+
# - bug-fix: fixes a problem in a previous version
5+
# - enhancement: extends functionality but does not break or fix existing behavior
6+
# - feature: new functionality
7+
# - known-issue: problems that we are aware of in a given version
8+
# - security: impacts on the security of a product or a user’s deployment.
9+
# - upgrade: important information for someone upgrading from a prior version
10+
# - other: does not fit into any of the other categories
11+
kind: feature
12+
13+
# Change summary; a 80ish characters long description of the change.
14+
summary: Preserve upgrade marker when rolling back upgrade and add rollback reason
15+
16+
# Long description; in case the summary is not enough to describe the change
17+
# this field accommodate a description without length limits.
18+
# NOTE: This field will be rendered only for breaking-change and known-issue kinds at the moment.
19+
description: Upgrade marker is now preserved by the watcher when performing a rollback and a new `reason`
20+
field is added to the upgrade details structure. The reason for keeping the upgrade marker when rolling back is
21+
to allow the rolled back agent to read the rollback reason and communicate that to user/Fleet.
22+
23+
# Affected component; usually one of "elastic-agent", "fleet-server", "filebeat", "metricbeat", "auditbeat", "all", etc.
24+
component: elastic-agent
25+
26+
# PR URL; optional; the PR number that added the changeset.
27+
# If not present is automatically filled by the tooling finding the PR where this changelog fragment has been added.
28+
# NOTE: the tooling supports backports, so it's able to fill the original PR number instead of the backport PR number.
29+
# Please provide it if you are adding a fragment for a different PR.
30+
pr: https://github.com/elastic/elastic-agent/8407
31+
32+
# Issue URL; optional; the GitHub issue related to this changeset (either closes or is part of).
33+
# If not present is automatically filled by the tooling with the issue linked to the PR number.
34+
#issue: https://github.com/owner/repo/1234

control_v2.proto

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -265,6 +265,10 @@ message UpgradeDetailsMetadata {
265265
// The deadline until when a retryable upgrade step, e.g. the download
266266
// step, will be retried.
267267
string retry_until = 6;
268+
269+
// Reason is a string that may give out more information about transitioning to the current state.
270+
// It has been introduced initially to distinguish between manual and automatic rollbacks
271+
string reason = 7;
268272
}
269273

270274
// DiagnosticFileResult is a file result from a diagnostic result.

go.mod

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ require (
1818
github.com/elastic/cloud-on-k8s/v2 v2.0.0-20250327073047-b624240832ae
1919
github.com/elastic/elastic-agent-autodiscover v0.9.2
2020
github.com/elastic/elastic-agent-client/v7 v7.17.2
21-
github.com/elastic/elastic-agent-libs v0.20.1
21+
github.com/elastic/elastic-agent-libs v0.21.0
2222
github.com/elastic/elastic-agent-system-metrics v0.11.16
2323
github.com/elastic/elastic-transport-go/v8 v8.7.0
2424
github.com/elastic/go-elasticsearch/v8 v8.18.1

go.sum

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -522,8 +522,8 @@ github.com/elastic/elastic-agent-autodiscover v0.9.2 h1:eBmru2v66HRRHOFf89rDl9OZ
522522
github.com/elastic/elastic-agent-autodiscover v0.9.2/go.mod h1:RNaHnOTYfNptSTQUyZYnjypxmrR5AaE6BIap/175F5c=
523523
github.com/elastic/elastic-agent-client/v7 v7.17.2 h1:Cl2TeABqWZgW40t5fchGWT/sRk4MDDLWA0d8iHHOxLA=
524524
github.com/elastic/elastic-agent-client/v7 v7.17.2/go.mod h1:5irRFqp6HLqtu1S+OeY0jg8x7K6PLL+DW+PwVk1vJnk=
525-
github.com/elastic/elastic-agent-libs v0.20.1 h1:M7ZME7yctVhI9349OiG0VQ8a+RsuParA/ZUgCuctwBE=
526-
github.com/elastic/elastic-agent-libs v0.20.1/go.mod h1:xSeIP3NtOIT4N2pPS4EyURmS1Q8mK0lWZ8Wd1Du6q3w=
525+
github.com/elastic/elastic-agent-libs v0.21.0 h1:lt2Xc87Si0mea0BgRKZGZA30j8LEx57k7GAyiKmZP/8=
526+
github.com/elastic/elastic-agent-libs v0.21.0/go.mod h1:xSeIP3NtOIT4N2pPS4EyURmS1Q8mK0lWZ8Wd1Du6q3w=
527527
github.com/elastic/elastic-agent-system-metrics v0.11.16 h1:cLjuO8pE5cUwPGWUHmy1VOERmJVDaep8gY+U4YRQ5vs=
528528
github.com/elastic/elastic-agent-system-metrics v0.11.16/go.mod h1:qiZC5p1hd8te4XVnhh7FkXdcYhxFnl5i9GJpROtf6zo=
529529
github.com/elastic/elastic-transport-go/v8 v8.7.0 h1:OgTneVuXP2uip4BA658Xi6Hfw+PeIOod2rY3GVMGoVE=

internal/pkg/agent/application/upgrade/details/details.go

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,10 @@ type Metadata struct {
5555
// the Fail() method of UpgradeDetails to correctly record details when
5656
// an upgrade fails.
5757
ErrorMsg string `json:"error_msg,omitempty" yaml:"error_msg,omitempty"`
58+
59+
// Reason is a string that may give out more information about transitioning to the current state. It has been
60+
// introduced initially to distinguish between manual and automatic rollbacks
61+
Reason string `json:"reason,omitempty" yaml:"reason,omitempty"`
5862
}
5963

6064
func NewDetails(targetVersion string, initialState State, actionID string) *Details {
@@ -87,6 +91,27 @@ func (d *Details) SetState(s State) {
8791
d.notifyObservers()
8892
}
8993

94+
// SetStateWithReason is a convenience method to set the state of the upgrade, the metadata.reason and
95+
// notify all observers.
96+
// Do NOT call SetStateWithReason with StateFailed; call the Fail method instead.
97+
func (d *Details) SetStateWithReason(s State, reason string) {
98+
d.mu.Lock()
99+
defer d.mu.Unlock()
100+
101+
d.State = s
102+
d.Metadata.Reason = reason
103+
104+
// If State is something other than StateFailed, make sure to clear
105+
// Metadata.FailedState and Metadata.ErrorMsg as those two fields
106+
// should be set when State is set to StateFailed. See the Fail method.
107+
if s != StateFailed {
108+
d.Metadata.ErrorMsg = ""
109+
d.Metadata.FailedState = ""
110+
}
111+
112+
d.notifyObservers()
113+
}
114+
90115
// SetDownloadProgress is a convenience method to set the download percent
91116
// and download rate when the upgrade is in UPG_DOWNLOADING state.
92117
func (d *Details) SetDownloadProgress(percent, rateBytesPerSecond float64) {

internal/pkg/agent/application/upgrade/details/details_test.go

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ import (
1111
"testing"
1212
"time"
1313

14+
"github.com/stretchr/testify/assert"
1415
"github.com/stretchr/testify/require"
1516
)
1617

@@ -30,6 +31,15 @@ func TestDetailsSetState(t *testing.T) {
3031
require.Equal(t, StateDownloading, det.State)
3132
}
3233

34+
func TestDetailsSetStateWithReason(t *testing.T) {
35+
det := NewDetails("99.999.9999", StateWatching, "test_action_id")
36+
require.Equal(t, StateWatching, det.State)
37+
38+
det.SetStateWithReason(StateRollback, ReasonWatchFailed)
39+
assert.Equal(t, StateRollback, det.State)
40+
assert.Equal(t, ReasonWatchFailed, det.Metadata.Reason)
41+
}
42+
3343
func TestDetailsFail(t *testing.T) {
3444
det := NewDetails("99.999.9999", StateRequested, "test_action_id")
3545
require.Equal(t, StateRequested, det.State)

internal/pkg/agent/application/upgrade/details/state.go

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@ package details
66

77
type State string
88

9-
// The values of these State* constants should match those enumerated for
10-
// upgrade_details.state in https://github.com/elastic/fleet-server/blob/main/model/openapi.yml
119
const (
10+
// The values of these State* constants should match those enumerated for
11+
// upgrade_details.state in https://github.com/elastic/fleet-server/blob/main/model/openapi.yml
1212
StateRequested State = "UPG_REQUESTED"
1313
StateScheduled State = "UPG_SCHEDULED"
1414
StateDownloading State = "UPG_DOWNLOADING"
@@ -19,4 +19,7 @@ const (
1919
StateRollback State = "UPG_ROLLBACK"
2020
StateCompleted State = "UPG_COMPLETED"
2121
StateFailed State = "UPG_FAILED"
22+
23+
// List of well-known reasons for state transitions
24+
ReasonWatchFailed = "watch failed"
2225
)

0 commit comments

Comments
 (0)