Skip to content

Commit bdbc2ef

Browse files
jillguyonnetkibanamachinejuliaElasticelasticmachine
authored
[Fleet] Add retry logic to automatic agent upgrades (#212744)
## Summary Relates elastic/ingest-dev#4720 This PR adds retry logic to the task that handles automatic agent upgrades originally implemented in #211019. Complementary fleet-server change which sets the agent's `upgrade_attempts` to `null` once the upgrade is complete.: elastic/fleet-server#4528 ### Approach - A new `upgrade_attempts` property is added to agents and stored in the agent doc (ES mapping update in elastic/elasticsearch#123256). - When a bulk upgrade action is sent from the automatic upgrade task, it pushes the timestamp of the upgrade to the affected agents' `upgrade_attempts`. - The default retry delays are `['30m', '1h', '2h', '4h', '8h', '16h', '24h']` and can be overridden with the new `xpack.fleet.autoUpgrades.retryDelays` setting. - On every run, the automatic upgrade task will first process retries and then query more agents if necessary (cf. elastic/ingest-dev#4720 (comment)). - Once an agent has completed and failed the max retries defined by the retry delays array, it is no longer retried. ### Testing The ES query for fetching agents with existing `upgrade_attempts` needs the updated mappings, so it might be necessary to pull the latest `main` in the `elasticsearch` repo and run `yarn es source` instead of `yarn es snapshot` (requires an up-to-date Java environment, currently 23). In order to test that `upgrade_attempts` is set to `null` when the upgrade is complete, fleet-server should be run in dev using the change in elastic/fleet-server#4528. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) ### Identify risks Low probability risk of incorrectly triggering agent upgrades. This feature is currently behind the `enableAutomaticAgentUpgrades` feature flag. --------- Co-authored-by: kibanamachine <[email protected]> Co-authored-by: Julia Bardi <[email protected]> Co-authored-by: Elastic Machine <[email protected]>
1 parent 1531849 commit bdbc2ef

File tree

16 files changed

+352
-96
lines changed

16 files changed

+352
-96
lines changed

oas_docs/bundle.json

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17675,6 +17675,13 @@
1767517675
"nullable": true,
1767617676
"type": "array"
1767717677
},
17678+
"upgrade_attempts": {
17679+
"items": {
17680+
"type": "string"
17681+
},
17682+
"nullable": true,
17683+
"type": "array"
17684+
},
1767817685
"upgrade_details": {
1767917686
"additionalProperties": false,
1768017687
"nullable": true,
@@ -19722,6 +19729,13 @@
1972219729
"nullable": true,
1972319730
"type": "array"
1972419731
},
19732+
"upgrade_attempts": {
19733+
"items": {
19734+
"type": "string"
19735+
},
19736+
"nullable": true,
19737+
"type": "array"
19738+
},
1972519739
"upgrade_details": {
1972619740
"additionalProperties": false,
1972719741
"nullable": true,
@@ -20206,6 +20220,13 @@
2020620220
"nullable": true,
2020720221
"type": "array"
2020820222
},
20223+
"upgrade_attempts": {
20224+
"items": {
20225+
"type": "string"
20226+
},
20227+
"nullable": true,
20228+
"type": "array"
20229+
},
2020920230
"upgrade_details": {
2021020231
"additionalProperties": false,
2021120232
"nullable": true,

oas_docs/bundle.serverless.json

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17675,6 +17675,13 @@
1767517675
"nullable": true,
1767617676
"type": "array"
1767717677
},
17678+
"upgrade_attempts": {
17679+
"items": {
17680+
"type": "string"
17681+
},
17682+
"nullable": true,
17683+
"type": "array"
17684+
},
1767817685
"upgrade_details": {
1767917686
"additionalProperties": false,
1768017687
"nullable": true,
@@ -19722,6 +19729,13 @@
1972219729
"nullable": true,
1972319730
"type": "array"
1972419731
},
19732+
"upgrade_attempts": {
19733+
"items": {
19734+
"type": "string"
19735+
},
19736+
"nullable": true,
19737+
"type": "array"
19738+
},
1972519739
"upgrade_details": {
1972619740
"additionalProperties": false,
1972719741
"nullable": true,
@@ -20206,6 +20220,13 @@
2020620220
"nullable": true,
2020720221
"type": "array"
2020820222
},
20223+
"upgrade_attempts": {
20224+
"items": {
20225+
"type": "string"
20226+
},
20227+
"nullable": true,
20228+
"type": "array"
20229+
},
2020920230
"upgrade_details": {
2021020231
"additionalProperties": false,
2021120232
"nullable": true,

oas_docs/output/kibana.serverless.yaml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18992,6 +18992,11 @@ paths:
1899218992
type: string
1899318993
nullable: true
1899418994
type: array
18995+
upgrade_attempts:
18996+
items:
18997+
type: string
18998+
nullable: true
18999+
type: array
1899519000
upgrade_details:
1899619001
additionalProperties: false
1899719002
nullable: true
@@ -19447,6 +19452,11 @@ paths:
1944719452
type: string
1944819453
nullable: true
1944919454
type: array
19455+
upgrade_attempts:
19456+
items:
19457+
type: string
19458+
nullable: true
19459+
type: array
1945019460
upgrade_details:
1945119461
additionalProperties: false
1945219462
nullable: true
@@ -19790,6 +19800,11 @@ paths:
1979019800
type: string
1979119801
nullable: true
1979219802
type: array
19803+
upgrade_attempts:
19804+
items:
19805+
type: string
19806+
nullable: true
19807+
type: array
1979319808
upgrade_details:
1979419809
additionalProperties: false
1979519810
nullable: true

oas_docs/output/kibana.yaml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21113,6 +21113,11 @@ paths:
2111321113
type: string
2111421114
nullable: true
2111521115
type: array
21116+
upgrade_attempts:
21117+
items:
21118+
type: string
21119+
nullable: true
21120+
type: array
2111621121
upgrade_details:
2111721122
additionalProperties: false
2111821123
nullable: true
@@ -21565,6 +21570,11 @@ paths:
2156521570
type: string
2156621571
nullable: true
2156721572
type: array
21573+
upgrade_attempts:
21574+
items:
21575+
type: string
21576+
nullable: true
21577+
type: array
2156821578
upgrade_details:
2156921579
additionalProperties: false
2157021580
nullable: true
@@ -21907,6 +21917,11 @@ paths:
2190721917
type: string
2190821918
nullable: true
2190921919
type: array
21920+
upgrade_attempts:
21921+
items:
21922+
type: string
21923+
nullable: true
21924+
type: array
2191021925
upgrade_details:
2191121926
additionalProperties: false
2191221927
nullable: true

x-pack/platform/plugins/shared/fleet/common/constants/index.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,3 +58,5 @@ export const FLEET_ENROLLMENT_API_PREFIX = 'fleet-enrollment-api-keys';
5858
export const REQUEST_DIAGNOSTICS_TIMEOUT_MS = 3 * 60 * 60 * 1000; // 3 hours;
5959

6060
export * from './mappings';
61+
62+
export const AUTO_UPGRADE_DEFAULT_RETRIES = ['30m', '1h', '2h', '4h', '8h', '16h', '24h'];

x-pack/platform/plugins/shared/fleet/common/constants/mappings.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -363,6 +363,9 @@ export const AGENT_MAPPINGS = {
363363
},
364364
},
365365
},
366+
upgrade_attempts: {
367+
type: 'date',
368+
},
366369
// added to allow validation on status field
367370
status: {
368371
type: 'keyword',

x-pack/platform/plugins/shared/fleet/common/types/index.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,9 @@ export interface FleetConfigType {
8484
};
8585
};
8686
createArtifactsBulkBatchSize?: number;
87+
autoUpgrades?: {
88+
retryDelays?: string[];
89+
};
8790
}
8891

8992
// Calling Object.entries(PackagesGroupedByStatus) gave `status: string`

x-pack/platform/plugins/shared/fleet/common/types/models/agent.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,7 @@ interface AgentBase {
9999
upgraded_at?: string | null;
100100
upgrade_started_at?: string | null;
101101
upgrade_details?: AgentUpgradeDetails;
102+
upgrade_attempts?: string[] | null;
102103
access_api_key_id?: string;
103104
default_api_key?: string;
104105
default_api_key_id?: string;
@@ -275,6 +276,10 @@ export interface FleetServerAgent {
275276
* Upgrade state of the Elastic Agent
276277
*/
277278
upgrade_details?: AgentUpgradeDetails;
279+
/**
280+
* List of timestamps of attempts of Elastic Agent automatic upgrades
281+
*/
282+
upgrade_attempts?: string[] | null;
278283
access_api_key_id?: string;
279284
agent?: FleetServerAgentMetadata;
280285
/**

x-pack/platform/plugins/shared/fleet/public/applications/fleet/components/search_bar.test.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,7 @@ describe('SearchBar', () => {
174174

175175
describe('getFieldSpecs', () => {
176176
it('returns fieldSpecs for Fleet agents', () => {
177-
expect(getFieldSpecs(AGENTS_INDEX, AGENTS_PREFIX)).toHaveLength(73);
177+
expect(getFieldSpecs(AGENTS_INDEX, AGENTS_PREFIX)).toHaveLength(74);
178178
});
179179

180180
it('returns fieldSpecs for Fleet enrollment tokens', () => {

x-pack/platform/plugins/shared/fleet/server/config.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -282,6 +282,11 @@ export const config: PluginConfigDescriptor = {
282282
min: 400,
283283
})
284284
),
285+
autoUpgrades: schema.maybe(
286+
schema.object({
287+
retryDelays: schema.maybe(schema.arrayOf(schema.string())),
288+
})
289+
),
285290
},
286291
{
287292
validate: (configToValidate) => {

0 commit comments

Comments
 (0)