Skip to content

Commit 3e41391

Browse files
jillguyonnetkibanamachinejuliaElasticelasticmachine
committed
[Fleet] Add retry logic to automatic agent upgrades (elastic#212744)
Relates elastic/ingest-dev#4720 This PR adds retry logic to the task that handles automatic agent upgrades originally implemented in elastic#211019. Complementary fleet-server change which sets the agent's `upgrade_attempts` to `null` once the upgrade is complete.: elastic/fleet-server#4528 - A new `upgrade_attempts` property is added to agents and stored in the agent doc (ES mapping update in elastic/elasticsearch#123256). - When a bulk upgrade action is sent from the automatic upgrade task, it pushes the timestamp of the upgrade to the affected agents' `upgrade_attempts`. - The default retry delays are `['30m', '1h', '2h', '4h', '8h', '16h', '24h']` and can be overridden with the new `xpack.fleet.autoUpgrades.retryDelays` setting. - On every run, the automatic upgrade task will first process retries and then query more agents if necessary (cf. elastic/ingest-dev#4720 (comment)). - Once an agent has completed and failed the max retries defined by the retry delays array, it is no longer retried. The ES query for fetching agents with existing `upgrade_attempts` needs the updated mappings, so it might be necessary to pull the latest `main` in the `elasticsearch` repo and run `yarn es source` instead of `yarn es snapshot` (requires an up-to-date Java environment, currently 23). In order to test that `upgrade_attempts` is set to `null` when the upgrade is complete, fleet-server should be run in dev using the change in elastic/fleet-server#4528. - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) Low probability risk of incorrectly triggering agent upgrades. This feature is currently behind the `enableAutomaticAgentUpgrades` feature flag. --------- Co-authored-by: kibanamachine <[email protected]> Co-authored-by: Julia Bardi <[email protected]> Co-authored-by: Elastic Machine <[email protected]>
1 parent e9908f2 commit 3e41391

File tree

12 files changed

+292
-96
lines changed

12 files changed

+292
-96
lines changed

oas_docs/output/kibana.serverless.yaml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19330,6 +19330,11 @@ paths:
1933019330
type: string
1933119331
nullable: true
1933219332
type: array
19333+
upgrade_attempts:
19334+
items:
19335+
type: string
19336+
nullable: true
19337+
type: array
1933319338
upgrade_details:
1933419339
additionalProperties: false
1933519340
type: object
@@ -19794,6 +19799,11 @@ paths:
1979419799
type: string
1979519800
nullable: true
1979619801
type: array
19802+
upgrade_attempts:
19803+
items:
19804+
type: string
19805+
nullable: true
19806+
type: array
1979719807
upgrade_details:
1979819808
additionalProperties: false
1979919809
type: object
@@ -20137,6 +20147,11 @@ paths:
2013720147
type: string
2013820148
nullable: true
2013920149
type: array
20150+
upgrade_attempts:
20151+
items:
20152+
type: string
20153+
nullable: true
20154+
type: array
2014020155
upgrade_details:
2014120156
additionalProperties: false
2014220157
type: object

x-pack/platform/plugins/shared/fleet/common/constants/index.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,3 +58,5 @@ export const FLEET_ENROLLMENT_API_PREFIX = 'fleet-enrollment-api-keys';
5858
export const REQUEST_DIAGNOSTICS_TIMEOUT_MS = 3 * 60 * 60 * 1000; // 3 hours;
5959

6060
export * from './mappings';
61+
62+
export const AUTO_UPGRADE_DEFAULT_RETRIES = ['30m', '1h', '2h', '4h', '8h', '16h', '24h'];

x-pack/platform/plugins/shared/fleet/common/constants/mappings.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -363,6 +363,9 @@ export const AGENT_MAPPINGS = {
363363
},
364364
},
365365
},
366+
upgrade_attempts: {
367+
type: 'date',
368+
},
366369
// added to allow validation on status field
367370
status: {
368371
type: 'keyword',

x-pack/platform/plugins/shared/fleet/common/types/index.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,9 @@ export interface FleetConfigType {
8383
};
8484
};
8585
createArtifactsBulkBatchSize?: number;
86+
autoUpgrades?: {
87+
retryDelays?: string[];
88+
};
8689
}
8790

8891
// Calling Object.entries(PackagesGroupedByStatus) gave `status: string`

x-pack/platform/plugins/shared/fleet/common/types/models/agent.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,7 @@ interface AgentBase {
9696
upgraded_at?: string | null;
9797
upgrade_started_at?: string | null;
9898
upgrade_details?: AgentUpgradeDetails;
99+
upgrade_attempts?: string[] | null;
99100
access_api_key_id?: string;
100101
default_api_key?: string;
101102
default_api_key_id?: string;
@@ -268,6 +269,10 @@ export interface FleetServerAgent {
268269
* Upgrade state of the Elastic Agent
269270
*/
270271
upgrade_details?: AgentUpgradeDetails;
272+
/**
273+
* List of timestamps of attempts of Elastic Agent automatic upgrades
274+
*/
275+
upgrade_attempts?: string[] | null;
271276
access_api_key_id?: string;
272277
agent?: FleetServerAgentMetadata;
273278
/**

x-pack/platform/plugins/shared/fleet/public/applications/fleet/components/search_bar.test.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,7 @@ describe('SearchBar', () => {
174174

175175
describe('getFieldSpecs', () => {
176176
it('returns fieldSpecs for Fleet agents', () => {
177-
expect(getFieldSpecs(AGENTS_INDEX, AGENTS_PREFIX)).toHaveLength(73);
177+
expect(getFieldSpecs(AGENTS_INDEX, AGENTS_PREFIX)).toHaveLength(74);
178178
});
179179

180180
it('returns fieldSpecs for Fleet enrollment tokens', () => {

x-pack/platform/plugins/shared/fleet/server/config.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -283,6 +283,11 @@ export const config: PluginConfigDescriptor = {
283283
min: 400,
284284
})
285285
),
286+
autoUpgrades: schema.maybe(
287+
schema.object({
288+
retryDelays: schema.maybe(schema.arrayOf(schema.string())),
289+
})
290+
),
286291
},
287292
{
288293
validate: (configToValidate) => {

x-pack/platform/plugins/shared/fleet/server/services/agents/helpers.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ export function searchHitToAgent(
6868
upgraded_at: hit._source?.upgraded_at,
6969
upgrade_started_at: hit._source?.upgrade_started_at,
7070
upgrade_details: hit._source?.upgrade_details,
71+
upgrade_attempts: hit._source?.upgrade_attempts,
7172
access_api_key_id: hit._source?.access_api_key_id,
7273
default_api_key_id: hit._source?.default_api_key_id,
7374
policy_id: hit._source?.policy_id,

x-pack/platform/plugins/shared/fleet/server/services/agents/upgrade.ts

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,3 +125,22 @@ export async function sendUpgradeAgentsActions(
125125

126126
return await upgradeBatch(esClient, givenAgents, outgoingErrors, options, currentSpaceId);
127127
}
128+
129+
export async function sendAutomaticUpgradeAgentsActions(
130+
soClient: SavedObjectsClientContract,
131+
esClient: ElasticsearchClient,
132+
options: {
133+
agents: Agent[];
134+
version: string;
135+
upgradeDurationSeconds?: number;
136+
}
137+
): Promise<{ actionId: string }> {
138+
const currentSpaceId = getCurrentNamespace(soClient);
139+
return await upgradeBatch(
140+
esClient,
141+
options.agents,
142+
{},
143+
{ ...options, isAutomatic: true },
144+
currentSpaceId
145+
);
146+
}

x-pack/platform/plugins/shared/fleet/server/services/agents/upgrade_action_runner.ts

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,13 @@ import type { ElasticsearchClient } from '@kbn/core/server';
99

1010
import { v4 as uuidv4 } from 'uuid';
1111
import moment from 'moment';
12+
import semverGte from 'semver/functions/gte';
1213

1314
import {
1415
getRecentUpgradeInfoForAgent,
1516
getNotUpgradeableMessage,
1617
isAgentUpgradeableToVersion,
18+
AGENT_UPGARDE_DETAILS_SUPPORTED_VERSION,
1719
} from '../../../common/services';
1820

1921
import type { Agent } from '../../types';
@@ -168,6 +170,10 @@ export async function upgradeBatch(
168170
data: {
169171
upgraded_at: null,
170172
upgrade_started_at: now,
173+
...(options.isAutomatic &&
174+
semverGte(agent.agent?.version ?? '0.0.0', AGENT_UPGARDE_DETAILS_SUPPORTED_VERSION)
175+
? { upgrade_attempts: [now, ...(agent.upgrade_attempts ?? [])] }
176+
: {}),
171177
},
172178
})),
173179
errors

0 commit comments

Comments
 (0)