You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/operator-service-manager/safe-upgrade-practices.md
+17-16Lines changed: 17 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,15 +3,15 @@ title: Get started with Azure Operator Service Manager Safe Upgrade Practices
3
3
description: Safely execute complex upgrades of CNF workloads on Azure Operator Nexus
4
4
author: msftadam
5
5
ms.author: adamdor
6
-
ms.date: 08/30/2024
6
+
ms.date: 02/19/2024
7
7
ms.topic: upgrade-and-migration-article
8
8
ms.service: azure-operator-service-manager
9
9
---
10
10
11
11
# Get started with safe upgrade practices
12
12
This article introduces Azure Operator Service Manager (AOSM) safe upgrade practices (SUP). This feature set enables an end user to safely execute complex upgrades of Container Network Function (CNF) workloads hosted on Azure Operator Nexus, in compliance with partner In Service Software Upgrade (ISSU) requirements, where applicable. Look for future articles in these services to expand on SUP features and capabilities.
13
13
14
-
## Introduction
14
+
## Introduction to safe upgrades
15
15
A given network service supported by Azure Operator Service Manager will be composed of one to many container based network functions (CNFs) which, over time, will require software updates. For each update, it is necessary to run one to many helm operations, upgrading dependent network function applications (NfApps), in a particular order, in a manner which least impacts the network service. At Azure Operator Service Manager, Safe Upgrade Practices represents a set of features, which can automate the CNF operations required to update a network service on Azure Operator Nexus.
16
16
17
17
* SNS Reput update - Execute helm upgrade operation across all NfApps in NFDV.
@@ -26,7 +26,7 @@ A given network service supported by Azure Operator Service Manager will be comp
26
26
* Execute NF-level Rollback On Failure - Based on flag, rollback all completed NfApps on failure.
27
27
* Image Preloading - Ability to preload images to edge repository.
28
28
29
-
## Upgrade approach
29
+
## Safe upgrade approach
30
30
To update an existing Azure Operator Service Manager site network service (SNS), the Operator executes a reput update request against the deployed SNS resource. Where the SNS contains CNFs with multiple NfApps, the request is fanned out across all NfApps defined in the network function definition version (NFDV). By default, in the order, which they appear, or optionally in the order defined by UpdateDependsOn parameter.
31
31
32
32
For each NfApp, the reput update request supports increasing a helm chart version, adding/removing helm values and/or adding/removing any NfApps. Time-outs can be set per NfApp, based on known allowable runtimes, but NfApps can only be processed in serial order, one after the other. The reput update implements the following processing logic:
@@ -41,12 +41,13 @@ For each NfApp, the reput update request supports increasing a helm chart versio
41
41
To ensure outcomes, NfApp testing is supported using helm, either helm upgrade pre/post tests, or standalone helm tests. For pre/post tests failures, the atomic parameter is honored. With atomic/true, the failed chart is rolled back. With atomic/false, no rollback is executed. For standalone helm tests, the rollbackOnTestFailure parameter us honored. With rollbackOnTestFailure/true, the failed chart is rolled back. With rollbackOnTestFailure/false, no rollback is executed.
42
42
43
43
## Considerations for in-service upgrades
44
-
Azure Operator Service Manager generally supports in service upgrades, an upgrade method which advances a deployment version without interrupting the running service. Some considerations are neccesary to ensure the proper behavior of AOSM during ISSU operations.
45
-
* Where AOSM is performaning an upgrade against an ordered set of multiple nfApps, AOSM will first upgrade or create all new nfApps, then delete all old nfApps. This ensures service is not impacted until all new nfApps are ready but requires extra platform capacity for transiet hosting of both old and new nfApps.
46
-
* Where AOSM is upgrading an NfApp with multiple replica, AOSM will honor the deployment profile settings for rolling or recreate option. Where rolling is used, it's recommended to expose the values `maxUnavailable` and `maxSurge` as parameters, which can then be set via operator CGV at run-time.
44
+
Azure Operator Service Manager generally supports in service upgrades, an upgrade method which advances a deployment version without interrupting the running service. Some considerations are necessary to ensure the proper behavior of AOSM during ISSU operations.
45
+
* Where AOSM performs an upgrade against an ordered set of multiple nfApps, AOSM first upgrades or creates all new nfApps, then deletes all old nfApps. This approach ensures service is not impacted until all new nfApps are ready but requires extra platform capacity for transient hosting of both old and new nfApps.
46
+
* Where AOSM upgrades an NfApp with multiple replica, AOSM honors the deployment profile settings for rolling or recreate option. Where rolling is used, expose the values `maxUnavailable` and `maxSurge` as CGS parameters, which can then be set via operator CGV at run-time.
47
+
47
48
Ultimately, the ability for a given service to be upgraded without interruption is a feature of the service itself. Consult further with the service publisher to understand the in-service upgrade capabilities and ensure they are aligned with the proper AOSM behavioral options.
48
49
49
-
## Upgrade prerequisites
50
+
## Safe upgrade prerequisites
50
51
When planning for an upgrade using Azure Operator Service Manager, address the following requirements in advance of upgrade execution to optimize the time spent attempting the upgrade.
51
52
52
53
- Onboard updated artifacts using publisher and/or designer workflows.
@@ -65,7 +66,7 @@ When planning for an upgrade using Azure Operator Service Manager, address the f
65
66
- Update templates to ensure that upgrade parameters are set based on confidence in the upgrade and desired failure behavior.
66
67
- Settings used for production may suppress failures details, while settings used for debugging, or testing, may choose to expose these details.
67
68
68
-
## Upgrade procedure
69
+
## Safe upgrade procedure
69
70
Follow the following process to trigger an upgrade with Azure Operator Service Manager.
70
71
71
72
### Create new NFDV resource
@@ -86,7 +87,7 @@ With onboarding complete, the reput operation is submitted. Depending on the num
86
87
### Examine reput results
87
88
If the reput is reporting a successful result, the upgrade is complete and the user should validate the state and availability of the service. If the reput is reporting a failure, follow the steps in the upgrade failure recovery section to continue.
88
89
89
-
## Retry procedure
90
+
## Safe upgrade retry procedure
90
91
In cases where a reput update fails, the following process can be followed to retry the operation.
91
92
92
93
### Diagnose failed NfApp
@@ -98,7 +99,7 @@ After fixing the failed NfApp, but before attempting an upgrade retry, consider
98
99
### Issue SNS reput retry (repeat until success)
99
100
By default, the reput retries NfApps in the declared update order, unless they are skipped using applicationEnablement flag.
100
101
101
-
## How skip nfApps using applicationEnablement
102
+
## Skip nfApps using applicationEnablement
102
103
In the NFDV resource, under deployParametersMappingRuleProfile there is the property applicationEnablement of type enum, which takes values: Unknown, Enabled, or disabled. It can be used to exclude NfApp operations during network function (NF) deployment.
103
104
104
105
### Publisher changes
@@ -164,7 +165,7 @@ The NFDV is used by publisher to set default values for applicationEnablement.
164
165
```
165
166
166
167
#### Sample configuration group schema (CGS) resource
167
-
The CGS is used by the publisher to require a roleOverrideValues variable to be provided by Operator at run-time. RoleOverrideValues can include non-default settings for applicationEnablement.
168
+
The CGS is used by the publisher to require a roleOverrideValues variable to be provided by Operator at run-time. RoleOverrideValues can include nondefault settings for applicationEnablement.
168
169
169
170
```json
170
171
{
@@ -224,7 +225,7 @@ The CGS is used by the publisher to require a roleOverrideValues variable to be
224
225
Operators inherit default applicationEnablement values as defined by the NFDV. If applicationEnablement is parameterized in CGS, then it must be passed through the deploymentValues property at runtime.
225
226
226
227
#### Sample configuration group value (CGV) resource
227
-
The CGV is used by the operator to set the roleOverrideValues variable at run-time. RoleOverrideValues include non-default settings for applicationEnablement.
228
+
The CGV is used by the operator to set the roleOverrideValues variable at run-time. RoleOverrideValues include nondefault settings for applicationEnablement.
228
229
229
230
```json
230
231
{
@@ -243,7 +244,7 @@ The CGV is used by the operator to set the roleOverrideValues variable at run-ti
243
244
```
244
245
245
246
#### Sample NF ARM template
246
-
The NF ARM template is used by operator to submit the roleOverrideValues variable(s), set by CGV, to the resource provider (RP). The operator can change the applicationEnablement setting in CGV, as needed, and resubmit the same NF ARM template, to alter behavior between iterations.
247
+
The NF ARM template is used by operator to submit the roleOverrideValues variable, set by CGV, to the resource provider (RP). The operator can change the applicationEnablement setting in CGV, as needed, and resubmit the same NF ARM template, to alter behavior between iterations.
247
248
248
249
```json
249
250
{
@@ -305,10 +306,10 @@ The NF ARM template is used by operator to submit the roleOverrideValues variabl
305
306
}
306
307
```
307
308
308
-
## How to skip NfApps which have no change
309
+
## Skip NfApps which have no change
309
310
The SkipUpgrade feature is designed to optimize the time taken for CNF upgrades. When the publisher enables this flag in the `RoleOverrideValues` under `UpgradeOptions`, the AOSM service layer performs certain prechecks, to determine whether an upgrade for a specific `NFApplication` can be skipped. If all precheck criteria are met, the upgrade is skipped for that application. Otherwise, an upgrade is executed at the cluster level.
310
311
311
-
### PreCheck Criteria
312
+
### Precheck Criteria
312
313
An upgrade can be skipped if all the following conditions are met:
313
314
1. The `NFApplication` provisioning state is Succeeded.
314
315
2. There is no change in the Helm chart name or version.
@@ -341,7 +342,7 @@ To enable the SkipUpgrade feature via `RoleOverrideValues`, refer to the followi
341
342
```
342
343
#### Explanation of the Example
343
344
-**NfApplication: `hellotest`**
344
-
- The `skipUpgrade` flag is enabled. If the upgrade request for `hellotest` meets the precheck criteria, the upgrade will be skipped at the service level.
345
+
- The `skipUpgrade` flag is enabled. If the upgrade request for `hellotest` meets the precheck criteria, the upgrade is skipped.
345
346
-**NfApplication: `runnerTest`**
346
347
- The `skipUpgrade` flag is not specified. Therefore, `runnerTest` executes a traditional Helm upgrade at the cluster level, even if the precheck criteria are met.
0 commit comments