You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This article introduces Azure Operator Service Manager (AOSM) safe upgrade practices (SUP). This feature set enables an end user to safely execute complex upgrades of container network function (CNF) workloads hosted on Azure Operator Nexus, in compliance with partner In Service Software Upgrade (ISSU) requirements, where applicable. Look for future articles to expand on advanced SUP features and capabilities.
13
13
14
14
## Introduction to safe upgrades
15
-
A given network service supported by AOSM is composed of one to many CNFs which, over time, require software upgrades. For each upgrade, it is necessary to run one to many helm operations, updating dependent network function applications (NfApps), in a particular order, in a manner which least impacts the network service. AOSM SUP represents a set of features, which enables safe automation of these operations on Azure Operator Nexus.
15
+
A given network service supported by AOSM is composed of one to many CNFs which, over time, require software upgrades. For each upgrade, it is necessary to run one to many helm operations, updating dependent network function applications (nfApps), in a particular order, in a manner which least impacts the network service. AOSM SUP represents a set of features, which enables safe automation of these operations on Azure Operator Nexus.
16
16
17
-
* SNS Reput Support - Execute helm upgrade operation across all NfApps in NFDV.
17
+
* SNS Reput Support - Execute helm upgrade operation across all nfApps in network function design version (NFDV).
18
18
* Nexus Platform - Support SNS reput operations on Nexus platform targets.
19
-
* Operation Time-outs - Ability to set operational time-outs for each NfApp operation.
20
-
* Synchronous Operations - Ability to run one serial NfApp operation at a time.
21
-
* Control Upgrade Order - Define different NfApp sequence for install and upgrade.
22
-
* Pause On Failure - Default behavior pauses after an NfApp operation failure.
23
-
* Rollback On Failure - Optional behavior, rollsback all completed NfApps prior to operation failure.
19
+
* Operation Time-outs - Ability to set operational time-outs for each nfApp operation.
20
+
* Synchronous Operations - Ability to run one serial nfApp operation at a time.
21
+
* Control Upgrade Order - Define different nfApp sequence for install and upgrade.
22
+
* Pause On Failure - Default behavior pauses after an nfApp operation failure.
23
+
* Rollback On Failure - Optional behavior, rollsback all completed nfApps prior to operation failure.
24
24
* Single Chart Test Validation - Running a helm test operation after a create or update.
25
-
* Skip NfApp on No Change - Skip processing of NfApps where no changes result.
25
+
* Skip nfApp on No Change - Skip processing of nfApps where no changes result.
26
26
* Image Preloading - Ability to preload images to edge repository.
27
27
28
28
## Safe upgrade approach
29
-
To update an existing Azure Operator Service Manager site network service (SNS), the Operator executes a reput update request against the deployed SNS resource. Where the SNS contains CNFs with multiple NfApps, the request is fanned out across all NfApps defined in the network function definition version (NFDV). By default, in the order, which they appear, or optionally in the order defined by `updateDependsOn` parameter.
29
+
To update an existing Azure Operator Service Manager site network service (SNS), the Operator executes a reput update request against the deployed SNS resource. Where the SNS contains CNFs with multiple nfApps, the request is fanned out across all nfApps defined in the network function definition version (NFDV). By default, in the order, which they appear, or optionally in the order defined by `updateDependsOn` parameter.
30
30
31
-
For each NfApp, the reput update request supports increasing a helm chart version, adding/removing helm values and/or adding/removing any NfApps. Time-outs can be set per NfApp, based on known allowable runtimes, but NfApps can only be processed in serial order, one after the other. The reput update implements the following processing logic:
31
+
For each nfApp, the reput update request supports increasing a helm chart version, adding/removing helm values and/or adding/removing any nfApps. Time-outs can be set per nfApp, based on known allowable runtimes, but nfApps can only be processed in serial order, one after the other. The reput update implements the following processing logic:
32
32
33
-
*NfApps are processed following either updateDependsOn ordering, or in the sequential order they appear.
34
-
*NfApps with parameter `applicationEnabled` set to disable are skipped.
35
-
*NfApps with parameter `skipUpgrade` set to enabled are skipped if no changes detected.
36
-
*NFApps which are common between old and new NFDV are upgraded.
37
-
*NFApps which are only in the new NFDV are installed.
38
-
*NFApps deployed, but not referenced by the new NFDV, are deleted.
33
+
*nfApps are processed following either updateDependsOn ordering, or in the sequential order they appear.
34
+
*nfApps with parameter `applicationEnabled` set to disable are skipped.
35
+
*nfApps with parameter `skipUpgrade` set to enabled are skipped if no changes detected.
36
+
*nfApps which are common between old and new NFDV are upgraded.
37
+
*nfApps which are only in the new NFDV are installed.
38
+
*nfApps deployed, but not referenced by the new NFDV, are deleted.
39
39
40
-
To ensure outcomes, NfApp testing is supported using helm, either helm upgrade pre/post tests, or standalone helm tests. For pre/post tests failures, the atomic parameter is honored. With atomic/true, the failed chart is rolled back. With atomic/false, no rollback is executed. For more information on standalone helm testing, see the following article: [Run tests after install or upgrade](safe-upgrades-helm-test.md)
40
+
To ensure outcomes, nfApp testing is supported using helm, either helm upgrade pre/post tests, or standalone helm tests. For pre/post tests failures, the atomic parameter is honored. With atomic/true, the failed chart is rolled back. With atomic/false, no rollback is executed. For more information on standalone helm testing, see the following article: [Run tests after install or upgrade](safe-upgrades-helm-test.md)
41
41
42
42
## Considerations for in-service upgrades
43
43
Azure Operator Service Manager generally supports in service upgrades, an upgrade method which advances a deployment version without interrupting the running service. Some considerations are necessary to ensure the proper behavior of AOSM during ISSU operations.
44
44
* Where AOSM performs an upgrade against an ordered set of multiple nfApps, AOSM first upgrades or creates all new nfApps, then deletes all old nfApps. This approach ensures service is not impacted until all new nfApps are ready but requires extra platform capacity for transient hosting of both old and new nfApps.
45
-
* Where AOSM upgrades an NfApp with multiple replica, AOSM honors the deployment profile settings for rolling or recreate option. Where rolling is used, expose the values `maxUnavailable` and `maxSurge` as CGS parameters, which can then be set via operator CGV at run-time.
45
+
* Where AOSM upgrades an nfApp with multiple replica, AOSM honors the deployment profile settings for rolling or recreate option. Where rolling is used, expose the values `maxUnavailable` and `maxSurge` as CGS parameters, which can then be set via operator CGV at run-time.
46
46
47
47
Ultimately, the ability for a given service to be upgraded without interruption is a feature of the service itself. Consult further with the service publisher to understand the in-service upgrade capabilities and ensure they are aligned with the proper AOSM behavioral options.
48
48
49
49
## Safe upgrade prerequisites
50
50
When planning for an upgrade using Azure Operator Service Manager, address the following requirements in advance of upgrade execution to optimize the time spent attempting the upgrade.
51
51
52
52
- Onboard updated artifacts using publisher and/or designer workflows.
53
-
- Publisher, store, network service design (NSDg), and network function design group (NFDg) are static and do not need to change.
54
-
- A new artifact manifest is needed to store the new charts and images. For more information, see onboarding documentation for details on uploading new charts and images.
55
-
- New NFDV and network service design version (NSDV) are needed, under existing NFDg and NSDg.
53
+
- Publisher, artifact store, network service design group (NSDG), and network function design group (NFDG) are immutabe and cannot change.
54
+
- Changing one of these resources would require deployment of a new NF via put.
55
+
- A new artifact manifest is needed to store the new charts and images.
56
+
- For more information, see [onboarding documentation](how-to-manage-artifacts-nexus.md) for details on uploading new charts and images.
57
+
- A new NFDV, and optionally network service design version (NSDV), is needed.
56
58
- We cover basic changes to the NFDV in the step by step section.
57
59
- New NSDV is only required if a new configuration group schema (CGS) version is being introduced.
58
60
- If necessary, new CGS.
59
61
- Required if an upgrade introduces new exposed configuration parameters.
60
62
63
+
> [!NOTE]
64
+
> NSDVs and NFDVs with different major versions can be supported in the same NSDG and NFDG
65
+
61
66
- Create updated artifacts using Operator workflow.
62
67
- If necessary, create new configuration group values (CGVs) based on new CGS.
63
68
- Reuse and craft payload by confirming the existing site and site network service objects.
@@ -69,40 +74,40 @@ When planning for an upgrade using Azure Operator Service Manager, address the f
69
74
Follow the following process to trigger an upgrade with Azure Operator Service Manager.
70
75
71
76
### Create new NFDV resource
72
-
For new NFDV versions, it must be in a valid SemVer format, where only higher incrementing values of patch and minor versions updates are allowed. A lower NFDV version is not allowed. Given a CNF deployed using NFDV 2.0.0, the new NFDV can be of version 2.0.1, or 2.1.0, but not 1.0.0, or 3.0.0.
77
+
For new NFDV versions, it must be in a valid SemVer format. The new version can be an upgrade, a greater value versus the deployed version, or an downgrade, a lower value versus the deployed version. The new version can differ by major, minor or patch values.
73
78
74
79
### Update new NFDV parameters
75
-
Helm chart versions can be updated, or Helm values can be updated or parameterized as necessary. New NfApps can also be added where they did not exist in deployed version.
80
+
Helm chart versions can be updated, or Helm values can be updated or parameterized as necessary. New nfApps can also be added where they did not exist in deployed version.
76
81
77
-
### Update NFDV for desired NfApp order
78
-
UpdateDependsOn is an NFDV parameter used to specify ordering of NfApps during update operations. If UpdateDependsOn is not provided, serial ordering of CNF applications, as appearing in the NFDV is used.
82
+
### Update NFDV for desired nfApp order
83
+
UpdateDependsOn is an NFDV parameter used to specify ordering of nfApps during update operations. If `updateDependsOn` is not provided, serial ordering of CNF applications, as appearing in the NFDV is used.
79
84
80
-
### Update NFDV for desired upgrade behavior
81
-
Make sure to set any desired CNF application time-outs, the atomic parameter, and rollbackOnTestFailure parameter. It may be useful to change these parameters over time as more confidence is gained in the upgrade.
85
+
### Update ARM template for desired upgrade behavior
86
+
Make sure to set any desired CNF application `timeout`, the `atomic` parameter, and `rollbackOnTestFailure` parameter. It may be useful to change these parameters over time as more confidence is gained in the upgrade.
82
87
83
88
### Issue SNS reput
84
-
With onboarding complete, the reput operation is submitted. Depending on the number, size and complexity of the NfApps, the reput operation could take some time to complete (multiple hours).
89
+
With onboarding complete, the reput operation is submitted. Depending on the number, size and complexity of the nfApps, the reput operation could take some time to complete (multiple hours).
85
90
86
91
### Examine reput results
87
92
If the reput is reporting a successful result, the upgrade is complete and the user should validate the state and availability of the service. If the reput is reporting a failure, follow the steps in the upgrade failure recovery section to continue.
88
93
89
94
## Safe upgrade retry procedure
90
95
In cases where a reput update fails, the following process can be followed to retry the operation.
91
96
92
-
### Diagnose failed NfApp
93
-
Resolve the root cause for NfApp failure by analyzing logs and other debugging information.
97
+
### Diagnose failed nfApp
98
+
Resolve the root cause for nfApp failure by analyzing logs and other debugging information.
94
99
95
100
### Manually skip completed charts
96
-
After fixing the failed NfApp, but before attempting an upgrade retry, consider changing the applicationEnablement parameter to accelerate retry behavior. This parameter can be set false, where an NfApp should be skipped. This parameter can be useful where an NfApp does not require an upgraded.
101
+
After fixing the failed nfApp, but before attempting an upgrade retry, consider changing the `applicationEnablement` parameter to accelerate retry behavior. This parameter can be set false, where an nfApp should be skipped. This parameter can be useful where an nfApp does not require an upgraded.
97
102
98
103
### Issue SNS reput retry (repeat until success)
99
-
By default, the reput retries NfApps in the declared update order, unless they are skipped using applicationEnablement flag.
104
+
By default, the reput retries nfApps in the declared update order, unless they are skipped using `applicationEnablement` flag.
100
105
101
106
## Skip nfApps using applicationEnablement
102
-
In the NFDV resource, under deployParametersMappingRuleProfile there is the property applicationEnablement of type enum, which takes values: Unknown, Enabled, or disabled. It can be used to exclude NfApp operations during network function (NF) deployment.
107
+
In the NFDV resource, under `deployParametersMappingRuleProfile` there is the property `applicationEnablement` of type enum, which takes values: Unknown, Enabled, or disabled. It can be used to exclude nfApp operations during network function (NF) deployment.
103
108
104
109
### Publisher changes
105
-
For the applicationEnablement property, the publisher has two options: either provide a default value or parameterize it.
110
+
For the `applicationEnablement` property, the publisher has two options: either provide a default value or parameterize it.
106
111
107
112
#### Sample NFDV
108
113
The NFDV is used by publisher to set default values for applicationEnablement.
@@ -164,7 +169,7 @@ The NFDV is used by publisher to set default values for applicationEnablement.
164
169
```
165
170
166
171
#### Sample configuration group schema (CGS) resource
167
-
The CGS is used by the publisher to require a roleOverrideValues variable to be provided by Operator at run-time. RoleOverrideValues can include nondefault settings for applicationEnablement.
172
+
The CGS is used by the publisher to require a `roleOverrideValues` variable to be provided by Operator at run-time. `roleOverrideValues` can include nondefault settings for `applicationEnablement`.
168
173
169
174
```json
170
175
{
@@ -221,10 +226,10 @@ The CGS is used by the publisher to require a roleOverrideValues variable to be
221
226
```
222
227
223
228
### Operator changes
224
-
Operators inherit default applicationEnablement values as defined by the NFDV. If applicationEnablement is parameterized in CGS, then it must be passed through the deploymentValues property at runtime.
229
+
Operators inherit default `applicationEnablement` values as defined by the NFDV. If `applicationEnablement` is parameterized in CGS, then it must be passed through the `deploymentValues` property at runtime.
225
230
226
231
#### Sample configuration group value (CGV) resource
227
-
The CGV is used by the operator to set the roleOverrideValues variable at run-time. RoleOverrideValues include nondefault settings for applicationEnablement.
232
+
The CGV is used by the operator to set the `roleOverrideValues` variable at run-time. `roleOverrideValues` include nondefault settings for `applicationEnablement`.
228
233
229
234
```json
230
235
{
@@ -243,7 +248,7 @@ The CGV is used by the operator to set the roleOverrideValues variable at run-ti
243
248
```
244
249
245
250
#### Sample NF ARM template
246
-
The NF ARM template is used by operator to submit the roleOverrideValues variable, set by CGV, to the resource provider (RP). The operator can change the applicationEnablement setting in CGV, as needed, and resubmit the same NF ARM template, to alter behavior between iterations.
251
+
The NF ARM template is used by operator to submit the `roleOverrideValues` variable, set by CGV, to the resource provider (RP). The operator can change the `applicationEnablement` setting in CGV, as needed, and resubmit the same NF ARM template, to alter behavior between iterations.
247
252
248
253
```json
249
254
{
@@ -305,8 +310,8 @@ The NF ARM template is used by operator to submit the roleOverrideValues variabl
305
310
}
306
311
```
307
312
308
-
## Skip NfApps which have no change
309
-
The `skipUpgrade` feature is designed to optimize the time taken for CNF upgrades. When the publisher enables this flag in the `roleOverrideValues` under `upgradeOptions`, the AOSM service layer performs certain prechecks, to determine whether an upgrade for a specific `nFApplication` can be skipped. If all precheck criteria are met, the upgrade is skipped for that application. Otherwise, an upgrade is executed at the cluster level.
313
+
## Skip nfApps which have no change
314
+
The `skipUpgrade` feature is designed to optimize the time taken for CNF upgrades. When the publisher enables this flag in the `roleOverrideValues` under `upgradeOptions`, the AOSM service layer performs certain prechecks, to determine whether an upgrade for a specific `nfApplication` can be skipped. If all precheck criteria are met, the upgrade is skipped for that application. Otherwise, an upgrade is executed at the cluster level.
310
315
311
316
### Precheck Criteria
312
317
An upgrade can be skipped if all the following conditions are met:
@@ -340,7 +345,7 @@ To enable the SkipUpgrade feature via `roleOverrideValues`, refer to the followi
340
345
}
341
346
```
342
347
#### Explanation of the Example
343
-
-**NfApplication: `hellotest`**
348
+
-**nfApplication: `hellotest`**
344
349
- The `skipUpgrade` flag is enabled. If the upgrade request for `hellotest` meets the precheck criteria, the upgrade is skipped.
345
-
-**NfApplication: `runnerTest`**
350
+
-**nfApplication: `runnerTest`**
346
351
- The `skipUpgrade` flag is not specified. Therefore, `runnerTest` executes a traditional Helm upgrade at the cluster level, even if the precheck criteria are met.
0 commit comments