|
1 | 1 | ---
|
2 |
| -title: Rollback on upgrade failure using Azure Operator Service Manager |
3 |
| -description: Revert all prior completed operations during safe upgrade failure. |
| 2 | +title: Control upgrade failure behavior with Azure Operator Service Manager |
| 3 | +description: Learn about recovery behaviors including pause on failure and rollback on failure. |
4 | 4 | author: msftadam
|
5 | 5 | ms.author: adamdor
|
6 |
| -ms.date: 08/28/2024 |
| 6 | +ms.date: 08/30/2024 |
7 | 7 | ms.topic: upgrade-and-migration-article
|
8 | 8 | ms.service: azure-operator-service-manager
|
9 | 9 | ---
|
10 | 10 |
|
11 |
| -# Rollback on upgrade failure |
12 |
| -This guide describes the Azure Operator Service Manager (AOSM) optional rollback on failure feature for container network functions (CNFs). This feature, as part of the AOSM safe upgrade practices initiative, reduces the service impact of unexpected upgrade failures for network functions (NFs) where comprehensive forward and backward version network function application (NfApp) compatibility is not available. |
| 11 | +# Control upgrade failure behavior |
| 12 | + |
| 13 | +## Overview |
| 14 | +This guide describes the Azure Operator Service Manager (AOSM) upgrade failure behavior features for container network functions (CNFs). These features, as part of the AOSM safe upgrade practices initiative, offer a choice between faster retries, with pause on failure, versus return to starting point, with rollback on failure. |
13 | 15 |
|
14 | 16 | ## Pause on failure
|
15 |
| -In the case of an unexpected failure during an upgrade, historically AOSM supports the pause on failure approach. This method remains the default and implements the following workflow logic; |
16 |
| -* The NfApps are created or upgraded following either updateDependsOn ordering, if provided, or in the sequential order they appear. |
17 |
| -* NfApps with parameter "applicationEnabled" disabled are skipped. |
18 |
| -* NFApps present before upgrade, but not referenced by the new network function definition version (NFDV) are deleted. |
19 |
| -* The execution is paused if any of the NfApp upgrades fail. |
| 17 | +Any upgrade using AOSM starts with a site network service (SNS) reput opreation. The reput operation processes the network function applications (NfApps) found in the network function design version (NFDV). The reput operation implements the following default logic: |
| 18 | +* NfApps are processed following either updateDependsOn ordering, or in the sequential order they appear. |
| 19 | +* NfApps with parameter "applicationEnabled" set to disable are skipped. |
| 20 | +* NFApps present, but not referenced by the new NFDV are deleted. |
| 21 | +* The execution sequence is paused if any of the NfApp upgrades fail and a rollback is considered. |
20 | 22 | * The failure leaves the NF resource in a failed state.
|
21 | 23 |
|
22 |
| -With pause on failure, AOSM rolls back the failed NfApp, via the testOptions, installOptions, or upgradeOptions parameters. This method allows the end user to troubleshoot the failed NfApp and then restart the upgrade from that point forward. As the default behavior, this method is the most efficient upgrade method, but may cause network function (NF) inconsistencies while in a mixed version state. |
| 24 | +With pause on failure, AOSM rolls back only the failed NfApp, via the testOptions, installOptions, or upgradeOptions parameters. No action is taken on any NfApps which proceed the failed NfApp. This method allows the end user to troubleshoot the failed NfApp and then restart the upgrade from that point forward. As the default behavior, this method is the most efficient method, but may cause network function (NF) inconsistencies while in a mixed version state. |
23 | 25 |
|
24 | 26 | ## Rollback on failure
|
25 |
| -To address risk of mismatched NfApp versions, AOSM now supports NF level rollback on failure. With this option enabled, if an NfApp upgrade fails, both the failed NfApp, and all prior completed NfApps, are rolled back to initial version state. This method minimizes, or eliminates, the amount of time the NF is exposed to NfApp version mismatches. The optional rollback on failure feature works as follows: |
26 |
| -* A user initiates an upgrade and enables the rollback on failure feature. |
| 27 | +To address risk of mismatched NfApp versions, AOSM now supports NF level rollback on failure. With this option enabled, if an NfApp operation fails, both the failed NfApp, and all prior completed NfApps, can be rolled back to initial version state. This method minimizes, or eliminates, the amount of time the NF is exposed to NfApp version mismatches. The optional rollback on failure feature works as follows: |
| 28 | +* A user initiates an sSNS reput operation and enables rollback on failure. |
27 | 29 | * A snapshot of the current NfApp versions is captured and stored.
|
28 | 30 | * The snapshot is used to determine the individual NfApp actions taken to reverse actions that completed successfully.
|
29 | 31 | - "helm install" action on deleted components,
|
@@ -52,7 +54,6 @@ AOSM returns the following operational status and messages, given the respective
|
52 | 54 | - Provisioning State: Failed
|
53 | 55 | - Message: Application(<ComponentName>) : <Failure reason>; Rollback Failed (<RollbackComponentName>) : <Rollback Failure reason>
|
54 | 56 | ```
|
55 |
| - |
56 | 57 | ## How to configure rollback on failure
|
57 | 58 | The most flexible method to control failure behavior is to extend a new configuration group schema (CGS) parameter, rollbackEnabled, to allow for configuration group value (CGV) control via roleOverrideValues in the NF payload. First, define the CGS parameter:
|
58 | 59 | ```
|
|
0 commit comments