Skip to content

Commit be1e306

Browse files
authored
Update safe-upgrades-nf-level-rollback.md
1 parent 5c870db commit be1e306

File tree

1 file changed

+24
-26
lines changed

1 file changed

+24
-26
lines changed

articles/operator-service-manager/safe-upgrades-nf-level-rollback.md

Lines changed: 24 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -14,48 +14,46 @@ ms.service: azure-operator-service-manager
1414
This guide describes the Azure Operator Service Manager (AOSM) upgrade failure behavior features for container network functions (CNFs). These features, as part of the AOSM safe upgrade practices initiative, offer a choice between faster retries, with pause on failure, versus return to starting point, with rollback on failure.
1515

1616
## Pause on failure
17-
Any upgrade using AOSM starts with a site network service (SNS) reput operation. The reput operation processes the network function applications (NfApps) found in the network function design version (NFDV). The reput operation implements the following default logic:
18-
* NfApps are processed following either updateDependsOn ordering, or in the sequential order they appear.
19-
* NfApps with parameter "applicationEnabled" set to disable are skipped.
20-
* NFApps present, but not referenced by the new NFDV are deleted.
21-
* The execution sequence is paused if any of the NfApp upgrades fail and a rollback is considered.
17+
Any upgrade using AOSM starts with a site network service (SNS) reput operation. The reput operation processes the network function applications (nfApps) found in the network function design version (NFDV). The reput operation implements the following default logic:
18+
* nfApps are processed following either `updateDependsOn` ordering, or in the sequential order they appear.
19+
* nfApps with parameter `applicationEnabled` set to disable are skipped.
20+
* nfApps present, but not referenced by the new NFDV are deleted.
21+
* The execution sequence is paused if any of the nfApp upgrades fail and an atomic rollback is considered.
2222
* The failure leaves the NF resource in a failed state.
2323

24-
With pause on failure, AOSM rolls back only the failed NfApp, via the testOptions, installOptions, or upgradeOptions parameters. No action is taken on any NfApps which proceed the failed NfApp. This method allows the end user to troubleshoot the failed NfApp and then restart the upgrade from that point forward. As the default behavior, this method is the most efficient method, but may cause network function (NF) inconsistencies while in a mixed version state.
24+
With pause on failure, AOSM rolls back only the failed nfApp, via the `testOptions`, `installOptions`, or `upgradeOptions` parameters. No action is taken on any nfApps which proceed the failed nfApp. This method allows the end user to troubleshoot the failed nfApp and then restart the upgrade from that point forward. As the default behavior, this method is the most efficient method, but may cause network function (NF) inconsistencies while in a mixed version state.
2525

2626
## Rollback on failure
27-
To address risk of mismatched NfApp versions, AOSM now supports NF level rollback on failure. With this option enabled, if an NfApp operation fails, both the failed NfApp, and all prior completed NfApps, can be rolled back to initial version state. This method minimizes, or eliminates, the amount of time the NF is exposed to NfApp version mismatches. The optional rollback on failure feature works as follows:
28-
* A user initiates an sSNS reput operation and enables rollback on failure.
29-
* A snapshot of the current NfApp versions is captured and stored.
30-
* The snapshot is used to determine the individual NfApp actions taken to reverse actions that completed successfully.
31-
- "helm install" action on deleted components,
32-
- "helm rollback" action on upgraded components,
33-
- "helm delete" action on newly installed components
34-
* NfApp failure occurs, AOSM restores the NfApps to the snapshot version state before the upgrade, with most recent actions reverted first.
27+
To address risk of mismatched nfApp versions, AOSM now supports NF level rollback on failure. With this option enabled, if an nfApp operation fails, both the failed nfApp, and all prior completed nfApps, can be rolled back to initial version state. This method minimizes, or eliminates, the amount of time the NF is exposed to nfApp version mismatches. The optional rollback on failure feature works as follows:
28+
* A user initiates an SNS reput operation and enables rollback on failure.
29+
* A snapshot of the current nfApp versions is captured and stored.
30+
* The snapshot is used to determine the individual nfApp actions taken to reverse actions that completed successfully.
31+
- `helm install` action on deleted components,
32+
- `helm rollback` action on upgraded components,
33+
- `helm delete` action on newly installed components
34+
* nfApp failure occurs, AOSM restores the nfApps to the snapshot version state before the upgrade, with most recent actions reverted first.
3535

3636
> [!NOTE]
3737
> * AOSM doesn't create a snapshot if a user doesn't enable rollback on failure.
38-
> * A rollback on failure only applies to the successfully completed NFApps.
39-
> - Use the testOptions, installOptions, or upgradeOptions parameters to control rollback of the failed NfApp.
38+
> * A rollback on failure only applies to the successfully completed nfApps.
39+
> - Use the `testOptions`, `installOptions`, or `upgradeOptions` parameters to control rollback of the failed nfApp.
4040
4141
AOSM returns the following operational status and messages, given the respective results:
4242
```
4343
- Upgrade Succeeded
4444
- Provisioning State: Succeeded
4545
- Message: <empty>
46-
```
47-
```
46+
4847
- Upgrade Failed, Rollback Succeeded
4948
- Provisioning State: Failed
5049
- Message: Application(<ComponentName>) : <Failure Reason>; Rollback succeeded
51-
```
52-
```
50+
5351
- Upgrade Failed, Rollback Failed
5452
- Provisioning State: Failed
5553
- Message: Application(<ComponentName>) : <Failure reason>; Rollback Failed (<RollbackComponentName>) : <Rollback Failure reason>
5654
```
5755
## How to configure rollback on failure
58-
The most flexible method to control failure behavior is to extend a new configuration group schema (CGS) parameter, rollbackEnabled, to allow for configuration group value (CGV) control via roleOverrideValues in the NF payload. First, define the CGS parameter:
56+
The most flexible method to control failure behavior is to extend a new configuration group schema (CGS) parameter, `rollbackEnabled`, to allow for configuration group value (CGV) control via `roleOverrideValues` in the NF payload. First, define the CGS parameter:
5957
```
6058
{
6159
"description": "NF configuration",
@@ -76,9 +74,9 @@ The most flexible method to control failure behavior is to extend a new configur
7674
}
7775
```
7876
> [!NOTE]
79-
> * If the nfConfiguration isn't provided through the roleOverrideValues parameter, by default the rollback is disabled.
77+
> * If the `nfConfiguration` isn't provided through the `roleOverrideValues` parameter, by default the rollback is disabled.
8078
81-
With the new rollbackEnable parameter defined, the Operator can now provide a run time value, under roleOverrideValues, as part of NF reput payload.
79+
With the new `rollbackEnable` parameter defined, the Operator can now provide a run time value, under `roleOverrideValues`, as part of NF reput payload.
8280
```
8381
example:
8482
{
@@ -89,14 +87,14 @@ example:
8987
"{\"nfConfiguration\":{\"rollbackEnabled\":true}}",
9088
"{\"name\":\"nfApp1\",\"deployParametersMappingRuleProfile\":{\"applicationEnablement\" : \"Disabled\"}}",
9189
"{\"name\":\"nfApp2\",\"deployParametersMappingRuleProfile\":{\"applicationEnablement\" : \"Disabled\"}}",
92-
//... other nfapps overrides
90+
//... other nfApps overrides
9391
]
9492
}
9593
}
9694
```
9795
> [!NOTE]
98-
> * Each roleOverrideValues entry overrides the default behavior of the NfAapps.
99-
> * If multiple entries of nfConfiguration are found in the roleOverrideValues, then the NF reput is returned as a bad request.
96+
> * Each `roleOverrideValues` entry overrides the default behavior of the NfAapps.
97+
> * If multiple entries of `nfConfiguration` are found in the `roleOverrideValues`, then the NF reput is returned as a bad request.
10098
10199
## How to troubleshoot rollback on failure
102100
### Understand pod states

0 commit comments

Comments
 (0)