Skip to content

Commit c600f74

Browse files
Merge pull request #285920 from msftadam/patch-13
Update safe-upgrades-nf-level-rollback.md
2 parents aa3b4ca + 5f57f00 commit c600f74

File tree

4 files changed

+26
-22
lines changed

4 files changed

+26
-22
lines changed

articles/operator-service-manager/best-practices-onboard-deploy.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -293,7 +293,10 @@ Delete publisher resources in the following order to make sure no orphaned resou
293293

294294
## Considerations if your NF runs cert-manager
295295

296-
With release 1.0.2728-50 and later , AOSM now uses cert-manager to store and rotate certificates. As part of this change, AOSM deploys a cert-manager operator, and associate CRDs, in the azurehybridnetwork namespace. Since having multiple cert-manager operators, even deployed in separate namespaces, will watch across all namespaces, only one cert-manager can be effectively run on the cluster.
296+
> [!IMPORTANT]
297+
> This guidance applies only to certain releases. Check your version for proper behavior.
298+
299+
From release 1.0.2728-50 to release Version 2.0.2777-132, AOSM uses cert-manager to store and rotate certificates. As part of this change, AOSM deploys a cert-manager operator, and associate CRDs, in the azurehybridnetwork namespace. Since having multiple cert-manager operators, even deployed in separate namespaces, will watch across all namespaces, only one cert-manager can be effectively run on the cluster.
297300

298301
Any user trying to install cert-manager on the cluster, as part of a workload deployment, will get a deployment failure with an error that the CRD “exists and cannot be imported into the current release.” To avoid this error, the recommendation is to skip installing cert-manager, instead take dependency on cert-manager operator and CRD already installed by AOSM.
299302

articles/operator-service-manager/release-notes.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
2-
title: Azure Operator Service Manager Release Notes
3-
description: Tracking of major and minor releases of Azure Operator Service Manager.
2+
title: Release notes for Azure Operator Service Manager
3+
description: Official documentation and tracking for major and minor releases.
44
author: msftadam
55
ms.author: adamdor
66
ms.date: 08/14/2024
@@ -149,7 +149,7 @@ The following bug fixes, or other defect resolutions, are delivered with this re
149149

150150
None
151151

152-
## Release Version 2.0.2788-135
152+
## Release 2.0.2788-135
153153

154154
Document Revision 1.1
155155

@@ -160,7 +160,7 @@ Azure Operator Service Manager is a cloud orchestration service that enables aut
160160
* Release Version: Version 2.0.2788-135
161161
* Release Date: August 21, 2024
162162
* Is NFO update required: YES, Update only
163-
* Dependency Versions: Go/1.22.4 Helm/3.15.2
163+
* Dependency Versions: Go/1.22.4 Helm/3.15.2
164164

165165
### Release Installation
166166
This release can be installed with as an update on top of release 2.0.2783-134.

articles/operator-service-manager/safe-upgrade-practices.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
---
2-
title: Safe Upgrade Practices for CNFs
3-
description: Safely execute complex upgrades of workloads with Azure Operator Service Manager.
2+
title: Get started with Azure Operator Service Manager Safe Upgrade Practices
3+
description: Safely execute complex upgrades of CNF workloads on Azure Operator Nexus
44
author: msftadam
55
ms.author: adamdor
6-
ms.date: 08/16/2024
6+
ms.date: 08/30/2024
77
ms.topic: upgrade-and-migration-article
88
ms.service: azure-operator-service-manager
99
---

articles/operator-service-manager/safe-upgrades-nf-level-rollback.md

Lines changed: 15 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,31 @@
11
---
2-
title: Rollback on upgrade failure using Azure Operator Service Manager
3-
description: Revert all prior completed operations during safe upgrade failure.
2+
title: Control upgrade failure behavior with Azure Operator Service Manager
3+
description: Learn about recovery behaviors including pause on failure and rollback on failure.
44
author: msftadam
55
ms.author: adamdor
6-
ms.date: 08/28/2024
6+
ms.date: 08/30/2024
77
ms.topic: upgrade-and-migration-article
88
ms.service: azure-operator-service-manager
99
---
1010

11-
# Rollback on upgrade failure
12-
This guide describes the Azure Operator Service Manager (AOSM) optional rollback on failure feature for container network functions (CNFs). This feature, as part of the AOSM safe upgrade practices initiative, reduces the service impact of unexpected upgrade failures for network functions (NFs) where comprehensive forward and backward version network function application (NfApp) compatibility is not available.
11+
# Control upgrade failure behavior
12+
13+
## Overview
14+
This guide describes the Azure Operator Service Manager (AOSM) upgrade failure behavior features for container network functions (CNFs). These features, as part of the AOSM safe upgrade practices initiative, offer a choice between faster retries, with pause on failure, versus return to starting point, with rollback on failure.
1315

1416
## Pause on failure
15-
In the case of an unexpected failure during an upgrade, historically AOSM supports the pause on failure approach. This method remains the default and implements the following workflow logic;
16-
* The NfApps are created or upgraded following either updateDependsOn ordering, if provided, or in the sequential order they appear.
17-
* NfApps with parameter "applicationEnabled" disabled are skipped.
18-
* NFApps present before upgrade, but not referenced by the new network function definition version (NFDV) are deleted.
19-
* The execution is paused if any of the NfApp upgrades fail.
17+
Any upgrade using AOSM starts with a site network service (SNS) reput opreation. The reput operation processes the network function applications (NfApps) found in the network function design version (NFDV). The reput operation implements the following default logic:
18+
* NfApps are processed following either updateDependsOn ordering, or in the sequential order they appear.
19+
* NfApps with parameter "applicationEnabled" set to disable are skipped.
20+
* NFApps present, but not referenced by the new NFDV are deleted.
21+
* The execution sequence is paused if any of the NfApp upgrades fail and a rollback is considered.
2022
* The failure leaves the NF resource in a failed state.
2123

22-
With pause on failure, AOSM rolls back the failed NfApp, via the testOptions, installOptions, or upgradeOptions parameters. This method allows the end user to troubleshoot the failed NfApp and then restart the upgrade from that point forward. As the default behavior, this method is the most efficient upgrade method, but may cause network function (NF) inconsistencies while in a mixed version state.
24+
With pause on failure, AOSM rolls back only the failed NfApp, via the testOptions, installOptions, or upgradeOptions parameters. No action is taken on any NfApps which proceed the failed NfApp. This method allows the end user to troubleshoot the failed NfApp and then restart the upgrade from that point forward. As the default behavior, this method is the most efficient method, but may cause network function (NF) inconsistencies while in a mixed version state.
2325

2426
## Rollback on failure
25-
To address risk of mismatched NfApp versions, AOSM now supports NF level rollback on failure. With this option enabled, if an NfApp upgrade fails, both the failed NfApp, and all prior completed NfApps, are rolled back to initial version state. This method minimizes, or eliminates, the amount of time the NF is exposed to NfApp version mismatches. The optional rollback on failure feature works as follows:
26-
* A user initiates an upgrade and enables the rollback on failure feature.
27+
To address risk of mismatched NfApp versions, AOSM now supports NF level rollback on failure. With this option enabled, if an NfApp operation fails, both the failed NfApp, and all prior completed NfApps, can be rolled back to initial version state. This method minimizes, or eliminates, the amount of time the NF is exposed to NfApp version mismatches. The optional rollback on failure feature works as follows:
28+
* A user initiates an sSNS reput operation and enables rollback on failure.
2729
* A snapshot of the current NfApp versions is captured and stored.
2830
* The snapshot is used to determine the individual NfApp actions taken to reverse actions that completed successfully.
2931
- "helm install" action on deleted components,
@@ -52,7 +54,6 @@ AOSM returns the following operational status and messages, given the respective
5254
- Provisioning State: Failed
5355
- Message: Application(<ComponentName>) : <Failure reason>; Rollback Failed (<RollbackComponentName>) : <Rollback Failure reason>
5456
```
55-
5657
## How to configure rollback on failure
5758
The most flexible method to control failure behavior is to extend a new configuration group schema (CGS) parameter, rollbackEnabled, to allow for configuration group value (CGV) control via roleOverrideValues in the NF payload. First, define the CGS parameter:
5859
```

0 commit comments

Comments
 (0)