Skip to content

Commit 7769fe6

Browse files
authored
Merge pull request #299870 from msftadam/patch-73
Update safe-upgrade-practices.md
2 parents cc13467 + 885595b commit 7769fe6

File tree

1 file changed

+140
-34
lines changed

1 file changed

+140
-34
lines changed

articles/operator-service-manager/safe-upgrade-practices.md

Lines changed: 140 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ When planning for an upgrade using Azure Operator Service Manager, address the f
5353
- In most cases, use the existing publisher to host new version artifacts.
5454
- Using an existing publisher supports `helm upgrade` to update an SNS to a different version.
5555
- Using a new publisher requires a `helm delete` on the current SNS and then a `helm install` for the new SNS version.
56-
- Artifact store, network service design group (NSDG), and network function design group (NFDG) are immutable and cannot change.
56+
- Artifact store, network service design group (NSDG), and network function design group (NFDG) are immutable and can't change.
5757
- Changing one of these resources requires deployment of a new SNS.
5858
- A new artifact manifest is needed to store the new charts and images.
5959
- See [onboarding documentation](how-to-manage-artifacts-nexus.md) for details on uploading new charts and images.
@@ -69,50 +69,63 @@ When planning for an upgrade using Azure Operator Service Manager, address the f
6969
- Create updated artifacts using Operator workflow.
7070
- If necessary, create new configuration group values (CGVs) based on new CGS.
7171
- Reuse and craft payload by confirming the existing site and site network service objects.
72-
7372
- Update templates to ensure that upgrade parameters are set based on confidence in the upgrade and desired failure behavior.
7473
- Settings used for production may suppress failures details, while settings used for debugging, or testing, may choose to expose these details.
7574

7675
## Safe upgrade procedure
7776
Follow the following process to trigger an upgrade with Azure Operator Service Manager.
7877

79-
### Create new NFDV resource
80-
For new NFDV versions, it must be in a valid SemVer format. The new version can be an upgrade, a greater value versus the deployed version, or a downgrade, a lower value versus the deployed version. The new version can differ by major, minor, or patch values.
81-
82-
### Update new NFDV parameters
83-
Helm chart versions can be updated, or Helm values can be updated or parameterized as necessary. New nfApps can also be added where they didn't exist in deployed version.
84-
85-
### Update NFDV for desired nfApp order
86-
UpdateDependsOn is an NFDV parameter used to specify ordering of nfApps during update operations. If `updateDependsOn` isn't provided, serial ordering of CNF applications, as appearing in the NFDV is used.
87-
88-
### Update ARM template for desired upgrade behavior
89-
Make sure to set any desired CNF application `timeout`, the `atomic` parameter, and `rollbackOnTestFailure` parameter. It may be useful to change these parameters over time as more confidence is gained in the upgrade.
90-
91-
### Issue SNS reput
92-
With onboarding complete, the reput operation is submitted. Depending on the number, size and complexity of the nfApps, the reput operation could take some time to complete (multiple hours).
93-
94-
### Examine reput results
95-
If the reput is reporting a successful result, the upgrade is complete and the user should validate the state and availability of the service. If the reput is reporting a failure, follow the steps in the upgrade failure recovery section to continue.
78+
* Create new NFDV resource
79+
* For new NFDV versions, it must be in a valid SemVer format. The new version can be an upgrade, a greater value versus the deployed version, or a downgrade, a lower value versus the deployed version. The new version can differ by major, minor, or patch values.
80+
* Update new NFDV parameters
81+
* Helm chart versions can be updated, or Helm values can be updated or parameterized as necessary. New nfApps can also be added where they didn't exist in deployed version.
82+
* Update NFDV for desired nfApp order
83+
* UpdateDependsOn is an NFDV parameter used to specify ordering of nfApps during update operations. If `updateDependsOn` isn't provided, serial ordering of CNF applications, as appearing in the NFDV is used.
84+
* Update ARM template for desired upgrade behavior
85+
* Make sure to set any desired CNF application `timeout`, the `atomic` parameter, and `rollbackOnTestFailure` parameter. It may be useful to change these parameters over time as more confidence is gained in the upgrade.
86+
* Issue SNS reput
87+
* With onboarding complete, the reput operation is submitted. Depending on the number, size and complexity of the nfApps, the reput operation could take some time to complete (multiple hours).
88+
* Examine reput results
89+
* If the reput is reporting a successful result, the upgrade is complete and the user should validate the state and availability of the service. If the reput is reporting a failure, follow the steps in the upgrade failure recovery section to continue.
9690

9791
## Safe upgrade retry procedure
9892
In cases where a reput update fails, the following process can be followed to retry the operation.
9993

100-
### Diagnose failed nfApp
101-
Resolve the root cause for nfApp failure by analyzing logs and other debugging information.
94+
* Diagnose failed nfApp
95+
* Resolve the root cause for nfApp failure by analyzing logs and other debugging information.
96+
* Manually skip completed charts
97+
* After fixing the failed nfApp, but before attempting an upgrade retry, consider changing the `applicationEnablement` parameter to accelerate retry behavior. This parameter can be set false, where an nfApp should be skipped. This parameter can be useful where an nfApp doesn't require an upgraded.
98+
* Issue SNS reput retry (repeat until success)
99+
* By default, the reput retries nfApps in the declared update order, unless they're skipped using `applicationEnablement` flag.
102100

103-
### Manually skip completed charts
104-
After fixing the failed nfApp, but before attempting an upgrade retry, consider changing the `applicationEnablement` parameter to accelerate retry behavior. This parameter can be set false, where an nfApp should be skipped. This parameter can be useful where an nfApp doesn't require an upgraded.
101+
## Control timeouts with installOptions and UpgradeOptions
102+
When an SNS operation starts either a helm install and helm upgrade, a 27-minute default timeout value. This value can be customized at the global NF, but we recommend to customize this value at the component NF levelby defining override values in the NF payload template. Further the values in the NF payload template and be exposed as operator values, allowing final customization at run-time. The following example demonstrates supported installOptions and upgradeOptions parameters applied to a single nfApp component;
105103

106-
### Issue SNS reput retry (repeat until success)
107-
By default, the reput retries nfApps in the declared update order, unless they're skipped using `applicationEnablement` flag.
104+
```
105+
"roleOverrideValues": ["{
106+
"name": "hellotest",
107+
"deployParametersMappingRuleProfile": {
108+
"helmMappingRuleProfile": {
109+
"options": {
110+
"installOptions": {
111+
"atomic": true,
112+
"wait": true,
113+
"timeout": "1" },
114+
"upgradeOptions": {
115+
"atomic": true,
116+
"wait": true,
117+
"timeout": "2" }
118+
} } } }"
119+
]
120+
```
108121

109122
## Skip nfApps using applicationEnablement
110123
In the NFDV resource, under `deployParametersMappingRuleProfile` there's a supported property `applicationEnablement` of type enum, which takes values of Unknown, Enabled, or disabled. It can be used to manually exclude nfApp operations during network function (NF) deployment. The following example demonstrates a generic method to parameterize `applicationEnablement` as an included value in `roleOverrideValues` property.
111124

112125
### Template changes
113-
While no NFDV changes are necessarily required, optionally the publisher can use the NFDV to set a default value for the `applicationEnablement` property. The default value will be used, unless its changed via `roleOverrideValues`.
126+
While no NFDV changes are necessarily required, optionally the publisher can use the NFDV to set a default value for the `applicationEnablement` property. The default value is used, unless its changed via `roleOverrideValues`.
114127

115-
#### NFDV Template
128+
#### NFDV template
116129
Use the NFDV template to set a default value for `applicationEnablement`. The following example sets `enabled` state as the default value for `hellotest` networkfunctionApplication.
117130

118131
```json
@@ -130,10 +143,10 @@ Use the NFDV template to set a default value for `applicationEnablement`. The fo
130143
}
131144
```
132145

133-
To manage the `applicationEnablement` value more dynamically, the Operator can pass a realtime value using the NF template `roleOverrideValues` property. While it's possible for the operator to manipulate the NF template directly, instead it's suggested to parameterize the `roleOverrideValues`, so that values can be passed via a CGV template at runtime. This requires the following modifications to the CGS, NF templates and finally the CGV.
146+
To manage the `applicationEnablement` value more dynamically, the Operator can pass a real-time value using the NF template `roleOverrideValues` property. While it's possible for the operator to manipulate the NF template directly, instead parameterize the `roleOverrideValues`, so that values can be passed via a CGV template at runtime. The following examples demonstrate the needed modifications to the CGS, NF templates, and finally the CGV.
134147

135-
#### CGS Template
136-
The CGS template must be updated to include one variable declaration for each line to parameterize under `roleOverrideValues`. The below example demonstrates three override values, one to for nfConfiguration [0] and two for nfApplication options [1,2].
148+
#### CGS template
149+
The CGS template must be updated to include one variable declaration for each line to parameterize under `roleOverrideValues`. The following example demonstrates three override values.
137150

138151
```json
139152
"roleOverrideValues0": {
@@ -147,8 +160,8 @@ The CGS template must be updated to include one variable declaration for each li
147160
}
148161
```
149162

150-
#### NF Template
151-
The NF template must be update three ways. First, the implicit config parameter must be defined as type object. Second, roleOverrideValues0, roleOverrideValues1 and roleOverrideValues2 must be declared as variables mapped to config parameter. Third, roleOverrideValues0, roleOverrideValues1 and roleOverrideValues2 must be referenced for substitution under `roleOverrideValues` in proper order and following proper syntax.
163+
#### NF payload template
164+
The NF template must be update three ways. First, the implicit config parameter must be defined as type object. Second, `roleOverrideValues0`, `roleOverrideValues1`, and `roleOverrideValues2` must be declared as variables mapped to config parameter. Third, `roleOverrideValues0`, `roleOverrideValues1` and `roleOverrideValues2` must be referenced for substitution under `roleOverrideValues` in proper order and following proper syntax.
152165

153166
```json
154167
"parameters": {
@@ -173,8 +186,8 @@ The NF template must be update three ways. First, the implicit config parameter
173186
}
174187
```
175188

176-
#### CGV Template
177-
The CGV template can now be updated to include the content for each variable to be substituted into `roleOverrideValues` property at run-time. The below example sets `rollbackEnabled` to true, followed by override sets for `hellotest` and `hellotest1` nfApplications.
189+
#### CGV template
190+
The CGV template can now be updated to include the content for each variable to be substituted into `roleOverrideValues` property at run-time. The following example sets `rollbackEnabled` to true, followed by override sets for `hellotest` and `hellotest1` nfApplications.
178191

179192
```json
180193
{
@@ -224,3 +237,96 @@ To enable the SkipUpgrade feature via `roleOverrideValues`, refer to the followi
224237
- The `skipUpgrade` flag is enabled. If the upgrade request for `hellotest` meets the precheck criteria, the upgrade is skipped.
225238
- **nfApplication: `runnerTest`**
226239
- The `skipUpgrade` flag isn't specified. Therefore, `runnerTest` executes a traditional Helm upgrade at the cluster level, even if the precheck criteria are met.
240+
241+
242+
243+
## Complete roleOverrideValues option reference
244+
Bringing together all examples in this and other articles, the following reference demonstrates all presently supported install and upgrade options available through the `roleOverrideValues` mechanism.
245+
246+
```json
247+
{
248+
"roleOverrideValues": [
249+
{
250+
"nfConfiguration": {
251+
"rollbackEnabled": "true"
252+
}
253+
},
254+
{
255+
"name": "nfApplication1",
256+
"deployParametersMappingRuleProfile": {
257+
"helmMappingRuleProfile": {
258+
"options": {
259+
"installOptions": {
260+
"atomic": "true",
261+
"wait": "true",
262+
"timeout": "1",
263+
"testOptions": {
264+
"enable": "true",
265+
"timeout": "true",
266+
"rollbackOnTestFailure": "true",
267+
"filter": [
268+
"test1",
269+
"test2"
270+
]
271+
}
272+
},
273+
"upgradeOptions": {
274+
"atomic": "true",
275+
"wait": "true",
276+
"timeout": "1",
277+
"skipUpgrade": "true",
278+
"testOptions": {
279+
"enable": "true",
280+
"timeout": "true",
281+
"rollbackOnTestFailure": "true",
282+
"filter": [
283+
"test1",
284+
"test2"
285+
]
286+
}
287+
}
288+
}
289+
}
290+
}
291+
},
292+
{
293+
"name": "nfApplication2",
294+
"deployParametersMappingRuleProfile": {
295+
"helmMappingRuleProfile": {
296+
"options": {
297+
"installOptions": {
298+
"atomic": "true",
299+
"wait": "true",
300+
"timeout": "1",
301+
"testOptions": {
302+
"enable": "true",
303+
"timeout": "true",
304+
"rollbackOnTestFailure": "true",
305+
"filter": [
306+
"test1",
307+
"test2"
308+
]
309+
}
310+
},
311+
"upgradeOptions": {
312+
"atomic": "true",
313+
"wait": "true",
314+
"timeout": "1",
315+
"skipUpgrade": "true",
316+
"testOptions": {
317+
"enable": "true",
318+
"timeout": "true",
319+
"rollbackOnTestFailure": "true",
320+
"filter": [
321+
"test1",
322+
"test2"
323+
]
324+
}
325+
}
326+
}
327+
}
328+
}
329+
}
330+
]
331+
}
332+
```

0 commit comments

Comments
 (0)