Skip to content

Commit 9558b17

Browse files
authored
Merge pull request #107828 from g0r1v3r4/patch-1
added section "cluster deploy and hardware validation"
2 parents e4c7391 + 16da481 commit 9558b17

File tree

1 file changed

+64
-2
lines changed

1 file changed

+64
-2
lines changed

articles/operator-nexus/howto-configure-cluster.md

Lines changed: 64 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ You can instead create a Cluster with ARM template/parameter files in
7070
| Parameter name | Description |
7171
| ------------------------- | --------------------------------------------------------------------------------------------------------------------- |
7272
| CLUSTER_NAME | Resource Name of the Cluster |
73-
| LOCATION | The Azure Region where the Cluster is deployed |
73+
| LOCATION | The Azure Region where the Cluster is deployed |
7474
| CL_NAME | The Cluster Manager Custom Location from Azure portal |
7575
| CLUSTER_RG | The cluster resource group name |
7676
| LAW_ID | Log Analytics Workspace ID for the Cluster |
@@ -155,7 +155,69 @@ az networkcloud cluster deploy \
155155
--no-wait --debug
156156
```
157157

158-
This command runs synchronously. If you wish to skip waiting for the command to complete, specify the `--no-wait --debug` options. For more information, see [how to track asynchronous operations](howto-track-async-operations-cli.md).
158+
> [!TIP]
159+
> To check the status of the `az networkcloud cluster deploy` command, it can be executed using the `--debug` flag.
160+
> This will allow you to obtain the `Azure-AsyncOperation` or `Location` header used to query the `operationStatuses` resource.
161+
> See the section [Cluster Deploy Failed](#cluster-deploy-failed) for more detailed steps.
162+
> Optionally, the command can run asynchronously using the `--no-wait` flag.
163+
164+
### Cluster Deploy with hardware validation
165+
166+
During a Cluster deploy process, one of the steps executed is hardware validation.
167+
The hardware validation procedure runs various test and checks against the machines
168+
provided through the Cluster's rack definition. Based on the results of these checks
169+
and any user skipped machines, a determination is done on whether sufficient nodes
170+
passed and/or are available to meet the thresholds necessary for deployment to continue.
171+
172+
#### Cluster Deploy Action with skipping specific bare-metal-machine
173+
174+
A parameter can be passed in to the deploy command that represents the names of
175+
bare metal machines in the cluster that should be skipped during hardware validation.
176+
Nodes skipped aren't validated and aren't added to the node pool.
177+
Additionally, nodes skipped don't count against the total used by threshold calculations.
178+
179+
```azurecli
180+
az networkcloud cluster deploy \
181+
--name "$CLUSTER_NAME" \
182+
--resource-group "$CLUSTER_RESOURCE_GROUP" \
183+
--subscription "$SUBSCRIPTION_ID" \
184+
--skip-validations-for-machines "$COMPX_SVRY_SERVER_NAME"
185+
```
186+
187+
#### Cluster Deploy failed
188+
189+
To track the status of an asynchronous operation, run with a `--debug` flag enabled.
190+
When `--debug` is specified, the progress of the request can be monitored.
191+
The operation status URL can be found by examining the debug output looking for the
192+
`Azure-AsyncOperation` or `Location` header on the HTTP response to the creation request.
193+
The headers can provide the `OPERATION_ID` field used in the HTTP API call.
194+
195+
```azurecli
196+
OPERATION_ID="12312312-1231-1231-1231-123123123123*99399E995..."
197+
az rest -m GET -u "https://management.azure.com/subscriptions/${SUBSCRIPTION_ID}/providers/Microsoft.NetworkCloud/locations/${LOCATION}/operationStatuses/${OPERATION_ID}?api-version=2022-12-12-preview"
198+
```
199+
200+
The output is similar to the JSON struct example. When the error code is
201+
`HardwareValidationThresholdFailed`, then the error message contains a list of bare
202+
metal machine(s) that failed the hardware validation (for example, `COMP0_SVR0_SERVER_NAME`,
203+
`COMP1_SVR1_SERVER_NAME`). These names can be used to parse the logs for further details.
204+
205+
```json
206+
{
207+
"endTime": "2023-03-24T14:56:59.0510455Z",
208+
"error": {
209+
"code": "HardwareValidationThresholdFailed",
210+
"message": "HardwareValidationThresholdFailed error hardware validation threshold for cluster layout plan is not met for cluster $CLUSTER_NAME in namespace nc-system with listed failed devices $COMP0_SVR0_SERVER_NAME, $COMP1_SVR1_SERVER_NAME"
211+
},
212+
"id": "/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.NetworkCloud/locations/$LOCATION/operationStatuses/12312312-1231-1231-1231-123123123123*99399E995...",
213+
"name": "12312312-1231-1231-1231-123123123123*99399E995...",
214+
"resourceId": "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$CLUSTER_RESOURCE_GROUP/providers/Microsoft.NetworkCloud/clusters/$CLUSTER_NAME",
215+
"startTime": "2023-03-24T14:56:26.6442125Z",
216+
"status": "Failed"
217+
}
218+
```
219+
220+
See the article [Tracking Asynchronous Operations Using Azure CLI](./howto-track-async-operations-cli.md) for another example.
159221

160222
## Cluster deployment validation
161223

0 commit comments

Comments
 (0)