Skip to content

Commit 1fa29cc

Browse files
committed
edit pass: azure-operator-nexus-cluster-and-bmm
1 parent 611636f commit 1fa29cc

6 files changed

+28
-22
lines changed
Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
2-
title: "Azure Operator Nexus: Accepted Cluster"
3-
description: Learn how to troubleshoot accepted Cluster resources.
2+
title: "Azure Operator Nexus: Accepted cluster"
3+
description: Learn how to troubleshoot accepted cluster resources.
44
author: matternst7258
55
ms.author: matthewernst
66
ms.service: azure-operator-nexus
@@ -10,13 +10,13 @@ ms.date: 10/30/2024
1010
# ms.custom: template-include
1111
---
1212

13-
# Troubleshoot accepted Cluster resources
13+
# Troubleshoot accepted cluster resources
1414

15-
Operator Nexus relies on mirroring, or hydrating, resources from the on-premises cluster to Azure. When this process is interrupted, the Cluster resource can move to the `Accepted` state.
15+
Azure Operator Nexus relies on mirroring, or hydrating, resources from the on-premises cluster to Azure. When this process is interrupted, the cluster resource can move to the `Accepted` state.
1616

1717
## Diagnosis
1818

19-
The Cluster status is viewed via the Azure portal or the Azure CLI.
19+
The cluster status is viewed via the Azure portal or the Azure CLI.
2020

2121
```bash
2222
az networkcloud cluster show --resource-group <RESOURCE_GROUP> --name <CLUSTER_NAME>
@@ -28,7 +28,7 @@ Follow these steps for mitigation.
2828

2929
### Trigger the resource sync
3030

31-
1. From the Cluster resource page in the Azure portal, add a tag to the Cluster resource.
31+
1. From the cluster resource page in the Azure portal, add a tag to the cluster resource.
3232
1. The resource moves out of the `Accepted` state.
3333

3434
```bash
@@ -39,16 +39,16 @@ az resource tag --tags exampleTag=exampleValue --name <CLUSTER> --resource-group
3939

4040
## Verification
4141

42-
After the tag is applied, the Cluster moves to the `Running` state.
42+
After the tag is applied, the cluster moves to the `Running` state.
4343

4444
```bash
4545
az networkcloud cluster show --resource-group <RESOURCE_GROUP> --name <CLUSTER_NAME>
4646
```
4747

48-
If the Cluster resource maintains the state after more than five minutes, contact Microsoft support.
48+
If the cluster resource maintains the state for more than five minutes, contact Microsoft support.
4949

5050
## Related content
5151

5252
- For more information about how resources are hydrated, see [Azure Arc-enabled Kubernetes](/azure/azure-arc/kubernetes/overview).
53-
- If you still have questions, [contact Azure support](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade).
53+
- If you still have questions, contact [Azure support](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade).
5454
- For more information about support plans, see [Azure support plans](https://azure.microsoft.com/support/plans/response/).

articles/operator-nexus/troubleshoot-bare-metal-machine-provisioning.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -250,5 +250,7 @@ racadm -r $BMC_IP -u $BMC_USER -p $CURRENT_PASSWORD set iDRAC.Users.2.Password
250250

251251
After the hardware is fixed, run the BMM `replace` action by following the instructions in [Manage the lifecycle of bare metal machines](howto-baremetal-functions.md).
252252

253-
If you still have questions, [contact Support](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade).
254-
For more information about support plans, see [Azure Support plans](https://azure.microsoft.com/support/plans/response/).
253+
## Related content
254+
255+
- If you still have questions, contact [Azure support](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade).
256+
- For more information about support plans, see [Azure support plans](https://azure.microsoft.com/support/plans/response/).

articles/operator-nexus/troubleshoot-control-plane-quorum.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ Follow the steps in this troubleshooting article when multiple control plane nod
2929

3030
## Procedure
3131

32-
1. Identify the Nexus Management Node:
32+
1. Identify the Azure Operator Nexus management nodes:
3333
- To identify the management nodes, run `az networkcloud baremetalmachine list -g <ResourceGroup_Name>`.
3434
- Sign in to the identified server.
3535
- Ensure that the ironic-conductor service is present on this node by using `crictl ps -a |grep -i ironic-conductor`. Here's example output:
@@ -39,7 +39,7 @@ Follow the steps in this troubleshooting article when multiple control plane nod
3939
<id> <id> 6 hours ago Running ironic-conductor 0 <id>
4040
~~~
4141
42-
1. Determine the Dell remote access controller (iDRAC) IP of the server:
42+
1. Determine the integrated Dell remote access controller (iDRAC) IP of the server:
4343
- Run the command `az networkcloud cluster list -g <RG_Name>`.
4444
- The output of the command is JSON with the iDRAC IP.
4545
@@ -68,5 +68,7 @@ Follow the steps in this troubleshooting article when multiple control plane nod
6868
6969
The servers should now be restored.
7070
71-
If you still have questions, [contact Azure support](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade).
72-
For more information about support plans, see [Azure support plans](https://azure.microsoft.com/support/plans/response/).
71+
## Related content
72+
73+
- If you still have questions, contact [Azure support](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade).
74+
- For more information about support plans, see [Azure support plans](https://azure.microsoft.com/support/plans/response/).

articles/operator-nexus/troubleshoot-hardware-validation-failure.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ author: vnikolin
99
ms.author: vanjanikolin
1010
---
1111

12-
# Troubleshoot hardware validation failure in a Nexus cluster
12+
# Troubleshoot hardware validation failure in an Azure Operator Nexus cluster
1313

1414
This article describes how to troubleshoot a failed server hardware validation (HWV). HWV is run as part of a cluster deploy action and a bare metal `replace` action. HWV validates a bare metal machine (BMM) by executing test cases against the baseboard management controller (BMC). The Azure Operator Nexus platform is deployed on Dell servers. Dell servers use the integrated Dell remote access controller (iDRAC), which is the equivalent of a BMC.
1515

@@ -480,7 +480,7 @@ This section discusses troubleshooting for problems you might encounter.
480480
}
481481
```
482482

483-
* Allow-listed critical alarms and warning alarms are logged as informational starting with Nexus release 3.14.
483+
* Allow-listed critical alarms and warning alarms are logged as informational starting with Azure Operator Nexus release 3.14.
484484

485485
```yaml
486486
{
@@ -696,5 +696,7 @@ This section discusses troubleshooting for problems you might encounter.
696696

697697
After the hardware is fixed, run the BMM `replace` action by following the instructions in [Manage the lifecycle of bare metal machines](howto-baremetal-functions.md).
698698

699-
If you still have questions, [contact Azure support](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade).
700-
For more information about support plans, see [Azure support plans](https://azure.microsoft.com/support/plans/response/).
699+
## Related content
700+
701+
- If you still have questions, contact [Azure support](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade).
702+
- For more information about support plans, see [Azure support plans](https://azure.microsoft.com/support/plans/response/).

articles/operator-nexus/troubleshoot-lacp-bonding.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ On physical host startup, the two Mellanox cards are bonded to a pair of Arista
1515

1616
## Diagnosis
1717

18-
If LACP isn't negotiated correctly, traffic loss can occur. But traffic can pass for some flows too. This behavior can manifest itself as a virtual machine that can't get on the network, or even as oam/storage outages.
18+
If LACP isn't negotiated correctly, traffic loss can occur. But traffic can pass for some flows too. This behavior can manifest itself as a virtual machine that can't get on the network, or even as object attribute memory (OAM) or storage outages.
1919

2020
## Check LACP bonding
2121

@@ -48,5 +48,5 @@ The most common causes for these LACP issues are host or switch miswiring or mis
4848

4949
## Related content
5050

51-
- If you still have questions, [contact Azure support](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade).
51+
- If you still have questions, contact [Azure support](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade).
5252
- For more information about support plans, see [Azure support plans](https://azure.microsoft.com/support/plans/response/).

articles/operator-nexus/troubleshoot-memory-limits.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Learn about troubleshooting for container memory limits in this article.
1515

1616
## Alerts for memory limits
1717

18-
We recommend that you have alerts set up for the Operator Nexus cluster to look for Kubernetes pods that restart from `OOMKill` errors. These alerts let you know if a component on a server is working appropriately.
18+
We recommend that you have alerts set up for the Azure Operator Nexus cluster to look for Kubernetes pods that restart from `OOMKill` errors. These alerts let you know if a component on a server is working appropriately.
1919

2020
The following table lists the metrics that are exposed to identify memory limits.
2121

0 commit comments

Comments
 (0)