Skip to content

Commit 99108ef

Browse files
committed
resolved various feedback comments
1 parent dbbf089 commit 99108ef

File tree

5 files changed

+22
-26
lines changed

5 files changed

+22
-26
lines changed

articles/operator-nexus/TOC.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -242,7 +242,7 @@
242242
href: howto-baremetal-bmm-ssh.md
243243
- name: BareMetal BMC Access Setup
244244
href: howto-baremetal-bmc-ssh.md
245-
- name: BareMetal Functions
245+
- name: BareMetal BMM Platform Commands
246246
href: howto-baremetal-functions.md
247247
- name: BareMetal Run-Read Execution
248248
href: howto-baremetal-run-read.md
@@ -352,7 +352,7 @@
352352
- name: Cluster or BMM
353353
expanded: false
354354
items:
355-
- name: Troubleshoot Bare Metal Machine
355+
- name: Troubleshoot Bare Metal Machine Server Problems
356356
href: troubleshoot-reboot-reimage-replace.md
357357
- name: Troubleshoot Bare Metal Machine Provisioning
358358
href: troubleshoot-bare-metal-machine-provisioning.md

articles/operator-nexus/howto-baremetal-best-practices.md

Lines changed: 14 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -37,14 +37,13 @@ For this reason, it's essential to understand the available options well when tr
3737
- Attempt to identify the root cause of the failure to avoid repeating the same mistake.
3838
Perform retry attempts in incremental steps to isolate and address specific issues.
3939
- Wait for Az CLI commands to run to completion and validate the state of the BMM resource before executing other steps.
40-
- Keep an eye on system logs to detect any anomalies during the retry process.
41-
- Verify that the firmware and software versions are up-to-date to prevent compatibility issues and ensure compatibility between hardware and software versions.
42-
- Always back up critical data to prevent data loss during the recovery or replacement process.
40+
- Verify that the firmware and software versions are up-to-date before a new greenfield deployment to prevent compatibility issues between hardware and software versions.
41+
For more information about firmware compatibility, see [Operator Nexus Platform Prerequisites](./howto-platform-prerequisites.md).
4342
- Ensure stable network connectivity to avoid interruptions during the process.
4443
Validate that there are no active network stability issues with the network fabric.
4544
Ignoring network stability could make operations fail to complete successfully and leave a BMM in an unknown state.
4645

47-
## Best Practices for BMM Reimage
46+
## Best Practices for a BMM Reimage
4847

4948
The BMM `reimage` action is explained in [BMM Lifecycle Management Commands] and scenario procedures described in [Troubleshoot Azure Operator Nexus Server Problems].
5049

@@ -53,33 +52,31 @@ The BMM `reimage` action is explained in [BMM Lifecycle Management Commands] and
5352
You can restore the operating system runtime version on a BMM by executing the `reimage` operation.
5453
A BMM `reimage` can be both time-saving and reliable for resolving issues or restoring the operating system software to a known-good state.
5554
This process **redeploys** the runtime image on the target BMM and executes the steps to rejoin the cluster with the same identifiers.
56-
The `reimage` action doesn't affect the tenant workload files on the BMM under normal circumstances.
55+
The `reimage` action is designed to interact with the operating system partition, leaving virtual machine's local storage unchanged.
5756

5857
> [!IMPORTANT]
59-
> Avoid write or edit actions performed on the node via BMM access.
58+
> Avoid manual or automated changes to the BMM's file system (also known as "break glass").
6059
> The `reimage` action is required to restore Microsoft support and any changes done to the BMM are lost while restoring the node to its expected state.
6160
62-
### Preconditions and Validations Before BMM Reimage
61+
### Preconditions and Validations Before a BMM Reimage
6362

6463
Before initiating any `reimage` operation, ensure the following preconditions are met:
6564

6665
- Ensure the BMM is in `poweredState` set to `On` and `readyState` set to `True`.
6766
- Make sure the BMM's workloads are drained using the [`cordon`](./howto-baremetal-functions.md#make-a-bmm-unschedulable-cordon) command with the parameter `evacuate` set to `True`.
6867
- Perform high level checks covered in the article [Troubleshoot Bare Metal Machine Provisioning].
69-
- Evaluate any BMM warnings or degraded conditions which could indicate the need to resolve hardware, network, or server configuration problems before a `replace` operation.
68+
- Evaluate any BMM warnings or degraded conditions which could indicate the need to resolve hardware, network, or server configuration problems before a `reimage` operation.
7069
For more information, read [Troubleshoot Degraded Status Errors on Bare Metal Machines] and [Troubleshoot Bare Metal Machine Warning Status].
71-
- Ensure to resolve any BMM hardware validation failures.
72-
Read article [Troubleshoot Hardware Validation Failure](./troubleshoot-hardware-validation-failure.md) to understand hardware validation results.
73-
- Validate that there are no running firmware upgrade jobs through the BMC before initiating a `replace` operation.
70+
- Validate that there are no running firmware upgrade jobs through the BMC before initiating a `reimage` operation.
7471
The BMM has `provisioningStatus` in the `Preparing` state. Interrupting an ongoing firmware upgrade can leave the BMM in an inconsistent state.
7572

76-
## Best Practices for BMM Replace
73+
## Best Practices for a BMM Replace
7774

7875
The BMM `replace` action is explained in [BMM Lifecycle Management Commands] and scenario procedures described in [Troubleshoot Azure Operator Nexus Server Problems].
7976

8077
[!INCLUDE [warning-donot-run-multiple-actions](./includes/baremetal-machines/warning-donot-run-multiple-actions.md)]
8178

82-
Hardware failures are an expected occurrence over the natural lifecycle of a server.
79+
Hardware failures are a normal occurrence over the life of a server.
8380
Component replacements might be necessary to restore functionality and ensure continued operation.
8481
In cases where one or more hardware components fail on the server, it's necessary to perform a BMM `replace` operation.
8582
The `replace` operation should be executed after any hardware maintenance event. Multiple maintenance events should be done as multiple `replace` operations.
@@ -90,14 +87,15 @@ The `replace` operation should be executed after any hardware maintenance event.
9087
9188
### Resolve Hardware Validation Issues
9289

93-
When a BMM is marked with failed hardware validation, it indicates that physical repairs are needed. It's crucial to identify and address these repairs before performing a BMM `replace`.
90+
When a BMM is marked with failed hardware validation, it might indicate that physical repairs are needed.
91+
It's crucial to identify and address these repairs before performing a BMM `replace`.
9492
A hardware validation process is invoked, as part of the `replace` operation, to ensure the physical host's integrity before deploying the OS image.
95-
If the BMM continues to have hardware validation failures, then the BMM won't provision successfully meaning it fails to complete the necessary setup steps to become operational and won't join the cluster.
93+
If the BMM continues to have hardware validation failures, then the BMM can't provision successfully meaning it fails to complete the necessary setup steps to become operational and join the cluster.
9694
Ensure **all hardware validation issues** are cleared before the next `replace` action.
9795

9896
To understand hardware validation result, read through the article [Troubleshoot Hardware Validation Failure](./troubleshoot-hardware-validation-failure.md).
9997

100-
### Preconditions and Validations Before BMM Replace
98+
### Preconditions and Validations Before a BMM Replace
10199

102100
Before initiating any `replace` operation, ensure the following preconditions are met:
103101

articles/operator-nexus/howto-baremetal-functions.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: "Azure Operator Nexus: Platform functions for bare metal machines"
2+
title: "Azure Operator Nexus: BareMetal Machine Platform Commands"
33
description: Learn how to manage bare metal machines (BMM).
44
author: eak13
55
ms.author: ekarandjeff
@@ -9,7 +9,7 @@ ms.date: 07/19/2024
99
ms.custom: template-how-to, devx-track-azurecli
1010
---
1111

12-
# BareMetal Machine (BMM) Lifecycle Management Commands
12+
# BareMetal BMM Platform Commands
1313

1414
This article describes how to perform lifecycle management operations on bare metal machines (BMM).
1515
These steps should be used for troubleshooting purposes to recover from failures or when taking maintenance actions.

articles/operator-nexus/troubleshoot-bare-metal-machine-warning.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -26,15 +26,13 @@ The Detailed status message of the Bare Metal Machine (Operator Nexus) resource
2626

2727
## Troubleshooting
2828

29+
Evaluate the current status of all BMMs in the specified resource group.
30+
Any active _Warning_ conditions are visible in the Detailed Status Message, as seen in the following example.
31+
2932
To check for any Bare Metal Machines (BMMs) which are reporting _Warning_ messages, run:
3033

3134
```azurecli
3235
az networkcloud baremetalmachine list -g <ResourceGroup_Name> -o table
33-
```
34-
35-
This command shows the current status of all BMMs in the specified resource group. Any active _Warning_ conditions are visible in the Detailed Status Message, as seen in the following example.
36-
37-
```shell
3836
Name ResourceGroup DetailedStatus DetailedStatusMessage
3937
-------------- ---------------------------------- ---------------- -------------------------------------------------------------------------------------------
4038
rack1control01 cluster-1-HostedResources-3EA53DF9 Provisioned The OS is provisioned to the machine.

articles/operator-nexus/troubleshoot-reboot-reimage-replace.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ author: eak13
99
ms.author: ekarandjeff
1010
---
1111

12-
# Troubleshoot Azure Operator Nexus Server Problems
12+
# Troubleshoot Bare Metal Machine Server Problems
1313

1414
This article describes how to troubleshoot server problems by using `restart`, `reimage`, and `replace` actions on Azure Operator Nexus BareMetal Machines (BMM).
1515
These operations are performed for maintenance on your servers and cause a disruption to the specific BMM.

0 commit comments

Comments
 (0)