Skip to content

Commit 7e20d2a

Browse files
committed
Update troubleshoot-reboot-reimage-replace.md
1 parent f07c76a commit 7e20d2a

File tree

1 file changed

+19
-19
lines changed

1 file changed

+19
-19
lines changed

articles/operator-nexus/troubleshoot-reboot-reimage-replace.md

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ ms.author: ekarandjeff
1111

1212
# Troubleshoot Azure Operator Nexus Bare Metal Machine server problems
1313

14-
This article describes how to troubleshoot server problems by using restart, reimage, and replace actions on Azure Operator Nexus bare metal machines (BMMs). You might need to take these actions on your server for maintenance reasons, which may cause a brief disruption to specific BMMs.
14+
This article describes how to troubleshoot server problems by using Restart, Reimage, and Replace actions on Azure Operator Nexus Bare Metal Machines (BMMs). You might need to take these actions on your server for maintenance reasons, which may cause a brief disruption to specific BMMs.
1515

1616
The time required to complete each of these actions is similar. Restarting is the fastest, whereas replacing takes slightly longer. All three actions are simple and efficient methods for troubleshooting.
1717

@@ -37,9 +37,9 @@ The time required to complete each of these actions is similar. Restarting is th
3737

3838
When troubleshooting a BMM for failures and determining the most appropriate corrective action, it is essential to understand the available options. This article provides a systematic approach to troubleshoot Azure Operator Nexus server problems using these three methods:
3939

40-
1. **Restart** - Least invasive method, best for temporary glitches or unresponsive VMs
41-
2. **Reimage** - Intermediate solution, restores OS to known-good state without affecting data
42-
3. **Replace** - Most significant action, required for hardware component failures
40+
- **Restart** - Least invasive method, best for temporary glitches or unresponsive VMs
41+
- **Reimage** - Intermediate solution, restores OS to known-good state without affecting data
42+
- **Replace** - Most significant action, required for hardware component failures
4343

4444
### Troubleshooting decision tree
4545

@@ -63,9 +63,9 @@ The restart typically is the starting point for mitigating a problem.
6363
### Restart workflow
6464

6565
1. **Assess impact** - Determine if restarting the BMM will impact critical workloads.
66-
1. **Power off** - If needed, power off the BMM (optional).
67-
1. **Start or restart** - Either start a powered-off BMM or restart a running BMM.
68-
1. **Verify status** - Check if the BMM is back online and functioning properly.
66+
2. **Power off** - If needed, power off the BMM (optional).
67+
3. **Start or restart** - Either start a powered-off BMM or restart a running BMM.
68+
4. **Verify status** - Check if the BMM is back online and functioning properly.
6969

7070
> [!NOTE]
7171
> The restart operation is the fastest recovery method but may not resolve issues related to OS corruption or hardware failures.
@@ -118,9 +118,9 @@ A reimage action is the best practice for lowest operational risk to ensure the
118118
### Reimage workflow
119119

120120
1. **Verify running workloads** - Before reimaging, check what workloads are running on the BMM.
121-
1. **Cordon and evacuate workloads** - Drain the BMM of workloads.
122-
1. **Perform reimage** - Execute the reimage operation.
123-
1. **Uncordon** - Make the BMM schedulable again after reimage completes.
121+
2. **Cordon and evacuate workloads** - Drain the BMM of workloads.
122+
3. **Perform reimage** - Execute the reimage operation.
123+
4. **Uncordon** - Make the BMM schedulable again after reimage completes.
124124

125125
> [!WARNING]
126126
> Running more than one `baremetalmachine replace` or `reimage` command at the same time, or running a `replace`
@@ -182,10 +182,10 @@ A hardware validation process is invoked to ensure the integrity of the physical
182182
### Replace workflow
183183

184184
1. **Cordon and evacuate** - Remove workloads from the BMM before physical repair.
185-
1. **Perform physical repairs** - Replace hardware components as needed.
186-
1. **Execute replace command** - Run the replace command with required parameters.
187-
1. **Uncordon** - Make the BMM schedulable again after replacement completes.
188-
1. **Verify status** - Check that the BMM is properly functioning.
185+
2. **Perform physical repairs** - Replace hardware components as needed.
186+
3. **Execute replace command** - Run the replace command with required parameters.
187+
4. **Uncordon** - Make the BMM schedulable again after replacement completes.
188+
5. **Verify status** - Check that the BMM is properly functioning.
189189

190190
**The following Azure CLI command will `cordon` the specified bareMetalMachineName.**
191191

@@ -256,11 +256,11 @@ Restarting, reimaging, and replacing are effective troubleshooting methods for a
256256

257257
### Best practices
258258

259-
1. **Always follow the escalation path**: Start with restart, then reimage, then replace unless the issue clearly indicates otherwise.
260-
1. **Verify workloads before action**: Use the provided commands to identify running workloads before any disruptive action.
261-
1. **Cordon with evacuation**: When performing reimage or replace actions, always use `cordon` with `evacuate="True"` to safely move workloads.
262-
1. **Never run multiple operations simultaneously**: Ensure one operation completes before starting another to prevent server issues.
263-
1. **Verify resolution**: After performing any action, verify the BMM status and that the original issue is resolved.
259+
- **Always follow the escalation path**: Start with restart, then reimage, then replace unless the issue clearly indicates otherwise.
260+
- **Verify workloads before action**: Use the provided commands to identify running workloads before any disruptive action.
261+
- **Cordon with evacuation**: When performing reimage or replace actions, always use `cordon` with `evacuate="True"` to safely move workloads.
262+
- **Never run multiple operations simultaneously**: Ensure one operation completes before starting another to prevent server issues.
263+
- **Verify resolution**: After performing any action, verify the BMM status and that the original issue is resolved.
264264

265265
More details about the BMM actions can be found in the [BMM actions](howto-baremetal-functions.md) article.
266266

0 commit comments

Comments
 (0)