Skip to content

Commit a45497a

Browse files
authored
Merge pull request #285622 from DanCrank/danielcrank/nexusctl
[operator-nexus] Documentation for nexusctl #1470882
2 parents 7134de9 + ebc1213 commit a45497a

File tree

3 files changed

+118
-11
lines changed

3 files changed

+118
-11
lines changed

articles/operator-nexus/TOC.yml

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
- name: Network Fabric read-only commands
3030
href: concepts-network-fabric-read-only-commands.md
3131
- name: Network Fabric read write commands
32-
href: concepts-network-fabric-read-write-commands.md
32+
href: concepts-network-fabric-read-write-commands.md
3333
- name: Disable Border Gateway Protocol neighbors
3434
href: concepts-disable-border-gateway-protocol-neighbors.md
3535
- name: Isolation Domains
@@ -46,14 +46,14 @@
4646
- name: Nexus Kubernetes
4747
expanded: false
4848
items:
49-
- name: Overview
50-
href: concepts-nexus-kubernetes-cluster.md
51-
- name: Resource Placement
52-
href: concepts-nexus-kubernetes-placement.md
53-
- name: Networking
54-
href: concepts-nexus-networking.md
55-
- name: Workload Network Types
56-
href: concepts-nexus-workload-network-types.md
49+
- name: Overview
50+
href: concepts-nexus-kubernetes-cluster.md
51+
- name: Resource Placement
52+
href: concepts-nexus-kubernetes-placement.md
53+
- name: Networking
54+
href: concepts-nexus-networking.md
55+
- name: Workload Network Types
56+
href: concepts-nexus-workload-network-types.md
5757
- name: Observability
5858
expanded: false
5959
items:
@@ -290,6 +290,8 @@
290290
href: howto-baremetal-run-read.md
291291
- name: BareMetal Run-Data-Extract Execution
292292
href: howto-baremetal-run-data-extract.md
293+
- name: Running BareMetal actions directly with nexusctl
294+
href: howto-baremetal-nexusctl.md
293295
- name: Troubleshoot Control Plane Quorum
294296
href: troubleshoot-control-plane-quorum.md
295297
- name: Troubleshoot Bare Metal Machine Provisioning
@@ -358,4 +360,3 @@
358360
items:
359361
- name: 2404.2
360362
href: release-notes-2404.2.md
361-

articles/operator-nexus/howto-baremetal-functions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ This article describes how to perform lifecycle management operations on bare me
2626
- **Replace the BMM**
2727

2828
> [!IMPORTANT]
29-
> Disruptive command requests against a Kubernetes Control Plane (KCP) node are rejected if there is another disruptive action command already running against another KCP node or if the full KCP is not available. This check is done to maintain the integrity of the Nexus instance and ensure multiple KCP nodes don't go down at once due to simultaneous disruptive actions. If multiple nodes go down, it will break the healthy quorum threshold of the Kubernetes Control Plane.
29+
> Disruptive command requests against a Kubernetes Control Plane (KCP) node are rejected if there is another disruptive action command already running against another KCP node or if the full KCP is not available. This check is done to maintain the integrity of the Nexus instance and ensure multiple KCP nodes don't become non-operational at once due to simultaneous disruptive actions. If multiple nodes become non-operational, it will break the healthy quorum threshold of the Kubernetes Control Plane.
3030
>
3131
> The bolded actions in the above list are considered disruptive (Power off, Restart, Reimage, Replace). Cordon without evacuate is not considered disruptive. Cordon with evacuate is considered disruptive.
3232
>
Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
---
2+
title: "Azure Operator Nexus: Running bare metal actions directly with nexusctl"
3+
description: Learn how to bypass Azure and run bare metal actions directly in an emergency using nexusctl.
4+
author: DanCrank
5+
ms.author: danielcrank
6+
ms.service: azure-operator-nexus
7+
ms.topic: how-to
8+
ms.date: 08/26/2024
9+
ms.custom: template-how-to, devx-track-azurecli
10+
---
11+
12+
# Run emergency bare metal actions outside of Azure using nexusctl
13+
14+
This article describes the `nexusctl` utility, which can be used in break-glass (emergency) situations to
15+
run simple actions on bare metal machines without using the Azure console or command-line interface (CLI).
16+
17+
> [!CAUTION]
18+
> Do not perform any action against management servers without first consulting with Microsoft support personnel. Doing so could affect the integrity of the Operator Nexus Cluster.
19+
20+
> [!IMPORTANT]
21+
> Disruptive command requests against a Kubernetes Control Plane (KCP) node are rejected if there is another disruptive action command already running against another KCP node or if the full KCP is not available. This check is done to maintain the integrity of the Nexus instance and ensure multiple KCP nodes don't become non-operational at once due to simultaneous disruptive actions. If multiple nodes become non-operational, it will break the healthy quorum threshold of the Kubernetes Control Plane.
22+
>
23+
> Powering off a KCP node is the only nexusctl action considered disruptive in the context of this check.
24+
25+
## Prerequisites
26+
27+
- A [BareMetalMachineKeySet](./howto-baremetal-bmm-ssh.md) must be available to allow ssh access to the bare metal machines. The user must have superuser privilege level.
28+
- The platform Kubernetes must be up and running on site.
29+
30+
## Overview
31+
32+
`nexusctl` is a stand-alone program that can be run using `nc-toolbox` from an `ssh` session on any control-plane or management-plane node. Since `nexusctl` is contained in the `nc-toolbox-breakglass` container image and isn't installed directly on the host, it must be run with a command-line like:
33+
34+
```
35+
sudo nc-toolbox nc-toolbox-breakglass nexusctl <command> [subcommand] [options]
36+
```
37+
38+
(`nc-toolbox` must always be run as root or with `sudo`.)
39+
40+
Like most other command-line programs, the `--help` option can be used with any command or subcommand to see more information:
41+
42+
```
43+
sudo nc-toolbox nc-toolbox-breakglass nexusctl --help
44+
sudo nc-toolbox nc-toolbox-breakglass nexusctl baremetal --help
45+
sudo nc-toolbox nc-toolbox-breakglass nexusctl baremetal power-off --help
46+
```
47+
48+
etc.
49+
50+
> [!NOTE]
51+
>
52+
> > There is no bulk execution against multiple machines. Commands are executed on a machine by machine basis.
53+
54+
## Power off a bare metal machine
55+
56+
A single bare metal machine can be powered off by connecting to a control-plane or management-plane node via ssh and running the command:
57+
58+
```
59+
sudo nc-toolbox nc-toolbox-breakglass nexusctl baremetal power-off --name <machine name>
60+
```
61+
62+
If the command is accepted, `nexusctl` responds with another command line that can be used to view the status of the long-running operation. Prefix this command with `sudo nc-toolbox nc-toolbox-breakglass`, as follows:
63+
64+
```
65+
sudo nc-toolbox nc-toolbox-breakglass nexusctl baremetal power-off --status --name <machine name> --operation-id <operation-id>
66+
```
67+
68+
The status is blank until the operation completes and reaches either a "succeeded" or "failed" state. While it's blank, assume that the operation is still in progress.
69+
70+
## Start a bare metal machine
71+
72+
A single bare metal machine can be started by connecting to a control-plane or management-plane node via ssh and running the command:
73+
74+
```
75+
sudo nc-toolbox nc-toolbox-breakglass nexusctl baremetal start --name <machine name>
76+
```
77+
78+
If the command is accepted, `nexusctl` responds with another command line that can be used to view the status of the long-running operation. Prefix this command with `sudo nc-toolbox nc-toolbox-breakglass`, as follows:
79+
80+
```
81+
sudo nc-toolbox nc-toolbox-breakglass nexusctl baremetal start --status --name <machine name> --operation-id <operation-id>
82+
```
83+
84+
The status is blank until the operation completes and reaches either a "succeeded" or "failed" state. While it's blank, assume that the operation is still in progress.
85+
86+
## Unmanage a bare metal machine (set to unmanaged state)
87+
88+
A single bare metal machine can be switched to an unmanaged state by connecting to a control-plane or management-plane node via ssh and running the command:
89+
90+
```
91+
sudo nc-toolbox nc-toolbox-breakglass nexusctl baremetal unmanage --name <machine name>
92+
```
93+
94+
While in an unmanaged state, no actions are permitted for that machine, except for returning it to a managed state (see next section). This function can be used to keep a bare metal machine powered off if it's in a rebooting crash loop.
95+
96+
`unmanage` isn't a long-running command, so there's no associated command to check operation status.
97+
98+
## Manage a bare metal machine (set to managed state)
99+
100+
A single bare metal machine can be switched to a managed state by connecting to a control-plane or management-plane node via ssh and running the command:
101+
102+
```
103+
sudo nc-toolbox nc-toolbox-breakglass nexusctl baremetal manage --name <machine name>
104+
```
105+
106+
`manage` isn't a long-running command, so there's no associated command to check operation status.

0 commit comments

Comments
 (0)