Skip to content

Commit 76e3b09

Browse files
revert most secondary copilot changes
1 parent f2f694d commit 76e3b09

6 files changed

+52
-61
lines changed

articles/operator-nexus/concepts-compute.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ Huge page usage in workloads refers to the utilization of large memory pages, ty
4545

4646
Workloads that involve large data sets or intensive memory operations such as network packet processing, can benefit from huge page usage because it enhances memory performance and reduces memory-related bottlenecks. As a result, users see improved throughput and reduced latency.
4747

48-
All virtual machines created on Azure Operator Nexus are backed by 1GiB(1G) hugepages for the requested memory. The kernel running inside the VM can manage these available memory anyway it likes, including the allocation of memory to support hugepages (2M or 1G).
48+
All virtual machines created on Azure Operator Nexus are backed by 1GiB(1G) hugepages for the requested memory. The kernel running inside the VM can manage these available memory anyway it likes, including the allocation of memory to support hugepages (2M or 1G).
4949

5050
### Dual-stack support
5151

@@ -88,7 +88,7 @@ The following properties reflect the operational state of a BMM:
8888
- `Control plane`: These BMM runs the Kubernetes control plane agents for Nexus platform cluster.
8989
- `Management plane`: The BMM runs the Nexus platform agents including controllers and extensions.
9090
- `Compute plane`: The BMM responsible for running actual tenant workloads including Nexus Kubernetes Clusters and Virtual Machines.
91-
91+
9292
Refer this [link](reference-near-edge-baremetal-machine-roles.md) for more details on Machine Roles.
9393

9494
## BMM operations

articles/operator-nexus/concepts-rack-resiliency.md

Lines changed: 9 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -71,26 +71,23 @@ To maintain Kubernetes control plane (KCP) quorum, Operator Nexus provides autom
7171

7272
Here are the triggers for automated remediation:
7373

74-
- For all servers (Compute, Management and KCP): if a server fails to provision successfully after six hours, automated remediation occurs. This check includes provisioning a new Bare Metal Machine (BMM) at initial deployment time or provisioning during a Replace action.
75-
- For all servers (Compute, Management and KCP): if a running node is stuck in a read only root file system mode for 10 minutes, automated remediation occurs.
76-
- For KCP and Management Plane servers only, if a Kubernetes node is in an Unknown state for 30 minutes, automated remediation occurs.
74+
* For all servers (Compute, Management and KCP): if a server fails to provision successfully after six hours, automated remediation occurs. This check includes provisioning a new Bare Metal Machine (BMM) at initial deployment time or provisioning during a Replace action.
75+
* For all servers (Compute, Management and KCP): if a running node is stuck in a read only root file system mode for 10 minutes, automated remediation occurs.
76+
* For KCP and Management Plane servers only, if a Kubernetes node is in an Unknown state for 30 minutes, automated remediation occurs.
7777

7878
### Remediation process
7979

80-
- Remediation of a Compute node is now one reprovisioning attempt. If the reprovisioning fails, the node is marked Unhealthy. Reprovisioning no longer continues to retry infinitely, and the Bare Metal Machine is powered off.
81-
- Remediation of a Management Plane node is to attempt one reboot and then one reprovisioning attempt. If those steps fail, the node is marked Unhealthy.
82-
- Remediation of a KCP node is to attempt one reboot. If the reboot fails, the node is marked Unhealthy and Nexus triggers the immediate provisioning of the spare KCP node. This process is outlined in the [KCP remediation details](#kcp-remediation-details) section.
83-
- In all instances, when the Bare Metal Machine is marked unhealthy, the BMM's `detailedStatusMessage` is updated to read `Warning: BMM Node is unhealthy and may require hardware replacement.` The Bare Metal Machine's node is removed from the Kubernetes Cluster, which triggers a node drain. Users need to run a BMM Replace action to return the BMM into service and have it rejoin the Kubernetes Cluster.
84-
85-
> [!TIP]
86-
> When you run a BMM Replace to remediate an unhealthy node, you can monitor progress and steps in the Azure portal JSON view under `properties.actionStates` (Operator Nexus 2509.1+ and API 2025-07-01-preview+). See [Monitor status in Bare Metal Machine JSON properties](./howto-bare-metal-best-practices.md#monitor-status-in-bare-metal-machine-json-properties).
80+
* Remediation of a Compute node is now one reprovisioning attempt. If the reprovisioning fails, the node is marked Unhealthy. Reprovisioning no longer continues to retry infinitely, and the Bare Metal Machine is powered off. 
81+
* Remediation of a Management Plane node is to attempt one reboot and then one reprovisioning attempt. If those steps fail, the node is marked Unhealthy.
82+
* Remediation of a KCP node is to attempt one reboot. If the reboot fails, the node is marked Unhealthy and Nexus triggers the immediate provisioning of the spare KCP node. This process is outlined in the [KCP remediation details](#kcp-remediation-details) section.
83+
* In all instances, when the Bare Metal Machine is marked unhealthy, the BMM's `detailedStatusMessage` is updated to read `Warning: BMM Node is unhealthy and may require hardware replacement.` The Bare Metal Machine's node is removed from the Kubernetes Cluster, which triggers a node drain. Users need to run a BMM Replace action to return the BMM into service and have it rejoin the Kubernetes Cluster.
8784

8885
### KCP remediation details
8986

9087
Ongoing control plane resiliency requires a spare KCP node. When KCP node fails remediation and is marked Unhealthy, a deprovisioning of the node occurs. The unhealthy KCP node is exchanged with a suitable healthy Management Plane server. This Management Plane server becomes the new spare KCP node. The failed KCP node is updated and labeled as a Management Plane node. Once the label changes, an attempt to provision the newly labeled management plane node occurs. If it fails to provision, the management plane remediation process takes over. If it fails provisioning or doesn't run successfully, the machine's status remains unhealthy, and the user must fix. The unhealthy condition surfaces to the Bare Metal Machine's (BMM) `detailedStatus` and `detailedStatusMessage` fields in Azure and clears through a BMM Replace action.
9188

92-
> [!NOTE]
93-
> The provisioning retry process doesn't execute on compute and management node pool nodes for systems running the 4.1 NetworkCloud runtime. This capability is available when the Nexus Cluster is updated to the 4.4 runtime.
89+
> [!NOTE]
90+
>The provisioning retry process doesn't execute on compute and management node pool nodes for systems running the 4.1 NetworkCloud runtime. This capability is available when the Nexus Cluster is updated to the 4.4 runtime.
9491
9592
## Related links
9693

articles/operator-nexus/concepts-resource-types.md

Lines changed: 8 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ You can manage the lifecycle of a Network Fabric via Azure using any of the supp
4343

4444
### Network racks
4545

46-
Network Rack resource is a representation of your on-premises racks from the networking perspective. The number of network racks in an Operator Nexus instance depends on the Network Fabric SKU that was chosen during creation.
46+
Network Rack resource is a representation of your on-premises racks from the networking perspective. The number of network racks in an Operator Nexus instance depends on the Network Fabric SKU that was chosen during creation.
4747

4848
Each network rack consists of Network Devices that are part of that rack. For example - Customer Edge (CE) routers, Top of Rack (ToR) Switches, Management Switches, and Network Packet Brokers (NPB).
4949

@@ -63,25 +63,25 @@ The lifecycle of the Network Device resources depends on the network rack resour
6363

6464
### Isolation domains
6565

66-
Isolation Domains enable east-west or north-south connectivity across Operator Nexus instance. They provide the required network connectivity between infrastructure components and also workload components. In principle, there are two types of networks that are established by isolation domains - management network and workload or tenant network.
66+
Isolation Domains enable east-west or north-south connectivity across Operator Nexus instance. They provide the required network connectivity between infrastructure components and also workload components. In principle, there are two types of networks that are established by isolation domains - management network and workload or tenant network.
6767

6868
A management network provides private connectivity that enables communication between the Network Fabric instance that is deployed on-premises and Azure Virtual Network. You can create workload or tenant networks to enable communication between the workloads that are deployed across the Operator Nexus instance.
6969

7070
Each isolation domain is associated with a specific Network Fabric resource and has the option to be enabled/disabled. Only when an isolation domain is enabled, it's configured on the network devices, and the configuration is removed once the isolation domain is removed.
7171

7272
Primarily, there are two types of isolation domains:
7373

74-
- Layer 2 or L2 Isolation Domains
75-
- Layer 3 or L3 Isolation Domains
74+
* Layer 2 or L2 Isolation Domains
75+
* Layer 3 or L3 Isolation Domains
7676

7777
Layer 2 isolation domains enable your infrastructure and workloads communicate with each other within or across racks over a Layer 2 network. Layer 2 networks enable east-west communication within your Operator Nexus instance. You can configure an L2 isolation domain with a desired Vlan ID and MTU size, see [Nexus Limits and Quotas](./reference-limits-and-quotas.md) for MTU limits.
7878

7979
Layer 3 isolation domains enable your infrastructure and workloads communicate with each other within or across racks over a Layer 3 network. Layer 3 networks enable east-west and north-south communication within and outside your Operator Nexus instance.
8080

8181
There are two types of Layer 3 networks that you can create:
8282

83-
- Internal Network
84-
- External Network
83+
* Internal Network
84+
* External Network
8585

8686
Internal networks enable layer 3 east-west connectivity across racks within the Operator Nexus instance and external networks enable layer 3 north-south connectivity from the Operator Nexus instance to networks outside the instance. A Layer 3 isolation domain must be configured with at least one internal network; external networks are optional.
8787

@@ -110,17 +110,14 @@ Storage Appliances represent storage arrays used for persistent data storage in
110110
Bare Metal Machines represent the physical servers in a rack. They are lifecycle managed by the Cluster Manager.
111111
Bare Metal Machines are used by workloads to host Virtual Machines and Kubernetes clusters.
112112

113-
> [!NOTE]
114-
> Recent or in-progress lifecycle actions for a Bare Metal Machine (for example, Replace, Reimage, Restart) appear in the Azure portal JSON view under `properties.actionStates` (Operator Nexus 2509.1+ and API 2025-07-01-preview+). See [Monitor status in Bare Metal Machine JSON properties](./howto-bare-metal-best-practices.md#monitor-status-in-bare-metal-machine-json-properties).
115-
116113
## Workload components
117114

118115
Workload components are resources that you use in hosting your workloads.
119116

120117
### Network resources
121118

122-
The Network resources represent the virtual networking in support of your workloads hosted on VMs or Kubernetes clusters.
123-
There are four Network resource types that represent a network attachment to an underlying isolation-domain.
119+
The Network resources represent the virtual networking in support of your workloads hosted on VMs or Kubernetes clusters.
120+
There are four Network resource types that represent a network attachment to an underlying isolation-domain.
124121

125122
- **Cloud Services Network Resource**: provides VMs/Kubernetes clusters access to cloud services such as DNS, NTP, and user-specified Azure PaaS services. You must create at least one Cloud Services Network (CSN) in each of your Operator Nexus instances. Each CSN can be reused by many VMs and/or tenant clusters.
126123

articles/operator-nexus/troubleshoot-bare-metal-machine-degraded.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,9 +47,6 @@ az networkcloud baremetalmachine list \
4747
--query "[].{name:name,powerState:powerState,provisioningState:provisioningState,readyState:readyState,cordonStatus:cordonStatus,detailedStatus:detailedStatus,detailedStatusMessage:detailedStatusMessage}"
4848
```
4949

50-
> [!NOTE]
51-
> If you trigger a corrective action such as Reimage or Replace, you can monitor its status in the Azure portal JSON view under `properties.actionStates` (requires Operator Nexus 2509.1+ and API 2025-07-01-preview+). See [Monitor status in Bare Metal Machine JSON properties](./howto-bare-metal-best-practices.md#monitor-status-in-bare-metal-machine-json-properties).
52-
5350
**Example Azure CLI output**
5451

5552
This example shows a deployment with two currently degraded BMMs (`compute01` and `compute04`), and two cordoned BMMs (`compute02` and `compute04`).

0 commit comments

Comments
 (0)