Skip to content

Commit 140fdfe

Browse files
committed
Troubleshooting docs
1 parent ba1a3d9 commit 140fdfe

File tree

5 files changed

+70
-5
lines changed

5 files changed

+70
-5
lines changed

articles/operator-nexus/TOC.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -407,6 +407,12 @@
407407
items:
408408
- name: Due To Bare Metal Machine Power Failure
409409
href: troubleshoot-kubernetes-cluster-stuck-workloads-due-to-power-failure.md
410+
- name: Storage Appliance
411+
expanded: false
412+
items:
413+
- name: Troubleshoot Multiple Storage appliances
414+
href: troubleshoot-multiple-storage-appliances.md
415+
410416
- name: FAQ
411417
href: azure-operator-nexus-faq.md
412418
- name: Reference

articles/operator-nexus/concepts-storage-multiple-appliances.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ status:
7272
phase: Bound
7373
```
7474
75-
`storageApplianceName` must match the Azure resource name of the storage appliance resource managed by your Azure Operator Nexus cluster on which you want to create the volume backing your PVC. If there's no `storageApplianceName`, or if the `storageApplianceName` doesn't match a storage appliance resource managed by your Azure Operator Nexus cluster, Azure Operator Nexus places the volume on the first storage appliance.
75+
`storageApplianceName` must match the Azure resource name of the storage appliance resource managed by your Azure Operator Nexus cluster on which you want to create the volume backing your PVC. If there's no `storageApplianceName` annotation, Azure Operator Nexus places the volume on the first storage appliance. If there is a `storageApplianceName` annotation, but it does not match the Azure resource name of a storage appliance managed by your Azure Operator Nexus cluster, the PVC creation will fail.
7676

7777
#### Nexus-volume limitations
7878

@@ -83,7 +83,9 @@ status:
8383

8484
Azure Operator Nexus provides a shared filesystem storage solution for containerized workloads: the *nexus-shared* storage class. This storage class provides a highly available shared storage solution by enabling multiple pods in the same Nexus Kubernetes cluster to concurrently access and share the same volume. The *nexus-shared* storage class is backed by a highly available storage service. This service is deployed and managed by the Cloud Service Network (CSN) resource and is in turn backed by volumes on a storage appliance. Individual PVCs consume storage from the CSN-managed storage service, rather than directly from the storage appliance.
8585

86-
You can create the shared storage service on either storage appliance when the CSN is created. All nexus-shared PVCs using that shared storage service consume storage from the storage appliance backing the shared service. You can't place a specific nexus-shared PVC on a specific storage appliance. If no storage appliance configuration is provided at CSN creation time, or if the configuration doesn't match a storage appliance, the shared storage service uses the first storage appliance.
86+
You can create the shared storage service on either storage appliance when the CSN is created. All nexus-shared PVCs using that shared storage service consume storage from the storage appliance backing the shared service. The configuration applies to all nexus-shared PVCs using the shared storage service provided by the CSN. All nexus-shared PVCs using the same shared storage service use the same storage appliance.
87+
88+
If no storage appliance configuration is provided at CSN creation time, the shared storage service uses the first storage appliance. If the configuration is present but doesn't match a storage appliance then the CSN creation will fail.
8789

8890
See [Prerequisites for deploying tenant workloads](/quickstarts-tenant-workload-prerequisites.md#create-a-cloud-services-network) for instructions on creating the shared storage service on a specific storage appliance.
8991

articles/operator-nexus/howto-platform-prerequisites.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -537,9 +537,6 @@ Interface: net1, via: LLDP, RID: 1, Time: 0 day, 20:28:36
537537
>[!NOTE]
538538
> This section is optional. You only need to execute it if you are deploying an Azure Operator Nexus instance with two storage appliances. For more information, including restrictions on supported hardware, see [Azure Operator Nexus multiple storage appliances](/concepts-storage-multiple-appliances.md).
539539
540-
>[!IMPORTANT]
541-
>Deployment of a Nexus instance with two storage appliances requires a manual step after [Network Fabric creation](/howto-configure-network-fabric.md#create-a-network-fabric) and before [Network Fabric provisioning](/howto-configure-network-fabric.md#provision-a-network-fabric). The network fabric requires manual enablement of the ports that connect to the second storage device. This step can only be performed by Microsoft support; users should [raise a support ticket to request assistance](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade).
542-
543540
1. Operator needs to install the storage array hardware as specified by the BOM and rack elevation
544541
within the Aggregation Rack. The hardware must go in the second storage appliance rack slot in the aggregator rack.
545542
2. Operator needs to provide the storage array Technician with information, in order for the storage

articles/operator-nexus/list-logs-available.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,9 @@ Logs emitted by Nexus Resources provide insight in the detailed operations of Ne
3535
| Storage Appliance audits | Audit Logs from Storage Appliance |
3636
| Storage Appliance alerts | Alert logs from Storage Appliance |
3737

38+
>[!NOTE]
39+
> Storage appliance audit and alert logs are specific to a single storage appliance. Nexus instances with multiple storage appliances have different tables for each storage appliance. System logs have a single table for all storage appliances in the Nexus instance.
40+
3841
## Cluster Manager
3942

4043
| Log categories | Description |
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
---
2+
title: Troubleshooting common issues with multiple Storage Appliances
3+
description: Troubleshooting common issues with multiple Storage Appliances
4+
ms.service: azure-operator-nexus
5+
ms.custom: troubleshooting
6+
ms.topic: troubleshooting
7+
ms.date: 08/12/2024
8+
ms.author: peterwhiting
9+
author: pjw711
10+
---
11+
12+
# Troubleshooting common issues with multiple Storage Appliances
13+
14+
This guide documents common issues encountered in Azure Operator Nexus environments with multiple storage appliances.
15+
16+
## Failure to create the storage appliance
17+
18+
There are several common misconfigurations that prevent the second storage appliance from deploying successfully. Symptoms include:
19+
20+
- The [cluster creation](/howto-configure-cluster.md#create-a-cluster) step failing.
21+
- The cluster creation step succeeding, but only creating a single storage appliance resource.
22+
23+
If you see these issues, perform these checks:
24+
25+
- Confirm that you correctly configured the prerequisites for both storage appliances. The initial IP address configuration is different for each storage appliance. See [the platform prerequisites](./howto-platform-prerequisites.md) for the correct configuration.
26+
- Confirm that Network Fabric Controller and Network Fabric are successfully provisioned.
27+
- Confirm that you have opened a support ticket for Network Fabric second storage appliance port enablement and that the ticket has been closed.
28+
- Check that the Azure CLI command you ran included the configuration for the second storage appliance and specified an aggregator rack SKU that supports a second storage appliance. See [cluster creation with multiple storage appliances](/howto-configure-cluster.md#create-the-cluster-using-azure-cli---multiple-storage-appliances) for details.
29+
30+
If any of the configuration was incorrect, delete the Nexus cluster, apply the correct initial storage appliance configuration and/or open a support ticket for Network Fabric enablement, and then recreate the cluster with the correct configuration.
31+
32+
## Nexus-volume Persistent Volume Claim (PVC) on the wrong storage appliance
33+
34+
PVCs using the nexus-volume storage class can select the storage appliance to use for backing storage using the `storageApplianceName` annotation. If this annotation is not present the PVC will use the first storage appliance. You can check this by using `kubectl get pvc <pvcName>` and checking the `storageApplianceName` annotation. The value tells you which storage appliance the PVC is using.
35+
36+
If you wanted to create the PVC on the other storage appliance then you must delete and recreate the PVC, and then provide the correct annotation. There is no support for moving the volumes consumed by a PVC between storage appliances.
37+
38+
## Failure to create nexus-volume PVC
39+
40+
A PVC will fail to create if the `storageApplianceName` annotation is present but does not match the Azure Resource name of a storage appliance managed by the Nexus Cluster. You can check that the `storageApplianceName` annotation is correct by:
41+
42+
1. Opening the Cluster (Operator Nexus) resource in the Azure Portal
43+
1. Clicking on Rack definitions in the resource menu.
44+
1. Navigating to the aggregator rack and selecting Storage Appliance definitions.
45+
46+
The `storageApplianceName` annotation must match one of the storage appliances in the Storage Appliance definitions list. You must delete the PVC and recreate it with the correct annotation to resolve this issue.
47+
48+
## CSN fails to create
49+
50+
A CSN will fail to create if the `storageApplianceName` Azure resource tag is present but does not match the Azure Resource name of a storage appliance managed by the Nexus Cluster. You can check that the `storageApplianceName` Azure resource tag is correct by:
51+
52+
1. Opening the Cluster (Operator Nexus) resource in the Azure Portal
53+
1. Clicking on Rack definitions in the resource menu.
54+
1. Navigating to the aggregator rack and selecting Storage Appliance definitions.
55+
56+
The `storageApplianceName` Azure resource tag must match one of the storage appliances in the Storage Appliance definitions list. You must delete the CSN and recreate it with the correct Azure Resource tag to resolve this issue.
57+

0 commit comments

Comments
 (0)