Skip to content

Commit c33d423

Browse files
authored
Merge pull request #16222 from sethmanheim/mc11-1
Sync release-ash-2501 with main, fix merge
2 parents 6cf0ceb + 19bae96 commit c33d423

File tree

5 files changed

+102
-8
lines changed

5 files changed

+102
-8
lines changed

azure-managed-lustre/TOC.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,3 +54,7 @@
5454
items:
5555
- name: Recover from a regional outage
5656
href: amlfs-region-recovery.md
57+
- name: Troubleshooting
58+
items:
59+
- name: Troubleshoot cluster deployment failures
60+
href: troubleshoot-deployment.md
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
---
2+
title: Troubleshoot Azure Managed Lustre cluster deployment issues
3+
description: Learn how to troubleshoot common cluster deployment issues in Azure Managed Lustre
4+
author: pauljewellmsft
5+
ms.author: pauljewell
6+
ms.service: azure-managed-lustre
7+
ms.topic: troubleshooting-general
8+
ms.date: 11/01/2024
9+
10+
---
11+
12+
# Troubleshoot Azure Managed Lustre deployment issues
13+
14+
In this article, you learn how to troubleshoot common issues that you might encounter when deploying an Azure Managed Lustre file system.
15+
16+
## Cluster deployment fails due to incorrect network configuration
17+
18+
In this section, we cover the following causes:
19+
20+
- [Cause 1: Network ports are blocked](#cause-1-network-ports-are-blocked)
21+
- [Cause 2: Resources within the subnet are incompatible](#cause-2-resources-within-the-subnet-are-incompatible)
22+
- [Cause 3: Network security group rules aren't configured correctly](#cause-3-network-security-group-rules-arent-configured-correctly)
23+
24+
### Cause 1: Network ports are blocked
25+
26+
Port 988 and port 22 must be open within the subnet for the cluster to communicate with the Azure Managed Lustre service. If either port is blocked, the deployment fails.
27+
28+
### Solution: Verify the network configuration
29+
30+
Allow inbound and outbound access between hosts within the Azure Managed Lustre subnet. For example, access to TCP port 22 (SSH) is necessary for cluster deployment.
31+
32+
Your network security group (NSG) must allow inbound and outbound access on port 988 and ports 1019-1023. No other services can reserve or use these ports on your Lustre clients. If you use the `ypbind` daemon on your clients to maintain Network Information Services (NIS) binding information, you must ensure that `ypbind` doesn't reserve port 988.
33+
34+
Make sure that the virtual network, subnet, and NSG meet the requirements for Azure Managed Lustre. To learn more, see [Network prerequisites](amlfs-prerequisites.md#network-prerequisites).
35+
36+
### Cause 2: Resources within the subnet are incompatible
37+
38+
Azure Managed Lustre and Azure NetApp Files resources can't share a subnet. The deployment fails if you try to create an Azure Managed Lustre file system in a subnet that currently contains, or has previously contained, Azure NetApp Files resources.
39+
40+
### Solution: Verify the subnet configuration
41+
42+
If you use the Azure NetApp Files service, you must create your Azure Managed Lustre file system in a separate subnet. To learn more, see [Network prerequisites](amlfs-prerequisites.md#network-prerequisites).
43+
44+
### Cause 3: Network security group rules aren't configured correctly
45+
46+
If you're using a network security group to filter network traffic between Azure resources in an Azure virtual network, the security rules that allow or deny inbound and outbound network traffic must be properly configured. If the network security group rules aren't correctly configured for Azure Managed Lustre file system support, the deployment fails.
47+
48+
### Solution: Verify the network security group configuration
49+
50+
For detailed guidance about configuring inbound and outbound security rules to support Azure Managed Lustre file systems, see [Configure network security group rules](configure-network-security-group.md#configure-network-security-group-rules).
51+
52+
## Cluster deployment fails due to incorrect blob container configuration
53+
54+
In this section, we cover the following causes:
55+
56+
- [Cause 1: Blob container allows public access](#cause-1-blob-container-allows-public-access)
57+
- [Cause 2: Blob container can't be accessed by the file system](#cause-2-blob-container-cant-be-accessed-by-the-file-system)
58+
59+
### Cause 1: Blob container allows public access
60+
61+
To comply with security requirements, the blob container anonymous access level must be set to private. If the blob container is set to public, the deployment fails.
62+
63+
### Solution: Set the blob container access level to private
64+
65+
Configure the blob container to allow private access only. You can disallow public access at the storage account level, or you can configure access at the container level. To learn more, see [About anonymous read access](/azure/storage/blobs/anonymous-read-access-configure#about-anonymous-read-access).
66+
67+
### Cause 2: Blob container can't be accessed by the file system
68+
69+
If the file system can't access the blob container, the deployment fails. You must add role assignments at the storage account scope or higher to allow the file system to access the container.
70+
71+
### Solution: Authorize access to the storage account
72+
73+
To authorize access to the storage account, add the following role assignments to the service principal **HPC Cache Resource Provider**:
74+
75+
- [Storage Account Contributor](/azure/role-based-access-control/built-in-roles#storage-account-contributor)
76+
- [Storage Blob Data Contributor](/azure/role-based-access-control/built-in-roles#storage-blob-data-contributor)
77+
78+
To learn more, see [Access role for blob integration](amlfs-prerequisites.md#access-roles-for-blob-integration).

azure-stack/operator/app-service-rotate-certificates.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Rotate App Service on Azure Stack Hub secrets and certificates
33
description: Learn how to rotate secrets and certificates used by Azure App Service on Azure Stack Hub.
44
author: sethmanheim
55
ms.topic: article
6-
ms.date: 08/19/2024
6+
ms.date: 10/31/2024
77
ms.author: sethm
88
ms.reviewer: anwestg
99
ms.lastreviewed: 04/09/2020
@@ -165,13 +165,16 @@ To rotate the system credentials used within Azure App Service on Azure Stack Hu
165165
166166
1. Go to the **Secrets** menu option.
167167
168-
1. Select the **Rotate** button in the System Credentials section.
168+
1. Select the **Rotate** button in the **System Credentials** section.
169169
170-
1. Select the **Scope** of the System Credential you're rotating. Operators can choose to rotate the system credentials for all roles or individual roles.
170+
> [!IMPORTANT]
171+
> If the scope you select is **All** or **Management Server**, the credential for the controllers is also updated with the specified new username and password.
172+
173+
1. Select the **Scope** of the system credential you're rotating. Operators can choose to rotate the system credentials for all roles, or for individual roles.
171174
172175
1. Specify a **new Local Admin User Name** and a new **Password**. Then confirm the **Password** and select **OK**.
173176
174-
1. The credential(s) are rotated as required throughout the corresponding Azure App Service on Azure Stack Hub role instance. Operators can check the status of the procedure using the **Status** button.
177+
1. The credentials are rotated as required throughout the corresponding Azure App Service on Azure Stack Hub role instance. Operators can check the status of the procedure using the **Status** button.
175178
176179
## Next steps
177180

azure-stack/operator/known-issues.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Azure Stack Hub known issues
33
description: Learn about known issues in Azure Stack Hub releases.
44
author: sethmanheim
55
ms.topic: article
6-
ms.date: 10/30/2024
6+
ms.date: 10/31/2024
77
ms.author: sethm
88
ms.reviewer: rtiberiu
99
ms.lastreviewed: 11/30/2023
@@ -72,7 +72,12 @@ To access known issues for a different version, use the version selector dropdow
7272
::: moniker-end
7373

7474
::: moniker range="azs-2408"
75-
<!-- ## Update -->
75+
## Update
76+
77+
- Applicable: This issue applies to release 2408.
78+
- Cause: An internal failure of Live Update forces the update method to use FRU instead, which significantly extends the overall update period. Due to this issue, each node update takes an additional 5 hours to complete (approximately).
79+
- Remediation: If you have more than 8 node stamps, it's advised that you delay your updates if possible, until a hotfix/inline fix is released.
80+
- Occurrence: Common.
7681

7782
<!-- ## Networking -->
7883

azure-stack/operator/release-notes.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Azure Stack Hub release notes
33
description: Release notes for Azure Stack Hub integrated systems, including updates and bug fixes.
44
author: sethmanheim
55
ms.topic: article
6-
ms.date: 10/30/2024
6+
ms.date: 10/31/2024
77
ms.author: sethm
88
ms.reviewer: rtiberiu
99
ms.lastreviewed: 04/22/2024
@@ -137,7 +137,11 @@ For more information about update build types, see [Manage updates in Azure Stac
137137

138138
### What's new
139139

140-
<!-- ### Improvements -->
140+
> [!IMPORTANT]
141+
> [See this known 2408 update issue](known-issues.md#update).
142+
143+
- With the 2408 update, we are introducing the ESv3 and DSv3 VM SKUs. These new SKUs are designed to provide higher IOPS for both OS and data disks. For more information, see [Azure Stack Hub VM SKUs](../user/azure-stack-vm-sizes.md).
144+
- We are also introducing [two new VM SKUs to support the L40s GPUs](../user/gpu-vms-about.md#nc_l40s-v4).
141145

142146
### Changes
143147

0 commit comments

Comments
 (0)