Skip to content

Commit 1ec3310

Browse files
ekpghjswoodward
authored andcommitted
hpc-cache: add NAS troubleshooting article
1 parent 82cccb0 commit 1ec3310

File tree

4 files changed

+150
-3
lines changed

4 files changed

+150
-3
lines changed

articles/hpc-cache/hpc-cache-prereqs.md

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Prerequisites for using Azure HPC Cache
44
author: ekpgh
55
ms.service: hpc-cache
66
ms.topic: conceptual
7-
ms.date: 02/12/2020
7+
ms.date: 02/20/2020
88
ms.author: rohogue
99
---
1010

@@ -90,7 +90,9 @@ If using an NFS storage system (for example, an on-premises hardware NAS system)
9090
> [!NOTE]
9191
> Storage target creation will fail if the cache has insufficient access to the NFS storage system.
9292
93-
* **Network connectivity:** The Azure HPC Cache needs high-bandwidth network access between the cache subnet and the NFS system's data center. [ExpressRoute](https://docs.microsoft.com/azure/expressroute/) or similar access is recommended. If using a VPN, you might need to configure it to clamp TCP MSS at 1350 to make sure large packets are not blocked.
93+
More information is included in [Troubleshoot NAS configuration and NFS storage target issues](troubleshoot-nas.md).
94+
95+
* **Network connectivity:** The Azure HPC Cache needs high-bandwidth network access between the cache subnet and the NFS system's data center. [ExpressRoute](https://docs.microsoft.com/azure/expressroute/) or similar access is recommended. If using a VPN, you might need to configure it to clamp TCP MSS at 1350 to make sure large packets are not blocked. Read [VPN packet size restrictions](troubleshoot-nas.md#adjust-vpn-packet-size-restrictions) for additional help troubleshooting VPN settings.
9496

9597
* **Port access:** The cache needs access to specific TCP/UDP ports on your storage system. Different types of storage have different port requirements.
9698

@@ -104,6 +106,8 @@ If using an NFS storage system (for example, an on-premises hardware NAS system)
104106
rpcinfo -p <storage_IP> |egrep "100000\s+4\s+tcp|100005\s+3\s+tcp|100003\s+3\s+tcp|100024\s+1\s+tcp|100021\s+4\s+tcp"| awk '{print $4 "/" $3 " " $5}'|column -t
105107
```
106108

109+
Make sure that all of the ports returned by the ``rpcinfo`` query allow unrestricted traffic from the Azure HPC Cache's subnet.
110+
107111
* In addition to the ports returned by the `rpcinfo` command, make sure that these commonly used ports allow inbound and outbound traffic:
108112
109113
| Protocol | Port | Service |
@@ -116,17 +120,21 @@ If using an NFS storage system (for example, an on-premises hardware NAS system)
116120
117121
* Check firewall settings to be sure that they allow traffic on all of these required ports. Be sure to check firewalls used in Azure as well as on-premises firewalls in your data center.
118122
119-
* **Directory access:** Enable the `showmount` command on the storage system. Azure HPC Cache uses this command to check that your storage target configuration points to a valid export, and also to make sure that multiple mounts don't access the same subdirectories (which risks file collisions).
123+
* **Directory access:** Enable the `showmount` command on the storage system. Azure HPC Cache uses this command to check that your storage target configuration points to a valid export, and also to make sure that multiple mounts don't access the same subdirectories (a risk for file collision).
120124

121125
> [!NOTE]
122126
> If your NFS storage system uses NetApp's ONTAP 9.2 operating system, **do not enable `showmount`**. [Contact Microsoft Service and Support](hpc-cache-support-ticket.md) for help.
123127
128+
Learn more about directory listing access in the NFS storage target [troubleshooting article](troubleshoot-nas.md#enable-export-listing).
129+
124130
* **Root access:** The cache connects to the back-end system as user ID 0. Check these settings on your storage system:
125131
126132
* Enable `no_root_squash`. This option ensures that the remote root user can access files owned by root.
127133
128134
* Check export policies to make sure they do not include restrictions on root access from the cache's subnet.
129135

136+
* If your storage has any exports that are subdirectories of another export, make sure the cache has root access to the lowest segment of the path. Read [Root access on directory paths](troubleshoot-nas.md#allow-root-access-on-directory-paths) in the NFS storage target troubleshooting article for details.
137+
130138
* NFS back-end storage must be a compatible hardware/software platform. Contact the Azure HPC Cache team for details.
131139

132140
## Next steps

articles/hpc-cache/index.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,11 @@ landingContent:
5959
url: hpc-cache-edit-storage.md
6060
- text: Work around firewall settings to create Blob storage targets
6161
url: hpc-cache-blob-firewall-fix.md
62+
- text: Troubleshoot NFS storage target creation
63+
url: troubleshoot-nas.md
6264
- linkListType: concept
6365
links:
6466
- text: Recover from a regional outage
6567
url: hpc-region-recovery.md
68+
- text: Use Azure NetApp Files storage targets
69+
url: hpc-cache-netapp.md

articles/hpc-cache/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,8 @@
4646
href: hpc-cache-support-ticket.md
4747
- name: Work around Blob storage account firewall settings
4848
href: hpc-cache-blob-firewall-fix.md
49+
- name: Troubleshoot NFS storage target creation
50+
href: troubleshoot-nas.md
4951
- name: Recover from a regional outage
5052
href: hpc-region-recovery.md
5153
- name: Use Azure NetApp Files with Azure HPC Cache
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
---
2+
title: Troubleshoot Azure HPC Cache NFS storage targets
3+
description: Tips to avoid and fix configuration errors and other problems that can cause failure when creating an NFS storage target
4+
author: ekpgh
5+
ms.service: hpc-cache
6+
ms.topic: conceptual
7+
ms.date: 02/20/2020
8+
ms.author: rohogue
9+
---
10+
11+
# Troubleshoot NAS configuration and NFS storage target issues
12+
13+
This article gives solutions for some common configuration errors and other issues that might prevent Azure HPC Cache from adding an NFS storage system as a storage target.
14+
15+
If your problem is not included here, please [open a support ticket](hpc-cache-support-ticket.md) so that Microsoft Service and Support can investigate and solve the problem.
16+
17+
Before using this guide, read and check the [prerequisites for NFS storage targets](hpc-cache-prereqs.md#nfs-storage-requirements).
18+
19+
This article includes details about how to check ports and how to enable root access, plus information about other issues that might cause NFS storage target creation to fail.
20+
21+
## Check port settings
22+
23+
Azure HPC Cache needs read/write access to several UDP/TCP ports on the back-end storage system. Make sure these ports are accessible on the NAS system and that traffic is permitted through its firewalls. You might need to work with firewall and network administrators for your data center to verify this configuration.
24+
25+
The ports are different for storage systems from different vendors, so check your system's requirements when setting up a storage target.
26+
27+
In general, the cache needs access to these ports:
28+
29+
| Protocol | Port | Service |
30+
|----------|-------|----------|
31+
| TCP/UDP | 111 | rpcbind |
32+
| TCP/UDP | 2049 | NFS |
33+
| TCP/UDP | 4045 | nlockmgr |
34+
| TCP/UDP | 4046 | mountd |
35+
| TCP/UDP | 4047 | status |
36+
37+
To learn the specific ports needed for your system, use the following ``rpcinfo`` command. This command below lists the ports and formats the relevant results in a table. (Use your system's IP address in place of the *<storage_IP>* term.)
38+
39+
You can issue this command from any Linux client that has NFS infrastructure installed. If you use a client inside the cluster subnet, it also can help verify connectivity between the subnet and the storage system.
40+
41+
```bash
42+
rpcinfo -p <storage_IP> |egrep "100000\s+4\s+tcp|100005\s+3\s+tcp|100003\s+3\s+tcp|100024\s+1\s+tcp|100021\s+4\s+tcp"| awk '{print $4 "/" $3 " " $5}'|column -t
43+
```
44+
45+
Make sure that all of the ports returned by the ``rpcinfo`` query allow unrestricted traffic from the Azure HPC Cache's subnet.
46+
47+
Check these settings both on the NAS itself as well as on any firewalls between the storage system and the cache subnet.
48+
49+
## Check root access
50+
51+
Azure HPC Cache needs access to your storage system's exports to create the storage target. Specifically, it mounts the exports as user ID 0.
52+
53+
Different storage systems use different method to enable this access:
54+
55+
* Linux servers generally add ``no_root_squash`` to the exported path in ``/etc/exports``.
56+
* NetApp and EMC systems typically control access with export rules that are tied to specific IP addresses or networks.
57+
58+
If using export rules, remember that the cache can use multiple different IP addresses from the cache subnet. Allow access from the full range of possible subnet IP addresses.
59+
60+
Work with your NAS storage vendor to enable the right level of access for the cache.
61+
62+
### Allow root access on directory paths
63+
<!-- linked in prereqs article -->
64+
65+
For NAS systems that export hierarchical directories, Azure HPC Cache needs root access to each export level.
66+
67+
For example, a system might show three exports like these, where the ``/ifs/accounting/payroll`` export is a child of the export ``/ifs/accounting``:
68+
69+
```bash
70+
/ifs
71+
/ifs/accounting
72+
/ifs/accounting/payroll
73+
```
74+
75+
If you add the ``payroll`` export as an HPC cache storage target, the cache actually mounts ``/ifs/`` and accesses the payroll directory from there. So Azure HPC Cache needs root access to ``/ifs`` in order to access the ``/ifs/accounting/payroll`` export.
76+
77+
This requirement is related to the way the cache indexes files and avoids file collisions. A NAS system with hierarchical exports can give clients different file handles for the same file, based on which export was used. The storage system aliases the file handles internally, but Azure HPC Cache cannot tell which file handles in its index reference the same item. So it is possible that the cache can have different writes cached for the same file, and apply them incorrectly because it does not know that they are the same file.
78+
79+
To avoid this possible file collision for files in multiple exports, Azure HPC Cache automatically mounts the shallowest available export in the path (``/ifs`` in the example) and uses the file handle given from that export. If multiple exports use the same base path, Azure HPC Cache needs root access to that path.
80+
81+
## Enable export listing
82+
<!-- link in prereqs article -->
83+
84+
The NAS must list its exports when the Azure HPC Cache queries it.
85+
86+
On most NFS storage systems, you can test this by sending the following query from a Linux client: ``showmount -e <storage IP address>``
87+
88+
Use a Linux client from the same virtual network as your cache, if possible.
89+
90+
If that command doesn't list the exports, the cache will have trouble connecting to your storage system. Work with your NAS vendor to enable export listing.
91+
92+
## Adjust VPN packet size restrictions
93+
<!-- link in prereqs article -->
94+
95+
If you have a VPN between the cache and your NAS device, the VPN might block full-sized 1500-byte Ethernet packets. You might have this problem if large exchanges between the NAS and the Azure HPC Cache instance do not complete, but smaller updates work as expected.
96+
97+
There isn't a simple way to tell whether or not your system has this problem, but here are a few methods to diagnose it.
98+
99+
* Use packet sniffers on both sides of the VPN to detect which packets transfer successfully.
100+
* If your VPN allows ping commands, you can test sending a full-sized packet.
101+
102+
Run a ping command over the VPN to the NAS with these options. (Use your storage system's IP address in place of the *<storage_IP>* value.)
103+
104+
```bash
105+
ping -M do -s 1472 -c 1 <storage_IP>
106+
```
107+
108+
These are the options in the command:
109+
110+
* ``-M do`` - Do not fragment
111+
* ``-c 1`` - Send only one packet
112+
* ``-s 1472`` - Set the size of the payload to 1472 bytes. This is the maximum size payload for a 1500-byte packet after accounting for the Ethernet overhead.
113+
114+
A successful response looks like this:
115+
116+
```bash
117+
PING 10.54.54.11 (10.54.54.11) 1472(1500) bytes of data.
118+
1480 bytes from 10.54.54.11: icmp_seq=1 ttl=64 time=2.06 ms
119+
```
120+
121+
If the ping fails with 1472 bytes, you might need to configure MSS clamping on the VPN to make the remote system properly detect the maximum frame size. Read the [VPN Gateway IPsec/IKE parameters documentation](../vpn-gateway/vpn-gateway-about-vpn-devices#ipsec) for more information.
122+
123+
## Check for ACL security style
124+
125+
Some NAS systems use a hybrid security style that combines access control lists (ACLs) with traditional POSIX or UNIX security.
126+
127+
If your system reports its security style as UNIX or POSIX without including the acronym "ACL", this issue does not affect you.
128+
129+
For systems that use ACLs, Azure HPC Cache needs to track additional user-specific values in order to control file access. This is done by enabling an access cache. There isn't a user-facing control to turn on the access cache, but you can open a support ticket to request that it be enabled for the affected storage targets on your cache system.
130+
131+
## Next steps
132+
133+
If you have a problem that was not addressed in this article, [open a support ticket](hpc-cache-support-ticket.md) to get expert help.

0 commit comments

Comments
 (0)