|
| 1 | +--- |
| 2 | +title: Troubleshoot Azure HPC Cache NFS storage targets |
| 3 | +description: Tips to avoid and fix configuration errors and other problems that can cause failure when creating an NFS storage target |
| 4 | +author: ekpgh |
| 5 | +ms.service: hpc-cache |
| 6 | +ms.topic: conceptual |
| 7 | +ms.date: 02/20/2020 |
| 8 | +ms.author: rohogue |
| 9 | +--- |
| 10 | + |
| 11 | +# Troubleshoot NAS configuration and NFS storage target issues |
| 12 | + |
| 13 | +This article gives solutions for some common configuration errors and other issues that might prevent Azure HPC Cache from adding an NFS storage system as a storage target. |
| 14 | + |
| 15 | +If your problem is not included here, please [open a support ticket](hpc-cache-support-ticket.md) so that Microsoft Service and Support can investigate and solve the problem. |
| 16 | + |
| 17 | +Before using this guide, read and check the [prerequisites for NFS storage targets](hpc-cache-prereqs.md#nfs-storage-requirements). |
| 18 | + |
| 19 | +This article includes details about how to check ports and how to enable root access, plus information about other issues that might cause NFS storage target creation to fail. |
| 20 | + |
| 21 | +## Check port settings |
| 22 | + |
| 23 | +Azure HPC Cache needs read/write access to several UDP/TCP ports on the back-end storage system. Make sure these ports are accessible on the NAS system and that traffic is permitted through its firewalls. You might need to work with firewall and network administrators for your data center to verify this configuration. |
| 24 | + |
| 25 | +The ports are different for storage systems from different vendors, so check your system's requirements when setting up a storage target. |
| 26 | + |
| 27 | +In general, the cache needs access to these ports: |
| 28 | + |
| 29 | +| Protocol | Port | Service | |
| 30 | +|----------|-------|----------| |
| 31 | +| TCP/UDP | 111 | rpcbind | |
| 32 | +| TCP/UDP | 2049 | NFS | |
| 33 | +| TCP/UDP | 4045 | nlockmgr | |
| 34 | +| TCP/UDP | 4046 | mountd | |
| 35 | +| TCP/UDP | 4047 | status | |
| 36 | + |
| 37 | +To learn the specific ports needed for your system, use the following ``rpcinfo`` command. This command below lists the ports and formats the relevant results in a table. (Use your system's IP address in place of the *<storage_IP>* term.) |
| 38 | + |
| 39 | +You can issue this command from any Linux client that has NFS infrastructure installed. If you use a client inside the cluster subnet, it also can help verify connectivity between the subnet and the storage system. |
| 40 | + |
| 41 | +```bash |
| 42 | +rpcinfo -p <storage_IP> |egrep "100000\s+4\s+tcp|100005\s+3\s+tcp|100003\s+3\s+tcp|100024\s+1\s+tcp|100021\s+4\s+tcp"| awk '{print $4 "/" $3 " " $5}'|column -t |
| 43 | +``` |
| 44 | + |
| 45 | +Make sure that all of the ports returned by the ``rpcinfo`` query allow unrestricted traffic from the Azure HPC Cache's subnet. |
| 46 | + |
| 47 | +Check these settings both on the NAS itself as well as on any firewalls between the storage system and the cache subnet. |
| 48 | + |
| 49 | +## Check root access |
| 50 | + |
| 51 | +Azure HPC Cache needs access to your storage system's exports to create the storage target. Specifically, it mounts the exports as user ID 0. |
| 52 | + |
| 53 | +Different storage systems use different method to enable this access: |
| 54 | + |
| 55 | +* Linux servers generally add ``no_root_squash`` to the exported path in ``/etc/exports``. |
| 56 | +* NetApp and EMC systems typically control access with export rules that are tied to specific IP addresses or networks. |
| 57 | + |
| 58 | +If using export rules, remember that the cache can use multiple different IP addresses from the cache subnet. Allow access from the full range of possible subnet IP addresses. |
| 59 | + |
| 60 | +Work with your NAS storage vendor to enable the right level of access for the cache. |
| 61 | + |
| 62 | +### Allow root access on directory paths |
| 63 | +<!-- linked in prereqs article --> |
| 64 | + |
| 65 | +For NAS systems that export hierarchical directories, Azure HPC Cache needs root access to each export level. |
| 66 | + |
| 67 | +For example, a system might show three exports like these, where the ``/ifs/accounting/payroll`` export is a child of the export ``/ifs/accounting``: |
| 68 | + |
| 69 | +```bash |
| 70 | +/ifs |
| 71 | +/ifs/accounting |
| 72 | +/ifs/accounting/payroll |
| 73 | +``` |
| 74 | + |
| 75 | +If you add the ``payroll`` export as an HPC cache storage target, the cache actually mounts ``/ifs/`` and accesses the payroll directory from there. So Azure HPC Cache needs root access to ``/ifs`` in order to access the ``/ifs/accounting/payroll`` export. |
| 76 | + |
| 77 | +This requirement is related to the way the cache indexes files and avoids file collisions. A NAS system with hierarchical exports can give clients different file handles for the same file, based on which export was used. The storage system aliases the file handles internally, but Azure HPC Cache cannot tell which file handles in its index reference the same item. So it is possible that the cache can have different writes cached for the same file, and apply them incorrectly because it does not know that they are the same file. |
| 78 | + |
| 79 | +To avoid this possible file collision for files in multiple exports, Azure HPC Cache automatically mounts the shallowest available export in the path (``/ifs`` in the example) and uses the file handle given from that export. If multiple exports use the same base path, Azure HPC Cache needs root access to that path. |
| 80 | + |
| 81 | +## Enable export listing |
| 82 | +<!-- link in prereqs article --> |
| 83 | + |
| 84 | +The NAS must list its exports when the Azure HPC Cache queries it. |
| 85 | + |
| 86 | +On most NFS storage systems, you can test this by sending the following query from a Linux client: ``showmount -e <storage IP address>`` |
| 87 | + |
| 88 | +Use a Linux client from the same virtual network as your cache, if possible. |
| 89 | + |
| 90 | +If that command doesn't list the exports, the cache will have trouble connecting to your storage system. Work with your NAS vendor to enable export listing. |
| 91 | + |
| 92 | +## Adjust VPN packet size restrictions |
| 93 | +<!-- link in prereqs article --> |
| 94 | + |
| 95 | +If you have a VPN between the cache and your NAS device, the VPN might block full-sized 1500-byte Ethernet packets. You might have this problem if large exchanges between the NAS and the Azure HPC Cache instance do not complete, but smaller updates work as expected. |
| 96 | + |
| 97 | +There isn't a simple way to tell whether or not your system has this problem, but here are a few methods to diagnose it. |
| 98 | + |
| 99 | +* Use packet sniffers on both sides of the VPN to detect which packets transfer successfully. |
| 100 | +* If your VPN allows ping commands, you can test sending a full-sized packet. |
| 101 | + |
| 102 | + Run a ping command over the VPN to the NAS with these options. (Use your storage system's IP address in place of the *<storage_IP>* value.) |
| 103 | + |
| 104 | + ```bash |
| 105 | + ping -M do -s 1472 -c 1 <storage_IP> |
| 106 | + ``` |
| 107 | + |
| 108 | + These are the options in the command: |
| 109 | + |
| 110 | + * ``-M do`` - Do not fragment |
| 111 | + * ``-c 1`` - Send only one packet |
| 112 | + * ``-s 1472`` - Set the size of the payload to 1472 bytes. This is the maximum size payload for a 1500-byte packet after accounting for the Ethernet overhead. |
| 113 | + |
| 114 | + A successful response looks like this: |
| 115 | + |
| 116 | + ```bash |
| 117 | + PING 10.54.54.11 (10.54.54.11) 1472(1500) bytes of data. |
| 118 | + 1480 bytes from 10.54.54.11: icmp_seq=1 ttl=64 time=2.06 ms |
| 119 | + ``` |
| 120 | + |
| 121 | + If the ping fails with 1472 bytes, you might need to configure MSS clamping on the VPN to make the remote system properly detect the maximum frame size. Read the [VPN Gateway IPsec/IKE parameters documentation](../vpn-gateway/vpn-gateway-about-vpn-devices#ipsec) for more information. |
| 122 | + |
| 123 | +## Check for ACL security style |
| 124 | + |
| 125 | +Some NAS systems use a hybrid security style that combines access control lists (ACLs) with traditional POSIX or UNIX security. |
| 126 | + |
| 127 | +If your system reports its security style as UNIX or POSIX without including the acronym "ACL", this issue does not affect you. |
| 128 | + |
| 129 | +For systems that use ACLs, Azure HPC Cache needs to track additional user-specific values in order to control file access. This is done by enabling an access cache. There isn't a user-facing control to turn on the access cache, but you can open a support ticket to request that it be enabled for the affected storage targets on your cache system. |
| 130 | + |
| 131 | +## Next steps |
| 132 | + |
| 133 | +If you have a problem that was not addressed in this article, [open a support ticket](hpc-cache-support-ticket.md) to get expert help. |
0 commit comments