|
1 | 1 | ---
|
2 | 2 | title: Understand path lengths in Azure NetApp Files
|
3 |
| -description: Learn about the supported languages and character sets with NFS, SMB, and dual-protocol configurations in Azure NetApp Files. |
| 3 | +description: Learn how file path limits and lengths are calculated in Azure NetApp Files. |
4 | 4 | services: azure-netapp-files
|
5 | 5 | author: b-ahibbard
|
6 | 6 | ms.service: azure-netapp-files
|
@@ -43,4 +43,162 @@ mkdir: cannot create directory ‘256charsaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
|
43 | 43 |
|
44 | 44 | The Linux utility [`uniutils`](https://billposer.org/Software/unidesc.html) can be used to find the byte size of Unicode characters by typing multiple instances of the character instance and viewing the “bytes” field.
|
45 | 45 |
|
46 |
| -The Latin capital A increments by 1 byte each time it is used (using a single hex value of 41, which is in the 0-255 range of ASCII characters). |
| 46 | +**Example 1:** The Latin capital A increments by 1 byte each time it is used (using a single hex value of 41, which is in the 0-255 range of ASCII characters). |
| 47 | + |
| 48 | +``` |
| 49 | +# printf %b 'AAA' | uniname |
| 50 | +character byte UTF-32 encoded as glyph name |
| 51 | + 0 0 000041 41 A LATIN CAPITAL LETTER A |
| 52 | + 1 1 000041 41 A LATIN CAPITAL LETTER A |
| 53 | + 2 2 000041 41 A LATIN CAPITAL LETTER A |
| 54 | +
|
| 55 | +``` |
| 56 | + |
| 57 | +**Result 1:** The name AAA uses 3 bytes out of 255. |
| 58 | + |
| 59 | +**Example 2:** The Japanese character 字 increments 3 bytes each instance. This can be also calculated by the 3 separate hex code values (E5 AD 97) under “encoded as.” Each hex value represents 1 byte: |
| 60 | + |
| 61 | +``` |
| 62 | +# printf %b '字字字' | uniname |
| 63 | +character byte UTF-32 encoded as glyph name |
| 64 | + 0 0 005B57 E5 AD 97 字 CJK character Nelson 1281 |
| 65 | + 1 3 005B57 E5 AD 97 字 CJK character Nelson 1281 |
| 66 | + 2 6 005B57 E5 AD 97 字 CJK character Nelson 1281 |
| 67 | +``` |
| 68 | + |
| 69 | +**Result 2:** A file named 字字字 uses 9 bytes out of 255. |
| 70 | + |
| 71 | +**Example 3:** The letter Ä with diaeresis uses two bytes per instance (C3 + 84). |
| 72 | + |
| 73 | +``` |
| 74 | +# printf %b 'ÄÄÄ' | uniname |
| 75 | +character byte UTF-32 encoded as glyph name |
| 76 | + 0 0 0000C4 C3 84 Ä LATIN CAPITAL LETTER A WITH DIAERESIS |
| 77 | + 1 2 0000C4 C3 84 Ä LATIN CAPITAL LETTER A WITH DIAERESIS |
| 78 | + 2 4 0000C4 C3 84 Ä LATIN CAPITAL LETTER A WITH DIAERESIS |
| 79 | +``` |
| 80 | + |
| 81 | +**Result 3:** A file named ÄÄÄ uses 6 bytes out of 255. |
| 82 | + |
| 83 | +**Example 4:** A special character, such as the 😃 emoji, falls into an “undefined” range that exceeds the 0-3 bytes generally used for Unicode characters. As a result, it uses a surrogate pair for its character encoding. In this case, each instance of the character uses 4 bytes. |
| 84 | + |
| 85 | +``` |
| 86 | +# printf %b '😃😃😃' | uniname |
| 87 | +character byte UTF-32 encoded as glyph name |
| 88 | + 0 0 01F603 F0 9F 98 83 😃 Character in undefined range |
| 89 | + 1 4 01F603 F0 9F 98 83 😃 Character in undefined range |
| 90 | + 2 8 01F603 F0 9F 98 83 😃 Character in undefined range |
| 91 | +``` |
| 92 | + |
| 93 | +**Result 4:** A file named 😃😃😃 uses 12 bytes out of 255. |
| 94 | + |
| 95 | +Most emojis fall into the 4 byte range, while others can extend out to up to 7 bytes. Of the more than one thousand standard emojis, approximately 180 are in the [Basic Multilingual Plane (BMP)](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane), which means they can be displayed as text or emoji in Azure NetApp Files, depending on the client’s support for the language type. |
| 96 | + |
| 97 | +For more detailed information on the BMP and other Unicode planes, see [Understand volume languages in Azure NetApp Files](understand-volume-languages.md). |
| 98 | + |
| 99 | +## Character byte impact on path lengths |
| 100 | + |
| 101 | +Although a path length is generally thought to be the number of characters in a file or folder name, it's is actually the _size_ of the supported bytes in the path. Since each character adds a byte size to a name, different character sets in different languages support different file name lengths. |
| 102 | + |
| 103 | +Consider the following scenarios: |
| 104 | + |
| 105 | +- **A file or folder repeats the Latin alphabet character “A” for its file name.** (for example, AAAAAAAA) |
| 106 | + |
| 107 | + Since “A” uses 1 byte and 255 bytes is the path component size limit, then 255 instances of “A” would be allowed in a file name. |
| 108 | + |
| 109 | +- **A file or folder repeats the Japanese character 字 in its name.** |
| 110 | + |
| 111 | + Since “字” has a size of 3 bytes, the file name length limit would be 85 instances of 字 (3 byte * 85 = 255 bytes), or a total of 85 characters. |
| 112 | + |
| 113 | +- **A file or folder repeats the grinning face emoji (😃) in its name.** |
| 114 | + |
| 115 | +A grinning face emoji (😃) uses 4 bytes, which means that a file name with only that emoji would allow a total of 64 characters (255 bytes/4 bytes). |
| 116 | + |
| 117 | +- A file or folder uses a combination of different characters (ie, Name字😃). |
| 118 | + |
| 119 | +When different characters with different byte sizes are used in a file or folder name, then each character’s byte size factors in to the file or folder length. A file or folder name of Name字😃 would use 1+1+1+1+3+4 bytes (11 bytes) of that total 255 byte length. |
| 120 | + |
| 121 | +#### Special emoji concepts |
| 122 | + |
| 123 | +Special emojis, such as a flag emoji, fall into the BMP classification: the emoji renders as text or an image depending on client support. When a client doesn't support the image designation, it instead uses regional text-based designations. |
| 124 | + |
| 125 | +For instance, the [United States flag](https://emojipedia.org/flag-united-states) use the characters "us" (which resemble the Latin characters U+S, but are actually special characters that use different encodings). Uniname shows the differences between the characters. |
| 126 | + |
| 127 | +``` |
| 128 | +# printf %b 'US' | uniname |
| 129 | +character byte UTF-32 encoded as glyph name |
| 130 | + 0 0 000055 55 U LATIN CAPITAL LETTER U |
| 131 | + 1 1 000053 53 S LATIN CAPITAL LETTER S |
| 132 | +
|
| 133 | +
|
| 134 | +# printf %b '🇺🇸' | uniname |
| 135 | +character byte UTF-32 encoded as glyph name |
| 136 | + 0 0 01F1FA F0 9F 87 BA 🇺 Character in undefined range |
| 137 | + 1 4 01F1F8 F0 9F 87 B8 🇸 Character in undefined range |
| 138 | +``` |
| 139 | + |
| 140 | +Characters designated for the flag emojis translate to flag images in supported systems, but remain as text values in unsupported systems. These characters use 4 bytes per character for a total of 8 bytes when a flag emoji is used. As such, a total of 31 flag emojis are allowed in a file name (255 bytes/8 bytes). |
| 141 | + |
| 142 | + |
| 143 | +## SMB path limits |
| 144 | + |
| 145 | +By default, Windows servers and clients support path lengths up to 260 bytes, but the actual file path lengths are shorter due to extra metadata added to Windows paths, such as [the <NUL> value](/windows/win32/fileio/maximum-file-path-limitation?tabs=registry), domain information, etc. |
| 146 | + |
| 147 | +When a path limit is exceeded in Windows, a dialog box appears: |
| 148 | + |
| 149 | +:::image type="content" source="./media/understand-path-lengths/path-length-warning.png" alt-text="Screenshot of path length dialog warning."::: |
| 150 | + |
| 151 | + |
| 152 | +SMB path lengths can be extended when using Windows 10/Windows server 2016, version 1607 or later by changing a registry value as covered in [Maximum Path Length Limitation](/windows/win32/fileio/maximum-file-path-limitation?tabs=registry). When this value is changed, path lengths can extend out to up to 32,767 bytes (minus metadata values). |
| 153 | + |
| 154 | +:::image type="content" source="./media/understand-path-lengths/path-group-policy-management.png" alt-text="Screenshot of Group Policy Management window."::: |
| 155 | + |
| 156 | +:::image type="content" source="./media/understand-path-lengths/enable-long-paths.png" alt-text="Screenshot of window to enable long file paths."::: |
| 157 | + |
| 158 | +Once this feature is enabled, the SMB share needs to be accessed using `\\?\` in the path to allow longer path lengths. This method doesn't support UNC paths, so the SMB share needs to be mapped to a drive letter. |
| 159 | + |
| 160 | +:::image type="content" source="./media/understand-path-lengths/dialog-cannot-find.png" alt-text="Screenshot of dialog window with undiscoverable path."::: |
| 161 | + |
| 162 | +Using `\\?\Z:` instead allows access and supports longer file paths. |
| 163 | + |
| 164 | +:::image type="content" source="./media/understand-path-lengths/longer-path-name-directory.png" alt-text="Screenshot of a directory with a long name."::: |
| 165 | + |
| 166 | +>[!NOTE] |
| 167 | +>The Windows CMD does not currently support the use of `\\?\`. |
| 168 | +
|
| 169 | +### Workaround if the max path length cannot be increased |
| 170 | + |
| 171 | +If the max path length cannot be enabled in the Windows environment, or the Windows client versions are too low to allow it to be enabled, there is a workaround. You can mount the SMB share deeper into the directory structure can reduce the queried path length. |
| 172 | + |
| 173 | +For example, rather than mapping `\\NAS-SHARE\AzureNetAppFiles` to `Z:`, map `\\NAS-SHARE\AzureNetAppFiles\folder1\folder2\folder3\folder4` to `Z:`. |
| 174 | + |
| 175 | +## NFS path limits |
| 176 | + |
| 177 | +NFS path limits with Azure NetApp Files volumes have the same 255 byte limit for individual path components. Each component, however, is evaluated one at a time and can process up to 4,096 bytes per request with a near limitless total path length. For instance, if each path component is 255 bytes, an NFS client can evaluate up to 15 components per request (including `/` characters). As such, a `cd` request to a path over the 4,096-byte limit yields a "File name too long" error message. |
| 178 | + |
| 179 | +In most cases, Unicode characters are 1 byte or less, so the 4,096-byte limit generally corresponds to 4,096 characters. If a character is larger than 1 byte in size, then the path length is less than 4,096 characters. Characters with a size greater than 1 byte in size count more against the total character count than 1 byte characters. |
| 180 | + |
| 181 | +The path length max can be queried using the `getconf PATH_MAX /NFSmountpoint` command. |
| 182 | + |
| 183 | +>[!NOTE] |
| 184 | +>The limit is defined in the `limits.h` file on the NFS client.You should not adjust these limits. |
| 185 | +
|
| 186 | +## Dual-protocol volume considerations |
| 187 | + |
| 188 | +When using Azure NetApp Files for dual protocol access (SMB and NFS on the same datasets), the difference in how path lengths are handled in those protocols can create incompatibilities across file and folders. For instance, Windows SMB supports up to 32,767 characters in a path (provided the long path feature is enabled on the SMB client), but NFS support can exceed that amount. As such, if a path length is created in NFS that exceeds the support of SMB, clients will not be able to access the data once the path length maximums have been reached. In those cases, either take care to consider the lower end limits of file path lengths across protocols when creating file and folder names (and folder path depth) or map SMB shares closer to the desired folder path to reduce the path length. |
| 189 | + |
| 190 | +Instead of mapping the SMB share to the top level of the volume to navigate down to a path of `\\share\folder1\folder2\folder3\folder4`, consider mapping the SMB share to the entire path of `\\share\folder1\folder2\folder3\folder4`. As a result, a drive letter mapping to `Z:` lands in the desired folder and reduces the path length from `Z:\folder1\folder2\folder3\folder4\file` to `Z:\file`. |
| 191 | + |
| 192 | +### Special character considerations |
| 193 | + |
| 194 | +Azure NetApp Files volumes use a volume language type of [C.UTF-8](/cpp/build/reference/utf-8-set-source-and-executable-character-sets-to-utf-8?view=msvc-170), which covers many countries and languages including German, Cyrillic, Hebrew, most Chinese/Japanese/Korean (CJK), and others. Most common text characters in Unicode are 3 bytes or less. Special characters, such as emojis, musical symbols, and mathematical symbols are often larger than 3 bytes in size. Some may use [UTF-16 surrogate pair logic](/windows/win32/intl/surrogates-and-supplementary-characters). |
| 195 | + |
| 196 | +If you use a character that Azure NetApp Files doesn't support, you might see a warning requesting a different file name. |
| 197 | + |
| 198 | +:::image type="content" source="./media/understand-path-lengths/dialog-cannot-find.png" alt-text="Screenshot of an invalid file name warning."::: |
| 199 | + |
| 200 | +Rather than the name being too long, the error actually results from the character byte size being too large for the for the Azure NetApp Files volume to use over SMB. There is no workaround in Azure NetApp Files for this limitation. For more information on special character handling in Azure NetApp Files, see [Protocol behavior with special character sets](understand-volume-languages.md#protocol-behaviors-with-special-character-sets). |
| 201 | + |
| 202 | +## Next steps |
| 203 | + |
| 204 | +* [Understand volume languages](understand-volume-languages.md) |
0 commit comments