Skip to content

Commit 24b7c8d

Browse files
committed
file path lengths topic
1 parent 2556b75 commit 24b7c8d

File tree

9 files changed

+168
-4
lines changed

9 files changed

+168
-4
lines changed

articles/azure-netapp-files/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,8 @@
5353
href: understand-file-locks.md
5454
- name: Under volume languages
5555
href: understand-volume-languages.md
56+
- name: Understand file path lengths
57+
href: understand-path-lengths.md
5658
- name: Azure NetApp Files essentials
5759
items:
5860
- name: Storage hierarchy of Azure NetApp Files
8.09 KB
Loading
53.5 KB
Loading
27.9 KB
Loading
46.5 KB
Loading
61.9 KB
Loading
23.4 KB
Loading

articles/azure-netapp-files/understand-path-lengths.md

Lines changed: 160 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Understand path lengths in Azure NetApp Files
3-
description: Learn about the supported languages and character sets with NFS, SMB, and dual-protocol configurations in Azure NetApp Files.
3+
description: Learn how file path limits and lengths are calculated in Azure NetApp Files.
44
services: azure-netapp-files
55
author: b-ahibbard
66
ms.service: azure-netapp-files
@@ -43,4 +43,162 @@ mkdir: cannot create directory ‘256charsaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
4343

4444
The Linux utility [`uniutils`](https://billposer.org/Software/unidesc.html) can be used to find the byte size of Unicode characters by typing multiple instances of the character instance and viewing the “bytes” field.
4545

46-
The Latin capital A increments by 1 byte each time it is used (using a single hex value of 41, which is in the 0-255 range of ASCII characters).
46+
**Example 1:** The Latin capital A increments by 1 byte each time it is used (using a single hex value of 41, which is in the 0-255 range of ASCII characters).
47+
48+
```
49+
# printf %b 'AAA' | uniname
50+
character byte UTF-32 encoded as glyph name
51+
0 0 000041 41 A LATIN CAPITAL LETTER A
52+
1 1 000041 41 A LATIN CAPITAL LETTER A
53+
2 2 000041 41 A LATIN CAPITAL LETTER A
54+
55+
```
56+
57+
**Result 1:** The name AAA uses 3 bytes out of 255.
58+
59+
**Example 2:** The Japanese character 字 increments 3 bytes each instance. This can be also calculated by the 3 separate hex code values (E5 AD 97) under “encoded as.” Each hex value represents 1 byte:
60+
61+
```
62+
# printf %b '字字字' | uniname
63+
character byte UTF-32 encoded as glyph name
64+
0 0 005B57 E5 AD 97 字 CJK character Nelson 1281
65+
1 3 005B57 E5 AD 97 字 CJK character Nelson 1281
66+
2 6 005B57 E5 AD 97 字 CJK character Nelson 1281
67+
```
68+
69+
**Result 2:** A file named 字字字 uses 9 bytes out of 255.
70+
71+
**Example 3:** The letter Ä with diaeresis uses two bytes per instance (C3 + 84).
72+
73+
```
74+
# printf %b 'ÄÄÄ' | uniname
75+
character byte UTF-32 encoded as glyph name
76+
0 0 0000C4 C3 84 Ä LATIN CAPITAL LETTER A WITH DIAERESIS
77+
1 2 0000C4 C3 84 Ä LATIN CAPITAL LETTER A WITH DIAERESIS
78+
2 4 0000C4 C3 84 Ä LATIN CAPITAL LETTER A WITH DIAERESIS
79+
```
80+
81+
**Result 3:** A file named ÄÄÄ uses 6 bytes out of 255.
82+
83+
**Example 4:** A special character, such as the 😃 emoji, falls into an “undefined” range that exceeds the 0-3 bytes generally used for Unicode characters. As a result, it uses a surrogate pair for its character encoding. In this case, each instance of the character uses 4 bytes.
84+
85+
```
86+
# printf %b '😃😃😃' | uniname
87+
character byte UTF-32 encoded as glyph name
88+
0 0 01F603 F0 9F 98 83 😃 Character in undefined range
89+
1 4 01F603 F0 9F 98 83 😃 Character in undefined range
90+
2 8 01F603 F0 9F 98 83 😃 Character in undefined range
91+
```
92+
93+
**Result 4:** A file named 😃😃😃 uses 12 bytes out of 255.
94+
95+
Most emojis fall into the 4 byte range, while others can extend out to up to 7 bytes. Of the more than one thousand standard emojis, approximately 180 are in the [Basic Multilingual Plane (BMP)](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane), which means they can be displayed as text or emoji in Azure NetApp Files, depending on the client’s support for the language type.
96+
97+
For more detailed information on the BMP and other Unicode planes, see [Understand volume languages in Azure NetApp Files](understand-volume-languages.md).
98+
99+
## Character byte impact on path lengths
100+
101+
Although a path length is generally thought to be the number of characters in a file or folder name, it's is actually the _size_ of the supported bytes in the path. Since each character adds a byte size to a name, different character sets in different languages support different file name lengths.
102+
103+
Consider the following scenarios:
104+
105+
- **A file or folder repeats the Latin alphabet character “A” for its file name.** (for example, AAAAAAAA)
106+
107+
Since “A” uses 1 byte and 255 bytes is the path component size limit, then 255 instances of “A” would be allowed in a file name.
108+
109+
- **A file or folder repeats the Japanese character 字 in its name.**
110+
111+
Since “字” has a size of 3 bytes, the file name length limit would be 85 instances of 字 (3 byte * 85 = 255 bytes), or a total of 85 characters.
112+
113+
- **A file or folder repeats the grinning face emoji (😃) in its name.**
114+
115+
A grinning face emoji (😃) uses 4 bytes, which means that a file name with only that emoji would allow a total of 64 characters (255 bytes/4 bytes).
116+
117+
- A file or folder uses a combination of different characters (ie, Name字😃).
118+
119+
When different characters with different byte sizes are used in a file or folder name, then each character’s byte size factors in to the file or folder length. A file or folder name of Name字😃 would use 1+1+1+1+3+4 bytes (11 bytes) of that total 255 byte length.
120+
121+
#### Special emoji concepts
122+
123+
Special emojis, such as a flag emoji, fall into the BMP classification: the emoji renders as text or an image depending on client support. When a client doesn't support the image designation, it instead uses regional text-based designations.
124+
125+
For instance, the [United States flag](https://emojipedia.org/flag-united-states) use the characters "us" (which resemble the Latin characters U+S, but are actually special characters that use different encodings). Uniname shows the differences between the characters.
126+
127+
```
128+
# printf %b 'US' | uniname
129+
character byte UTF-32 encoded as glyph name
130+
0 0 000055 55 U LATIN CAPITAL LETTER U
131+
1 1 000053 53 S LATIN CAPITAL LETTER S
132+
133+
134+
# printf %b '🇺🇸' | uniname
135+
character byte UTF-32 encoded as glyph name
136+
0 0 01F1FA F0 9F 87 BA 🇺 Character in undefined range
137+
1 4 01F1F8 F0 9F 87 B8 🇸 Character in undefined range
138+
```
139+
140+
Characters designated for the flag emojis translate to flag images in supported systems, but remain as text values in unsupported systems. These characters use 4 bytes per character for a total of 8 bytes when a flag emoji is used. As such, a total of 31 flag emojis are allowed in a file name (255 bytes/8 bytes).
141+
142+
143+
## SMB path limits
144+
145+
By default, Windows servers and clients support path lengths up to 260 bytes, but the actual file path lengths are shorter due to extra metadata added to Windows paths, such as [the <NUL> value](/windows/win32/fileio/maximum-file-path-limitation?tabs=registry), domain information, etc.
146+
147+
When a path limit is exceeded in Windows, a dialog box appears:
148+
149+
:::image type="content" source="./media/understand-path-lengths/path-length-warning.png" alt-text="Screenshot of path length dialog warning.":::
150+
151+
152+
SMB path lengths can be extended when using Windows 10/Windows server 2016, version 1607 or later by changing a registry value as covered in [Maximum Path Length Limitation](/windows/win32/fileio/maximum-file-path-limitation?tabs=registry). When this value is changed, path lengths can extend out to up to 32,767 bytes (minus metadata values).
153+
154+
:::image type="content" source="./media/understand-path-lengths/path-group-policy-management.png" alt-text="Screenshot of Group Policy Management window.":::
155+
156+
:::image type="content" source="./media/understand-path-lengths/enable-long-paths.png" alt-text="Screenshot of window to enable long file paths.":::
157+
158+
Once this feature is enabled, the SMB share needs to be accessed using `\\?\` in the path to allow longer path lengths. This method doesn't support UNC paths, so the SMB share needs to be mapped to a drive letter.
159+
160+
:::image type="content" source="./media/understand-path-lengths/dialog-cannot-find.png" alt-text="Screenshot of dialog window with undiscoverable path.":::
161+
162+
Using `\\?\Z:` instead allows access and supports longer file paths.
163+
164+
:::image type="content" source="./media/understand-path-lengths/longer-path-name-directory.png" alt-text="Screenshot of a directory with a long name.":::
165+
166+
>[!NOTE]
167+
>The Windows CMD does not currently support the use of `\\?\`.
168+
169+
### Workaround if the max path length cannot be increased
170+
171+
If the max path length cannot be enabled in the Windows environment, or the Windows client versions are too low to allow it to be enabled, there is a workaround. You can mount the SMB share deeper into the directory structure can reduce the queried path length.
172+
173+
For example, rather than mapping `\\NAS-SHARE\AzureNetAppFiles` to `Z:`, map `\\NAS-SHARE\AzureNetAppFiles\folder1\folder2\folder3\folder4` to `Z:`.
174+
175+
## NFS path limits
176+
177+
NFS path limits with Azure NetApp Files volumes have the same 255 byte limit for individual path components. Each component, however, is evaluated one at a time and can process up to 4,096 bytes per request with a near limitless total path length. For instance, if each path component is 255 bytes, an NFS client can evaluate up to 15 components per request (including `/` characters). As such, a `cd` request to a path over the 4,096-byte limit yields a "File name too long" error message.
178+
179+
In most cases, Unicode characters are 1 byte or less, so the 4,096-byte limit generally corresponds to 4,096 characters. If a character is larger than 1 byte in size, then the path length is less than 4,096 characters. Characters with a size greater than 1 byte in size count more against the total character count than 1 byte characters.
180+
181+
The path length max can be queried using the `getconf PATH_MAX /NFSmountpoint` command.
182+
183+
>[!NOTE]
184+
>The limit is defined in the `limits.h` file on the NFS client.You should not adjust these limits.
185+
186+
## Dual-protocol volume considerations
187+
188+
When using Azure NetApp Files for dual protocol access (SMB and NFS on the same datasets), the difference in how path lengths are handled in those protocols can create incompatibilities across file and folders. For instance, Windows SMB supports up to 32,767 characters in a path (provided the long path feature is enabled on the SMB client), but NFS support can exceed that amount. As such, if a path length is created in NFS that exceeds the support of SMB, clients will not be able to access the data once the path length maximums have been reached. In those cases, either take care to consider the lower end limits of file path lengths across protocols when creating file and folder names (and folder path depth) or map SMB shares closer to the desired folder path to reduce the path length.
189+
190+
Instead of mapping the SMB share to the top level of the volume to navigate down to a path of `\\share\folder1\folder2\folder3\folder4`, consider mapping the SMB share to the entire path of `\\share\folder1\folder2\folder3\folder4`. As a result, a drive letter mapping to `Z:` lands in the desired folder and reduces the path length from `Z:\folder1\folder2\folder3\folder4\file` to `Z:\file`.
191+
192+
### Special character considerations
193+
194+
Azure NetApp Files volumes use a volume language type of [C.UTF-8](/cpp/build/reference/utf-8-set-source-and-executable-character-sets-to-utf-8?view=msvc-170), which covers many countries and languages including German, Cyrillic, Hebrew, most Chinese/Japanese/Korean (CJK), and others. Most common text characters in Unicode are 3 bytes or less. Special characters, such as emojis, musical symbols, and mathematical symbols are often larger than 3 bytes in size. Some may use [UTF-16 surrogate pair logic](/windows/win32/intl/surrogates-and-supplementary-characters).
195+
196+
If you use a character that Azure NetApp Files doesn't support, you might see a warning requesting a different file name.
197+
198+
:::image type="content" source="./media/understand-path-lengths/dialog-cannot-find.png" alt-text="Screenshot of an invalid file name warning.":::
199+
200+
Rather than the name being too long, the error actually results from the character byte size being too large for the for the Azure NetApp Files volume to use over SMB. There is no workaround in Azure NetApp Files for this limitation. For more information on special character handling in Azure NetApp Files, see [Protocol behavior with special character sets](understand-volume-languages.md#protocol-behaviors-with-special-character-sets).
201+
202+
## Next steps
203+
204+
* [Understand volume languages](understand-volume-languages.md)

articles/azure-netapp-files/understand-volume-languages.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ In an Azure NetApp Files file sharing environment, file and folder names are rep
2727
For instance, the Japanese character for data is 資. Since this character can't be represented in ASCII, a client using ASCII encoding show a “?” instead of 資.
2828

2929
<!-- needs link to new page-->
30-
[ASCII supports only 95 printable characters](https://en.wikipedia.org/wiki/ASCII#Printable_characters), principally those found in the English language. Each of those characters uses 1 byte, which is factored into the [total file path length]() on an Azure NetApp Files volume. This limits the internationalization of datasets, since file names may have a variety of characters not recognized by ASCII, from Japanese to Cyrillic to emoji. An international standard ([ISO/IEC 8859](https://en.wikipedia.org/wiki/ISO/IEC_8859)) further attempted to support more international characters, but also had its [limitations](). Most modern clients send and receive characters using some form of Unicode.
30+
[ASCII supports only 95 printable characters](https://en.wikipedia.org/wiki/ASCII#Printable_characters), principally those found in the English language. Each of those characters uses 1 byte, which is factored into the [total file path length](understand-path-lengths.md) on an Azure NetApp Files volume. This limits the internationalization of datasets, since file names may have a variety of characters not recognized by ASCII, from Japanese to Cyrillic to emoji. An international standard ([ISO/IEC 8859](https://en.wikipedia.org/wiki/ISO/IEC_8859)) further attempted to support more international characters, but also had its [limitations](). Most modern clients send and receive characters using some form of Unicode.
3131

3232
### Unicode
3333

@@ -604,4 +604,8 @@ When using special characters or characters outside of the standard [Basic Multi
604604
- Avoid using special characters outside of the BMP in file names, especially when using NFSv4.1 or dual-protocol volumes.
605605
- For character sets not in the BMP, UTF-8 encoding should allow display of the characters in Azure NetApp Files when using a single file protocol (SMB only or NFS only). However, dual-protocol volumes aren't able to accommodate these character sets in most cases.
606606
- Nonstandard encoding (such as Shift-JIS) isn't supported on Azure NetApp Files volumes.
607-
- Surrogate pair characters (such as emoji) are supported on Azure NetApp Files volumes.
607+
- Surrogate pair characters (such as emoji) are supported on Azure NetApp Files volumes.
608+
609+
## Next steps
610+
611+
* [Understand path lengths in Azure NetApp Files](understand-path-lengths.md)

0 commit comments

Comments
 (0)