Skip to content

Commit 3652104

Browse files
authored
Merge pull request #293397 from khdownie/kendownie012225
improve enumeration latencies
2 parents f5ea023 + 5235477 commit 3652104

File tree

2 files changed

+65
-14
lines changed

2 files changed

+65
-14
lines changed

articles/storage/files/TOC.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -251,7 +251,7 @@
251251
href: smb-performance.md
252252
- name: NFS performance
253253
href: nfs-performance.md
254-
- name: NFS large directory best practices
254+
- name: Large directory best practices
255255
href: nfs-large-directories.md
256256
- name: Scalability and performance targets
257257
href: storage-files-scale-targets.md

articles/storage/files/nfs-large-directories.md

Lines changed: 64 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,33 @@
11
---
2-
title: Work with large directories in NFS Azure file shares
3-
description: Learn recommendations for working with large directories in NFS Azure file shares mounted on Linux clients, including mount options, commands, and operations.
2+
title: Work with large directories in Azure file shares
3+
description: Learn recommendations for working with large directories in Azure file shares mounted on Linux clients, including mount options, commands, and operations.
44
author: khdownie
55
ms.service: azure-file-storage
66
ms.custom: linux-related-content
77
ms.topic: conceptual
8-
ms.date: 05/09/2024
8+
ms.date: 01/22/2025
99
ms.author: kendownie
1010
---
1111

12-
# Recommendations for working with large directories in NFS Azure file shares
12+
# Optimize file share performance when accessing large directories from Linux clients
1313

14-
This article provides recommendations for working with NFS directories that contain large numbers of files. It's usually a good practice to reduce the number of files in a single directory by spreading the files over multiple directories. However, there are situations in which large directories can't be avoided. Consider the following suggestions when working with large directories on NFS Azure file shares that are mounted on Linux clients.
14+
This article provides recommendations for working with directories that contain large numbers of files. It's usually a good practice to reduce the number of files in a single directory by spreading the files over multiple directories. However, there are situations in which large directories can't be avoided. Consider the following suggestions when working with large directories on Azure file shares that are mounted on Linux clients.
1515

1616
## Applies to
1717

1818
| File share type | SMB | NFS |
1919
|-|:-:|:-:|
20-
| Standard file shares (GPv2), LRS/ZRS | ![No, this article doesn't apply to standard SMB Azure file shares LRS/ZRS.](../media/icons/no-icon.png) | ![NFS shares are only available in premium Azure file shares.](../media/icons/no-icon.png) |
21-
| Standard file shares (GPv2), GRS/GZRS | ![No, this article doesn't apply to standard SMB Azure file shares GRS/GZRS.](../media/icons/no-icon.png) | ![NFS is only available in premium Azure file shares.](../media/icons/no-icon.png) |
22-
| Premium file shares (FileStorage), LRS/ZRS | ![No, this article doesn't apply to premium SMB Azure file shares.](../media/icons/no-icon.png) | ![Yes, this article applies to premium NFS Azure file shares.](../media/icons/yes-icon.png) |
20+
| Standard file shares (GPv2), LRS/ZRS | ![Yes, this article applies to standard SMB Azure file shares LRS/ZRS.](../media/icons/yes-icon.png) | ![NFS shares are only available in premium Azure file shares.](../media/icons/no-icon.png) |
21+
| Standard file shares (GPv2), GRS/GZRS | ![Yes, this article applies to standard SMB Azure file shares GRS/GZRS.](../media/icons/yes-icon.png) | ![NFS is only available in premium Azure file shares.](../media/icons/no-icon.png) |
22+
| Premium file shares (FileStorage), LRS/ZRS | ![Yes, this article applies to premium SMB Azure file shares.](../media/icons/yes-icon.png) | ![Yes, this article applies to premium NFS Azure file shares.](../media/icons/yes-icon.png) |
2323

2424
## Recommended mount options
2525

2626
The following mount options are specific to enumeration and can reduce latency when working with large directories.
2727

2828
### actimeo
2929

30-
Specifying `actimeo` sets all of `acregmin`, `acregmax`, `acdirmin`, and `acdirmax` to the same value. If `actimeo` isn't specified, the NFS client uses the defaults for each of these options.
30+
Specifying `actimeo` sets all of `acregmin`, `acregmax`, `acdirmin`, and `acdirmax` to the same value. If `actimeo` isn't specified, the client uses the defaults for each of these options.
3131

3232
We recommend setting `actimeo` between 30 and 60 seconds when working with large directories. Setting a value in this range makes the attributes remain valid for a longer time period in the client's attribute cache, allowing operations to get file attributes from the cache instead of fetching them over the wire. This can reduce latency in situations where the cached attributes expire while the operation is still running.
3333

@@ -68,17 +68,68 @@ The following chart compares the time it takes to output results using unaliased
6868

6969
:::image type="content" source="media/nfs-large-directories/sorted-versus-unsorted-ls.png" alt-text="Graph comparing the total time in seconds to complete a sorted ls operation versus unsorted." border="false":::
7070

71+
### Increase the number of hash buckets
72+
73+
The total amount of RAM present on the system doing the enumeration influences the internal working of filesystem protocols like NFS and SMB. Even if users aren't experiencing high memory usage, the amount of memory available influences the amount of hash buckets the system has, which impacts/improves enumeration performance for large directories. You can modify the amount of hash buckets the system has to reduce the hash collisions that can occur during large enumeration workloads.
74+
75+
To do this, you'll need to modify your boot configuration settings by providing an additional kernel command that takes effect during boot to increase the number of hash buckets. Follow these steps.
76+
77+
1. Using a text editor, edit the `/etc/default/grub` file.
78+
79+
```bash
80+
sudo vim /etc/default/grub
81+
```
82+
83+
2. Add the following text to the `/etc/default/grub` file. This command will set apart 128MB as the hash table size, increasing system memory consumption by a maximum of 128MB.
84+
85+
```bash
86+
GRUB_CMDLINE_LINUX="ihash_entries=16777216"
87+
```
88+
89+
If `GRUB_CMDLINE_LINUX` already exists, add `ihash_entries=16777216` separated by a space, like this:
90+
91+
```bash
92+
GRUB_CMDLINE_LINUX="<previous commands> ihash_entries=16777216"
93+
```
94+
95+
3. To apply the changes, run:
96+
97+
```bash
98+
sudo update-grub2
99+
```
100+
101+
4. Restart the system:
102+
103+
```bash
104+
sudo reboot
105+
```
106+
107+
5. To verify that the changes have taken effect, once the system reboots, check the kernel cmdline commands:
108+
109+
```bash
110+
cat /proc/cmdline
111+
```
112+
113+
If `ihash_entries` is visible, the system has applied the setting, and enumeration performance should improve exponentially.
114+
115+
You can also check the dmesg output to see if the kernel cmdline was applied:
116+
117+
```bash
118+
dmesg | grep "Inode-cache hash table"
119+
Inode-cache hash table entries: 16777216 (order: 15, 134217728 bytes, linear)
120+
```
121+
71122
## File copy and backup operations
72123

73-
When copying data from an NFS file share or backing up from NFS file shares to another location, for optimal performance we recommend using a share snapshot as the source instead of the live file share with active I/O. Backup applications should run commands on the snapshot directly. For more information, see [NFS file share snapshots](storage-files-how-to-mount-nfs-shares.md#nfs-file-share-snapshots).
124+
When copying data from a file share or backing up from file shares to another location, for optimal performance we recommend using a share snapshot as the source instead of the live file share with active I/O. Backup applications should run commands on the snapshot directly. For more information, see [Use share snapshots with Azure Files](storage-snapshots-files.md).
74125

75126
## Application-level recommendations
76127

77-
When developing applications that use large directories with NFS file shares, follow these recommendations.
128+
When developing applications that use large directories, follow these recommendations.
78129

79130
- **Skip file attributes.** If the application only needs the file name and not file attributes like file type or last modified time, you can use multiple calls to system calls such as `getdents64` with a good buffer size. This will get the entries in the specified directory without the file type, making the operation faster by avoiding extra operations that aren't needed.
80131

81-
- **Interleave stat calls.** If the application needs attributes and the file name, we recommend interleaving the stat calls along with `getdents64` instead of getting all entries until end of file with `getdents64` and then doing a statx on all entries returned. Interleaving the stat calls instructs the NFS client to request both the file and its attributes at once, reducing the number of calls to the server. When combined with a high `actimeo` value, this can significantly improve performance. For example, instead of `[ getdents64, getdents64, ... , getdents64, statx (entry1), ... , statx(n) ]`, place the statx calls after each `getdents64` like this: `[ getdents64, (statx, statx, ... , statx), getdents64, (statx, statx, ... , statx), ... ]`.
132+
- **Interleave stat calls.** If the application needs attributes and the file name, we recommend interleaving the stat calls along with `getdents64` instead of getting all entries until end of file with `getdents64` and then doing a statx on all entries returned. Interleaving the stat calls instructs the client to request both the file and its attributes at once, reducing the number of calls to the server. When combined with a high `actimeo` value, this can significantly improve performance. For example, instead of `[ getdents64, getdents64, ... , getdents64, statx (entry1), ... , statx(n) ]`, place the statx calls after each `getdents64` like this: `[ getdents64, (statx, statx, ... , statx), getdents64, (statx, statx, ... , statx), ... ]`.
82133

83134
- **Increase I/O depth.** If possible, we suggest configuring `nconnect` to a non-zero value (greater than 1) and distributing the operation among multiple threads, or using asynchronous I/O. This will enable operations that can be asynchronous to benefit from multiple concurrent connections to the file share.
84135

@@ -87,4 +138,4 @@ When developing applications that use large directories with NFS file shares, fo
87138
## See also
88139

89140
- [Improve NFS Azure file share performance](nfs-performance.md)
90-
- [NFS file shares in Azure Files](files-nfs-protocol.md)
141+
- [Improve SMB Azure file share performance](smb-performance.md)

0 commit comments

Comments
 (0)