Merge pull request #293397 from khdownie/kendownie012225

v-dirichards · web-flow · commit 3652104274d6 · 2025-01-22T14:55:28.000-06:00
improve enumeration latencies
diff --git a/articles/storage/files/TOC.yml b/articles/storage/files/TOC.yml
@@ -251,7 +251,7 @@
     href: smb-performance.md
   - name: NFS performance
     href: nfs-performance.md
-  - name: NFS large directory best practices
+  - name: Large directory best practices
     href: nfs-large-directories.md
   - name: Scalability and performance targets
     href: storage-files-scale-targets.md
diff --git a/articles/storage/files/nfs-large-directories.md b/articles/storage/files/nfs-large-directories.md
@@ -1,33 +1,33 @@
 ---
-title: Work with large directories in NFS Azure file shares
-description: Learn recommendations for working with large directories in NFS Azure file shares mounted on Linux clients, including mount options, commands, and operations.
+title: Work with large directories in Azure file shares
+description: Learn recommendations for working with large directories in Azure file shares mounted on Linux clients, including mount options, commands, and operations.
 author: khdownie
 ms.service: azure-file-storage
 ms.custom: linux-related-content
 ms.topic: conceptual
-ms.date: 05/09/2024
+ms.date: 01/22/2025
 ms.author: kendownie
 ---
 
-# Recommendations for working with large directories in NFS Azure file shares
+# Optimize file share performance when accessing large directories from Linux clients
 
-This article provides recommendations for working with NFS directories that contain large numbers of files. It's usually a good practice to reduce the number of files in a single directory by spreading the files over multiple directories. However, there are situations in which large directories can't be avoided. Consider the following suggestions when working with large directories on NFS Azure file shares that are mounted on Linux clients.
+This article provides recommendations for working with directories that contain large numbers of files. It's usually a good practice to reduce the number of files in a single directory by spreading the files over multiple directories. However, there are situations in which large directories can't be avoided. Consider the following suggestions when working with large directories on Azure file shares that are mounted on Linux clients.
 
 ## Applies to
 
 | File share type | SMB | NFS |
 |-|:-:|:-:|
-| Standard file shares (GPv2), LRS/ZRS | ![No, this article doesn't apply to standard SMB Azure file shares LRS/ZRS.](../media/icons/no-icon.png) | ![NFS shares are only available in premium Azure file shares.](../media/icons/no-icon.png) |
-| Standard file shares (GPv2), GRS/GZRS | ![No, this article doesn't apply to standard SMB Azure file shares GRS/GZRS.](../media/icons/no-icon.png) | ![NFS is only available in premium Azure file shares.](../media/icons/no-icon.png) |
-| Premium file shares (FileStorage), LRS/ZRS | ![No, this article doesn't apply to premium SMB Azure file shares.](../media/icons/no-icon.png) | ![Yes, this article applies to premium NFS Azure file shares.](../media/icons/yes-icon.png) |
+| Standard file shares (GPv2), LRS/ZRS | ![Yes, this article applies to standard SMB Azure file shares LRS/ZRS.](../media/icons/yes-icon.png) | ![NFS shares are only available in premium Azure file shares.](../media/icons/no-icon.png) |
+| Standard file shares (GPv2), GRS/GZRS | ![Yes, this article applies to standard SMB Azure file shares GRS/GZRS.](../media/icons/yes-icon.png) | ![NFS is only available in premium Azure file shares.](../media/icons/no-icon.png) |
+| Premium file shares (FileStorage), LRS/ZRS | ![Yes, this article applies to premium SMB Azure file shares.](../media/icons/yes-icon.png) | ![Yes, this article applies to premium NFS Azure file shares.](../media/icons/yes-icon.png) |
 
 ## Recommended mount options
 
 The following mount options are specific to enumeration and can reduce latency when working with large directories.
 
 ### actimeo
 
-Specifying `actimeo` sets all of `acregmin`, `acregmax`, `acdirmin`, and `acdirmax` to the same value. If `actimeo` isn't specified, the NFS client uses the defaults for each of these options.
+Specifying `actimeo` sets all of `acregmin`, `acregmax`, `acdirmin`, and `acdirmax` to the same value. If `actimeo` isn't specified, the client uses the defaults for each of these options.
 
 We recommend setting `actimeo` between 30 and 60 seconds when working with large directories. Setting a value in this range makes the attributes remain valid for a longer time period in the client's attribute cache, allowing operations to get file attributes from the cache instead of fetching them over the wire. This can reduce latency in situations where the cached attributes expire while the operation is still running.
 
@@ -68,17 +68,68 @@ The following chart compares the time it takes to output results using unaliased
 
 :::image type="content" source="media/nfs-large-directories/sorted-versus-unsorted-ls.png" alt-text="Graph comparing the total time in seconds to complete a sorted ls operation versus unsorted." border="false":::
 
+### Increase the number of hash buckets
+
+The total amount of RAM present on the system doing the enumeration influences the internal working of filesystem protocols like NFS and SMB. Even if users aren't experiencing high memory usage, the amount of memory available influences the amount of hash buckets the system has, which impacts/improves enumeration performance for large directories. You can modify the amount of hash buckets the system has to reduce the hash collisions that can occur during large enumeration workloads.
+
+To do this, you'll need to modify your boot configuration settings by providing an additional kernel command that takes effect during boot to increase the number of hash buckets. Follow these steps.
+
+1. Using a text editor, edit the `/etc/default/grub` file.
+
+   ```bash
+   sudo vim /etc/default/grub
+   ```
+
+2. Add the following text to the `/etc/default/grub` file. This command will set apart 128MB as the hash table size, increasing system memory consumption by a maximum of 128MB.
+
+   ```bash
+   GRUB_CMDLINE_LINUX="ihash_entries=16777216"
+   ```
+
+   If `GRUB_CMDLINE_LINUX` already exists, add `ihash_entries=16777216` separated by a space, like this:
+
+   ```bash
+   GRUB_CMDLINE_LINUX="<previous commands> ihash_entries=16777216"
+   ```
+
+3. To apply the changes, run:
+
+   ```bash
+   sudo update-grub2
+   ```
+
+4. Restart the system:
+
+   ```bash
+   sudo reboot
+   ```
+
+5. To verify that the changes have taken effect, once the system reboots, check the kernel cmdline commands:
+
+   ```bash
+   cat /proc/cmdline
+   ```
+
+   If `ihash_entries` is visible, the system has applied the setting, and enumeration performance should improve exponentially.
+
+   You can also check the dmesg output to see if the kernel cmdline was applied:
+
+   ```bash
+   dmesg | grep "Inode-cache hash table"
+   Inode-cache hash table entries: 16777216 (order: 15, 134217728 bytes, linear)
+   ```
+
 ## File copy and backup operations
 
-When copying data from an NFS file share or backing up from NFS file shares to another location, for optimal performance we recommend using a share snapshot as the source instead of the live file share with active I/O. Backup applications should run commands on the snapshot directly. For more information, see [NFS file share snapshots](storage-files-how-to-mount-nfs-shares.md#nfs-file-share-snapshots).
+When copying data from a file share or backing up from file shares to another location, for optimal performance we recommend using a share snapshot as the source instead of the live file share with active I/O. Backup applications should run commands on the snapshot directly. For more information, see [Use share snapshots with Azure Files](storage-snapshots-files.md).
 
 ## Application-level recommendations
 
-When developing applications that use large directories with NFS file shares, follow these recommendations.
+When developing applications that use large directories, follow these recommendations.
 
 - **Skip file attributes.** If the application only needs the file name and not file attributes like file type or last modified time, you can use multiple calls to system calls such as `getdents64` with a good buffer size. This will get the entries in the specified directory without the file type, making the operation faster by avoiding extra operations that aren't needed.  
 
-- **Interleave stat calls.** If the application needs attributes and the file name, we recommend interleaving the stat calls along with `getdents64` instead of getting all entries until end of file with `getdents64` and then doing a statx on all entries returned. Interleaving the stat calls instructs the NFS client to request both the file and its attributes at once, reducing the number of calls to the server. When combined with a high `actimeo` value, this can significantly improve performance. For example, instead of `[ getdents64, getdents64, ... , getdents64, statx (entry1), ... , statx(n) ]`, place the statx calls after each `getdents64` like this: `[ getdents64, (statx, statx, ... , statx), getdents64, (statx, statx, ... , statx), ... ]`.
+- **Interleave stat calls.** If the application needs attributes and the file name, we recommend interleaving the stat calls along with `getdents64` instead of getting all entries until end of file with `getdents64` and then doing a statx on all entries returned. Interleaving the stat calls instructs the client to request both the file and its attributes at once, reducing the number of calls to the server. When combined with a high `actimeo` value, this can significantly improve performance. For example, instead of `[ getdents64, getdents64, ... , getdents64, statx (entry1), ... , statx(n) ]`, place the statx calls after each `getdents64` like this: `[ getdents64, (statx, statx, ... , statx), getdents64, (statx, statx, ... , statx), ... ]`.
 
 - **Increase I/O depth.** If possible, we suggest configuring `nconnect` to a non-zero value (greater than 1) and distributing the operation among multiple threads, or using asynchronous I/O. This will enable operations that can be asynchronous to benefit from multiple concurrent connections to the file share.
 
@@ -87,4 +138,4 @@ When developing applications that use large directories with NFS file shares, fo
 ## See also
 
 - [Improve NFS Azure file share performance](nfs-performance.md)
-- [NFS file shares in Azure Files](files-nfs-protocol.md)
+- [Improve SMB Azure file share performance](smb-performance.md)