You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/storage/files/nfs-large-directories.md
+64-13Lines changed: 64 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,33 +1,33 @@
1
1
---
2
-
title: Work with large directories in NFS Azure file shares
3
-
description: Learn recommendations for working with large directories in NFS Azure file shares mounted on Linux clients, including mount options, commands, and operations.
2
+
title: Work with large directories in Azure file shares
3
+
description: Learn recommendations for working with large directories in Azure file shares mounted on Linux clients, including mount options, commands, and operations.
4
4
author: khdownie
5
5
ms.service: azure-file-storage
6
6
ms.custom: linux-related-content
7
7
ms.topic: conceptual
8
-
ms.date: 05/09/2024
8
+
ms.date: 01/22/2025
9
9
ms.author: kendownie
10
10
---
11
11
12
-
# Recommendations for working with large directories in NFS Azure file shares
12
+
# Optimize file share performance when accessing large directories from Linux clients
13
13
14
-
This article provides recommendations for working with NFS directories that contain large numbers of files. It's usually a good practice to reduce the number of files in a single directory by spreading the files over multiple directories. However, there are situations in which large directories can't be avoided. Consider the following suggestions when working with large directories on NFS Azure file shares that are mounted on Linux clients.
14
+
This article provides recommendations for working with directories that contain large numbers of files. It's usually a good practice to reduce the number of files in a single directory by spreading the files over multiple directories. However, there are situations in which large directories can't be avoided. Consider the following suggestions when working with large directories on Azure file shares that are mounted on Linux clients.
15
15
16
16
## Applies to
17
17
18
18
| File share type | SMB | NFS |
19
19
|-|:-:|:-:|
20
-
| Standard file shares (GPv2), LRS/ZRS |||
21
-
| Standard file shares (GPv2), GRS/GZRS |||
22
-
| Premium file shares (FileStorage), LRS/ZRS |||
20
+
| Standard file shares (GPv2), LRS/ZRS |||
21
+
| Standard file shares (GPv2), GRS/GZRS |||
22
+
| Premium file shares (FileStorage), LRS/ZRS |||
23
23
24
24
## Recommended mount options
25
25
26
26
The following mount options are specific to enumeration and can reduce latency when working with large directories.
27
27
28
28
### actimeo
29
29
30
-
Specifying `actimeo` sets all of `acregmin`, `acregmax`, `acdirmin`, and `acdirmax` to the same value. If `actimeo` isn't specified, the NFS client uses the defaults for each of these options.
30
+
Specifying `actimeo` sets all of `acregmin`, `acregmax`, `acdirmin`, and `acdirmax` to the same value. If `actimeo` isn't specified, the client uses the defaults for each of these options.
31
31
32
32
We recommend setting `actimeo` between 30 and 60 seconds when working with large directories. Setting a value in this range makes the attributes remain valid for a longer time period in the client's attribute cache, allowing operations to get file attributes from the cache instead of fetching them over the wire. This can reduce latency in situations where the cached attributes expire while the operation is still running.
33
33
@@ -68,17 +68,68 @@ The following chart compares the time it takes to output results using unaliased
68
68
69
69
:::image type="content" source="media/nfs-large-directories/sorted-versus-unsorted-ls.png" alt-text="Graph comparing the total time in seconds to complete a sorted ls operation versus unsorted." border="false":::
70
70
71
+
### Increase the number of hash buckets
72
+
73
+
The total amount of RAM present on the system doing the enumeration influences the internal working of filesystem protocols like NFS and SMB. Even if users aren't experiencing high memory usage, the amount of memory available influences the amount of hash buckets the system has, which impacts/improves enumeration performance for large directories. You can modify the amount of hash buckets the system has to reduce the hash collisions that can occur during large enumeration workloads.
74
+
75
+
To do this, you'll need to modify your boot configuration settings by providing an additional kernel command that takes effect during boot to increase the number of hash buckets. Follow these steps.
76
+
77
+
1. Using a text editor, edit the `/etc/default/grub` file.
78
+
79
+
```bash
80
+
sudo vim /etc/default/grub
81
+
```
82
+
83
+
2. Add the following text to the `/etc/default/grub` file. This command will set apart 128MB as the hash table size, increasing system memory consumption by a maximum of 128MB.
84
+
85
+
```bash
86
+
GRUB_CMDLINE_LINUX="ihash_entries=16777216"
87
+
```
88
+
89
+
If `GRUB_CMDLINE_LINUX` already exists, add `ihash_entries=16777216` separated by a space, like this:
When copying data from an NFS file share or backing up from NFS file shares to another location, for optimal performance we recommend using a share snapshot as the source instead of the live file share with active I/O. Backup applications should run commands on the snapshot directly. For more information, see [NFS file share snapshots](storage-files-how-to-mount-nfs-shares.md#nfs-file-share-snapshots).
124
+
When copying data from a file share or backing up from file shares to another location, for optimal performance we recommend using a share snapshot as the source instead of the live file share with active I/O. Backup applications should run commands on the snapshot directly. For more information, see [Use share snapshots with Azure Files](storage-snapshots-files.md).
74
125
75
126
## Application-level recommendations
76
127
77
-
When developing applications that use large directories with NFS file shares, follow these recommendations.
128
+
When developing applications that use large directories, follow these recommendations.
78
129
79
130
-**Skip file attributes.** If the application only needs the file name and not file attributes like file type or last modified time, you can use multiple calls to system calls such as `getdents64` with a good buffer size. This will get the entries in the specified directory without the file type, making the operation faster by avoiding extra operations that aren't needed.
80
131
81
-
-**Interleave stat calls.** If the application needs attributes and the file name, we recommend interleaving the stat calls along with `getdents64` instead of getting all entries until end of file with `getdents64` and then doing a statx on all entries returned. Interleaving the stat calls instructs the NFS client to request both the file and its attributes at once, reducing the number of calls to the server. When combined with a high `actimeo` value, this can significantly improve performance. For example, instead of `[ getdents64, getdents64, ... , getdents64, statx (entry1), ... , statx(n) ]`, place the statx calls after each `getdents64` like this: `[ getdents64, (statx, statx, ... , statx), getdents64, (statx, statx, ... , statx), ... ]`.
132
+
-**Interleave stat calls.** If the application needs attributes and the file name, we recommend interleaving the stat calls along with `getdents64` instead of getting all entries until end of file with `getdents64` and then doing a statx on all entries returned. Interleaving the stat calls instructs the client to request both the file and its attributes at once, reducing the number of calls to the server. When combined with a high `actimeo` value, this can significantly improve performance. For example, instead of `[ getdents64, getdents64, ... , getdents64, statx (entry1), ... , statx(n) ]`, place the statx calls after each `getdents64` like this: `[ getdents64, (statx, statx, ... , statx), getdents64, (statx, statx, ... , statx), ... ]`.
82
133
83
134
-**Increase I/O depth.** If possible, we suggest configuring `nconnect` to a non-zero value (greater than 1) and distributing the operation among multiple threads, or using asynchronous I/O. This will enable operations that can be asynchronous to benefit from multiple concurrent connections to the file share.
84
135
@@ -87,4 +138,4 @@ When developing applications that use large directories with NFS file shares, fo
0 commit comments