Skip to content

Commit 85bbeff

Browse files
Merge pull request #297345 from wmgries/consistent-media-tiers-1
Clean up nfs-large-directories
2 parents 543e43b + 5e4e634 commit 85bbeff

File tree

1 file changed

+22
-17
lines changed

1 file changed

+22
-17
lines changed

articles/storage/files/nfs-large-directories.md

Lines changed: 22 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -14,26 +14,32 @@ ms.author: kendownie
1414
This article provides recommendations for working with directories that contain large numbers of files. It's usually a good practice to reduce the number of files in a single directory by spreading the files over multiple directories. However, there are situations in which large directories can't be avoided. Consider the following suggestions when working with large directories on Azure file shares that are mounted on Linux clients.
1515

1616
## Applies to
17-
18-
| File share type | SMB | NFS |
19-
|-|:-:|:-:|
20-
| Standard file shares (GPv2), LRS/ZRS | ![Yes, this article applies to standard SMB Azure file shares LRS/ZRS.](../media/icons/yes-icon.png) | ![NFS shares are only available in premium Azure file shares.](../media/icons/no-icon.png) |
21-
| Standard file shares (GPv2), GRS/GZRS | ![Yes, this article applies to standard SMB Azure file shares GRS/GZRS.](../media/icons/yes-icon.png) | ![NFS is only available in premium Azure file shares.](../media/icons/no-icon.png) |
22-
| Premium file shares (FileStorage), LRS/ZRS | ![Yes, this article applies to premium SMB Azure file shares.](../media/icons/yes-icon.png) | ![Yes, this article applies to premium NFS Azure file shares.](../media/icons/yes-icon.png) |
17+
| Management model | Billing model | Media tier | Redundancy | SMB | NFS |
18+
|-|-|-|-|:-:|:-:|
19+
| Microsoft.Storage | Provisioned v2 | HDD (standard) | Local (LRS) | ![No](../media/icons/no-icon.png) | ![No](../media/icons/no-icon.png) |
20+
| Microsoft.Storage | Provisioned v2 | HDD (standard) | Zone (ZRS) | ![No](../media/icons/no-icon.png) | ![No](../media/icons/no-icon.png) |
21+
| Microsoft.Storage | Provisioned v2 | HDD (standard) | Geo (GRS) | ![No](../media/icons/no-icon.png) | ![No](../media/icons/no-icon.png) |
22+
| Microsoft.Storage | Provisioned v2 | HDD (standard) | GeoZone (GZRS) | ![No](../media/icons/no-icon.png) | ![No](../media/icons/no-icon.png) |
23+
| Microsoft.Storage | Provisioned v1 | SSD (premium) | Local (LRS) | ![No](../media/icons/no-icon.png) | ![Yes](../media/icons/yes-icon.png) |
24+
| Microsoft.Storage | Provisioned v1 | SSD (premium) | Zone (ZRS) | ![No](../media/icons/no-icon.png) | ![Yes](../media/icons/yes-icon.png) |
25+
| Microsoft.Storage | Pay-as-you-go | HDD (standard) | Local (LRS) | ![No](../media/icons/no-icon.png) | ![No](../media/icons/no-icon.png) |
26+
| Microsoft.Storage | Pay-as-you-go | HDD (standard) | Zone (ZRS) | ![No](../media/icons/no-icon.png) | ![No](../media/icons/no-icon.png) |
27+
| Microsoft.Storage | Pay-as-you-go | HDD (standard) | Geo (GRS) | ![No](../media/icons/no-icon.png) | ![No](../media/icons/no-icon.png) |
28+
| Microsoft.Storage | Pay-as-you-go | HDD (standard) | GeoZone (GZRS) | ![No](../media/icons/no-icon.png) | ![No](../media/icons/no-icon.png) |
2329

2430
## Increase the number of hash buckets
2531

2632
The total amount of RAM present on the system doing the enumeration influences the internal working of filesystem protocols like NFS and SMB. Even if users aren't experiencing high memory usage, the amount of memory available influences the number of inode hash buckets the system has, which impacts/improves enumeration performance for large directories. You can modify the number of inode hash buckets the system has to reduce the hash collisions that can occur during large enumeration workloads.
2733

28-
To do this, you'll need to modify your boot configuration settings by providing an additional kernel command that takes effect during boot to increase the number of inode hash buckets. Follow these steps.
34+
To increase the number of inode hash buckets, modify your boot configuration settings:
2935

3036
1. Using a text editor, edit the `/etc/default/grub` file.
3137

3238
```bash
3339
sudo vim /etc/default/grub
3440
```
3541

36-
2. Add the following text to the `/etc/default/grub` file. This command will set apart 128MB as the inode hash table size, increasing system memory consumption by a maximum of 128MB.
42+
2. Add the following text to the `/etc/default/grub` file. This command sets 128MB as the inode hash table size, increasing system memory consumption by a maximum of 128MB.
3743

3844
```bash
3945
GRUB_CMDLINE_LINUX="ihash_entries=16777216"
@@ -57,7 +63,7 @@ To do this, you'll need to modify your boot configuration settings by providing
5763
sudo reboot
5864
```
5965

60-
5. To verify that the changes have taken effect, once the system reboots, check the kernel cmdline commands:
66+
5. To verify that the changes are effective after reboot, check the kernel cmdline commands:
6167

6268
```bash
6369
cat /proc/cmdline
@@ -88,9 +94,8 @@ The following graph compares the total time it takes to finish different operati
8894

8995
:::image type="content" source="media/nfs-large-directories/default-mount-versus-actimeo.png" alt-text="Graph comparing the time to finish different operations with default mount versus setting an actimeo value of 30 for a workload with 1 million files." border="false":::
9096

91-
### nconnect
92-
93-
`Nconnect` is a client-side mount option for NFS file shares that allows you to use multiple TCP connections between the client and the Azure Premium Files service for NFSv4.1. We recommend the optimal setting of `nconnect=4` to reduce latency and improve performance. `Nconnect` can be especially useful for workloads that use asynchronous or synchronous I/O from multiple threads. [Learn more](nfs-performance.md#nconnect).
97+
### NFS nconnect
98+
NFS nconnect is a client-side mount option for NFS file shares that allows you to use multiple TCP connections between the client and your NFS file share. We recommend the optimal setting of `nconnect=4` to reduce latency and improve performance. The nconnect feature can be especially useful for workloads that use asynchronous or synchronous I/O from multiple threads. [Learn more](nfs-performance.md#nconnect).
9499

95100
## Commands and operations
96101

@@ -103,7 +108,7 @@ The way commands and operations are specified can also affect performance. Listi
103108

104109
In some Linux distributions, the shell automatically sets default options for the `ls` command such as `ls --color=auto`. This changes how `ls` works over the wire and adds more operations to the `ls` execution. To avoid performance degradation, we recommended using unaliased ls. You can do this one of three ways:
105110

106-
- Remove the alias by using the command `unalias ls`. This is only a temporary solution for the current session.
111+
- As a temporary workaround that only impacts the current session, you can remove the alias by using the command `unalias ls`.
107112

108113
- For a permanent change, you can edit the `ls` alias in the user's `bashrc/bash_aliases` file. In Ubuntu, edit `~/.bashrc` to remove the alias for `ls`.
109114

@@ -113,7 +118,7 @@ In some Linux distributions, the shell automatically sets default options for th
113118

114119
When using `ls` with other commands, you can improve performance by preventing `ls` from sorting its output in situations where you don't care about the order that `ls` returns the files. Sorting the output adds significant overhead.
115120

116-
Instead of running `ls -l | wc -l` to get the total number of files, you can use the `-f` or `-U` options with `ls` to prevent the output from being sorted. The difference is that `-f` will also show hidden files, and `-U` won't.
121+
Instead of running `ls -l | wc -l` to get the total number of files, you can use the `-f` or `-U` options with `ls` to prevent the output from being sorted. The difference is that `-f` also shows hidden files, and `-U` doesn't.
117122

118123
For example, if you're directly calling the `ls` binary in Ubuntu, you would run `/usr/bin/ls -1f | wc -l` or `/usr/bin/ls -1U | wc -l`.
119124

@@ -129,11 +134,11 @@ When copying data from a file share or backing up from file shares to another lo
129134

130135
When developing applications that use large directories, follow these recommendations.
131136

132-
- **Skip file attributes.** If the application only needs the file name and not file attributes like file type or last modified time, you can use multiple calls to system calls such as `getdents64` with a good buffer size. This will get the entries in the specified directory without the file type, making the operation faster by avoiding extra operations that aren't needed.
137+
- **Skip file attributes.** If the application only needs the file name and not file attributes like file type or last modified time, you can use multiple calls to system calls such as `getdents64` with a good buffer size to get the entries in the specified directory without the file type, making the operation faster by avoiding extra operations that aren't needed.
133138

134-
- **Interleave stat calls.** If the application needs attributes and the file name, we recommend interleaving the stat calls along with `getdents64` instead of getting all entries until end of file with `getdents64` and then doing a statx on all entries returned. Interleaving the stat calls instructs the client to request both the file and its attributes at once, reducing the number of calls to the server. When combined with a high `actimeo` value, this can significantly improve performance. For example, instead of `[ getdents64, getdents64, ... , getdents64, statx (entry1), ... , statx(n) ]`, place the statx calls after each `getdents64` like this: `[ getdents64, (statx, statx, ... , statx), getdents64, (statx, statx, ... , statx), ... ]`.
139+
- **Interleave stat calls.** If the application needs attributes and the file name, we recommend interleaving the stat calls along with `getdents64` instead of getting all entries until end of file with `getdents64` and then doing a statx on all entries returned. Interleaving the stat calls instructs the client to request both the file and its attributes at once, reducing the number of calls to the server. When combined with a high `actimeo` value, interleaving stat calls can significantly improve performance. For example, instead of `[ getdents64, getdents64, ... , getdents64, statx (entry1), ... , statx(n) ]`, place the statx calls after each `getdents64` like this: `[ getdents64, (statx, statx, ... , statx), getdents64, (statx, statx, ... , statx), ... ]`.
135140

136-
- **Increase I/O depth.** If possible, we suggest configuring `nconnect` to a non-zero value (greater than 1) and distributing the operation among multiple threads, or using asynchronous I/O. This will enable operations that can be asynchronous to benefit from multiple concurrent connections to the file share.
141+
- **Increase I/O depth.** If possible, we suggest configuring `nconnect` to a non-zero value (greater than 1) and distributing the operation among multiple threads, or using asynchronous I/O. This enables operations that can be asynchronous to benefit from multiple concurrent connections to the file share.
137142

138143
- **Force-use cache.** If the application is querying the file attributes on a file share that only one client has mounted, use the statx system call with the `AT_STATX_DONT_SYNC` flag. This flag ensures that the cached attributes are retrieved from the cache without synchronizing with the server, avoiding extra network round trips to get the latest data.
139144

0 commit comments

Comments
 (0)