You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/azure-netapp-files/performance-benchmarks-linux.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,15 +19,15 @@ This section describes performance benchmarks of Linux workload throughput and w
19
19
20
20
### Linux workload throughput
21
21
22
-
The graph below represents a 64-kibibyte (KiB) sequential workload and a 1 TiB working set. It shows that a single Azure NetApp Files volume can handle between ~1,600 MiB/s pure sequential writes and ~4,500 MiB/s pure sequential reads.
22
+
This graph represents a 64kibibyte (KiB) sequential workload and a 1 TiB working set. It shows that a single Azure NetApp Files volume can handle between ~1,600 MiB/s pure sequential writes and ~4,500 MiB/s pure sequential reads.
23
23
24
24
The graph illustrates decreases in 10% at a time, from pure read to pure write. It demonstrates what you can expect when using varying read/write ratios (100%:0%, 90%:10%, 80%:20%, and so on).
The following graph represents a 4-kibibyte (KiB) random workload and a 1 TiB working set. The graph shows that an Azure NetApp Files volume can handle between ~130,000 pure random writes and ~460,000 pure random reads.
30
+
The following graph represents a 4-KiB random workload and a 1 TiB working set. The graph shows that an Azure NetApp Files volume can handle between ~130,000 pure random writes and ~460,000 pure random reads.
31
31
32
32
This graph illustrates decreases in 10% at a time, from pure read to pure write. It demonstrates what you can expect when using varying read/write ratios (100%:0%, 90%:10%, 80%:20%, and so on).
33
33
@@ -37,7 +37,7 @@ This graph illustrates decreases in 10% at a time, from pure read to pure write.
37
37
38
38
The graphs in this section show the validation testing results for the client-side mount option with NFSv3. For more information, see [`nconnect` section of Linux mount options](performance-linux-mount-options.md#nconnect).
39
39
40
-
The graphs compare the advantages of `nconnect` to a non-`connected` mounted volume. In the graphs, FIO generated the workload from a single D32s_v4 instance in the us-west2 Azure region using a 64-KiB sequential workload – the largest I/O size supported by Azure NetApp Files at the time of the testing represented here. Azure NetApp Files now supports larger I/O sizes. For more details, see [`rsize` and `wsize` section of Linux mount options](performance-linux-mount-options.md#rsize-and-wsize).
40
+
The graphs compare the advantages of `nconnect` to a non-`connected` mounted volume. In the graphs, FIO generated the workload from a single D32s_v4 instance in the us-west2 Azure region using a 64-KiB sequential workload – the largest I/O size supported by Azure NetApp Files at the time of the testing represented here. Azure NetApp Files now supports larger I/O sizes. For more information, see [`rsize` and `wsize` section of Linux mount options](performance-linux-mount-options.md#rsize-and-wsize).
41
41
42
42
### Linux read throughput
43
43
@@ -47,7 +47,7 @@ The following graphs show 64-KiB sequential reads of ~3,500 MiB/s reads with `nc
47
47
48
48
### Linux write throughput
49
49
50
-
The following graphs show sequential writes. They indicate that `nconnect` has no noticeable benefit for sequential writes. 1,500 MiB/s is roughly both the sequential write volume upper limit and the D32s_v4 instance egress limit.
50
+
The following graphs show sequential writes. They indicate that `nconnect` has no noticeable benefit for sequential writes. The sequential write volume upper limit is approximately 1,500 MiB/s; the D32s_v4 instance egress limit is also approximately 1,500 MiB/s.
Copy file name to clipboardExpand all lines: articles/azure-netapp-files/performance-linux-direct-io.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,13 +15,13 @@ This article helps you understand direct I/O best practices for Azure NetApp Fil
15
15
16
16
## Direct I/O
17
17
18
-
The most common parameter used in storage performance benchmarking is direct I/O. It is supported by FIO and Vdbench. DISKSPD offers support for the similar construct of memory-mapped I/O. With direct I/O, the filesystem cache is bypassed, operations for direct memory access copy are avoided, and storage tests are made fast and simple.
18
+
The most common parameter used in storage performance benchmarking is direct I/O. It's supported by FIO and Vdbench. DISKSPD offers support for the similar construct of memory-mapped I/O. With direct I/O, the filesystem cache is bypassed, operations for direct memory access copy are avoided, and storage tests are made fast and simple.
19
19
20
-
Using the direct I/O parameter makes storage testing easy. No data is read from the filesystem cache on the client. As such, the test is truly stressing the storage protocol and service itself, rather than memory access speeds. Also, without the DMA memory copies, read and write operations are efficient from a processing perspective.
20
+
Using the direct I/O parameter makes storage testing easy. No data is read from the filesystem cache on the client. As such, the test is truly stressing the storage protocol and service itself, rather than memory access speeds. Without the DMA memory copies, read and write operations are efficient from a processing perspective.
21
21
22
-
Take the Linux `dd` command as an example workload. Without the optional `odirect` flag, all I/O generated by `dd` is served from the Linux buffer cache. Reads with the blocks already in memory are not retrieved from storage. Reads resulting in a buffer cache miss end up being read from storage using NFS read-ahead with varying results, depending on factors as mount `rsize` and client read-ahead tunables. When writes are sent through the buffer cache, they use a write-behind mechanism, which is untuned and uses a significant amount of parallelism to send the data to the storage device. You might attempt to run two independent streams of I/O, one `dd` for reads and one `dd` for writes. But in fact, the operating system, untuned, favors writes over reads and uses more parallelism of it.
22
+
Take the Linux `dd` command as an example workload. Without the optional `odirect` flag, all I/O generated by `dd` is served from the Linux buffer cache. Reads with the blocks already in memory aren't retrieved from storage. Reads resulting in a buffer cache miss end up being read from storage using NFS read-ahead with varying results, depending on factors as mount `rsize` and client read-ahead tunables. When writes are sent through the buffer cache, they use a write-behind mechanism, which is untuned and uses a significant amount of parallelism to send the data to the storage device. You might attempt to run two independent streams of I/O, one `dd` for reads and one `dd` for writes. But in fact, the operating system, untuned, favors writes over reads and uses more parallelism of it.
23
23
24
-
Aside from database, few applications use direct I/O. Instead, they choose to leverage the advantages of a large memory cache for repeated reads and a write behind cache for asynchronous writes. In short, using direct I/O turns the test into a micro benchmark *if* the application being synthesized uses the filesystem cache.
24
+
Aside from database, few applications use direct I/O. Instead, they leverage the advantages of a large memory cache for repeated reads and a write behind cache for asynchronous writes. In short, using direct I/O turns the test into a micro benchmark *if* the application being synthesized uses the filesystem cache.
25
25
26
26
The following are some databases that support direct I/O:
27
27
@@ -34,7 +34,7 @@ The following are some databases that support direct I/O:
34
34
35
35
## Best practices
36
36
37
-
Testing with `directio` is an excellent way to understand the limits of the storage service and client. To get a better understanding for how the application itself will behave (if the application doesn't use `directio`), you should also run tests through the filesystem cache.
37
+
Testing with `directio` is an excellent way to understand the limits of the storage service and client. To better understand how the application behaves (if the application doesn't use `directio`), you should also run tests through the filesystem cache.
Copy file name to clipboardExpand all lines: articles/azure-netapp-files/performance-linux-filesystem-cache.md
+13-13Lines changed: 13 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,60 +11,60 @@ ms.author: anfdocs
11
11
---
12
12
# Linux filesystem cache best practices for Azure NetApp Files
13
13
14
-
This article helps you understand filesystem cache best practices for Azure NetApp Files.
14
+
This article helps you understand filesystem cache best practices for Azure NetApp Files.
15
15
16
16
## Filesystem cache tunables
17
17
18
18
You need to understand the following factors about filesystem cache tunables:
19
19
20
-
* Flushing a dirty buffer leaves the data in a clean state usable for future reads until memory pressure leads to eviction.
20
+
* Flushing a dirty buffer leaves the data in a clean state usable for future reads until memory pressure leads to eviction.
21
21
* There are three triggers for an asynchronous flush operation:
22
22
* Time based: When a buffer reaches the age defined by this tunable, it must be marked for cleaning (that is, flushing, or writing to storage).
23
23
* Memory pressure: See [`vm.dirty_ratio | vm.dirty_bytes`](#vmdirty_ratio--vmdirty_bytes) for details.
24
24
* Close: When a file handle is closed, all dirty buffers are asynchronously flushed to storage.
25
25
26
-
These factors are controlled by four tunables. Each tunable can be tuned dynamically and persistently using `tuned` or `sysctl` in the `/etc/sysctl.conf` file. Tuning these variables improves performance for applications.
26
+
These factors are controlled by four tunables. Each tunable can be tuned dynamically and persistently using `tuned` or `sysctl` in the `/etc/sysctl.conf` file. Tuning these variables improves performance for applications.
27
27
28
28
> [!NOTE]
29
-
> Information discussed in this article was uncovered during SAS GRID and SAS Viya validation exercises. As such, the tunables are based on lessons learned from the validation exercises. Many applications will similarly benefit from tuning these parameters.
29
+
> Information discussed in this article was uncovered during SAS GRID and SAS Viya validation exercises. As such, the tunables are based on lessons learned from the validation exercises. Many applications similarly benefit from tuning these parameters.
30
30
31
31
### `vm.dirty_ratio | vm.dirty_bytes`
32
32
33
-
These two tunables define the amount of RAM made usable for data modified but not yet written to stable storage. Whichever tunable is set automatically sets the other tunable to zero; RedHat advises against manually setting either of the two tunables to zero. The option `vm.dirty_ratio` (the default of the two) is set by Redhat to either 20% or 30% of physical memory depending on the OS, which is a significant amount considering the memory footprint of modern systems. Consideration should be given to setting `vm.dirty_bytes` instead of `vm.dirty_ratio` for a more consistent experience regardless of memory size. For example, ongoing work with SAS GRID determined 30 MiB an appropriate setting for best overall mixed workload performance.
33
+
These two tunables define the amount of RAM made usable for data modified but not yet written to stable storage. Whichever tunable is set automatically sets the other tunable to zero; RedHat advises against manually setting either of the two tunables to zero. The option `vm.dirty_ratio` (the default of the two) is set by Redhat to either 20% or 30% of physical memory depending on the OS, which is a significant amount considering the memory footprint of modern systems. Consideration should be given to setting `vm.dirty_bytes` instead of `vm.dirty_ratio` for a more consistent experience regardless of memory size. For example, ongoing work with SAS GRID determined 30 MiB an appropriate setting for best overall mixed workload performance.
These tunables define the starting point where the Linux write-back mechanism begins flushing dirty blocks to stable storage. Redhat defaults to 10% of physical memory, which, on a large memory system, is a significant amount of data to start flushing. Taking SAS GRID for example, historically the recommendation has been to set `vm.dirty_background` to 1/5 size of `vm.dirty_ratio`or `vm.dirty_bytes`. Considering how aggressively the `vm.dirty_bytes` setting is set for SAS GRID, no specific value is being set here.
37
+
These tunables define the starting point where the Linux write-back mechanism begins flushing dirty blocks to stable storage. Redhat defaults to 10% of physical memory, which, on a large memory system, is a significant amount of data to start flushing. Taking SAS GRID for example, historically the recommendation was to set `vm.dirty_background` to 1/5 size of `vm.dirty_ratio` or `vm.dirty_bytes`. Considering how aggressively the `vm.dirty_bytes` setting is set for SAS GRID, no specific value is being set here.
38
38
39
39
### `vm.dirty_expire_centisecs`
40
40
41
-
This tunable defines how old a dirty buffer can be before it must be tagged for asynchronously writing out. Take SAS Viya’s CAS workload for example. An ephemeral write-dominant workload found that setting this value to 300 centiseconds (3 seconds) was optimal, with 3000 centiseconds (30 seconds) being the default.
41
+
This tunable defines how old a dirty buffer can be before it must be tagged for asynchronously writing out. Take SAS Viya’s CAS workload for example. An ephemeral write-dominant workload found that setting this value to 300 centiseconds (3 seconds) was optimal, with 3000 centiseconds (30 seconds) being the default.
42
42
43
-
SAS Viya shares CAS data into multiple small chunks of a few megabytes each. Rather than closing these file handles after writing data to each shard, the handles are left open and the buffers within are memory-mapped by the application. Without a close, there will be no flush until either memory pressure or 30 seconds has passed. Waiting for memory pressure proved suboptimal as did waiting for a long timer to expire. Unlike SAS GRID, which looked for the best overall throughput, SAS Viya looked to optimize write bandwidth.
43
+
SAS Viya shares CAS data into multiple small chunks of a few megabytes each. Rather than closing these file handles after writing data to each shard, the handles are left open and the buffers within are memory-mapped by the application. Without a close, there's no flush until either memory pressure or 30 seconds has passed. Waiting for memory pressure proved suboptimal as did waiting for a long timer to expire. Unlike SAS GRID, which looked for the best overall throughput, SAS Viya looked to optimize write bandwidth.
44
44
45
45
### `vm.dirty_writeback_centisecs`
46
46
47
-
The kernel flusher thread is responsible for asynchronously flushing dirty buffers between each flush thread sleeps. This tunable defines the amount spent sleeping between flushes. Considering the 3-second `vm.dirty_expire_centisecs` value used by SAS Viya, SAS set this tunable to 100 centiseconds (1 second) rather than the 500 centiseconds (5 seconds) default to find the best overall performance.
47
+
The kernel flusher thread is responsible for asynchronously flushing dirty buffers between each flush thread sleeps. This tunable defines the amount spent sleeping between flushes. Considering the 3-second `vm.dirty_expire_centisecs` value used by SAS Viya, SAS set this tunable to 100 centiseconds (1 second) rather than the 500 centiseconds (5 seconds) default to find the best overall performance.
48
48
49
49
## Impact of an untuned filesystem cache
50
50
51
-
Considering the default virtual memory tunables and the amount of RAM in modern systems, write-back potentially slows down other storage-bound operations from the perspective of the specific client driving this mixed workload. The following symptoms may be expected from an untuned, write-heavy, cache-laden Linux machine.
51
+
Considering the default virtual memory tunables and the amount of RAM in modern systems, write-back potentially slows down other storage-bound operations from the perspective of the specific client driving this mixed workload. The following symptoms may be expected from an untuned, write-heavy, cache-laden Linux machine.
52
52
53
53
* Directory lists `ls` take long enough as to appear unresponsive.
54
54
* Read throughput against the filesystem decreases significantly in comparison to write throughput.
55
55
*`nfsiostat` reports write latencies **in seconds or higher**.
56
56
57
-
You might experience this behavior only by *the Linux machine* performing the mixed write-heavy workload. Further, the experience is degraded against all NFS volumes mounted against a single storage endpoint. If the mounts come from two or more endpoints, only the volumes sharing an endpoint exhibit this behavior.
57
+
You might experience this behavior only by *the Linux machine* performing the mixed write-heavy workload. Further, the experience is degraded against all NFS volumes mounted against a single storage endpoint. If the mounts come from two or more endpoints, only the volumes sharing an endpoint exhibit this behavior.
58
58
59
59
Setting the filesystem cache parameters as described in this section has been shown to address the issues.
60
60
61
61
## Monitoring virtual memory
62
62
63
-
To understand what is going with virtual memory and the write-back, consider the following code snippet and output. *Dirty* represents the amount dirty memory in the system, and *writeback* represents the amount of memory actively being written to storage.
63
+
To understand what is going with virtual memory and the write-back, consider the following code snippet and output. *Dirty* represents the amount dirty memory in the system, and *writeback* represents the amount of memory actively being written to storage.
64
64
65
65
`# while true; do echo "###" ;date ; egrep "^Cached:|^Dirty:|^Writeback:|file" /proc/meminfo; sleep 5; done`
66
66
67
-
The following output comes from an experiment where the `vm.dirty_ratio` and the `vm.dirty_background` ratio were set to 2% and 1% of physical memory respectively. In this case, flushing began at 3.8 GiB, 1% of the 384-GiB memory system. Writeback closely resembled the write throughput to NFS.
67
+
The following output comes from an experiment where the `vm.dirty_ratio` and the `vm.dirty_background` ratio were set to 2% and 1% of physical memory respectively. In this case, flushing began at 3.8 GiB, 1% of the 384-GiB memory system. Writeback closely resembled the write throughput to NFS.
0 commit comments