Skip to content

Commit b26cb97

Browse files
committed
acrolinx
1 parent c753f48 commit b26cb97

File tree

4 files changed

+44
-44
lines changed

4 files changed

+44
-44
lines changed

articles/azure-netapp-files/performance-benchmarks-linux.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,15 +19,15 @@ This section describes performance benchmarks of Linux workload throughput and w
1919

2020
### Linux workload throughput
2121

22-
The graph below represents a 64-kibibyte (KiB) sequential workload and a 1 TiB working set. It shows that a single Azure NetApp Files volume can handle between ~1,600 MiB/s pure sequential writes and ~4,500 MiB/s pure sequential reads.
22+
This graph represents a 64 kibibyte (KiB) sequential workload and a 1 TiB working set. It shows that a single Azure NetApp Files volume can handle between ~1,600 MiB/s pure sequential writes and ~4,500 MiB/s pure sequential reads.
2323

2424
The graph illustrates decreases in 10% at a time, from pure read to pure write. It demonstrates what you can expect when using varying read/write ratios (100%:0%, 90%:10%, 80%:20%, and so on).
2525

2626
![Linux workload throughput](./media/performance-benchmarks-linux/performance-benchmarks-linux-workload-throughput.png)
2727

2828
### Linux workload IOPS
2929

30-
The following graph represents a 4-kibibyte (KiB) random workload and a 1 TiB working set. The graph shows that an Azure NetApp Files volume can handle between ~130,000 pure random writes and ~460,000 pure random reads.
30+
The following graph represents a 4-KiB random workload and a 1 TiB working set. The graph shows that an Azure NetApp Files volume can handle between ~130,000 pure random writes and ~460,000 pure random reads.
3131

3232
This graph illustrates decreases in 10% at a time, from pure read to pure write. It demonstrates what you can expect when using varying read/write ratios (100%:0%, 90%:10%, 80%:20%, and so on).
3333

@@ -37,7 +37,7 @@ This graph illustrates decreases in 10% at a time, from pure read to pure write.
3737

3838
The graphs in this section show the validation testing results for the client-side mount option with NFSv3. For more information, see [`nconnect` section of Linux mount options](performance-linux-mount-options.md#nconnect).
3939

40-
The graphs compare the advantages of `nconnect` to a non-`connected` mounted volume. In the graphs, FIO generated the workload from a single D32s_v4 instance in the us-west2 Azure region using a 64-KiB sequential workload – the largest I/O size supported by Azure NetApp Files at the time of the testing represented here. Azure NetApp Files now supports larger I/O sizes. For more details, see [`rsize` and `wsize` section of Linux mount options](performance-linux-mount-options.md#rsize-and-wsize).
40+
The graphs compare the advantages of `nconnect` to a non-`connected` mounted volume. In the graphs, FIO generated the workload from a single D32s_v4 instance in the us-west2 Azure region using a 64-KiB sequential workload – the largest I/O size supported by Azure NetApp Files at the time of the testing represented here. Azure NetApp Files now supports larger I/O sizes. For more information, see [`rsize` and `wsize` section of Linux mount options](performance-linux-mount-options.md#rsize-and-wsize).
4141

4242
### Linux read throughput
4343

@@ -47,7 +47,7 @@ The following graphs show 64-KiB sequential reads of ~3,500 MiB/s reads with `nc
4747

4848
### Linux write throughput
4949

50-
The following graphs show sequential writes. They indicate that `nconnect` has no noticeable benefit for sequential writes. 1,500 MiB/s is roughly both the sequential write volume upper limit and the D32s_v4 instance egress limit.
50+
The following graphs show sequential writes. They indicate that `nconnect` has no noticeable benefit for sequential writes. The sequential write volume upper limit is approximately 1,500 MiB/s; the D32s_v4 instance egress limit is also approximately 1,500 MiB/s.
5151

5252
![Linux write throughput](./media/performance-benchmarks-linux/performance-benchmarks-linux-write-throughput.png)
5353

articles/azure-netapp-files/performance-linux-direct-io.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,13 +15,13 @@ This article helps you understand direct I/O best practices for Azure NetApp Fil
1515

1616
## Direct I/O
1717

18-
The most common parameter used in storage performance benchmarking is direct I/O. It is supported by FIO and Vdbench. DISKSPD offers support for the similar construct of memory-mapped I/O. With direct I/O, the filesystem cache is bypassed, operations for direct memory access copy are avoided, and storage tests are made fast and simple.
18+
The most common parameter used in storage performance benchmarking is direct I/O. It's supported by FIO and Vdbench. DISKSPD offers support for the similar construct of memory-mapped I/O. With direct I/O, the filesystem cache is bypassed, operations for direct memory access copy are avoided, and storage tests are made fast and simple.
1919

20-
Using the direct I/O parameter makes storage testing easy. No data is read from the filesystem cache on the client. As such, the test is truly stressing the storage protocol and service itself, rather than memory access speeds. Also, without the DMA memory copies, read and write operations are efficient from a processing perspective.
20+
Using the direct I/O parameter makes storage testing easy. No data is read from the filesystem cache on the client. As such, the test is truly stressing the storage protocol and service itself, rather than memory access speeds. Without the DMA memory copies, read and write operations are efficient from a processing perspective.
2121

22-
Take the Linux `dd` command as an example workload. Without the optional `odirect` flag, all I/O generated by `dd` is served from the Linux buffer cache. Reads with the blocks already in memory are not retrieved from storage. Reads resulting in a buffer cache miss end up being read from storage using NFS read-ahead with varying results, depending on factors as mount `rsize` and client read-ahead tunables. When writes are sent through the buffer cache, they use a write-behind mechanism, which is untuned and uses a significant amount of parallelism to send the data to the storage device. You might attempt to run two independent streams of I/O, one `dd` for reads and one `dd` for writes. But in fact, the operating system, untuned, favors writes over reads and uses more parallelism of it.
22+
Take the Linux `dd` command as an example workload. Without the optional `odirect` flag, all I/O generated by `dd` is served from the Linux buffer cache. Reads with the blocks already in memory aren't retrieved from storage. Reads resulting in a buffer cache miss end up being read from storage using NFS read-ahead with varying results, depending on factors as mount `rsize` and client read-ahead tunables. When writes are sent through the buffer cache, they use a write-behind mechanism, which is untuned and uses a significant amount of parallelism to send the data to the storage device. You might attempt to run two independent streams of I/O, one `dd` for reads and one `dd` for writes. But in fact, the operating system, untuned, favors writes over reads and uses more parallelism of it.
2323

24-
Aside from database, few applications use direct I/O. Instead, they choose to leverage the advantages of a large memory cache for repeated reads and a write behind cache for asynchronous writes. In short, using direct I/O turns the test into a micro benchmark *if* the application being synthesized uses the filesystem cache.
24+
Aside from database, few applications use direct I/O. Instead, they leverage the advantages of a large memory cache for repeated reads and a write behind cache for asynchronous writes. In short, using direct I/O turns the test into a micro benchmark *if* the application being synthesized uses the filesystem cache.
2525

2626
The following are some databases that support direct I/O:
2727

@@ -34,7 +34,7 @@ The following are some databases that support direct I/O:
3434

3535
## Best practices
3636

37-
Testing with `directio` is an excellent way to understand the limits of the storage service and client. To get a better understanding for how the application itself will behave (if the application doesn't use `directio`), you should also run tests through the filesystem cache.
37+
Testing with `directio` is an excellent way to understand the limits of the storage service and client. To better understand how the application behaves (if the application doesn't use `directio`), you should also run tests through the filesystem cache.
3838

3939
## Next steps
4040

articles/azure-netapp-files/performance-linux-filesystem-cache.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -11,60 +11,60 @@ ms.author: anfdocs
1111
---
1212
# Linux filesystem cache best practices for Azure NetApp Files
1313

14-
This article helps you understand filesystem cache best practices for Azure NetApp Files.
14+
This article helps you understand filesystem cache best practices for Azure NetApp Files.
1515

1616
## Filesystem cache tunables
1717

1818
You need to understand the following factors about filesystem cache tunables:
1919

20-
* Flushing a dirty buffer leaves the data in a clean state usable for future reads until memory pressure leads to eviction.
20+
* Flushing a dirty buffer leaves the data in a clean state usable for future reads until memory pressure leads to eviction.
2121
* There are three triggers for an asynchronous flush operation:
2222
* Time based: When a buffer reaches the age defined by this tunable, it must be marked for cleaning (that is, flushing, or writing to storage).
2323
* Memory pressure: See [`vm.dirty_ratio | vm.dirty_bytes`](#vmdirty_ratio--vmdirty_bytes) for details.
2424
* Close: When a file handle is closed, all dirty buffers are asynchronously flushed to storage.
2525

26-
These factors are controlled by four tunables. Each tunable can be tuned dynamically and persistently using `tuned` or `sysctl` in the `/etc/sysctl.conf` file. Tuning these variables improves performance for applications.
26+
These factors are controlled by four tunables. Each tunable can be tuned dynamically and persistently using `tuned` or `sysctl` in the `/etc/sysctl.conf` file. Tuning these variables improves performance for applications.
2727

2828
> [!NOTE]
29-
> Information discussed in this article was uncovered during SAS GRID and SAS Viya validation exercises. As such, the tunables are based on lessons learned from the validation exercises. Many applications will similarly benefit from tuning these parameters.
29+
> Information discussed in this article was uncovered during SAS GRID and SAS Viya validation exercises. As such, the tunables are based on lessons learned from the validation exercises. Many applications similarly benefit from tuning these parameters.
3030
3131
### `vm.dirty_ratio | vm.dirty_bytes`
3232

33-
These two tunables define the amount of RAM made usable for data modified but not yet written to stable storage. Whichever tunable is set automatically sets the other tunable to zero; RedHat advises against manually setting either of the two tunables to zero. The option `vm.dirty_ratio` (the default of the two) is set by Redhat to either 20% or 30% of physical memory depending on the OS, which is a significant amount considering the memory footprint of modern systems. Consideration should be given to setting `vm.dirty_bytes` instead of `vm.dirty_ratio` for a more consistent experience regardless of memory size. For example, ongoing work with SAS GRID determined 30 MiB an appropriate setting for best overall mixed workload performance.
33+
These two tunables define the amount of RAM made usable for data modified but not yet written to stable storage. Whichever tunable is set automatically sets the other tunable to zero; RedHat advises against manually setting either of the two tunables to zero. The option `vm.dirty_ratio` (the default of the two) is set by Redhat to either 20% or 30% of physical memory depending on the OS, which is a significant amount considering the memory footprint of modern systems. Consideration should be given to setting `vm.dirty_bytes` instead of `vm.dirty_ratio` for a more consistent experience regardless of memory size. For example, ongoing work with SAS GRID determined 30 MiB an appropriate setting for best overall mixed workload performance.
3434

3535
### `vm.dirty_background_ratio | vm.dirty_background_bytes`
3636

37-
These tunables define the starting point where the Linux write-back mechanism begins flushing dirty blocks to stable storage. Redhat defaults to 10% of physical memory, which, on a large memory system, is a significant amount of data to start flushing. Taking SAS GRID for example, historically the recommendation has been to set `vm.dirty_background` to 1/5 size of `vm.dirty_ratio` or `vm.dirty_bytes`. Considering how aggressively the `vm.dirty_bytes` setting is set for SAS GRID, no specific value is being set here.
37+
These tunables define the starting point where the Linux write-back mechanism begins flushing dirty blocks to stable storage. Redhat defaults to 10% of physical memory, which, on a large memory system, is a significant amount of data to start flushing. Taking SAS GRID for example, historically the recommendation was to set `vm.dirty_background` to 1/5 size of `vm.dirty_ratio` or `vm.dirty_bytes`. Considering how aggressively the `vm.dirty_bytes` setting is set for SAS GRID, no specific value is being set here.
3838

3939
### `vm.dirty_expire_centisecs`
4040

41-
This tunable defines how old a dirty buffer can be before it must be tagged for asynchronously writing out. Take SAS Viya’s CAS workload for example. An ephemeral write-dominant workload found that setting this value to 300 centiseconds (3 seconds) was optimal, with 3000 centiseconds (30 seconds) being the default.
41+
This tunable defines how old a dirty buffer can be before it must be tagged for asynchronously writing out. Take SAS Viya’s CAS workload for example. An ephemeral write-dominant workload found that setting this value to 300 centiseconds (3 seconds) was optimal, with 3000 centiseconds (30 seconds) being the default.
4242

43-
SAS Viya shares CAS data into multiple small chunks of a few megabytes each. Rather than closing these file handles after writing data to each shard, the handles are left open and the buffers within are memory-mapped by the application. Without a close, there will be no flush until either memory pressure or 30 seconds has passed. Waiting for memory pressure proved suboptimal as did waiting for a long timer to expire. Unlike SAS GRID, which looked for the best overall throughput, SAS Viya looked to optimize write bandwidth.
43+
SAS Viya shares CAS data into multiple small chunks of a few megabytes each. Rather than closing these file handles after writing data to each shard, the handles are left open and the buffers within are memory-mapped by the application. Without a close, there's no flush until either memory pressure or 30 seconds has passed. Waiting for memory pressure proved suboptimal as did waiting for a long timer to expire. Unlike SAS GRID, which looked for the best overall throughput, SAS Viya looked to optimize write bandwidth.
4444

4545
### `vm.dirty_writeback_centisecs`
4646

47-
The kernel flusher thread is responsible for asynchronously flushing dirty buffers between each flush thread sleeps. This tunable defines the amount spent sleeping between flushes. Considering the 3-second `vm.dirty_expire_centisecs` value used by SAS Viya, SAS set this tunable to 100 centiseconds (1 second) rather than the 500 centiseconds (5 seconds) default to find the best overall performance.
47+
The kernel flusher thread is responsible for asynchronously flushing dirty buffers between each flush thread sleeps. This tunable defines the amount spent sleeping between flushes. Considering the 3-second `vm.dirty_expire_centisecs` value used by SAS Viya, SAS set this tunable to 100 centiseconds (1 second) rather than the 500 centiseconds (5 seconds) default to find the best overall performance.
4848

4949
## Impact of an untuned filesystem cache
5050

51-
Considering the default virtual memory tunables and the amount of RAM in modern systems, write-back potentially slows down other storage-bound operations from the perspective of the specific client driving this mixed workload. The following symptoms may be expected from an untuned, write-heavy, cache-laden Linux machine.
51+
Considering the default virtual memory tunables and the amount of RAM in modern systems, write-back potentially slows down other storage-bound operations from the perspective of the specific client driving this mixed workload. The following symptoms may be expected from an untuned, write-heavy, cache-laden Linux machine.
5252

5353
* Directory lists `ls` take long enough as to appear unresponsive.
5454
* Read throughput against the filesystem decreases significantly in comparison to write throughput.
5555
* `nfsiostat` reports write latencies **in seconds or higher**.
5656

57-
You might experience this behavior only by *the Linux machine* performing the mixed write-heavy workload. Further, the experience is degraded against all NFS volumes mounted against a single storage endpoint. If the mounts come from two or more endpoints, only the volumes sharing an endpoint exhibit this behavior.
57+
You might experience this behavior only by *the Linux machine* performing the mixed write-heavy workload. Further, the experience is degraded against all NFS volumes mounted against a single storage endpoint. If the mounts come from two or more endpoints, only the volumes sharing an endpoint exhibit this behavior.
5858

5959
Setting the filesystem cache parameters as described in this section has been shown to address the issues.
6060

6161
## Monitoring virtual memory
6262

63-
To understand what is going with virtual memory and the write-back, consider the following code snippet and output. *Dirty* represents the amount dirty memory in the system, and *writeback* represents the amount of memory actively being written to storage.
63+
To understand what is going with virtual memory and the write-back, consider the following code snippet and output. *Dirty* represents the amount dirty memory in the system, and *writeback* represents the amount of memory actively being written to storage.
6464

6565
`# while true; do echo "###" ;date ; egrep "^Cached:|^Dirty:|^Writeback:|file" /proc/meminfo; sleep 5; done`
6666

67-
The following output comes from an experiment where the `vm.dirty_ratio` and the `vm.dirty_background` ratio were set to 2% and 1% of physical memory respectively. In this case, flushing began at 3.8 GiB, 1% of the 384-GiB memory system. Writeback closely resembled the write throughput to NFS.
67+
The following output comes from an experiment where the `vm.dirty_ratio` and the `vm.dirty_background` ratio were set to 2% and 1% of physical memory respectively. In this case, flushing began at 3.8 GiB, 1% of the 384-GiB memory system. Writeback closely resembled the write throughput to NFS.
6868

6969
```
7070
Cons

0 commit comments

Comments
 (0)