Skip to content

Commit bf0c9ce

Browse files
Final tweaks.
1 parent 9ceb36c commit bf0c9ce

File tree

4 files changed

+55
-27
lines changed

4 files changed

+55
-27
lines changed

content/learning-paths/servers-and-cloud-computing/disk-io-benchmark/_index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,10 @@ title: Microbenchmark storage performance with fio on Arm
33

44
minutes_to_complete: 30
55

6-
who_is_this_for: This is an introductory topic for developers who want to optimize storage performance, reduce costs, identify bottlenecks, and evaluate storage options when migrating applications across platforms.
6+
who_is_this_for: This is an introductory topic for developers looking to optimize storage performance, reduce costs, identify bottlenecks, and evaluate storage options when migrating applications across platforms.
77

88
learning_objectives:
9-
- Describe how data flows through storage devices.
9+
- Describe data flow through storage devices.
1010
- Monitor storage performance using tools like iostat, iotop, and pidstat.
1111
- Run fio to microbenchmark a block storage device.
1212

content/learning-paths/servers-and-cloud-computing/disk-io-benchmark/characterising-workload.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
2-
title: Characterizing a workload
2+
title: Analyzing I/O behavior with real workloads
33
weight: 3
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
## Basic attributes
9+
## Workload attributes
1010

1111
The basic attributes of a given workload are the following:
1212

@@ -16,13 +16,15 @@ The basic attributes of a given workload are the following:
1616
- Read-to-write ratio.
1717
- Random vs. sequential access.
1818

19-
While characteristics like latency are important, this section focuses on the high-level metrics listed above.
19+
While latency is also an important factor, this section focuses on these high-level metrics to establish a foundational understanding.
2020

2121
## Run an example workload
2222

2323
Connect to an Arm-based server or cloud instance.
2424

25-
As an example workload, use the media manipulation tool, FFMPEG on an AWS `t4g.medium` instance. This is an Arm-based (AWS Graviton2) virtual machine with two vCPUs and 4 GiB of memory, designed for general-purpose workloads with a balance of compute, memory, and network resources.
25+
As an example workload, use the media manipulation tool, FFMPEG on an AWS `t4g.medium` instance.
26+
27+
This is an Arm-based (AWS Graviton2) virtual machine with two vCPUs and 4 GiB of memory, designed for general-purpose workloads with a balance of compute, memory, and network resources.
2628

2729
First, install the required tools:
2830

@@ -39,7 +41,9 @@ mkdir src && cd src
3941
wget http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4
4042
```
4143

42-
Run the following command to begin transcoding the video and audio using the `H.264` and `aac` transcoders respectively. The `-flush_packets` flag forces FFMPEG to write each chunk of video data from memory to storage immediately, rather than buffering it in memory. This reduces the risk of data loss in case of a crash and allows disk write activity to be more observable during monitoring, making it easier to study write behavior in real-time.
44+
Run the following command to begin transcoding the video and audio using the `H.264` and `aac` transcoders respectively. The `-flush_packets` flag forces FFMPEG to write each chunk of video data from memory to storage immediately, rather than buffering it in memory.
45+
46+
This reduces the risk of data loss in case of a crash and allows disk write activity to be more observable during monitoring, making it easier to study write behavior in real-time.
4347

4448
```bash
4549
ffmpeg -i BigBuckBunny.mp4 -c:v libx264 -preset fast -crf 23 -c:a aac -b:a 128k -flush_packets 1 output_video.mp4

content/learning-paths/servers-and-cloud-computing/disk-io-benchmark/introduction.md

Lines changed: 22 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -10,17 +10,27 @@ layout: learningpathall
1010

1111
Performance-sensitive application data - such as frequently-accessed configuration files, logs, or transactional state - should ideally reside in system memory (RAM) or CPU cache, where data access latency is measured in nanoseconds to microseconds. These are the fastest tiers in the memory hierarchy, enabling rapid read and write operations that reduce latency and improve throughput.
1212

13-
However, random-access memory (RAM) is volatile (data is lost on power down), limited in capacity, and more expensive per gigabyte than other storage types. Due to these constraints, most applications also rely on solid-state drives (SSDs) or hard disk drives (HDDs).
13+
However, random-access memory (RAM) has the following constraints:
14+
15+
* It is volatile - data is lost on power down.
16+
* It is limited in capacity.
17+
* It is more expensive per gigabyte than other storage types.
18+
19+
For these reasons, most applications also rely on solid-state drives (SSDs) or hard disk drives (HDDs).
1420

1521
## High-level view of data flow
1622

1723
The diagram below shows a high-level view of how data moves to and from storage in a multi-disk I/O architecture. Each disk (Disk 1 to Disk N) has its own I/O queue and optional disk cache, communicating with a central CPU through a disk controller.
1824

19-
While memory is not shown, it plays a central role in providing fast temporary access between the CPU and persistent storage. Likewise, file systems (not depicted) run in the OS kernel and manage metadata, access permissions, and user-friendly file abstractions.
25+
While memory is not shown, it plays a central role in providing fast temporary access between the CPU and persistent storage. Likewise, file systems (also not depicted) run in the OS kernel and manage metadata, access permissions, and user-friendly file abstractions.
2026

21-
This architecture enables parallelism in I/O operations, improves throughput, and supports scalability across multiple storage devices.
27+
This architecture has the following benefits:
2228

23-
![disk i/o](./diskio.jpeg)
29+
* It enables parallelism in I/O operations.
30+
* It improves throughput.
31+
* It supports scalability across multiple storage devices.
32+
33+
![disk i/o alt-text#center](./diskio.jpeg "A high-level view of data flow in a multi-disk I/O architecture.")
2434

2535
## Key Terms
2636

@@ -33,9 +43,9 @@ This architecture enables parallelism in I/O operations, improves throughput, an
3343

3444
#### Input/Output Operations per Second (IOPS)
3545

36-
IOPS measures how many random read/write requests your storage system can perform per second. It depends on the block size or device type. For example, AWS does not show IOPS values for traditional HDD volumes.
46+
IOPS measures how many random read/write requests your storage system can perform per second. It depends on the block size or device type. For example, AWS does not show IOPS values for traditional HDD volumes, as shown in the image below:
3747

38-
![iops_hdd](./IOPS.png)
48+
![iops_hdd alt-text#center](./IOPS.png "Example where IOPS values are not provided.")
3949

4050
#### Throughput and bandwidth
4151

@@ -49,14 +59,15 @@ You can calculate storage `throughput as IOPS × block size`.
4959

5060
*Queue depth* is the number of I/O operations a device can process concurrently. Consumer SSDs typically support a queue depth of 32–64, while enterprise-class NVMe drives can support hundreds to thousands of concurrent requests per queue. Higher queue depths allow more operations to be handled simultaneously, which can significantly boost throughput on high-performance drives — especially NVMe SSDs with advanced queuing capabilities.
5161

52-
#### I/O Engine
62+
#### I/O engine
5363

5464
The I/O engine is the software layer in Linux that manages I/O requests between applications and storage. For example, the Linux kernel's block I/O scheduler queues and dispatches requests to device drivers, using multiple queues to optimize disk access. Benchmarking tools like `fio` let you choose different I/O engines:
5565

56-
* `sync`- synchronous I/O.
57-
* `libaio` - Linux native asynchronous I/O.
58-
* `io_uring` - a newer async I/O interface in newer Linux kernels.
66+
* `sync` – Performs blocking I/O operations using standard system calls. Simple and portable, but less efficient under high concurrency.
67+
* `libaio` – Uses Linux's native asynchronous I/O interface (`io_submit`/`io_getevents`) for non-blocking operations with lower overhead than `sync`.
68+
* `io_uring` – A modern, high-performance async I/O API introduced in Linux 5.1. It minimizes syscalls and context switches, and supports advanced features like buffer selection and batched submissions.
69+
5970

60-
#### I/O Wait
71+
#### I/O wait
6172

6273
I/O wait is the time a CPU core spends waiting for I/O operations to complete. Tools like `pidstat`, `top`, and `iostat` can help identify storage-related CPU bottlenecks.

content/learning-paths/servers-and-cloud-computing/disk-io-benchmark/using-fio.md

Lines changed: 22 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -10,28 +10,35 @@ layout: learningpathall
1010

1111
You can use the same `t4g.medium` instance from the previous section with two different types of SSD-based block storage devices as shown in the console screenshot below.
1212

13+
### Attach and Identify Block Devices
1314
To add the required EBS volumes to your EC2 instance:
1415

1516
1. In the AWS Console, navigate to **EC2** > **Volumes** > **Create Volume**.
17+
1618
2. Create a volume with the following settings:
1719
- Volume Type: io2 (Provisioned IOPS SSD).
1820
- Size: 8 GiB.
1921
- IOPS: 400.
2022
- Availability Zone: The same as your EC2 instance
23+
2124
3. Create another volume with the following settings:
2225
- Volume Type: gp2 (General Purpose SSD).
2326
- Size: 8 GiB.
2427
- Availability Zone: The same as your EC2 instance.
25-
4. Once created, select each volume and choose **Actions** > **Attach Volume**
26-
5. Select your t4g.medium instance from the dropdown and attach each volume
28+
29+
4. Once created, select each volume and choose **Actions** > **Attach Volume**.
30+
31+
5. Select your t4g.medium instance from the dropdown and attach each volume.
2732

2833
Both block devices have the same 8 GiB capacity, but the `io2` is optimized for throughput, while `gp2` is general-purpose.
2934

30-
![EBS](./EBS.png)
35+
![EBS alt-text#center](./EBS.png "Multi-volume storage information.")
3136

3237
In this section, you’ll measure real-world performance to help guide your storage selection.
3338

34-
Flexible I/O (fio) is a command-line tool to generate a synthetic workload with specific I/O characteristics. This serves as a simpler alternative to full record and replay testing. Fio is available through most Linux distribution packages, please refer to the [documentation](https://github.com/axboe/fio) for package availability.
39+
Flexible I/O (fio) is a command-line tool to generate a synthetic workload with specific I/O characteristics. This serves as a simpler alternative to full record and replay testing.
40+
41+
Fio is available through most Linux distribution packages, see the [documentation](https://github.com/axboe/fio) for package availability.
3542

3643
```bash
3744
sudo apt update
@@ -50,7 +57,7 @@ The version is printed:
5057
fio-3.37
5158
```
5259

53-
## Locate device
60+
## Identify Device Names for Benchmarking
5461

5562
Fio allows you to microbenchmark either the block device or a mounted filesystem. Use the disk free, `df` command to confirm your EBS volumes are not mounted. Writing to drives containing critical data can result in data loss. In this tutorial, you're writing to blank, unmounted block devices.
5663

@@ -76,15 +83,18 @@ If you have more than one block volume attached to an instance, the `sudo nvme l
7683

7784
## Generating a synthetic workload
7885

79-
Suppose you want to simulate a fictional logging application with the following characteristics observed using the tools from the previous section.
86+
Let’s define a synthetic workload that mimics the behavior of a logging application, using metrics observed earlier.
8087

8188
{{% notice Workload%}}
8289
This workload involves light, sequential reads and writes. The system write throughput per thread is 5 MB/s with 83% writes. There are infrequent bursts of reads for approximately 5 seconds, operating at up to 16MB/s per thread. The workload can scale the infrequent reads and writes to use up to 16 threads each. The block size for the writes and reads are 64KiB and 256KiB respectively (as opposed to the standard 4KiB Page size).
8390

8491
Further, the application is latency sensitive and given it holds critical information, needs to write directly to non-volatile storage through direct IO.
8592
{{% /notice %}}
8693

87-
The fio tool uses simple configuration `jobfiles` to describe the characteristics of your synthetic workload. Parameters under the `[global]` option are shared among jobs. From the example below, you can create 2 jobs to represent the steady write and infrequent reads. Please refer to the official [documentation](https://fio.readthedocs.io/en/latest/fio_doc.html#job-file-format) for more details.
94+
The fio tool uses simple configuration files - called `jobfiles` - to describe the characteristics of your synthetic workload. Parameters under the `[global]` option are shared among jobs. From the example below, you can create 2 jobs to represent the steady write and infrequent reads. Please refer to the official [documentation](https://fio.readthedocs.io/en/latest/fio_doc.html#job-file-format) for more details.
95+
96+
### Create fio Job Files
97+
8898

8999
Create two job files, one for each device, by copying the configuration below and adjusting the filename parameter (`/dev/nvme1n1` or `/dev/nvme2n1`):
90100

@@ -111,17 +121,20 @@ bs=64k ; Block size of 64KiB (default block size of 4 KiB)
111121
name=burst_read
112122
rw=read
113123
bs=256k ; Block size of 256KiB for reads (default is 4KiB)
114-
startdelay=25 ; simulate infrequent reads (5 seconds out 30)
124+
startdelay=25 ; simulate a 5-second read burst at the end of a 30-second window
115125
runtime=5
116126
; -- end job file including.fio --
117127
```
128+
## Run the Benchmarks
118129

119130

120131
{{% notice Note %}}
121132
Running fio directly on block devices requires root privileges (hence the use of `sudo`). Be careful: writing to the wrong device can result in data loss. Always ensure you are targeting a blank, unmounted device.
122133
{{% /notice %}}
123134

124-
Run the following commands to run each test back to back.
135+
136+
137+
Run the following commands to execute each test sequentially:
125138

126139
```bash
127140
sudo NUM_JOBS=16 IO_DEPTH=64 fio nvme1.fio

0 commit comments

Comments
 (0)