diff --git a/content/learning-paths/cross-platform/filesystem-cache-hit/_index.md b/content/learning-paths/cross-platform/filesystem-cache-hit/_index.md new file mode 100644 index 0000000000..9b3c172534 --- /dev/null +++ b/content/learning-paths/cross-platform/filesystem-cache-hit/_index.md @@ -0,0 +1,42 @@ +--- +title: Improve File System Cache hit rate with the posix_fadvise function + +minutes_to_complete: 15 + +who_is_this_for: Developers who want to boost performance of applications limited by file system cache misses. + +learning_objectives: + - Basic understanding of memory usage in a system + - Learn how to measure cache miss rates + - Learn how to use the posix_fadvise() function to provide hints to the kernel about file access patterns + +prerequisites: + - Basic understanding of C++ and Linux + - Understanding of File Systems and Memory Usage + +author: Kieran Hejmadi + +### Tags +skilllevels: Introductory +subjects: Runbook +armips: + - Neoverse +tools_software_languages: + - C++ +operatingsystems: + - Linux + +further_reading: + - resource: + title: posix_fadvise documentation + link: https://man7.org/linux/man-pages/man2/posix_fadvise.2.html + type: documentation + + + +### FIXED, DO NOT MODIFY +# ================================================================================ +weight: 1 # _index.md always has weight of 1 to order correctly +layout: "learningpathall" # All files under learning paths have this same wrapper +learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content. +--- diff --git a/content/learning-paths/cross-platform/filesystem-cache-hit/_next-steps.md b/content/learning-paths/cross-platform/filesystem-cache-hit/_next-steps.md new file mode 100644 index 0000000000..c3db0de5a2 --- /dev/null +++ b/content/learning-paths/cross-platform/filesystem-cache-hit/_next-steps.md @@ -0,0 +1,8 @@ +--- +# ================================================================================ +# FIXED, DO NOT MODIFY THIS FILE +# ================================================================================ +weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation. +title: "Next Steps" # Always the same, html page title. +layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing. +--- diff --git a/content/learning-paths/cross-platform/filesystem-cache-hit/intro.md b/content/learning-paths/cross-platform/filesystem-cache-hit/intro.md new file mode 100644 index 0000000000..e52d789b9a --- /dev/null +++ b/content/learning-paths/cross-platform/filesystem-cache-hit/intro.md @@ -0,0 +1,54 @@ +--- +title: Recap of Filesystems caching and Memory +weight: 2 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Recap of File Systems + +A file system is the method an operating system uses to store, organize, and manage data on storage devices like hard drives or SSDs. Linux uses a virtual file system (VFS) layer to provide a uniform interface to different file system types (e.g., `ext4`, `xfs`, `btrfs`), allowing programs to interact with them in a consistent way. + +Developers typically interact with the file system through system calls or standard library functions—for example, using `open()`, `read()`, and `write()` in C to access files. For instance, reading a configuration file at `/etc/myapp/config.json` involves navigating the file system hierarchy and accessing the file’s contents through these interfaces. To speed up access to such files, the operating system creates a file system cache, managed by the kernel and resides in main memory (RAM). This cache temporarily stores recently accessed file data and metadata to speed up future reads and reduce disk I/O. + + +## Recap of Memory Usage + +The hardware and operating system is responsible for managing the memory usage of the system. + +The well-known command `free -wh` provides a snapshot of memory usage. The `cache` column includes the portion of memory used to store the file system cache. + +```output + total used free shared buffers cache available +Mem: 7.6Gi 1.1Gi 5.8Gi 960Ki 22Mi 832Mi 6.5Gi +Swap: 0B 0B 0B +``` + +The command summarises the memory usage into the following columns. + +- `total`: Total installed memory +- `used`: Memory in use (excluding cache/buffers) +- `free`: Unused memory +- `shared`: Memory used by tmpfs and shared between processes +- `buffers`: Memory used by kernel buffers +- `cache`: Memory used by file system cache +- `available`: An estimate of memory available (both free memory and memory that can be reclaimed) when a new process starts. + + +### What is the posix_fadvise syscall? + +`posix_fadvise` is a Linux system call that allows a program to provide the kernel with hints about its expected file access patterns, such as sequential or random reads. These hints help the kernel optimize caching and I/O performance but are purely advisory—the system may choose to ignore them. It’s particularly useful for tuning performance when working with large files or when bypassing unnecessary caching. + +### When the posix_fadvise syscall could be of use? + +You should consider using `posix_fadvise` when: + +- You're reading or writing large files that won't fit entirely in RAM. +- You want to avoid polluting the cache with data you won't reuse. +- You know the access pattern ahead of time and can optimize accordingly. + +The syscall doesn’t guarantee behavior, but it influences how the kernel allocates caching resources. + + + diff --git a/content/learning-paths/cross-platform/filesystem-cache-hit/with_hint.md b/content/learning-paths/cross-platform/filesystem-cache-hit/with_hint.md new file mode 100644 index 0000000000..64462d6f4e --- /dev/null +++ b/content/learning-paths/cross-platform/filesystem-cache-hit/with_hint.md @@ -0,0 +1,95 @@ +--- +title: With Hint +weight: 4 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + + +## Providing hints with posix_fadvise + +The `posix_fadvise()` function is a wrapper around the `fadvise64` linux system call (syscall). This syscall gives the Linux kernel hints about expected file access patterns to optimize I/O, for example preemptively reading ahead for sequential file access. The function takes the following arguments. +- `fd`: file descriptor of the file, +- `offset`: where the advice starts (0 = beginning), +- `len`: how many bytes the advice applies to (0 = to the end), +- `advice`: the expected access pattern (`POSIX_FADV_SEQUENTIAL` suggests sequential reading). + +For more information on all available arguments, see the [official documentation](https://man7.org/linux/man-pages/man2/posix_fadvise.2.html) + +Copy and paste the code sample below a new file called `with_hint.cpp` which includes the `posix_fadvise()` function. + +```cpp +#include +#include +#include +#include +#include + +int drop_cache() { + int result = system("sync && echo 3 > /proc/sys/vm/drop_caches"); + if (result != 0) { + std::cerr << "Failed to drop caches. Are you running as root?\n"; + } + return result; +} + +int main() { + drop_cache(); + int fd = open("./smallfile.bin", O_RDONLY); + if (fd == -1) { + std::cerr << "Error opening file\n"; + return 1; + } + + posix_fadvise(fd, 0, 0, POSIX_FADV_SEQUENTIAL); + + const size_t bufferSize = 4096; // 4 KB + std::vector buffer(bufferSize); + + ssize_t bytesRead; + while ((bytesRead = read(fd, buffer.data(), bufferSize)) > 0) { + volatile char temp = buffer[0]; + (void)temp; + usleep(10); // Simulate processing delay + } + + close(fd); + return 0; +} + +``` + +Compile with the following command. + +```bash +g++ with_hint.cpp -o with_hint +``` + +Again, run the `perf stat` command to observe the rate of cache misses. + +```bash +sudo perf stat -e cache-references,cache-misses,minor-faults,major-faults -r 9 ./with_hint +``` + +```output + Performance counter stats for './with_hint' (9 runs): + + 108313825 cache-references ( +- 0.28% ) + 4189713 cache-misses # 3.87% of all cache refs ( +- 0.68% ) + 227 minor-faults ( +- 0.36% ) + 8 major-faults ( +- 8.10% + +``` + +{{% notice Tip%}} +If you are using `posix_fadvise()` with your own application and you want to observe which system calls are issued. Consider using the system call tracer, `strace` with a command such as `strace -ttT -e trace= ./` to observe what is causing fewer cache misses. +{{% /notice %}} + +### Results + +Here we observe that on this run with a single line of code we are able to reduce the cache miss rate from ~4.8% to ~3.8%. This can translate to more efficient and performant software, especially if your program is synchronous and has to wait for disk accesses. + +{{% notice Please Note%}} +Since this is advise to the operating system, the operating may not do anything with this. Real-world impact depends on other factors such as memory pressure, memory usage etc. As such, the behaviour on your own system may be different. +{{% /notice %}} \ No newline at end of file diff --git a/content/learning-paths/cross-platform/filesystem-cache-hit/without_hint.md b/content/learning-paths/cross-platform/filesystem-cache-hit/without_hint.md new file mode 100644 index 0000000000..51bececd1b --- /dev/null +++ b/content/learning-paths/cross-platform/filesystem-cache-hit/without_hint.md @@ -0,0 +1,121 @@ +--- +title: Example without Hint +weight: 3 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Setup + +For this demonstration, connect to an Arm-based AWS `c7g.xlarge` instance running Ubuntu 24.04. Results may vary depending on which instance and kernel version you are using. At the time of writing, kernel version `6.8.0-1024-aws` was used. + +First, you need to install the linux performance measure tool, `perf`. Please follow the [installation guide](https://learn.arm.com/install-guides/perf/) for your system. + +Additionally, install a `C++` compiler with the following command. + +```bash +sudo apt update && sudo apt install g++ -y +``` + +Next, to understand the current cache and memory usage. Run the following command to see the cache structure. + +```bash +lscpu | grep cache +``` + +As per the output below, each of our 4 cores has 256 KiB of level 1 data and instruction cache along with 4 MiB of slower level 2 cache with shared data and instructions. Finally we have 32 MiB of level 3 cache which is shared among our 4 CPU cores. + +```output   +L1d cache: 256 KiB (4 instances) +L1i cache: 256 KiB (4 instances) +L2 cache: 4 MiB (4 instances) +L3 cache: 32 MiB (1 instance) +``` + +This information will be useful to ensure our working set size cannot all fit within on-CPU cache. + +Next, check the current memory usage of an idle system with the `free -h` command. + +```output + total used free shared buff/cache available +Mem: 7.6Gi 779Mi 6.4Gi 952Ki 597Mi 6.8Gi +Swap: 0B 0B 0B +``` + +As the output above shows, we have `7.6GiB` of total memory on this instance with `779 MiB` actively used by user and kernel processes. It may look confusing how the `free` and `available` columns show different values. `Free` is memory completely unused (6.4GiB) whereas `available` includes free memory plus reclaimable cache/buffers (6.8GiB), showing what’s ready for new processes if the file system cache is reclaimed. + + +## Example + +First, we need to create a file on the file system. Run the command below to create a file with random bytes in the current working directory. The command writes 64 blocks of 1 MB each to a file named `smallfile.bin`. Importantly, this file is too large to fit within our 64 MiB on-CPU cache. + +```bash +dd if=/dev/urandom of=smallfile.bin bs=1M count=64 +``` + +Next, copy and paste the following `C++` file into a new file called `no_hints.cpp`. The sample below continuously reads the random binary file created above into a 4KiB buffer. The first byte is then read and a processing is simulated with a short delay. + +Importantly, with the `drop_cache()` function, we first drop the file file system cache in memory each time this program is run so that we are running from a cold start each time. + +```cpp +#include +#include +#include +#include // for usleep +#include + + +int drop_cache() { + int result = system("sync && echo 3 > /proc/sys/vm/drop_caches"); + if (result != 0) { + std::cerr << "Failed to drop caches. Are you running as root?\n"; + } + return result; +} + +int main() { + drop_cache(); + std::ifstream file("./smallfile.bin", std::ios::binary); + if (!file) { + std::cerr << "Error opening file\n"; + return 1; + } + + const size_t bufferSize = 4096; // 4 KB + std::vector buffer(bufferSize); + + while (file.read(buffer.data(), bufferSize) || file.gcount()) { + volatile char temp = buffer[0]; + (void)temp; + usleep(10); // Simulate processing delay + } + + file.close(); + return 0; +} + +``` + +Compile without any optimisations with the following command. + +```bash +g++ no_hints.cpp -o no_hints +``` + +To observe the cache miss ratio we can use the `perf stat` command that prints out the cache miss statistics. Repeating this multiple times allows us to observe the variation in performance. + +```bash +sudo perf stat -e cache-references,cache-misses,minor-faults,major-faults -r 9 ./no_hints +``` + +```output + Performance counter stats for './read_without_fadvise_small' (9 runs): + + 113047762 cache-references ( +- 0.37% ) + 5407145 cache-misses # 4.78% of all cache refs ( +- 0.23% ) + 229 minor-faults ( +- 0.39% ) + 9 major-faults ( +- 9.80% ) + + 1.70040 +- 0.00224 seconds time elapsed ( +- 0.13% ) +``` \ No newline at end of file