diff --git a/content/learning-paths/cross-platform/cpp-loop-size-context/Example.md b/content/learning-paths/cross-platform/cpp-loop-size-context/Example.md new file mode 100644 index 0000000000..285460da55 --- /dev/null +++ b/content/learning-paths/cross-platform/cpp-loop-size-context/Example.md @@ -0,0 +1,65 @@ +--- +title: Example +weight: 3 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Example + +The following `C++` snippet takes user input as the loop size so that the loop size, `max_loop_size`, is only known at runtime. This initialises an array of size, , `max_loop_size` with the value for each element corresponding to the index position. The function, `foo`, loops through each element to print out the sum of all elements. + +Copy the snippet below into a file named, `no-context.cpp`. + +```cpp +#include +#include + +void foo(const int* x, int max_loop_size) +{ + int sum = 0; + for (int k = 0; k < max_loop_size; k++) { + sum += x[k]; + } + std::cout << "Sum: " << sum << std::endl; +} + +int main() { + int max_loop_size; + std::cout << "Enter a value for max_loop_size (must be a multiple of 4): "; + std::cin >> max_loop_size; + + int x[max_loop_size]; + // Initialise test data + for(int i = 0; i < max_loop_size; ++i) x[i] = i; + + // Start timing + auto start = std::chrono::high_resolution_clock::now(); + foo(x, max_loop_size); + // Stop timing + auto end = std::chrono::high_resolution_clock::now(); + + // Calculate and display the elapsed time + auto duration = std::chrono::duration_cast(end - start).count(); + std::cout << "Time taken by foo: " << duration << " nanoseconds" << std::endl; + + return 0; +} +``` + +Compiling using the following command. + +```bash +g++ -O3 -march=armv8-a+simd no_context.cpp -o no_context +``` + +Running the example with the number 4000 leads to the following results. You will see runtime variability depending on which platform you run this on. + +```output +./no_context +Enter a value for max_loop_size (must be a multiple of 4): 40000 +Sum: 799980000 +Time taken by foo: 138100 nanoseconds +``` + diff --git a/content/learning-paths/cross-platform/cpp-loop-size-context/Introduction.md b/content/learning-paths/cross-platform/cpp-loop-size-context/Introduction.md new file mode 100644 index 0000000000..6be3f649c5 --- /dev/null +++ b/content/learning-paths/cross-platform/cpp-loop-size-context/Introduction.md @@ -0,0 +1,25 @@ +--- +title: Setup +weight: 2 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Introduction + +Often, the programmer has deeper insights into their software's behavior and its inputs than the compiler does. For instance, if a loop's size is determined at runtime, the compiler must conservatively handle the possibility of variable sizes, potentially limiting optimization opportunities. However, a developer might know more about the application's runtime characteristics—such as the fact that the loop size always adheres to specific constraints, like being a multiple of a particular number. + +To illustrate how you can explicitly provide this valuable context to the compiler, we'll walk through a simple C++ example. + +## Setup + +In this learning path, I will be demonstrating the examples using an Arm-based `r7g.large` instance from AWS; however, you're welcome to follow along using any Arm-based machine that suits your environment or preference. + +To get started, you'll first need to install the `g++` compiler on your system. Use the following commands as a guide, adjusting them accordingly based on the operating system or distribution you're working with. + +```bash +sudo apt update +sudo apt install g++ +``` + diff --git a/content/learning-paths/cross-platform/cpp-loop-size-context/_index.md b/content/learning-paths/cross-platform/cpp-loop-size-context/_index.md new file mode 100644 index 0000000000..d4bdf5007d --- /dev/null +++ b/content/learning-paths/cross-platform/cpp-loop-size-context/_index.md @@ -0,0 +1,41 @@ +--- +title: Learn to Optimize C++ Loops with Size Context + +minutes_to_complete: 15 + +who_is_this_for: C++ developer who want to improve the runtime of for loops with basic insider knowledge of the loop size + +learning_objectives: + - Learn how to add preexisting knowledge of loop sizes to for loops + +prerequisites: + - Access to an Arm-based machine / instance + - Basic understanding of C++ + +author: Kieran Hejmadi + +### Tags +skilllevels: Introductory +subjects: ML +armips: + - Neoverse +tools_software_languages: + - C++ +operatingsystems: + - Linux + + + +further_reading: + - resource: + title: PLACEHOLDER MANUAL + link: PLACEHOLDER MANUAL LINK + type: documentation + + +### FIXED, DO NOT MODIFY +# ================================================================================ +weight: 1 # _index.md always has weight of 1 to order correctly +layout: "learningpathall" # All files under learning paths have this same wrapper +learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content. +--- diff --git a/content/learning-paths/cross-platform/cpp-loop-size-context/_next-steps.md b/content/learning-paths/cross-platform/cpp-loop-size-context/_next-steps.md new file mode 100644 index 0000000000..c3db0de5a2 --- /dev/null +++ b/content/learning-paths/cross-platform/cpp-loop-size-context/_next-steps.md @@ -0,0 +1,8 @@ +--- +# ================================================================================ +# FIXED, DO NOT MODIFY THIS FILE +# ================================================================================ +weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation. +title: "Next Steps" # Always the same, html page title. +layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing. +--- diff --git a/content/learning-paths/cross-platform/cpp-loop-size-context/providing-inside-knowledge.md b/content/learning-paths/cross-platform/cpp-loop-size-context/providing-inside-knowledge.md new file mode 100644 index 0000000000..bac5012d63 --- /dev/null +++ b/content/learning-paths/cross-platform/cpp-loop-size-context/providing-inside-knowledge.md @@ -0,0 +1,85 @@ +--- +title: Adding Inside Knowledge +weight: 4 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Adding Inside Knowledge + +To explicitly inform the compiler that our input will always be a multiple of 4, we can rewrite the loop size calculation as follows: + +```output +((max_loop_size/4)*4) +``` + +At first glance, this calculation might seem mathematically redundant. However, since the expression `(max_loop_size/4)` is an integer division, it truncates the result, effectively guaranteeing that `(max_loop_size/4)*4` will always yield a number divisible by 4. The compiler can pick up on this information and optimise accordingly. + +As slightly easier to read method that avoids confusion when passing arguments is to divide the variable and rename before it is passed in. For example. + +```output +(max_loop_size_div_4 * 4) +``` + +## Improved Example + +Copy the snippet below and paste into a file named `context.cpp`. + +```cpp +#include +#include + +void foo(const int* x, int max_loop_size_div_4) +{ + int sum = 0; + for (int k = 0; k < max_loop_size_div_4 * 4; k++) { + sum += x[k]; + } + std::cout << "Sum: " << sum << std::endl; +} + +int main() { + int max_loop_size; + std::cout << "Enter a value for max_loop_size (must be a multiple of 4): "; + std::cin >> max_loop_size; + + int max_loop_size_div_4 = max_loop_size / 4; + int x[max_loop_size]; + // Initialise test data + for(int i = 0; i < max_loop_size; ++i) x[i] = i; + + // Start timing + auto start = std::chrono::high_resolution_clock::now(); + foo(x, max_loop_size_div_4); + // Stop timing + auto end = std::chrono::high_resolution_clock::now(); + + // Calculate and display the elapsed time + auto duration = std::chrono::duration_cast(end - start).count(); + std::cout << "Time taken by foo: " << duration << " nanoseconds" << std::endl; + + return 0; +} +``` + +Again compile with the same compiler flags. + +```bash +g++ -O3 -march=armv8-a+simd context.cpp -o context +``` + +```output +./context +Enter a value for max_loop_size (must be a multiple of 4): 40000 +Sum: 799980000 +Time taken by foo: 24650 nanoseconds +``` +In this particular run, the time taken has significantly reduced compared to our previous example. + +## Comparison + +To compare we will use compiler explorer to see the assembly [here](https://godbolt.org/z/nvx4j1vTK). + +As the assembly shows we have fewer lines of assembly corresponding to the function `foo` when context is added. This is because the compiler can optimise the conditional checking and any clean up code given the context. +