Skip to content

Commit c9b3031

Browse files
author
Your Name
committed
initial commit
1 parent 120c7c4 commit c9b3031

File tree

5 files changed

+228
-0
lines changed

5 files changed

+228
-0
lines changed
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
---
2+
title: Example
3+
weight: 3
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Example
10+
11+
The following `C++` snippet takes user input as the loop size so that the loop size, `max_loop_size`, is only known at runtime. This initialises an array of size, , `max_loop_size` with the value for each element corresponding to the index position. The function, `foo`, loops of each element to print out the sum of all elements.
12+
13+
Copy the snippet below into a file named, `no-context.cpp`.
14+
15+
```cpp
16+
#include <iostream>
17+
#include <chrono>
18+
19+
void foo(const int* x, int max_loop_size)
20+
{
21+
int sum = 0;
22+
for (int k = 0; k < max_loop_size; k++) {
23+
sum += x[k];
24+
}
25+
std::cout << "Sum: " << sum << std::endl;
26+
}
27+
28+
int main() {
29+
int max_loop_size;
30+
std::cout << "Enter a value for max_loop_size (must be a multiple of 4): ";
31+
std::cin >> max_loop_size;
32+
33+
int x[max_loop_size];
34+
// Initialise test data
35+
for(int i = 0; i < max_loop_size; ++i) x[i] = i;
36+
37+
// Start timing
38+
auto start = std::chrono::high_resolution_clock::now();
39+
foo(x, max_loop_size);
40+
// Stop timing
41+
auto end = std::chrono::high_resolution_clock::now();
42+
43+
// Calculate and display the elapsed time
44+
auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();
45+
std::cout << "Time taken by foo: " << duration << " nanoseconds" << std::endl;
46+
47+
return 0;
48+
}
49+
```
50+
51+
Compiling using the following command.
52+
53+
```bash
54+
g++ -O3 -march=armv8-a+simd -o no_context
55+
```
56+
57+
Running the example with the number 4000 leads to the following results. Naturally you will see variability depending on which platform you run this on.
58+
59+
```output
60+
./no_context
61+
Enter a value for max_loop_size (must be a multiple of 4): 40000
62+
Sum: 799980000
63+
Time taken by foo: 138100 nanoseconds
64+
```
65+
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
---
2+
title: Setup
3+
weight: 2
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Introduction
10+
11+
Often the programmer will have a better understanding of their software and the inputs than the compiler. For example, if the loop size is calculated at runtime, the compiler will have to account for a variable size. However, a developer may have knowledge of the runtime profile, for example if the loop size is always a multiple of a specific number.
12+
13+
To provide this context to the compiler we will use a simple example written in C++.
14+
15+
## Setup
16+
17+
In this learning path I will be using an Arm-based `r7g.large` instance from AWS but any Arm-based machine can be used.
18+
19+
Install the `g++` compiler with the following commands. Adjust to the appropriate commands for your operating system.
20+
21+
```bash
22+
sudo apt update
23+
sudo apt install g++
24+
```
25+
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
---
2+
title: Learn to improve for loop run time with loop size context
3+
4+
minutes_to_complete: 15
5+
6+
who_is_this_for: C++ developers
7+
8+
learning_objectives:
9+
- Learn how to add preexisting knowledge of loop sizes to for loops
10+
11+
prerequisites:
12+
- Access to an Arm-based machine / instance
13+
- Basic understanding of C++
14+
15+
author: Kieran Hejmadi
16+
17+
### Tags
18+
skilllevels: Introductory
19+
subjects: C++
20+
armips:
21+
- Neoverse
22+
tools_software_languages:
23+
- C++
24+
operatingsystems:
25+
- Linux
26+
27+
28+
29+
further_reading:
30+
- resource:
31+
title: PLACEHOLDER MANUAL
32+
link: PLACEHOLDER MANUAL LINK
33+
type: documentation
34+
35+
36+
### FIXED, DO NOT MODIFY
37+
# ================================================================================
38+
weight: 1 # _index.md always has weight of 1 to order correctly
39+
layout: "learningpathall" # All files under learning paths have this same wrapper
40+
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
41+
---
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
# ================================================================================
3+
# FIXED, DO NOT MODIFY THIS FILE
4+
# ================================================================================
5+
weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
6+
title: "Next Steps" # Always the same, html page title.
7+
layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
8+
---
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
---
2+
title: Adding Inside Knowledge
3+
weight: 4
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Adding Inside Knowledge
10+
11+
To make the compiler aware that the input will be a multiple of 4 we will rewrite our loop size as the following.
12+
13+
```output
14+
((max_loop_size/4)*4)
15+
```
16+
17+
Mathematically this may seem redundant. However since `(max_loop_size/4)` will be truncated to an integer this guarantees `(max_loop_size/4)*4` is a multiple of 4.
18+
19+
As slightly easier to read method that avoids confusion when arguments are passed in is dividing the variable before passing it in. For example.
20+
21+
```output
22+
(max_loop_size_div_4 * 4)
23+
```
24+
25+
## Adding Insider Knowledge
26+
27+
```cpp
28+
#include <iostream>
29+
#include <chrono>
30+
31+
void foo(const int* x, int max_loop_size_div_4)
32+
{
33+
int sum = 0;
34+
for (int k = 0; k < max_loop_size_div_4 * 4; k++) {
35+
sum += x[k];
36+
}
37+
std::cout << "Sum: " << sum << std::endl;
38+
}
39+
40+
int main() {
41+
int max_loop_size;
42+
std::cout << "Enter a value for max_loop_size (must be a multiple of 4): ";
43+
std::cin >> max_loop_size;
44+
45+
int max_loop_size_div_4 = max_loop_size / 4;
46+
int x[max_loop_size];
47+
// Initialise test data
48+
for(int i = 0; i < max_loop_size; ++i) x[i] = i;
49+
50+
// Start timing
51+
auto start = std::chrono::high_resolution_clock::now();
52+
foo(x, max_loop_size_div_4);
53+
// Stop timing
54+
auto end = std::chrono::high_resolution_clock::now();
55+
56+
// Calculate and display the elapsed time
57+
auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();
58+
std::cout << "Time taken by foo: " << duration << " nanoseconds" << std::endl;
59+
60+
return 0;
61+
}
62+
```
63+
64+
Again compile with the same compiler flags.
65+
66+
```bash
67+
g++ -O3 -march=armv8-a+simd -o context
68+
```
69+
70+
```output
71+
./context
72+
Enter a value for max_loop_size (must be a multiple of 4): 40000
73+
Sum: 799980000
74+
Time taken by foo: 24650 nanoseconds
75+
```
76+
77+
## Comparison
78+
79+
To compare we will use compiler explorer to see the assembly.
80+
81+
First, looking at the example without context [here](https://godbolt.org/z/qPaW5Kjxa).
82+
Second, looking at the example with context [here](https://godbolt.org/z/rhj65Pe4v).
83+
84+
85+
[Here](https://godbolt.org/z/nvx4j1vTK).
86+
87+
As the assembly shows we have fewer lines of assembly corresponding to the function `foo` as there is less setup code to account given the insider knowledge.
88+
89+

0 commit comments

Comments
 (0)