You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/2.md
+11-41Lines changed: 11 additions & 41 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -41,55 +41,25 @@ You see the following output:
41
41
aarch64
42
42
```
43
43
44
-
## Enable environment modules
44
+
## Install Different Versions of the GNU Compiler Collection
45
45
46
-
Environment modules is a tool to quickly modify your shell configuration and environment variables. For this activity, it allows you to quickly switch between different compiler versions to demonstrate potential improvements.
46
+
An effective way to improve performance on arm may not only come from optimal flag use, but from using a recent compiler version.
47
+
Older compilers might not fully leverage the latest hardware features, particularly when targeting cutting-edge Arm processors, resulting in less optimized code.
47
48
48
-
First, you need to install the environment modules package.
49
-
50
-
In your terminal and run the following command:
49
+
In your terminal and run the following command to install an older version of the GNU compiler collective, version 9:
51
50
52
51
```bash
53
52
sudo apt update
54
-
sudo apt install environment-modules
55
-
```
56
-
57
-
Load environment modules after the package is installed:
58
-
59
-
```bash
60
-
sudo chmod 755 /usr/share/modules/init/bash
61
-
source /usr/share/modules/init/bash
62
-
```
63
-
64
-
Reload your shell configuration:
65
-
66
-
```bash
67
-
source~/.bashrc
68
-
```
69
-
70
-
Install multiple compiler versions on your Ubuntu system. For this example you can install GCC version 9 to demonstrate potential improvements your application could achieve.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/3.md
+3-11Lines changed: 3 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,19 +6,11 @@ weight: 4
6
6
layout: learningpathall
7
7
---
8
8
9
-
You may want to find out which Neoverse processor a cloud instance uses.
9
+
## Finding Neoverse Version
10
10
11
-
You can learn the history of each cloud service providers, but as time progresses it becomes more complex to summarize.
11
+
Most cloud services providers offer a variety of arm-based instances based on Neoverse. For example Graviton3 was announced in 2021 and instance types include M7g, C7g, R7g. Graviton3 offers up to 2x better floating-point performance, up to 2x faster crypto performance, and up to 3x better ML performance compared to Graviton2. Graviton3 uses Arm Neoverse V1 cores. In 2023, AWS announced Graviton4, based on Neoverse V2 cores. Graviton4 increases core count to 96 and will be first available in the R8g instance type. Graviton4 provides 30% better compute performance, 50% more cores, and 75% more memory bandwidth than Graviton3.
12
12
13
-
For example, in 2919 AWS announced Graviton2 processors. The Graviton2 instance types include M6g, C6g, R6g, and T4g. AWS advertises 40% better price performance over the same generation of x86 instances. Graviton2 instances include up to 64 vCPUs. Graviton2 uses Arm Neoverse N1 cores.
14
-
15
-
Graviton3 was announced in 2021 and instance types include M7g, C7g, R7g. Graviton3 offers up to 2x better floating-point performance, up to 2x faster crypto performance, and up to 3x better ML performance compared to Graviton2. Graviton3 uses Arm Neoverse V1 cores.
16
-
17
-
In 2023, AWS announced Graviton4, based on Neoverse V2 cores. Graviton4 increases core count to 96 and will be first available in the R8g instance type. Graviton4 provides 30% better compute performance, 50% more cores, and 75% more memory bandwidth than Graviton3.
18
-
19
-
There are more than 150 instance types with Graviton processors.
20
-
21
-
Alternatively, if you have access to the instance, you can run the `lscpu` command and observe the underlying Neoverse Architecture under the `Model Name` row.
13
+
Using the same AWS instance we created in the previous section, you can run the `lscpu` command and observe the underlying Neoverse Architecture under the `Model Name` row.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/4.md
+26-10Lines changed: 26 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,9 +22,10 @@ Use an editor to copy and paste the C++ code below into a file named `vectorizab
22
22
23
23
The code initializes a vector of 1 million elements, doubles each element, and stores the result in the same vector. This is repeated 5 times to calculate the average runtime.
24
24
25
-
```c++
25
+
```cpp
26
26
#include<vector>
27
27
#include<chrono>
28
+
#include<iostream>
28
29
#include<unistd.h>// for getpid()
29
30
30
31
voidvectorizable_loop(std::vector<int>& data) {
@@ -64,7 +65,7 @@ int main() {
64
65
65
66
Compare compiler versions by building `vectorizable_loop.cpp` with the same arguments, but different compiler versions.
66
67
67
-
Run the commands below to use version 13 an then version 9 on the same code:
68
+
Run the commands below to use the default g++ compiler (13.3) and then g++ version installed earlier 9 on the same code with the same flags.
@@ -127,30 +128,45 @@ Average elapsed time: 0.0420332 seconds
127
128
Average elapsed time: 0.0155661 seconds
128
129
```
129
130
130
-
You see that optimization level impacts performance.
131
+
Here we can observe a notable performance speed up from using higher levels of optimisations.
131
132
132
-
## Understanding optimizations
133
+
Please Note: To understand which lower level optimisation are used by `-O1`, `-O2` and `-O3` we can use the `g++ <optimisatiob level> -Q --help=optimizers` command.
133
134
134
-
Naturally, the next question is to understand which part of your source code was optimized. Full optimization reports generated by compilers like GCC provide a detailed tree of reports through various stages of the optimization process. These reports can be overwhelming due to the sheer volume of information they contain, covering every aspect of the code's transformation and optimization.
135
135
136
-
For a more manageable overview, you can enable basic optimization reports using specific arguments such as `-fopt-info-vec`, which focuses on vectorization optimizations. The `-fopt-info` flag can be customized by changing the info bit to target different types of optimizations, making it easier to pinpoint specific areas of interest without being inundated with data.
136
+
### Understanding what was optimised
137
137
138
-
Applying this to the following command we can see that there is no vector optimization with the `-O1` optimization level.
138
+
Naturally, the next question is to understand which part of your source code was optimized between the outputs above. Full optimization reports generated by compilers like GCC provide a detailed tree of reports through various stages of the optimization process. For beginners, these reports can be overwhelming due to the sheer volume of information they contain, covering every aspect of the code's transformation and optimization.
139
139
140
-
Run the compier with the arguments shown:
140
+
For a more manageable overview, you can enable basic optimization information (`opt-info`) reports using specific arguments such as `-fopt-info-vec`, which focuses on vectorization optimizations. The `-fopt-info` flag can be customized by changing the info bit to target different types of optimizations, making it easier to pinpoint specific areas of interest.
141
+
142
+
First, to see what part of our source code was optimised between levels 1 and 2 we can run the following commands to see if our vectorisable loop was indeed vectorised.
This time the `-O2` flag enables our loop to be vectorised as can be seen from the output below.
147
155
148
156
```output
149
157
vectorizable_loop.cpp:13:30: optimized: loop vectorized using 16 byte vectors
150
158
/usr/include/c++/13/bits/stl_algobase.h:930:22: optimized: loop vectorized using 16 byte vectors
151
159
```
152
160
153
-
However the same command with the `-O2` optimization level we observe line 13, column 30 of our source code was optimized.
161
+
To see what optimisations were performed and missed between level 2 and level 3, we could direct the terminal output from all optimisations (`-fopt-info`) to a text file with the commands below.
162
+
163
+
```bash
164
+
g++ -O2 vectorizable_loop.cpp -o level_2 -fopt-info 2>&1| tee level2.txt
165
+
g++ -O3 vectorizable_loop.cpp -o level_3 -fopt-info 2>&1| tee level3.txt
166
+
```
167
+
168
+
Comparing the outputs between different levels can highlight where in your source code opportunities to optimise code where missed, for example with the `diff` command. This can help you write source code that is more likely to be optimised. However, source code modifications are out of scope for this learning path and we will leave it to the reader to dive into the differences if they wish to learn more.
0 commit comments