Skip to content

Commit b8107ba

Browse files
Merge pull request #1654 from kieranhejmadi01/cplusplus_flag_JA_comments
Cplusplus_flag_JA_comments
2 parents 628c338 + 1dd22cc commit b8107ba

File tree

3 files changed

+40
-62
lines changed
  • content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags

3 files changed

+40
-62
lines changed

content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/2.md

Lines changed: 11 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -41,55 +41,25 @@ You see the following output:
4141
aarch64
4242
```
4343

44-
## Enable environment modules
44+
## Install Different Versions of the GNU Compiler Collection
4545

46-
Environment modules is a tool to quickly modify your shell configuration and environment variables. For this activity, it allows you to quickly switch between different compiler versions to demonstrate potential improvements.
46+
An effective way to improve performance on arm may not only come from optimal flag use, but from using a recent compiler version.
47+
Older compilers might not fully leverage the latest hardware features, particularly when targeting cutting-edge Arm processors, resulting in less optimized code.
4748

48-
First, you need to install the environment modules package.
49-
50-
In your terminal and run the following command:
49+
In your terminal and run the following command to install an older version of the GNU compiler collective, version 9:
5150

5251
```bash
5352
sudo apt update
54-
sudo apt install environment-modules
55-
```
56-
57-
Load environment modules after the package is installed:
58-
59-
```bash
60-
sudo chmod 755 /usr/share/modules/init/bash
61-
source /usr/share/modules/init/bash
62-
```
63-
64-
Reload your shell configuration:
65-
66-
```bash
67-
source ~/.bashrc
68-
```
69-
70-
Install multiple compiler versions on your Ubuntu system. For this example you can install GCC version 9 to demonstrate potential improvements your application could achieve.
71-
72-
Install GCC version 9:
73-
74-
```bash
7553
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
7654
sudo apt update
7755
sudo apt install gcc-9 g++-9 -y
7856
```
7957

80-
Create a module file for each compiler installed.
81-
82-
```bash
83-
mkdir -p ~/modules/gcc
84-
```
85-
86-
Use a text editor to modify the file `~/modules/gcc/9`
87-
88-
Copy and paste the text below into the file and save it.
89-
90-
```console
91-
#%Module1.0
92-
prepend-path PATH /usr/bin/gcc-9
93-
prepend-path PATH /usr/bin/g++-9
94-
```
58+
Directly run the `g++-9 --version` command to confirm you have the same output as below.
9559

60+
```output
61+
g++-9 (Ubuntu 9.5.0-6ubuntu2) 9.5.0
62+
Copyright (C) 2019 Free Software Foundation, Inc.
63+
This is free software; see the source for copying conditions. There is NO
64+
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
65+
```

content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/3.md

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -6,19 +6,11 @@ weight: 4
66
layout: learningpathall
77
---
88

9-
You may want to find out which Neoverse processor a cloud instance uses.
9+
## Finding Neoverse Version
1010

11-
You can learn the history of each cloud service providers, but as time progresses it becomes more complex to summarize.
11+
Most cloud services providers offer a variety of arm-based instances based on Neoverse. For example Graviton3 was announced in 2021 and instance types include M7g, C7g, R7g. Graviton3 offers up to 2x better floating-point performance, up to 2x faster crypto performance, and up to 3x better ML performance compared to Graviton2. Graviton3 uses Arm Neoverse V1 cores. In 2023, AWS announced Graviton4, based on Neoverse V2 cores. Graviton4 increases core count to 96 and will be first available in the R8g instance type. Graviton4 provides 30% better compute performance, 50% more cores, and 75% more memory bandwidth than Graviton3.
1212

13-
For example, in 2919 AWS announced Graviton2 processors. The Graviton2 instance types include M6g, C6g, R6g, and T4g. AWS advertises 40% better price performance over the same generation of x86 instances. Graviton2 instances include up to 64 vCPUs. Graviton2 uses Arm Neoverse N1 cores.
14-
15-
Graviton3 was announced in 2021 and instance types include M7g, C7g, R7g. Graviton3 offers up to 2x better floating-point performance, up to 2x faster crypto performance, and up to 3x better ML performance compared to Graviton2. Graviton3 uses Arm Neoverse V1 cores.
16-
17-
In 2023, AWS announced Graviton4, based on Neoverse V2 cores. Graviton4 increases core count to 96 and will be first available in the R8g instance type. Graviton4 provides 30% better compute performance, 50% more cores, and 75% more memory bandwidth than Graviton3.
18-
19-
There are more than 150 instance types with Graviton processors.
20-
21-
Alternatively, if you have access to the instance, you can run the `lscpu` command and observe the underlying Neoverse Architecture under the `Model Name` row.
13+
Using the same AWS instance we created in the previous section, you can run the `lscpu` command and observe the underlying Neoverse Architecture under the `Model Name` row.
2214

2315
For example, on the `r8g.xlarge` instance run:
2416

content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/4.md

Lines changed: 26 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,10 @@ Use an editor to copy and paste the C++ code below into a file named `vectorizab
2222

2323
The code initializes a vector of 1 million elements, doubles each element, and stores the result in the same vector. This is repeated 5 times to calculate the average runtime.
2424

25-
```c++
25+
```cpp
2626
#include <vector>
2727
#include <chrono>
28+
#include <iostream>
2829
#include <unistd.h> // for getpid()
2930

3031
void vectorizable_loop(std::vector<int>& data) {
@@ -64,7 +65,7 @@ int main() {
6465
6566
Compare compiler versions by building `vectorizable_loop.cpp` with the same arguments, but different compiler versions.
6667
67-
Run the commands below to use version 13 an then version 9 on the same code:
68+
Run the commands below to use the default g++ compiler (13.3) and then g++ version installed earlier 9 on the same code with the same flags.
6869
6970
```bash
7071
g++ vectorizable_loop.cpp -o vectorizable_loop_gcc_13
@@ -127,30 +128,45 @@ Average elapsed time: 0.0420332 seconds
127128
Average elapsed time: 0.0155661 seconds
128129
```
129130

130-
You see that optimization level impacts performance.
131+
Here we can observe a notable performance speed up from using higher levels of optimisations.
131132

132-
## Understanding optimizations
133+
Please Note: To understand which lower level optimisation are used by `-O1`, `-O2` and `-O3` we can use the `g++ <optimisatiob level> -Q --help=optimizers` command.
133134

134-
Naturally, the next question is to understand which part of your source code was optimized. Full optimization reports generated by compilers like GCC provide a detailed tree of reports through various stages of the optimization process. These reports can be overwhelming due to the sheer volume of information they contain, covering every aspect of the code's transformation and optimization.
135135

136-
For a more manageable overview, you can enable basic optimization reports using specific arguments such as `-fopt-info-vec`, which focuses on vectorization optimizations. The `-fopt-info` flag can be customized by changing the info bit to target different types of optimizations, making it easier to pinpoint specific areas of interest without being inundated with data.
136+
### Understanding what was optimised
137137

138-
Applying this to the following command we can see that there is no vector optimization with the `-O1` optimization level.
138+
Naturally, the next question is to understand which part of your source code was optimized between the outputs above. Full optimization reports generated by compilers like GCC provide a detailed tree of reports through various stages of the optimization process. For beginners, these reports can be overwhelming due to the sheer volume of information they contain, covering every aspect of the code's transformation and optimization.
139139

140-
Run the compier with the arguments shown:
140+
For a more manageable overview, you can enable basic optimization information (`opt-info`) reports using specific arguments such as `-fopt-info-vec`, which focuses on vectorization optimizations. The `-fopt-info` flag can be customized by changing the info bit to target different types of optimizations, making it easier to pinpoint specific areas of interest.
141+
142+
First, to see what part of our source code was optimised between levels 1 and 2 we can run the following commands to see if our vectorisable loop was indeed vectorised.
141143

142144
```bash
143145
g++ -O1 vectorizable_loop.cpp -o level_1 -fopt-info-vec
144146
```
145147

146-
The output is:
148+
Running the `-O1` flag led showed no terminal output indicating no vectorisation was performed. Next, run the command below with the `-O2` flag.
149+
150+
```bash
151+
g++ -O2 vectorizable_loop.cpp -o level_2 -fopt-info-vec
152+
```
153+
154+
This time the `-O2` flag enables our loop to be vectorised as can be seen from the output below.
147155

148156
```output
149157
vectorizable_loop.cpp:13:30: optimized: loop vectorized using 16 byte vectors
150158
/usr/include/c++/13/bits/stl_algobase.h:930:22: optimized: loop vectorized using 16 byte vectors
151159
```
152160

153-
However the same command with the `-O2` optimization level we observe line 13, column 30 of our source code was optimized.
161+
To see what optimisations were performed and missed between level 2 and level 3, we could direct the terminal output from all optimisations (`-fopt-info`) to a text file with the commands below.
162+
163+
```bash
164+
g++ -O2 vectorizable_loop.cpp -o level_2 -fopt-info 2>&1 | tee level2.txt
165+
g++ -O3 vectorizable_loop.cpp -o level_3 -fopt-info 2>&1 | tee level3.txt
166+
```
167+
168+
Comparing the outputs between different levels can highlight where in your source code opportunities to optimise code where missed, for example with the `diff` command. This can help you write source code that is more likely to be optimised. However, source code modifications are out of scope for this learning path and we will leave it to the reader to dive into the differences if they wish to learn more.
169+
154170

155171
## Target balanced performance
156172

0 commit comments

Comments
 (0)