Merge pull request #1654 from kieranhejmadi01/cplusplus_flag_JA_comments

jasonrandrews · web-flow · commit b8107bab17f5 · 2025-02-28T08:30:54.000-06:00
Cplusplus_flag_JA_comments
diff --git a/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/2.md b/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/2.md
@@ -41,55 +41,25 @@ You see the following output:
 aarch64
 ```
 
-## Enable environment modules
+## Install Different Versions of the GNU Compiler Collection
 
-Environment modules is a tool to quickly modify your shell configuration and environment variables. For this activity, it allows you to quickly switch between different compiler versions to demonstrate potential improvements. 
+An effective way to improve performance on arm may not only come from optimal flag use, but from using a recent compiler version. 
+Older compilers might not fully leverage the latest hardware features, particularly when targeting cutting-edge Arm processors, resulting in less optimized code. 
 
-First, you need to install the environment modules package. 
-
-In your terminal and run the following command:
+In your terminal and run the following command to install an older version of the GNU compiler collective, version 9:
 
 ```bash
 sudo apt update
-sudo apt install environment-modules
-```
-
-Load environment modules after the package is installed: 
-
-```bash
-sudo chmod 755 /usr/share/modules/init/bash
-source /usr/share/modules/init/bash
-```
-
-Reload your shell configuration:
-
-```bash
-source ~/.bashrc
-```
-
-Install multiple compiler versions on your Ubuntu system. For this example you can install GCC version 9 to demonstrate potential improvements your application could achieve. 
-
-Install GCC version 9:
-
-```bash
 sudo add-apt-repository ppa:ubuntu-toolchain-r/test
 sudo apt update
 sudo apt install gcc-9 g++-9 -y
 ```
 
-Create a module file for each compiler installed. 
-
-```bash
-mkdir -p ~/modules/gcc
-```
-
-Use a text editor to modify the file `~/modules/gcc/9`
-
-Copy and paste the text below into the file and save it.
-
-```console
-#%Module1.0
-prepend-path PATH /usr/bin/gcc-9
-prepend-path PATH /usr/bin/g++-9
-```
+Directly run the `g++-9 --version` command to confirm you have the same output as below.
 
+```output
+g++-9 (Ubuntu 9.5.0-6ubuntu2) 9.5.0
+Copyright (C) 2019 Free Software Foundation, Inc.
+This is free software; see the source for copying conditions.  There is NO
+warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+```
diff --git a/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/3.md b/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/3.md
@@ -6,19 +6,11 @@ weight: 4
 layout: learningpathall
 ---
 
-You may want to find out which Neoverse processor a cloud instance uses.
+## Finding Neoverse Version
 
-You can learn the history of each cloud service providers, but as time progresses it becomes more complex to summarize. 
+Most cloud services providers offer a variety of arm-based instances based on Neoverse. For example Graviton3 was announced in 2021 and instance types include M7g, C7g, R7g. Graviton3 offers up to 2x better floating-point performance, up to 2x faster crypto performance, and up to 3x better ML performance compared to Graviton2. Graviton3 uses Arm Neoverse V1 cores. In 2023, AWS announced Graviton4, based on Neoverse V2 cores. Graviton4 increases core count to 96 and will be first available in the R8g instance type. Graviton4 provides 30% better compute performance, 50% more cores, and 75% more memory bandwidth than Graviton3.
 
-For example, in 2919 AWS announced Graviton2 processors. The Graviton2 instance types include M6g, C6g, R6g, and T4g. AWS advertises 40% better price performance over the same generation of x86 instances. Graviton2 instances include up to 64 vCPUs. Graviton2 uses Arm Neoverse N1 cores.
-
-Graviton3 was announced in 2021 and instance types include M7g, C7g, R7g. Graviton3 offers up to 2x better floating-point performance, up to 2x faster crypto performance, and up to 3x better ML performance compared to Graviton2. Graviton3 uses Arm Neoverse V1 cores.
-
-In 2023, AWS announced Graviton4, based on Neoverse V2 cores. Graviton4 increases core count to 96 and will be first available in the R8g instance type. Graviton4 provides 30% better compute performance, 50% more cores, and 75% more memory bandwidth than Graviton3.
-
-There are more than 150 instance types with Graviton processors.
-
-Alternatively, if you have access to the instance, you can run the `lscpu` command and observe the underlying Neoverse Architecture under the `Model Name` row.
+Using the same AWS instance we created in the previous section, you can run the `lscpu` command and observe the underlying Neoverse Architecture under the `Model Name` row.
 
 For example, on the `r8g.xlarge` instance run:
 
diff --git a/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/4.md b/content/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/4.md
@@ -22,9 +22,10 @@ Use an editor to copy and paste the C++ code below into a file named `vectorizab
 
 The code initializes a vector of 1 million elements, doubles each element, and stores the result in the same vector. This is repeated 5 times to calculate the average runtime. 
 
-```c++
+```cpp
 #include <vector>
 #include <chrono>
+#include <iostream>
 #include <unistd.h> // for getpid()
 
 void vectorizable_loop(std::vector<int>& data) {
@@ -64,7 +65,7 @@ int main() {
 
 Compare compiler versions by building `vectorizable_loop.cpp` with the same arguments, but different compiler versions. 
 
-Run the commands below to use version 13 an then version 9 on the same code:
+Run the commands below to use the default g++ compiler (13.3) and then g++ version installed earlier 9 on the same code with the same flags. 
 
 ```bash
 g++ vectorizable_loop.cpp -o vectorizable_loop_gcc_13
@@ -127,30 +128,45 @@ Average elapsed time: 0.0420332 seconds
 Average elapsed time: 0.0155661 seconds
 ```
 
-You see that optimization level impacts performance. 
+Here we can observe a notable performance speed up from using higher levels of optimisations.
 
-## Understanding optimizations 
+Please Note: To understand which lower level optimisation are used by `-O1`, `-O2` and `-O3` we can use the `g++ <optimisatiob level> -Q --help=optimizers` command. 
 
-Naturally, the next question is to understand which part of your source code was optimized. Full optimization reports generated by compilers like GCC provide a detailed tree of reports through various stages of the optimization process. These reports can be overwhelming due to the sheer volume of information they contain, covering every aspect of the code's transformation and optimization. 
 
-For a more manageable overview, you can enable basic optimization reports using specific arguments such as `-fopt-info-vec`, which focuses on vectorization optimizations. The `-fopt-info` flag can be customized by changing the info bit to target different types of optimizations, making it easier to pinpoint specific areas of interest without being inundated with data. 
+### Understanding what was optimised 
 
-Applying this to the following command we can see that there is no vector optimization with the `-O1` optimization level. 
+Naturally, the next question is to understand which part of your source code was optimized between the outputs above. Full optimization reports generated by compilers like GCC provide a detailed tree of reports through various stages of the optimization process. For beginners, these reports can be overwhelming due to the sheer volume of information they contain, covering every aspect of the code's transformation and optimization. 
 
-Run the compier with the arguments shown:
+For a more manageable overview, you can enable basic optimization information (`opt-info`) reports using specific arguments such as `-fopt-info-vec`, which focuses on vectorization optimizations. The `-fopt-info` flag can be customized by changing the info bit to target different types of optimizations, making it easier to pinpoint specific areas of interest. 
+
+First, to see what part of our source code was optimised between levels 1 and 2 we can run the following commands to see if our vectorisable loop was indeed vectorised. 
 
 ```bash
 g++ -O1 vectorizable_loop.cpp -o level_1 -fopt-info-vec
 ```
 
-The output is:
+Running the `-O1` flag led showed no terminal output indicating no vectorisation was performed. Next, run the command below with the `-O2` flag.
+
+```bash
+g++ -O2 vectorizable_loop.cpp -o level_2 -fopt-info-vec
+```
+
+This time the `-O2` flag enables our loop to be vectorised as can be seen from the output below.
 
 ```output
 vectorizable_loop.cpp:13:30: optimized: loop vectorized using 16 byte vectors
 /usr/include/c++/13/bits/stl_algobase.h:930:22: optimized: loop vectorized using 16 byte vectors
 ```
 
-However the same command with the `-O2` optimization level we observe line 13, column 30 of our source code was optimized. 
+To see what optimisations were performed and missed between level 2 and level 3, we could direct the terminal output from all optimisations (`-fopt-info`) to a text file with the commands below. 
+
+```bash
+g++ -O2 vectorizable_loop.cpp -o level_2 -fopt-info 2>&1 | tee level2.txt
+g++ -O3 vectorizable_loop.cpp -o level_3 -fopt-info 2>&1 | tee level3.txt
+```
+
+Comparing the outputs between different levels can highlight where in your source code opportunities to optimise code where missed, for example with the `diff` command. This can help you write source code that is more likely to be optimised. However, source code modifications are out of scope for this learning path and we will leave it to the reader to dive into the differences if they wish to learn more. 
+
 
 ## Target balanced performance