ArmDeveloperEcosystem
diff --git a/‎content/learning-paths/servers-and-cloud-computing/using-and-porting-performance-libs/1.md‎
Lines changed: 46 additions & 21 deletions b/‎content/learning-paths/servers-and-cloud-computing/using-and-porting-performance-libs/1.md‎
Lines changed: 46 additions & 21 deletions
diff --git a/‎content/learning-paths/servers-and-cloud-computing/using-and-porting-performance-libs/2.md‎
Lines changed: 8 additions & 6 deletions b/‎content/learning-paths/servers-and-cloud-computing/using-and-porting-performance-libs/2.md‎
Lines changed: 8 additions & 6 deletions
@@ -1,57 +1,82 @@
 ---
-title: Introduction to performance libraries
+title: "Introduction to Libraries"
 weight: 2
 
 ### FIXED, DO NOT MODIFY
 layout: learningpathall
 ---
 
-## Introduction to performance libraries
+## Types of Library
 
-The C++ Standard Library provides a collection of classes and functions that are essential for everyday programming tasks, such as data structures, algorithms, and input/output operations. It is designed to be versatile and easy to use, ensuring compatibility and portability across different platforms. However as a result of this portability, standard libraries introduce some limitations. Performance sensitive applications may wish to take maximum advantage of the hardware's capabilities - this is where performance libraries come in. 
+C++ libraries generally fall into two major categories, each serving different needs. This section walks you through both the standard library and performance libraries, and outlines the purpose and characteristics of each. 
 
-Performance libraries are specialized for high-performance computing tasks and are often tailored to the microarchitecture of a specific processor. These libraries are optimized for speed and efficiency, often leveraging hardware-specific features such as vector units to achieve maximum performance. Performance libraries are crafted through extensive benchmarking and optimization, and can be domain-specific, such as genomics libraries, or produced by Arm for general-purpose computing. For example, OpenRNG focuses on generating random numbers quickly and efficiently, which is crucial for simulations and scientific computations, whereas the C++ Standard Library offers a more general-purpose approach with functions like `std::mt19937` for random number generation.
+### Standard C++ Library
 
-Performance libraries for Arm CPUs, such as the Arm Performance Libraries (APL), provide highly optimized mathematical functions for scientific computing. An analogous library for accelerating routines on a GPU is cuBLAS for NVIDIA GPUs. These libraries can be linked dynamically at runtime or statically during compilation, offering flexibility in deployment. They are designed to support multiple versions of the Arm architecture, including those with NEON and SVE.  Generally, minimal source code changes are required to use these libraries, making them suitable for porting and optimizing applications. 
+The C++ Standard Library provides a collection of classes, functions, and templates that are defined by the C++ standard and are essential for everyday programming, such as:
 
-### How can I choose the right version of a performance library?
+* Data structures.
+* Algorithms.
+* Input/output operations. 
+* Utility functions.
 
-Performance libraries are often distributed with multiple formats to support various use cases. 
+### Trade-offs between versatility and performance
 
-- **ILP64** uses 64 bits for representing integers, which are often used for indexing large arrays in scientific computing. In C++ source code we use the `long long` type to specify 64-bit integers. 
+The C++ Standard Library is designed to be versatile and easy to use, ensuring compatibility and portability across different platforms. This portability comes at a cost, however, and standard libraries have some limitations. Designers of performance-sensitive applications might wish to take advantage of the hardware's full capabilities, and where they might be unable to do so through standard libraries, they can instead implement performance libraries that can bring these performance optimizations into effect. 
 
-- **LP64** uses 32 bits to present integers which are more common in general purpose applications. 
+### Benefits of Performance libraries
 
-- **Open Multi-process** (OpenMP) is a programming interface for paralleling workloads across many CPU cores across multiple platforms (i.e. x86, AArch64 etc.). Programmers interact primarily through compiler directives, such as `#pragma omp parallel` indicating which section of source code can be run in parallel and which sections require synchronization. 
+Performance libraries are specialized for high-performance computing tasks and are often tailored to the microarchitecture of a specific processor. These libraries are optimized for speed and efficiency, often leveraging hardware-specific features such as vector units to achieve maximum performance. 
 
-Arm performance libraries like the x86 equivalent, Open Math Kernel Library (MKL) provide optimized functions for both ILP64 and LP64 as well as OpenMP or single threaded implementations. Further, the interface libraries are available as shared libraries for dynamic linking (i.e. `*.so`) or static linking (i.e. `*.a`).
+Crafted through extensive benchmarking and optimization, performance libraries can be domain-specific - such as genomics libraries - or for general-purpose computing. For example, OpenRNG focuses on generating random numbers quickly and efficiently, which is crucial for simulations and scientific computations, whereas the C++ Standard Library offers a more general-purpose approach with functions such as `std::mt19937` for random number generation.
 
-### Why do multiple performance Libraries exist?
+Performance libraries for Arm CPUs - such as the Arm Performance Libraries (APL) - provide highly optimized mathematical functions for scientific computing. An analogous library for accelerating routines on a GPU is cuBLAS, which is available for NVIDIA GPUs. 
 
-A natural source of confusion stems from the plethora of similar seeming performance libraries. For example, OpenBLAS and NVIDIA Performance Libraries (NVPL) both have their own implementations for basic linear algebra subprograms (BLAS). This begs the question which one should a developer use?
+These libraries can be linked dynamically at runtime or statically during compilation, offering flexibility in deployment. They are designed to support multiple versions of the Arm architecture, including those with NEON and SVE.  Generally, only minimal source code changes are required to use these libraries, making them ideal for porting and optimizing applications. 
 
-Multiple performance libraries coexist to cater to the diverse needs of different hardware architectures and applications. For instance, Arm performance libraries are optimized for Arm CPUs, leveraging the unique instruction sets and power efficiency. On the other hand, NVIDIA performance libraries for Grace CPUs are tailored to maximize the performance of NVIDIA's hardware.
+### How do I choose the right version of a performance library?
 
-- **Hardware Specialization**  Some libraries are designed to be cross-platform, supporting multiple hardware architectures to provide flexibility and broader usability. For example, the OpenBLAS library supports both Arm and x86 architectures, allowing developers to use the same library across different systems. 
+Performance libraries are often distributed in multiple formats to support various use cases: 
 
-- **Domain-Specific Libraries**: Libraries are often created to handle specific domains or types of computations more efficiently. For instance, libraries like cuDNN are optimized for deep learning tasks, providing specialized functions that significantly speed up neural network training and inference.
+- **ILP64** uses 64 bits for representing integers, which are often used for indexing large arrays in scientific computing. In C++ source code, one uses the `long long` type to specify 64-bit integers. 
 
-- **Commercial Libraries**: Alternatively, some highly performant libraries require a license to use. This is more common in domain specific libraries such as computational chemistry or fluid dynamics. 
+- **LP64** uses 32 bits to represent integers which are more common in general-purpose applications. 
+
+- **Open Multi-Processing** (OpenMP) is a cross-platform programming interface for parallelizing workloads across many CPU cores, such as x86 and AArch64. Programmers interact primarily through compiler directives, such as `#pragma omp parallel` indicating which section of source code can be run in parallel and which sections require synchronization. 
+
+Arm Performance Libraries, in common with their x86 equivalent, Open Math Kernel Library (MKL), provide optimized functions for both ILP64 and LP64, as well as for OpenMP or single-threaded implementations. 
+
+Additionally, interface libraries are available as shared libraries for dynamic linking, such as those with a `.so` file extension, or as static linking, such as those with a `.a` file extension.
+
+### Which performance library should I choose?
+
+A natural source of confusion stems from the plethora of similar performance libraries. For example, OpenBLAS and NVIDIA Performance Libraries (NVPL) each offer their own implementation of basic linear algebra subprograms (BLAS). This raises the question: which one should a developer choose?
+
+Multiple performance libraries exist to meet the diverse needs of different hardware architectures and applications. For instance, Arm performance libraries are optimized for Arm CPUs, leveraging unique instruction sets and power efficiency. Meanwhile, NVIDIA performance libraries for Grace CPUs are tailored to maximize the performance of NVIDIA's hardware.
+
+Here are some of the different types of performance libraries available:
+
+- Hardware-specialized - some libraries are designed to be cross-platform, supporting multiple hardware architectures to provide flexibility and broader usability. For example, the OpenBLAS library supports both Arm and x86 architectures, allowing developers to use the same library across different systems. 
+
+- Domain-specific - libraries are often created to handle specific domains or types of computations more efficiently. For instance, libraries like cuDNN are optimized for deep learning tasks, providing specialized functions that significantly speed up neural network training and inference.
+
+- Commercial - some highly-performant libraries require a license to use. This is more common in domain-specific libraries such as computational chemistry or fluid dynamics. 
 
 These factors contribute to the existence of multiple performance libraries, each tailored to meet the specific demands of various hardware and applications.
 
 Invariably, there will be performance differences between each library and the best way to observe them is to use the library within your own application. 
 
-For more information on performance benchmarking you can read [Arm Performance Libraries 24.10](https://community.arm.com/arm-community-blogs/b/servers-and-cloud-computing-blog/posts/arm-performance-libraries-24-10).
+For more information on performance benchmarking, see [Arm Performance Libraries 24.10](https://community.arm.com/arm-community-blogs/b/servers-and-cloud-computing-blog/posts/arm-performance-libraries-24-10).
 
 ### What performance libraries are available on Arm?
 
-For a directory of community-produced libraries we recommend looking at the the Software Ecosystem Dashboard for Arm. Each library may not be available as a binary and may need to be compiled from source. The table below gives examples of libraries that are available on Arm. 
+For a directory of community-produced libraries, see the [Software Ecosystem Dashboard for Arm](https://www.arm.com/developer-hub/ecosystem-dashboard). 
+
+Each library might not be available as a binary and you might need to compile it from source. The table below gives examples of libraries that are available on Arm. 
 
 | Package / Library    | Domain |
 | -------- | ------- |
 | Minimap2  | Long-read sequence alignment in genomics    |
 | HMMER |Bioinformatics library for homologous sequences     |
-| FFTW    | Open-source fast fourier transform library    |
+| FFTW    | Open-source Fast Fourier Transform Library    |
 
-See the [Software Ecosystem Dashboard for Arm](https://www.arm.com/developer-hub/ecosystem-dashboard) for the most comprehensive and up-to-date list.
+See the [Software Ecosystem Dashboard for Arm](https://www.arm.com/developer-hub/ecosystem-dashboard) for the most comprehensive and up-to-date list.
@@ -6,18 +6,20 @@ weight: 3
 layout: learningpathall
 ---
 
+## Get started
+
 You can install Arm Performance Libraries on an Arm-based AWS instance, such as `t4g.2xlarge`, running Ubuntu 22.04 LTS. 
 
-For instructions to create and connect to an AWS instance, please refer to [Get started with Servers and Cloud Computing](/learning-paths/servers-and-cloud-computing/intro/). 
+For instructions to create and connect to an AWS instance, see [Get started with Servers and Cloud Computing](/learning-paths/servers-and-cloud-computing/intro/). 
 
-Once connected via `ssh`, install the required packages with the following commands. 
+Once connected via `ssh`, install the required packages with the following commands: 
 
 ```bash
 sudo apt update
 sudo apt install gcc g++ make -y
 ```
 
-Next, install Arm Performance Libraries with the commands below. For more information, refer to the [Arm Performance Libraries install guide](/install-guides/armpl/). 
+Next, install Arm Performance Libraries with the commands below. For more information, see the [Arm Performance Libraries install guide](/install-guides/armpl/). 
 
 ```bash
 wget https://developer.arm.com/-/cdn-downloads/permalink/Arm-Performance-Libraries/Version_24.10/arm-performance-libraries_24.10_deb_gcc.tar
@@ -41,13 +43,13 @@ You should see the `armpl/24.10.0_gcc` available.
 armpl/24.10.0_gcc  
 ```
 
-Load the module with the following command. 
+Load the module with the following command: 
 
 ```bash
 module load armpl/24.10.0_gcc
 ```
 
-Navigate to the `lp64` C source code examples and compile. 
+Navigate to the `lp64` C source code examples and compile: 
 
 ```bash
 cd $ARMPL_DIR/examples_lp64
@@ -62,6 +64,6 @@ Your terminal output shows the examples being compiled and the output ends with:
 Test passed OK
 ```
 
-For more information on all the available function, refer to the [Arm Performance Libraries Reference Guide](https://developer.arm.com/documentation/101004/latest/).
+For more information, see the [Arm Performance Libraries Reference Guide](https://developer.arm.com/documentation/101004/latest/).