Skip to content

Commit 8bec55a

Browse files
committed
Review performance libraries migration Learning Path
1 parent 6456ca0 commit 8bec55a

File tree

6 files changed

+98
-89
lines changed

6 files changed

+98
-89
lines changed
Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,55 +1,57 @@
11
---
2-
title: Introduction to Performance Libraries
2+
title: Introduction to performance libraries
33
weight: 2
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
## Introduction to Performance Libraries
9+
## Introduction to performance libraries
1010

11-
The C++ Standard Library provides a collection of classes and functions that are essential for everyday programming tasks, such as data structures, algorithms, and input/output operations. It is designed to be versatile and easy to use, ensuring compatibility and portability across different platforms. However as a result of this portability, standard libraries introduces some limitations. Performance sensitive applications may wish to take maximum advantage of the hardware's capabilities - this is where performance libraries come in.
11+
The C++ Standard Library provides a collection of classes and functions that are essential for everyday programming tasks, such as data structures, algorithms, and input/output operations. It is designed to be versatile and easy to use, ensuring compatibility and portability across different platforms. However as a result of this portability, standard libraries introduce some limitations. Performance sensitive applications may wish to take maximum advantage of the hardware's capabilities - this is where performance libraries come in.
1212

13-
Performance libraries are specialized for high-performance computing tasks and are often tailored to the microarchitecture of a specific processor. These libraries are optimized for speed and efficiency, often leveraging hardware-specific features such as vector units to achieve maximum performance. Performance libraries are crafted through extensive benchmarking and optimization, and can be domain-specific, such as genomics libraries, or produced by Arm for general-purpose computing. For example, OpenRNG focuses on generating random numbers quickly and efficiently, which is crucial for simulations and scientific computations, whereas the C++ Standard Library offers a more general-purpose approach with functions like std::mt19937 for random number generation.
13+
Performance libraries are specialized for high-performance computing tasks and are often tailored to the microarchitecture of a specific processor. These libraries are optimized for speed and efficiency, often leveraging hardware-specific features such as vector units to achieve maximum performance. Performance libraries are crafted through extensive benchmarking and optimization, and can be domain-specific, such as genomics libraries, or produced by Arm for general-purpose computing. For example, OpenRNG focuses on generating random numbers quickly and efficiently, which is crucial for simulations and scientific computations, whereas the C++ Standard Library offers a more general-purpose approach with functions like `std::mt19937` for random number generation.
1414

15-
Performance libraries for Arm CPUs, such as the Arm Performance Libraries (APL), provide highly optimized mathematical functions for scientific computing. An analogous library for accelerating routines on GPU is cuBLAS for NVIDIA GPUs. These libraries can be linked dynamically at runtime or statically during compilation, offering flexibility in deployment. They are designed to support multiple versions of the Arm architecture, including those with NEON and SVE extensions. Generally, minimal source code changes are required to support these libraries, making them simple for porting and optimising.
15+
Performance libraries for Arm CPUs, such as the Arm Performance Libraries (APL), provide highly optimized mathematical functions for scientific computing. An analogous library for accelerating routines on a GPU is cuBLAS for NVIDIA GPUs. These libraries can be linked dynamically at runtime or statically during compilation, offering flexibility in deployment. They are designed to support multiple versions of the Arm architecture, including those with NEON and SVE. Generally, minimal source code changes are required to use these libraries, making them suitable for porting and optimizing applications.
1616

17-
### Choosing the right version of a library
17+
### How can I choose the right version of a performance library?
1818

19-
Performance libraries are often distributed with the following formats to support various use cases.
19+
Performance libraries are often distributed with multiple formats to support various use cases.
2020

21-
- **ILP64** use 64 bits for representing integers, which are often used for indexing large arrays in scentific computing. In C++ source code we use the `long long` type to specify 64-bit integers.
21+
- **ILP64** uses 64 bits for representing integers, which are often used for indexing large arrays in scientific computing. In C++ source code we use the `long long` type to specify 64-bit integers.
2222

23-
- **LP64** use 32 bits to present integers which are more common in general purpose applications.
23+
- **LP64** uses 32 bits to present integers which are more common in general purpose applications.
2424

25-
- **Open Multi-process** (OpenMP) is a programming interface for paralleling workloads across many CPU cores on shared memory across multiple platforms (i.e. x86, AArch64 etc.). Programmers would interact primarily through compiler directives, such as `#pragma omp parallel` indicating which section of source code can be run on parallel and which sections require synchronisation. This learning path does not serve to teach you about OpenMP but presumes the reader is familiar.
25+
- **Open Multi-process** (OpenMP) is a programming interface for paralleling workloads across many CPU cores across multiple platforms (i.e. x86, AArch64 etc.). Programmers interact primarily through compiler directives, such as `#pragma omp parallel` indicating which section of source code can be run in parallel and which sections require synchronization.
2626

27-
Arm performance libraries like the x86 equivalent, Open Math Kernel Library (MKL) provide optimised functions for both ILP64 and LP64 as well as OpenMP or single threaded implementations. Further, the interface libraries are available as shared libraries for dynamic linking (i.e. `*.so`) or static linking (i.e. `*.a`).
27+
Arm performance libraries like the x86 equivalent, Open Math Kernel Library (MKL) provide optimized functions for both ILP64 and LP64 as well as OpenMP or single threaded implementations. Further, the interface libraries are available as shared libraries for dynamic linking (i.e. `*.so`) or static linking (i.e. `*.a`).
2828

2929
### Why do multiple performance Libraries exist?
3030

31-
A natural source of confusion stems from the plethora of similar seeming performance libraries, for example OpenBLAS, NVIDIA Performance Libraries (NVPL) which have their own implementations for specific functions, for example basic linear algebra subprograms (BLAS). This begs the question which one should a developer use?
31+
A natural source of confusion stems from the plethora of similar seeming performance libraries. For example, OpenBLAS and NVIDIA Performance Libraries (NVPL) both have their own implementations for basic linear algebra subprograms (BLAS). This begs the question which one should a developer use?
3232

33-
Multiple performance libraries coexist to cater to the diverse needs of different hardware architectures and applications. For instance, Arm performance libraries are optimized for Arm CPUs, leveraging their unique instruction sets and power efficiency. On the other hand, NVIDIA performance libraries for Grace CPU are tailored to maximize the performance of NVIDIA's Grace hardware features specific to their own Neoverse implementation.
33+
Multiple performance libraries coexist to cater to the diverse needs of different hardware architectures and applications. For instance, Arm performance libraries are optimized for Arm CPUs, leveraging the unique instruction sets and power efficiency. On the other hand, NVIDIA performance libraries for Grace CPUs are tailored to maximize the performance of NVIDIA's hardware.
3434

3535
- **Hardware Specialization** Some libraries are designed to be cross-platform, supporting multiple hardware architectures to provide flexibility and broader usability. For example, the OpenBLAS library supports both Arm and x86 architectures, allowing developers to use the same library across different systems.
3636

3737
- **Domain-Specific Libraries**: Libraries are often created to handle specific domains or types of computations more efficiently. For instance, libraries like cuDNN are optimized for deep learning tasks, providing specialized functions that significantly speed up neural network training and inference.
3838

39-
- **Commercial Libraries**: Alternatively, some highly performant libraries require a license to use. This is more common in domain specific libraries such as computations chemistry or fluid dynamics.
39+
- **Commercial Libraries**: Alternatively, some highly performant libraries require a license to use. This is more common in domain specific libraries such as computational chemistry or fluid dynamics.
4040

4141
These factors contribute to the existence of multiple performance libraries, each tailored to meet the specific demands of various hardware and applications.
4242

43-
Invariably, there will be performance differences between each library and the best way to observe it to use the library within your own program. For more information on performance benchmarking please read [this blog](https://community.arm.com/arm-community-blogs/b/servers-and-cloud-computing-blog/posts/arm-performance-libraries-24-10).
43+
Invariably, there will be performance differences between each library and the best way to observe them is to use the library within your own application.
4444

45-
### What performance libraries are available on Arm?
45+
For more information on performance benchmarking you can read [Arm Performance Libraries 24.10](https://community.arm.com/arm-community-blogs/b/servers-and-cloud-computing-blog/posts/arm-performance-libraries-24-10).
4646

47-
For a directory of community-produced libraries we recommend looking at the the Arm Ecosystem Dashboard. Each library may not be available as a binary and may need to be compiled from source. The table below gives and example of such libraries that are available on Arm with a link to the full dashboard at the bottom.
47+
### What performance libraries are available on Arm?
4848

49+
For a directory of community-produced libraries we recommend looking at the the Software Ecosystem Dashboard for Arm. Each library may not be available as a binary and may need to be compiled from source. The table below gives examples of libraries that are available on Arm.
4950

5051
| Package / Library | Domain |
5152
| -------- | ------- |
5253
| Minimap2 | Long-read sequence alignment in genomics |
5354
| HMMER |Bioinformatics library for homologous sequences |
5455
| FFTW | Open-source fast fourier transform library |
55-
|[Please see the Arm Ecosystem Dashboard](https://www.arm.com/developer-hub/ecosystem-dashboard) for the most comprehensive and up-to-date list.||
56+
57+
See the [Software Ecosystem Dashboard for Arm](https://www.arm.com/developer-hub/ecosystem-dashboard) for the most comprehensive and up-to-date list.
Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,43 @@
11
---
2-
title: Setting Up Your Environment
3-
weight: 2
2+
title: Set up your environment
3+
weight: 3
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
## Setting Up Your Environment
9+
You can install Arm Performance Libraries on an Arm-based AWS instance, such as `t4g.2xlarge`, running Ubuntu 22.04 LTS.
1010

11-
In this initial example we will use an Arm-based AWS `t4g.2xlarge` instance running Ubuntu 22.04 LTS along with the Arm Performance Libraries. For instructions to connect to an AWS instance, please see our [getting started guide](https://learn.arm.com/learning-paths/servers-and-cloud-computing/intro/).
11+
For instructions to create and connect to an AWS instance, please refer to [Get started with Servers and Cloud Computing](/learning-paths/servers-and-cloud-computing/intro/).
1212

1313
Once connected via `ssh`, install the required packages with the following commands.
1414

1515
```bash
1616
sudo apt update
17-
sudo apt install gcc make
17+
sudo apt install gcc g++ make -y
1818
```
19-
Next, install Arm performance libraries using the following [installation guide](https://learn.arm.com/install-guides/armpl/). Alternatively, use the commands below.
19+
20+
Next, install Arm Performance Libraries with the commands below. For more information, refer to the [Arm Performance Libraries install guide](/install-guides/armpl/).
2021

2122
```bash
2223
wget https://developer.arm.com/-/cdn-downloads/permalink/Arm-Performance-Libraries/Version_24.10/arm-performance-libraries_24.10_deb_gcc.tar
2324
tar xvf arm-performance-libraries_24.10_deb_gcc.tar
24-
cd arm-performance-libraries_24.10_deb/
25+
sudo ./arm-performance-libraries_24.10_deb/arm-performance-libraries_24.10_deb.sh --accept
2526
```
2627

27-
Now we need to install environment modules to set the required environment variables, allowing us to quickly build the example applications.
28+
Install environment modules to set the required environment variables. This allows you to quickly build the example applications.
2829

2930
```bash
30-
sudo add-apt-respository universe
3131
sudo apt install environment-modules
3232
source /usr/share/modules/init/bash
3333
export MODULEPATH=$MODULEPATH:/opt/arm/modulefiles
3434
module avail
3535
```
3636

37-
You should see the following `armpl/24.10.0_gcc` available.
37+
You should see the `armpl/24.10.0_gcc` available.
38+
3839
```output
39-
------------------------------------------------------------------------------------------------------- /opt/arm/modulefiles -------------------------------------------------------------------------------------------------------
40+
------------------------------------ /opt/arm/modulefiles ---------------------------------------
4041
armpl/24.10.0_gcc
4142
```
4243

@@ -49,18 +50,18 @@ module load armpl/24.10.0_gcc
4950
Navigate to the `lp64` C source code examples and compile.
5051

5152
```bash
52-
cd $ARMPL_DIR
53-
cd /examples_lp64/
54-
sudo -E make c_examples // -E is to preserve environment variables
53+
cd $ARMPL_DIR/examples_lp64
54+
# -E is to preserve environment variables
55+
sudo -E make c_examples
5556
```
5657

57-
Your terminal output should show the examples being compiled, ending with.
58+
Your terminal output shows the examples being compiled and the output ends with:
5859

5960
```output
6061
...
6162
Test passed OK
6263
```
6364

64-
For more information on all the available function, please refer to the [Arm Performance Libraries Reference Guide](https://developer.arm.com/documentation/101004/latest/).
65+
For more information on all the available function, refer to the [Arm Performance Libraries Reference Guide](https://developer.arm.com/documentation/101004/latest/).
6566

6667

0 commit comments

Comments
 (0)