Skip to content

Commit 13ba248

Browse files
Merge pull request #1690 from madeline-underwood/PortingPerfLib
Porting perf lib_andy to approve
2 parents 7ba390c + 59cbb55 commit 13ba248

File tree

5 files changed

+103
-62
lines changed

5 files changed

+103
-62
lines changed
Lines changed: 46 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,57 +1,82 @@
11
---
2-
title: Introduction to performance libraries
2+
title: "Introduction to Libraries"
33
weight: 2
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
## Introduction to performance libraries
9+
## Types of Library
1010

11-
The C++ Standard Library provides a collection of classes and functions that are essential for everyday programming tasks, such as data structures, algorithms, and input/output operations. It is designed to be versatile and easy to use, ensuring compatibility and portability across different platforms. However as a result of this portability, standard libraries introduce some limitations. Performance sensitive applications may wish to take maximum advantage of the hardware's capabilities - this is where performance libraries come in.
11+
C++ libraries generally fall into two major categories, each serving different needs. This section walks you through both the standard library and performance libraries, and outlines the purpose and characteristics of each.
1212

13-
Performance libraries are specialized for high-performance computing tasks and are often tailored to the microarchitecture of a specific processor. These libraries are optimized for speed and efficiency, often leveraging hardware-specific features such as vector units to achieve maximum performance. Performance libraries are crafted through extensive benchmarking and optimization, and can be domain-specific, such as genomics libraries, or produced by Arm for general-purpose computing. For example, OpenRNG focuses on generating random numbers quickly and efficiently, which is crucial for simulations and scientific computations, whereas the C++ Standard Library offers a more general-purpose approach with functions like `std::mt19937` for random number generation.
13+
### Standard C++ Library
1414

15-
Performance libraries for Arm CPUs, such as the Arm Performance Libraries (APL), provide highly optimized mathematical functions for scientific computing. An analogous library for accelerating routines on a GPU is cuBLAS for NVIDIA GPUs. These libraries can be linked dynamically at runtime or statically during compilation, offering flexibility in deployment. They are designed to support multiple versions of the Arm architecture, including those with NEON and SVE. Generally, minimal source code changes are required to use these libraries, making them suitable for porting and optimizing applications.
15+
The C++ Standard Library provides a collection of classes, functions, and templates that are defined by the C++ standard and are essential for everyday programming, such as:
1616

17-
### How can I choose the right version of a performance library?
17+
* Data structures.
18+
* Algorithms.
19+
* Input/output operations.
20+
* Utility functions.
1821

19-
Performance libraries are often distributed with multiple formats to support various use cases.
22+
### Trade-offs between versatility and performance
2023

21-
- **ILP64** uses 64 bits for representing integers, which are often used for indexing large arrays in scientific computing. In C++ source code we use the `long long` type to specify 64-bit integers.
24+
The C++ Standard Library is designed to be versatile and easy to use, ensuring compatibility and portability across different platforms. This portability comes at a cost, however, and standard libraries have some limitations. Designers of performance-sensitive applications might wish to take advantage of the hardware's full capabilities, and where they might be unable to do so through standard libraries, they can instead implement performance libraries that can bring these performance optimizations into effect.
2225

23-
- **LP64** uses 32 bits to present integers which are more common in general purpose applications.
26+
### Benefits of Performance libraries
2427

25-
- **Open Multi-process** (OpenMP) is a programming interface for paralleling workloads across many CPU cores across multiple platforms (i.e. x86, AArch64 etc.). Programmers interact primarily through compiler directives, such as `#pragma omp parallel` indicating which section of source code can be run in parallel and which sections require synchronization.
28+
Performance libraries are specialized for high-performance computing tasks and are often tailored to the microarchitecture of a specific processor. These libraries are optimized for speed and efficiency, often leveraging hardware-specific features such as vector units to achieve maximum performance.
2629

27-
Arm performance libraries like the x86 equivalent, Open Math Kernel Library (MKL) provide optimized functions for both ILP64 and LP64 as well as OpenMP or single threaded implementations. Further, the interface libraries are available as shared libraries for dynamic linking (i.e. `*.so`) or static linking (i.e. `*.a`).
30+
Crafted through extensive benchmarking and optimization, performance libraries can be domain-specific - such as genomics libraries - or for general-purpose computing. For example, OpenRNG focuses on generating random numbers quickly and efficiently, which is crucial for simulations and scientific computations, whereas the C++ Standard Library offers a more general-purpose approach with functions such as `std::mt19937` for random number generation.
2831

29-
### Why do multiple performance Libraries exist?
32+
Performance libraries for Arm CPUs - such as the Arm Performance Libraries (APL) - provide highly optimized mathematical functions for scientific computing. An analogous library for accelerating routines on a GPU is cuBLAS, which is available for NVIDIA GPUs.
3033

31-
A natural source of confusion stems from the plethora of similar seeming performance libraries. For example, OpenBLAS and NVIDIA Performance Libraries (NVPL) both have their own implementations for basic linear algebra subprograms (BLAS). This begs the question which one should a developer use?
34+
These libraries can be linked dynamically at runtime or statically during compilation, offering flexibility in deployment. They are designed to support multiple versions of the Arm architecture, including those with NEON and SVE. Generally, only minimal source code changes are required to use these libraries, making them ideal for porting and optimizing applications.
3235

33-
Multiple performance libraries coexist to cater to the diverse needs of different hardware architectures and applications. For instance, Arm performance libraries are optimized for Arm CPUs, leveraging the unique instruction sets and power efficiency. On the other hand, NVIDIA performance libraries for Grace CPUs are tailored to maximize the performance of NVIDIA's hardware.
36+
### How do I choose the right version of a performance library?
3437

35-
- **Hardware Specialization** Some libraries are designed to be cross-platform, supporting multiple hardware architectures to provide flexibility and broader usability. For example, the OpenBLAS library supports both Arm and x86 architectures, allowing developers to use the same library across different systems.
38+
Performance libraries are often distributed in multiple formats to support various use cases:
3639

37-
- **Domain-Specific Libraries**: Libraries are often created to handle specific domains or types of computations more efficiently. For instance, libraries like cuDNN are optimized for deep learning tasks, providing specialized functions that significantly speed up neural network training and inference.
40+
- **ILP64** uses 64 bits for representing integers, which are often used for indexing large arrays in scientific computing. In C++ source code, one uses the `long long` type to specify 64-bit integers.
3841

39-
- **Commercial Libraries**: Alternatively, some highly performant libraries require a license to use. This is more common in domain specific libraries such as computational chemistry or fluid dynamics.
42+
- **LP64** uses 32 bits to represent integers which are more common in general-purpose applications.
43+
44+
- **Open Multi-Processing** (OpenMP) is a cross-platform programming interface for parallelizing workloads across many CPU cores, such as x86 and AArch64. Programmers interact primarily through compiler directives, such as `#pragma omp parallel` indicating which section of source code can be run in parallel and which sections require synchronization.
45+
46+
Arm Performance Libraries, in common with their x86 equivalent, Open Math Kernel Library (MKL), provide optimized functions for both ILP64 and LP64, as well as for OpenMP or single-threaded implementations.
47+
48+
Additionally, interface libraries are available as shared libraries for dynamic linking, such as those with a `.so` file extension, or as static linking, such as those with a `.a` file extension.
49+
50+
### Which performance library should I choose?
51+
52+
A natural source of confusion stems from the plethora of similar performance libraries. For example, OpenBLAS and NVIDIA Performance Libraries (NVPL) each offer their own implementation of basic linear algebra subprograms (BLAS). This raises the question: which one should a developer choose?
53+
54+
Multiple performance libraries exist to meet the diverse needs of different hardware architectures and applications. For instance, Arm performance libraries are optimized for Arm CPUs, leveraging unique instruction sets and power efficiency. Meanwhile, NVIDIA performance libraries for Grace CPUs are tailored to maximize the performance of NVIDIA's hardware.
55+
56+
Here are some of the different types of performance libraries available:
57+
58+
- Hardware-specialized - some libraries are designed to be cross-platform, supporting multiple hardware architectures to provide flexibility and broader usability. For example, the OpenBLAS library supports both Arm and x86 architectures, allowing developers to use the same library across different systems.
59+
60+
- Domain-specific - libraries are often created to handle specific domains or types of computations more efficiently. For instance, libraries like cuDNN are optimized for deep learning tasks, providing specialized functions that significantly speed up neural network training and inference.
61+
62+
- Commercial - some highly-performant libraries require a license to use. This is more common in domain-specific libraries such as computational chemistry or fluid dynamics.
4063

4164
These factors contribute to the existence of multiple performance libraries, each tailored to meet the specific demands of various hardware and applications.
4265

4366
Invariably, there will be performance differences between each library and the best way to observe them is to use the library within your own application.
4467

45-
For more information on performance benchmarking you can read [Arm Performance Libraries 24.10](https://community.arm.com/arm-community-blogs/b/servers-and-cloud-computing-blog/posts/arm-performance-libraries-24-10).
68+
For more information on performance benchmarking, see [Arm Performance Libraries 24.10](https://community.arm.com/arm-community-blogs/b/servers-and-cloud-computing-blog/posts/arm-performance-libraries-24-10).
4669

4770
### What performance libraries are available on Arm?
4871

49-
For a directory of community-produced libraries we recommend looking at the the Software Ecosystem Dashboard for Arm. Each library may not be available as a binary and may need to be compiled from source. The table below gives examples of libraries that are available on Arm.
72+
For a directory of community-produced libraries, see the [Software Ecosystem Dashboard for Arm](https://www.arm.com/developer-hub/ecosystem-dashboard).
73+
74+
Each library might not be available as a binary and you might need to compile it from source. The table below gives examples of libraries that are available on Arm.
5075

5176
| Package / Library | Domain |
5277
| -------- | ------- |
5378
| Minimap2 | Long-read sequence alignment in genomics |
5479
| HMMER |Bioinformatics library for homologous sequences |
55-
| FFTW | Open-source fast fourier transform library |
80+
| FFTW | Open-source Fast Fourier Transform Library |
5681

57-
See the [Software Ecosystem Dashboard for Arm](https://www.arm.com/developer-hub/ecosystem-dashboard) for the most comprehensive and up-to-date list.
82+
See the [Software Ecosystem Dashboard for Arm](https://www.arm.com/developer-hub/ecosystem-dashboard) for the most comprehensive and up-to-date list.

content/learning-paths/servers-and-cloud-computing/using-and-porting-performance-libs/2.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,18 +6,20 @@ weight: 3
66
layout: learningpathall
77
---
88

9+
## Get started
10+
911
You can install Arm Performance Libraries on an Arm-based AWS instance, such as `t4g.2xlarge`, running Ubuntu 22.04 LTS.
1012

11-
For instructions to create and connect to an AWS instance, please refer to [Get started with Servers and Cloud Computing](/learning-paths/servers-and-cloud-computing/intro/).
13+
For instructions to create and connect to an AWS instance, see [Get started with Servers and Cloud Computing](/learning-paths/servers-and-cloud-computing/intro/).
1214

13-
Once connected via `ssh`, install the required packages with the following commands.
15+
Once connected via `ssh`, install the required packages with the following commands:
1416

1517
```bash
1618
sudo apt update
1719
sudo apt install gcc g++ make -y
1820
```
1921

20-
Next, install Arm Performance Libraries with the commands below. For more information, refer to the [Arm Performance Libraries install guide](/install-guides/armpl/).
22+
Next, install Arm Performance Libraries with the commands below. For more information, see the [Arm Performance Libraries install guide](/install-guides/armpl/).
2123

2224
```bash
2325
wget https://developer.arm.com/-/cdn-downloads/permalink/Arm-Performance-Libraries/Version_24.10/arm-performance-libraries_24.10_deb_gcc.tar
@@ -41,13 +43,13 @@ You should see the `armpl/24.10.0_gcc` available.
4143
armpl/24.10.0_gcc
4244
```
4345

44-
Load the module with the following command.
46+
Load the module with the following command:
4547

4648
```bash
4749
module load armpl/24.10.0_gcc
4850
```
4951

50-
Navigate to the `lp64` C source code examples and compile.
52+
Navigate to the `lp64` C source code examples and compile:
5153

5254
```bash
5355
cd $ARMPL_DIR/examples_lp64
@@ -62,6 +64,6 @@ Your terminal output shows the examples being compiled and the output ends with:
6264
Test passed OK
6365
```
6466

65-
For more information on all the available function, refer to the [Arm Performance Libraries Reference Guide](https://developer.arm.com/documentation/101004/latest/).
67+
For more information, see the [Arm Performance Libraries Reference Guide](https://developer.arm.com/documentation/101004/latest/).
6668

6769

0 commit comments

Comments
 (0)