|
| 1 | +--- |
| 2 | +title: Using Optimised Math Library |
| 3 | +weight: 4 |
| 4 | + |
| 5 | +### FIXED, DO NOT MODIFY |
| 6 | +layout: learningpathall |
| 7 | +--- |
| 8 | + |
| 9 | +## Example using Optimised Math library |
| 10 | + |
| 11 | +The `libamath` library from Arm is an optimized subset of the standard library math functions for Arm-based CPUs, providing both scalar and vector functions at different levels of precision. It includes vectorized versions (Neon and SVE) of common math functions found in the standard library, such as those in the `<cmath>` header. |
| 12 | + |
| 13 | +The trivial snippet below uses the `<cmath>` standard cmath header to calculate the base exponential of a scalar value. Copy and paste the code sample below into a file named `basic_math.cpp`. |
| 14 | + |
| 15 | +```c++ |
| 16 | +#include <iostream> |
| 17 | +#include <ctime> |
| 18 | +#include <cmath> // Include the standard library |
| 19 | + |
| 20 | +int main() { |
| 21 | + std::srand(std::time(0)); |
| 22 | + double random_number = std::rand() / static_cast<double>(RAND_MAX); |
| 23 | + double result = exp(random_number); // Use the standard exponential function |
| 24 | + std::cout << "Exponential of " << random_number << " is " << result << std::endl; |
| 25 | + return 0; |
| 26 | +} |
| 27 | +``` |
| 28 | + |
| 29 | +Compiling using the following g++ command. We can use the `ldd` command to print the shared objects for dynamic linking. Here we observe the superset `libm.so` is linked. |
| 30 | + |
| 31 | +```bash |
| 32 | +g++ basic_math.cpp -o basic_math |
| 33 | +ldd basic_math |
| 34 | +``` |
| 35 | +You should see the following output. |
| 36 | + |
| 37 | +```output |
| 38 | + linux-vdso.so.1 (0x0000f55218587000) |
| 39 | + libstdc++.so.6 => /lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000f55218200000) |
| 40 | + libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000f55218490000) |
| 41 | + libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000f55218050000) |
| 42 | + /lib/ld-linux-aarch64.so.1 (0x0000f5521854e000) |
| 43 | + libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000f55218460000) |
| 44 | +``` |
| 45 | + |
| 46 | +## Updating to use Optimised Library |
| 47 | + |
| 48 | +To use the optimised math library `libamath` requires minimal source code changes for our scalar example. Modify the include statements to point to the correct header file and additional compiler flags. |
| 49 | + |
| 50 | +Libamath routines have maximum errors inferior to 4 ULPs, where ULP stands for Unit in the Last Place, which is the smallest difference between two consecutive floating-point numbers at a specific precision. These routines only support the default rounding mode (round-to-nearest, ties to even). Therefore, switching from libm to libamath results in a small accuracy loss on a range of routines, similar to other vectorized implementations of these functions. |
| 51 | + |
| 52 | +Copy and paste the following C++ snippet into a file named `optimised_math.cpp`. |
| 53 | + |
| 54 | +```c++ |
| 55 | +#include <iostream> |
| 56 | +#include <ctime> |
| 57 | +#include <amath.h> // Include the Arm Performance Library header |
| 58 | + |
| 59 | +int main() { |
| 60 | + std::srand(std::time(0)); |
| 61 | + double random_number = std::rand() / static_cast<double>(RAND_MAX); |
| 62 | + double result = exp(random_number); // Use the optimized exp function from libamath |
| 63 | + std::cout << "Exponential of " << random_number << " is " << result << std::endl; |
| 64 | + return 0; |
| 65 | +} |
| 66 | +``` |
| 67 | + |
| 68 | +Compiling using the following g++ command. Again we can use the `ldd` command to print the shared objects for dynamic linking. |
| 69 | + |
| 70 | +```bash |
| 71 | +g++ optimised_math.cpp -o optimised_math -lamath -lm |
| 72 | +ldd optimised_math |
| 73 | +``` |
| 74 | +Now we can observe the `libamath.so` shared object is linked. |
| 75 | + |
| 76 | +```output |
| 77 | +
|
| 78 | + linux-vdso.so.1 (0x0000eb1eb379b000) |
| 79 | + libamath.so => /opt/arm/armpl_24.10_gcc/lib/libamath.so (0x0000eb1eb35c0000) |
| 80 | + libstdc++.so.6 => /lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000eb1eb3200000) |
| 81 | + libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000eb1eb3050000) |
| 82 | + libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000eb1eb3520000) |
| 83 | + /lib/ld-linux-aarch64.so.1 (0x0000eb1eb3762000) |
| 84 | + libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000eb1eb34f0000 |
| 85 | +``` |
| 86 | + |
| 87 | +### What about vector operations? |
| 88 | + |
| 89 | +The naming convention of the Arm Performance Library for scalar operations follows that of `libm`. Hence, we are able to simply update the header file and recompile. For vector operations, we can either rely on the compiler autovectorisation, whereby the compiler generates the vector code for us. This is used in the Arm Compiler for Linux (ACfL). Alternatively, we can use vector routines, which uses name mangling. Mangling is a technique used in computer programming to modify the names of vector functions to ensure uniqueness and avoid conflicts. This is particularly important in compiled languages like C++ and in environments where multiple libraries or modules may be used together. |
| 90 | + |
| 91 | +In the context of Arm's AArch64 architecture, vector name mangling follows the specific convention below to differentiate between scalar and vector versions of functions. |
| 92 | + |
| 93 | +```output |
| 94 | +'_ZGV' <isa> <mask> <vlen> <signature> '_' <original_name> |
| 95 | +``` |
| 96 | + |
| 97 | +- **original_name** : name of scalar libm function |
| 98 | +- **ISA** : 'n' for Neon, 's' for SVE |
| 99 | +- **Mask** : 'M' for masked/predicated version, 'N' for unmasked. Only masked routines are defined for SVE, and only unmasked for Neon. |
| 100 | +- **vlen** : integer number representing vector length expressed as number of lanes. For Neon <vlen>='2' in double-precision and <vlen>='4' in single-precision. For SVE, <vlen>='x'. |
| 101 | +- **signature** : 'v' for 1 input floating point or integer argument, 'vv' for 2. More details in AArch64's vector function ABI. |
| 102 | + |
| 103 | +Please refer to the [Arm Performance Library reference guide](https://developer.arm.com/documentation/101004/latest/) for more information. |
0 commit comments