Skip to content

Commit 91cdb9b

Browse files
committed
Tech review for Libmath accuracy Learning Path
1 parent 55f4141 commit 91cdb9b

File tree

6 files changed

+128
-72
lines changed

6 files changed

+128
-72
lines changed

content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/_index.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,31 @@
11
---
22
title: Understanding Libamath's vector accuracy modes
33

4+
draft: true
5+
cascade:
6+
draft: true
7+
48
minutes_to_complete: 20
59
author: Joana Cruz
610

7-
who_is_this_for: This is an introductory topic for software developers who want to learn how to use the different accuracy modes present in Libamath, a component of ArmPL. This feature was introduced in ArmPL 25.04.
11+
who_is_this_for: This is an introductory topic for software developers who want to learn how to use the different accuracy modes present in Libamath, a component of Arm Performance Libraries.
812

913
learning_objectives:
10-
- understand how accuracy is defined in Libamath;
11-
- pick an accuracy mode depending on your application.
12-
13-
# [libamath](https://developer.arm.com/documentation/101004/2504/, (component of [ArmPL (Arm Performance Libraries)](https://developer.arm.com/documentation/101004/2504/General-information/Arm-Performance-Libraries?lang=en)). Since libamath only provides vector functions on Linux, we assume you are working in a Linux environment where ArmPL is installed (meaning you completed [ArmPL's installation guide](https://learn.arm.com/install-guides/armpl/).)
14+
- Understand how accuracy is defined in Libamath.
15+
- Pick an appropriate accuracy mode for your application.
1416

1517
prerequisites:
16-
- An Arm computer running Linux
17-
- Build and install [ArmPL](https://learn.arm.com/install-guides/armpl/)
18+
- An Arm computer running Linux with [Arm Performance Libraries](https://learn.arm.com/install-guides/armpl/) version 25.04 or newer installed.
1819

1920
### Tags
2021
skilllevels: Introductory
2122
subjects: Performance and Architecture
2223
armips:
2324
- Neoverse
2425
tools_software_languages:
25-
- ArmPL
26+
- Arm Performance Libraries
2627
- GCC
27-
- Libamath
28+
- Libmath
2829
operatingsystems:
2930
- Linux
3031

content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/examples.md

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,18 @@
11
---
2-
title: Examples
2+
title: Arm Performance Libraries example
33
weight: 6
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
# Example
9+
# Arm Performance Libraries example
1010

11-
Here is an example invoking all accuracy modes of the Neon single precision exp function (where `ulp_error.h` is the implementation of ULP error explained in [this section](/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error/)):
11+
Here is an example invoking all accuracy modes of the Neon single precision exp function. The file `ulp_error.h` is from the previous section.
12+
13+
Make sure you have [Arm Performance Libraries](https://learn.arm.com/install-guides/armpl/) installed.
14+
15+
Use a text editor save the code below in a file named `example.c`.
1216

1317
```C { line_numbers = "true" }
1418
#include <amath.h>
@@ -46,14 +50,21 @@ int main(void) {
4650
}
4751
```
4852
49-
You can compile the above program with:
53+
Compile the program with:
54+
5055
```bash
5156
gcc -O2 -o example example.c -lamath -lm
5257
```
5358

54-
Running the example returns:
59+
Run the example:
60+
5561
```bash
56-
$ ./example
62+
./example
63+
```
64+
65+
The output is:
66+
67+
```output
5768
Libamath example:
5869
-----------------------------------------------
5970
// Display worst-case ULP error in expf for each
@@ -78,5 +89,5 @@ armpl_vexpq_f32_umax(-0x1.5b7322p+6) delivers result with half correct bits
7889
ULP error = 1745.2120
7990
```
8091

81-
The inputs we use for each variant correspond to the worst case scenario known to date (ULP Error argmax).
82-
This means that the ULP error should not be higher than the one we demonstrate here, meaning we stand below the thresholds we define for each accuracy.
92+
The inputs used for each variant correspond to the worst case scenario known to date (ULP Error argmax).
93+
This means that the ULP error should not be higher than the one demonstrated here, ensuring the results remain below the defined thresholds for each accuracy.

content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,13 @@ weight: 2
66
layout: learningpathall
77
---
88

9-
# Floating-Point Representation Basics
9+
## Floating-Point Representation Basics
1010

1111
Floating Point numbers are a finite and discrete approximation of the real numbers, allowing us to implement and compute functions in the continuous domain with an adequate (but limited) resolution.
1212

1313
A Floating Point number is typically expressed as:
1414

15-
```
15+
```output
1616
+/-d.dddd...d x B^e
1717
```
1818

@@ -33,14 +33,13 @@ Fixing `B=2, p=24`
3333

3434
{{% /notice %}}
3535

36-
Usually a Floating Point number has multiple non-normalized representations, but only 1 normalized representation (assuming leading digit is stricly smaller than base), when fixing a base and a precision.
37-
36+
Usually a Floating Point number has multiple non-normalized representations, but only 1 normalized representation (assuming leading digit is strictly smaller than base), when fixing a base and a precision.
3837

39-
## Building a Floating-Point Ruler
38+
### Building a Floating-Point Ruler
4039

4140
Given a base `B`, a precision `p`, a maximum exponent `emax` and a minimum exponent `emin`, we can create the set of all the normalized values in this system.
4241

43-
{{% notice Example 3 %}}
42+
{{% notice Example 2 %}}
4443
`B=2, p=3, emax=2, emin=-1`
4544

4645
| Significand | × 2⁻¹ | × 2⁰ | × 2¹ | × 2² |
@@ -53,44 +52,46 @@ Given a base `B`, a precision `p`, a maximum exponent `emax` and a minimum expon
5352

5453
{{% /notice %}}
5554

56-
Note that, for any given integer n, numbers are evenly spaced between 2ⁿ and 2ⁿ⁺¹. But the gap between them (also called [ULP](/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp/), which we explain in the more detail in the next section) grows as the exponent increases. So the spacing between floating point numbers gets larger as numbers get bigger.
55+
Note that, for any given integer n, numbers are evenly spaced between 2ⁿ and 2ⁿ⁺¹. But the gap between them (also called [ULP](/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp/), which is explained in the more detail in the next section) grows as the exponent increases. So the spacing between floating point numbers gets larger as numbers get bigger.
5756

5857
### The Floating-Point bitwise representation
59-
Since there are `B^p` possible mantissas, and `emax-emin+1` possible exponents, then we need `log2(B^p) + log2(emax-emin+1) + 1` (sign) bits to represent a given Floating Point number in a system.
60-
In Example 3, we need 3+2+1=6 bits.
6158

62-
We can then define Floating Point's bitwise representation in our system to be:
59+
Since there are `B^p` possible mantissas, and `emax-emin+1` possible exponents, then `log2(B^p) + log2(emax-emin+1) + 1` (sign) bits are needed to represent a given Floating Point number in a system.
60+
61+
In Example 2, 3+2+1=6 bits are needed.
62+
63+
Based on this, the floating point's bitwise representation is defined to be:
6364

6465
```
6566
b0 b1 b2 b3 b4 b5
6667
```
6768

6869
where
6970

70-
```
71+
```output
7172
b0 -> sign (S)
7273
b1, b2 -> exponent (E)
7374
b3, b4, b5 -> mantissa (M)
7475
```
7576

7677
However, this is not enough. In this bitwise definition, the possible values of E are 0, 1, 2, 3.
77-
But in the system we are trying to define, we are only interested in the the integer values in the range [-1, 2].
78+
But in the system being defined, only the integer values in the range [-1, 2] are of interest.
7879

79-
For this reason, E is called the biased exponent, and in order to retrieve the value it is trying to represent (i.e. the unbiased exponent) we need to add/subtract an offset to it (in this case we subtract 1):
80+
For this reason, E is called the biased exponent, and in order to retrieve the value it is trying to represent (i.e. the unbiased exponent) an offset must be added or subtracted (in this case, subtract 1):
8081

81-
```
82+
```output
8283
x = (-1)^S x M x 2^(E-1)
8384
```
8485

85-
# IEEE-754 Single Precision
86+
## IEEE-754 Single Precision
8687

8788
Single precision (also called float) is a 32-bit format defined by the [IEEE-754 Floating Point Standard](https://ieeexplore.ieee.org/document/8766229)
8889

8990
In this standard the sign is represented using 1 bit, the exponent uses 8 bits and the mantissa uses 23 bits.
9091

9192
The value of a (normalized) Floating Point in IEEE-754 can be represented as:
9293

93-
```
94+
```output
9495
x=(−1)^S x 1.M x 2^E−127
9596
```
9697

content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
---
2-
title: Accuracy Modes in Libamath
2+
title: Accuracy modes in Libamath
33
weight: 5
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

99

10-
# The 3 Accuracy Modes of Libamath
10+
## The 3 accuracy modes of Libamath
1111

1212
Libamath vector functions can come in various accuracy modes for the same mathematical function.
1313
This means, some of our functions allow users and compilers to choose between:
@@ -16,36 +16,36 @@ This means, some of our functions allow users and compilers to choose between:
1616
- **Low accuracy / max performance** (approx. ≤ 4096 ULP)
1717

1818

19-
# How Accuracy Modes Are Encoded in Libamath
19+
## How accuracy modes are encoded in Libamath
2020

2121
You can recognize the accuracy mode of a function by inspecting the **suffix** in its symbol:
2222

2323
- **`_u10`** → High accuracy
24-
E.g., `armpl_vcosq_f32_u10`
24+
For instance, `armpl_vcosq_f32_u10`
2525
Ensures results stay within **1 Unit in the Last Place (ULP)**.
2626

2727
- *(no suffix)* → Default accuracy
28-
E.g., `armpl_vcosq_f32`
28+
For instance, `armpl_vcosq_f32`
2929
Keeps errors within **3.5 ULP** — a sweet spot for many workloads.
3030

3131
- **`_umax`** → Low accuracy
32-
E.g., `armpl_vcosq_f32_umax`
32+
For instance, `armpl_vcosq_f32_umax`
3333
Prioritizes speed, tolerating errors up to **4096 ULP**, or roughly **11 correct bits** in single-precision.
3434

3535

36-
# Applications
36+
## Applications
3737

3838
Selecting an appropriate accuracy level helps avoid unnecessary compute cost while preserving output quality where it matters.
3939

4040

4141
### High Accuracy (≤ 1 ULP)
4242

43-
Use when **numerical (almost) correctness** is a priority. These routines involve precise algorithms (e.g., high-degree polynomials, careful range reduction, FMA usage) and are ideal for:
43+
Use when **numerical (almost) correctness** is a priority. These routines involve precise algorithms (such as high-degree polynomials, careful range reduction, or FMA usage) and are ideal for:
4444

4545
- **Scientific computing**
46-
e.g., simulations, finite element analysis
46+
such as simulations or finite element analysis
4747
- **Signal processing pipelines** [1,2]
48-
especially recursive filters or transform
48+
particularly recursive filters or transform
4949
- **Validation & reference implementations**
5050

5151
While slower, these functions provide **near-bitwise reproducibility** — critical in sensitive domains.
@@ -57,7 +57,7 @@ The default mode strikes a **practical balance** between performance and numeric
5757

5858
- **General-purpose math libraries**
5959
- **Analytics workloads** [3]
60-
e.g., log/sqrt during feature extraction
60+
such as log or sqrt during feature extraction
6161
- **Inference pipelines** [4]
6262
especially on edge devices where latency matters
6363

@@ -69,15 +69,15 @@ Also suitable for many **scientific workloads** that can tolerate modest error i
6969
This mode trades precision for speed — aggressively. It's designed for:
7070

7171
- **Games, graphics, and shaders** [5]
72-
e.g., approximating sin/cos for animation curves
72+
such as approximating sin or cos for animation curves
7373
- **Monte Carlo simulations**
7474
where statistical convergence outweighs per-sample accuracy [6]
7575
- **Genetic algorithms, audio processing, and embedded DSP**
7676

7777
Avoid in control-flow-critical code or where **errors amplify**.
7878

7979

80-
# Summary
80+
## Summary
8181

8282
| Accuracy Mode | Libamath example | Approx. Error | Performance | Typical Applications |
8383
|---------------|------------------------|------------------|-------------|-----------------------------------------------------------|
@@ -87,7 +87,9 @@ Avoid in control-flow-critical code or where **errors amplify**.
8787

8888

8989

90-
**Pro tip:** If your workload has mixed precision needs, you can *selectively call different accuracy modes* for different parts of your pipeline. Libamath lets you tailor precision where it matters — and boost performance where it doesn’t.
90+
{{% notice Tip %}}
91+
If your workload has mixed precision needs, you can *selectively call different accuracy modes* for different parts of your pipeline. Libamath lets you tailor precision where it matters — and boost performance where it doesn’t.
92+
{{% /notice %}}
9193

9294

9395
#### References

content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error.md

Lines changed: 26 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ layout: learningpathall
88

99
# ULP Error and Accuracy
1010

11-
In the development of Libamath, we use a metric called ULP error to assess the accuracy of our functions.
11+
In the development of Libamath, a metric called ULP error is used to assess the accuracy of functions.
1212
This metric measures the distance between two numbers, a reference (`want`) and an approximation (`got`), relative to how many floating-point “steps” (ULPs) these two numbers are apart.
1313

1414
It can be calculated by:
@@ -17,14 +17,14 @@ It can be calculated by:
1717
ulp_err = | want - got | / ULP(want)
1818
```
1919

20-
Because this is a relative measure in terms of floating-point spacing (ULPs) - i.e. this metric is scale-aware - it is ideal for comparing accuracy across magnitudes. Otherwise, error measure would be very biased by the uneven distribution of the floats.
20+
Because this is a relative measure in terms of floating-point spacing (ULPs)—that is, this metric is scale-awareit is ideal for comparing accuracy across magnitudes. Otherwise, error measures would be very biased by the uneven distribution of the floats.
2121

2222

2323
# ULP Error Implementation
2424

25-
In practice, however, the above expression may take different forms, to account for sources of error that may happen during the computation of the error itself.
25+
In practice, however, the above expression may take different forms to account for sources of error that may occur during the computation of the error itself.
2626

27-
In our implementation, this quantity is held by a term called `tail`:
27+
In the implementation used here, this quantity is held by a term called `tail`:
2828

2929
```
3030
ulp_err = | (got - want) / ULP(want) - tail |
@@ -36,8 +36,9 @@ This term takes into account the error introduced by casting `want` from a highe
3636
tail = | (want_l - want) / ULP(want) |
3737
```
3838

39-
Here is a simplified version of our ULP Error (where `ulp.h` is the implementation of ULP in the [previous section](/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp/)):
39+
Here is a simplified version of the ULP Error. Use the same `ulp.h` from the previous section.
4040

41+
Use a text editor to opy the code below into a new file `ulp_error.h`.
4142

4243
```C
4344
// Defines ulpscale(x)
@@ -69,15 +70,17 @@ double ulp_error(float got, double want_l) {
6970
```
7071
Note that the final scaling is done with respect to the rounded reference.
7172
72-
In this implementation, it is possible to get exactly 0.0 ULP error in this implementation if and only if:
73+
In this implementation, it is possible to get exactly 0.0 ULP error if and only if:
7374
7475
* The high-precision reference (`want_l`, a double) is exactly representable as a float, and
7576
* The computed result (`got`) is bitwise equal to that float representation.
7677
77-
Here is a small snippet to check out this implementation in action.
78+
Below is a small example to check this implementation.
7879
80+
Save the code below into a file named `ulp_error.c`.
7981
8082
```C
83+
#include <stdio.h>
8184
#include "ulp_error.h"
8285
8386
int main() {
@@ -88,9 +91,23 @@ int main() {
8891
return 0;
8992
}
9093
```
94+
95+
Compile the program with GCC.
96+
97+
```bash
98+
gcc -O2 ulp_error.c -o ulp_error
99+
```
100+
101+
Run the program:
102+
103+
```bash
104+
./ulp_error
105+
```
106+
91107
The output should be:
108+
92109
```
93110
ULP error: 1.0
94111
```
95-
Note that
96-
If you are interested in diving into the full implementation of the ulp error we use internally, you can consult the [tester](https://github.com/ARM-software/optimized-routines/tree/master/math/test) tool in [AOR](https://github.com/ARM-software/optimized-routines/tree/master), with particular focus to the [ulp.h](https://github.com/ARM-software/optimized-routines/blob/master/math/test/ulp.h) file. Note this tool also handles special cases and considers the effect of different rounding modes in the ULP error.
112+
113+
If you are interested in diving into the full implementation of the ulp error, you can consult the [tester](https://github.com/ARM-software/optimized-routines/tree/master/math/test) tool in [AOR](https://github.com/ARM-software/optimized-routines/tree/master), with particular focus to the [ulp.h](https://github.com/ARM-software/optimized-routines/blob/master/math/test/ulp.h) file. Note this tool also handles special cases and considers the effect of different rounding modes in the ULP error.

0 commit comments

Comments
 (0)