Skip to content

Commit 55f4141

Browse files
Merge pull request #2020 from joanaxcruz/main
New learning path: Understanding Libamath's vector accuracy modes
2 parents 789fcd6 + e3f8e48 commit 55f4141

File tree

7 files changed

+621
-0
lines changed

7 files changed

+621
-0
lines changed
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
---
2+
title: Understanding Libamath's vector accuracy modes
3+
4+
minutes_to_complete: 20
5+
author: Joana Cruz
6+
7+
who_is_this_for: This is an introductory topic for software developers who want to learn how to use the different accuracy modes present in Libamath, a component of ArmPL. This feature was introduced in ArmPL 25.04.
8+
9+
learning_objectives:
10+
- understand how accuracy is defined in Libamath;
11+
- pick an accuracy mode depending on your application.
12+
13+
# [libamath](https://developer.arm.com/documentation/101004/2504/, (component of [ArmPL (Arm Performance Libraries)](https://developer.arm.com/documentation/101004/2504/General-information/Arm-Performance-Libraries?lang=en)). Since libamath only provides vector functions on Linux, we assume you are working in a Linux environment where ArmPL is installed (meaning you completed [ArmPL's installation guide](https://learn.arm.com/install-guides/armpl/).)
14+
15+
prerequisites:
16+
- An Arm computer running Linux
17+
- Build and install [ArmPL](https://learn.arm.com/install-guides/armpl/)
18+
19+
### Tags
20+
skilllevels: Introductory
21+
subjects: Performance and Architecture
22+
armips:
23+
- Neoverse
24+
tools_software_languages:
25+
- ArmPL
26+
- GCC
27+
- Libamath
28+
operatingsystems:
29+
- Linux
30+
31+
further_reading:
32+
- resource:
33+
title: ArmPL Libamath Documentation
34+
link: https://developer.arm.com/documentation/101004/2410/General-information/Arm-Performance-Libraries-math-functions
35+
type: documentation
36+
# - resource:
37+
# title: PLACEHOLDER BLOG
38+
# link: PLACEHOLDER BLOG LINK
39+
# type: blog
40+
- resource:
41+
title: ArmPL Installation Guide
42+
link: https://learn.arm.com/install-guides/armpl/
43+
type: website
44+
45+
46+
47+
### FIXED, DO NOT MODIFY
48+
# ================================================================================
49+
weight: 1 # _index.md always has weight of 1 to order correctly
50+
layout: "learningpathall" # All files under learning paths have this same wrapper
51+
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
52+
---
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
# ================================================================================
3+
# FIXED, DO NOT MODIFY THIS FILE
4+
# ================================================================================
5+
weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
6+
title: "Next Steps" # Always the same, html page title.
7+
layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
8+
---
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
---
2+
title: Examples
3+
weight: 6
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
# Example
10+
11+
Here is an example invoking all accuracy modes of the Neon single precision exp function (where `ulp_error.h` is the implementation of ULP error explained in [this section](/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error/)):
12+
13+
```C { line_numbers = "true" }
14+
#include <amath.h>
15+
#include <stdio.h>
16+
#include <stdlib.h>
17+
#include <math.h>
18+
19+
#include "ulp_error.h"
20+
21+
void check_accuracy(float32x4_t (__attribute__((aarch64_vector_pcs)) *vexp_fun)(float32x4_t), float arg, const char *label) {
22+
float32x4_t varg = vdupq_n_f32(arg);
23+
float32x4_t vres = vexp_fun(varg);
24+
double want = exp((double)arg);
25+
float got = vgetq_lane_f32(vres, 0);
26+
27+
printf(label, arg);
28+
printf("\n got = %a\n", got);
29+
printf(" (float)want = %a\n", (float)want);
30+
printf(" want = %.12a\n", want);
31+
printf(" ULP error = %.4f\n\n", ulp_error(got, want));
32+
}
33+
34+
int main(void) {
35+
// Inputs that trigger worst-case errors for each accuracy mode
36+
printf("Libamath example:\n");
37+
printf("-----------------------------------------------\n");
38+
printf(" // Display worst-case ULP error in expf for each\n");
39+
printf(" // accuracy mode, along with approximate (`got`) and exact results (`want`)\n\n");
40+
41+
check_accuracy (armpl_vexpq_f32_u10, 0x1.ab312p+4, "armpl_vexpq_f32_u10(%a) delivers error under 1.0 ULP");
42+
check_accuracy (armpl_vexpq_f32, 0x1.8163ccp+5, "armpl_vexpq_f32(%a) delivers error under 3.5 ULP");
43+
check_accuracy (armpl_vexpq_f32_umax, -0x1.5b7322p+6, "armpl_vexpq_f32_umax(%a) delivers result with half correct bits");
44+
45+
return 0;
46+
}
47+
```
48+
49+
You can compile the above program with:
50+
```bash
51+
gcc -O2 -o example example.c -lamath -lm
52+
```
53+
54+
Running the example returns:
55+
```bash
56+
$ ./example
57+
Libamath example:
58+
-----------------------------------------------
59+
// Display worst-case ULP error in expf for each
60+
// accuracy mode, along with approximate (`got`) and exact results (`want`)
61+
62+
armpl_vexpq_f32_u10(0x1.ab312p+4) delivers error under 1.0 ULP
63+
got = 0x1.6ee554p+38
64+
(float)want = 0x1.6ee556p+38
65+
want = 0x1.6ee555bb01d1p+38
66+
ULP error = 0.8652
67+
68+
armpl_vexpq_f32(0x1.8163ccp+5) delivers error under 3.5 ULP
69+
got = 0x1.6a09ep+69
70+
(float)want = 0x1.6a09e4p+69
71+
want = 0x1.6a09e3e3d585p+69
72+
ULP error = 1.9450
73+
74+
armpl_vexpq_f32_umax(-0x1.5b7322p+6) delivers result with half correct bits
75+
got = 0x1.9b56bep-126
76+
(float)want = 0x1.9b491cp-126
77+
want = 0x1.9b491b9376d3p-126
78+
ULP error = 1745.2120
79+
```
80+
81+
The inputs we use for each variant correspond to the worst case scenario known to date (ULP Error argmax).
82+
This means that the ULP error should not be higher than the one we demonstrate here, meaning we stand below the thresholds we define for each accuracy.
Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
---
2+
title: Floating Point Representation
3+
weight: 2
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
# Floating-Point Representation Basics
10+
11+
Floating Point numbers are a finite and discrete approximation of the real numbers, allowing us to implement and compute functions in the continuous domain with an adequate (but limited) resolution.
12+
13+
A Floating Point number is typically expressed as:
14+
15+
```
16+
+/-d.dddd...d x B^e
17+
```
18+
19+
where:
20+
* B is the base;
21+
* e is the exponent;
22+
* d.dddd...d is the mantissa (or significand). It is p-bit word, where p represents the precision;
23+
* +/- sign which is usually stored separately.
24+
25+
If the leading digit is non-zero then it is a normalized representation/normal number.
26+
27+
{{% notice Example 1 %}}
28+
Fixing `B=2, p=24`
29+
30+
`0.1 = 1.10011001100110011001101 × 2^4` is a normalized representation of 0.1
31+
32+
`0.1 = 0.000110011001100110011001 × 2^0` is a non normalized representation of 0.1
33+
34+
{{% /notice %}}
35+
36+
Usually a Floating Point number has multiple non-normalized representations, but only 1 normalized representation (assuming leading digit is stricly smaller than base), when fixing a base and a precision.
37+
38+
39+
## Building a Floating-Point Ruler
40+
41+
Given a base `B`, a precision `p`, a maximum exponent `emax` and a minimum exponent `emin`, we can create the set of all the normalized values in this system.
42+
43+
{{% notice Example 3 %}}
44+
`B=2, p=3, emax=2, emin=-1`
45+
46+
| Significand | × 2⁻¹ | × 2⁰ | × 2¹ | × 2² |
47+
|-------------|-------|------|------|------|
48+
| 1.00 (1.0) | 0.5 | 1.0 | 2.0 | 4.0 |
49+
| 1.01 (1.25) | 0.625 | 1.25 | 2.5 | 5.0 |
50+
| 1.10 (1.5) | 0.75 | 1.5 | 3.0 | 6.0 |
51+
| 1.11 (1.75) | 0.875 | 1.75 | 3.5 | 7.0 |
52+
53+
54+
{{% /notice %}}
55+
56+
Note that, for any given integer n, numbers are evenly spaced between 2ⁿ and 2ⁿ⁺¹. But the gap between them (also called [ULP](/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp/), which we explain in the more detail in the next section) grows as the exponent increases. So the spacing between floating point numbers gets larger as numbers get bigger.
57+
58+
### The Floating-Point bitwise representation
59+
Since there are `B^p` possible mantissas, and `emax-emin+1` possible exponents, then we need `log2(B^p) + log2(emax-emin+1) + 1` (sign) bits to represent a given Floating Point number in a system.
60+
In Example 3, we need 3+2+1=6 bits.
61+
62+
We can then define Floating Point's bitwise representation in our system to be:
63+
64+
```
65+
b0 b1 b2 b3 b4 b5
66+
```
67+
68+
where
69+
70+
```
71+
b0 -> sign (S)
72+
b1, b2 -> exponent (E)
73+
b3, b4, b5 -> mantissa (M)
74+
```
75+
76+
However, this is not enough. In this bitwise definition, the possible values of E are 0, 1, 2, 3.
77+
But in the system we are trying to define, we are only interested in the the integer values in the range [-1, 2].
78+
79+
For this reason, E is called the biased exponent, and in order to retrieve the value it is trying to represent (i.e. the unbiased exponent) we need to add/subtract an offset to it (in this case we subtract 1):
80+
81+
```
82+
x = (-1)^S x M x 2^(E-1)
83+
```
84+
85+
# IEEE-754 Single Precision
86+
87+
Single precision (also called float) is a 32-bit format defined by the [IEEE-754 Floating Point Standard](https://ieeexplore.ieee.org/document/8766229)
88+
89+
In this standard the sign is represented using 1 bit, the exponent uses 8 bits and the mantissa uses 23 bits.
90+
91+
The value of a (normalized) Floating Point in IEEE-754 can be represented as:
92+
93+
```
94+
x=(−1)^S x 1.M x 2^E−127
95+
```
96+
97+
The exponent bias of 127 allows storage of exponents from -126 to +127. The leading digit is implicit - that is we have 24 bits of precision. In normalized numbers the leading digit is implicitly 1.
98+
99+
{{% notice Special Cases in IEEE-754 Single Precision %}}
100+
Since we have 8 bits of storage, meaning E ranges between 0 and 2^8-1=255. However not all these 256 values are going to be used for normal numbers.
101+
102+
If the exponent E is:
103+
* 0, then we are either in the presence of a denormalized number or a 0 (if M is 0 as well);
104+
* 1 to 254 then we are in the normalized range;
105+
* 255 then we are in the presence of Inf (if M==0), or Nan (if M!=0).
106+
107+
Subnormal numbers (also called denormal numbers) are special floating-point values defined by the IEEE-754 standard.
108+
109+
They allow the representation of numbers very close to zero, smaller than what is normally possible with the standard exponent range.
110+
111+
Subnormal numbers do not have the a leading 1 in their representation. They also assume exponent is 0.
112+
113+
The interpretation of denormal Floating Point in IEEE-754 can be represented as:
114+
115+
```
116+
x=(−1)^S x 0.M x 2^−126
117+
```
118+
119+
<!-- ### Subnormal numbers
120+
121+
Subnormal numbers (also called denormal numbers) are special floating-point values defined by the IEEE-754 standard.
122+
They allow the representation of numbers very close to zero, smaller than what is normally possible with the standard exponent range.
123+
Subnormal numbers do not have the a leading 1 in their representation. They also assume exponent is 0.
124+
125+
x=(−1)^s x 0.M x 2^−126
126+
127+
-->
128+
129+
<!-- | Significand | 0.? × 2⁻¹ | 1.? × 2⁻¹ | 1.? × 2⁰ | 1.? × 2¹ | 1.? × 2² |
130+
|-------------|-----------|-----------|----------|----------|----------|
131+
| 00 (1.0) | 0 | 0.5 | 1.0 | 2.0 | 4.0 |
132+
| 01 (1.25) | 0.125 | 0.625 | 1.25 | 2.5 | 5.0 |
133+
| 10 (1.5) | 0.25 | 0.75 | 1.5 | 3.0 | 6.0 |
134+
| 11 (1.75) | 0.375 | 0.875 | 1.75 | 3.5 | 7.0 | -->
135+
{{% /notice %}}
136+
137+
If you're interested in diving deeper in this subject, [What Every Computer Scientist Should Know About Floating-Point Arithmetic](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) by David Goldberg is a good place to start.
138+
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
---
2+
title: Accuracy Modes in Libamath
3+
weight: 5
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
10+
# The 3 Accuracy Modes of Libamath
11+
12+
Libamath vector functions can come in various accuracy modes for the same mathematical function.
13+
This means, some of our functions allow users and compilers to choose between:
14+
- **High accuracy** (≤ 1 ULP)
15+
- **Default accuracy** (≤ 3.5 ULP)
16+
- **Low accuracy / max performance** (approx. ≤ 4096 ULP)
17+
18+
19+
# How Accuracy Modes Are Encoded in Libamath
20+
21+
You can recognize the accuracy mode of a function by inspecting the **suffix** in its symbol:
22+
23+
- **`_u10`** → High accuracy
24+
E.g., `armpl_vcosq_f32_u10`
25+
Ensures results stay within **1 Unit in the Last Place (ULP)**.
26+
27+
- *(no suffix)* → Default accuracy
28+
E.g., `armpl_vcosq_f32`
29+
Keeps errors within **3.5 ULP** — a sweet spot for many workloads.
30+
31+
- **`_umax`** → Low accuracy
32+
E.g., `armpl_vcosq_f32_umax`
33+
Prioritizes speed, tolerating errors up to **4096 ULP**, or roughly **11 correct bits** in single-precision.
34+
35+
36+
# Applications
37+
38+
Selecting an appropriate accuracy level helps avoid unnecessary compute cost while preserving output quality where it matters.
39+
40+
41+
### High Accuracy (≤ 1 ULP)
42+
43+
Use when **numerical (almost) correctness** is a priority. These routines involve precise algorithms (e.g., high-degree polynomials, careful range reduction, FMA usage) and are ideal for:
44+
45+
- **Scientific computing**
46+
e.g., simulations, finite element analysis
47+
- **Signal processing pipelines** [1,2]
48+
especially recursive filters or transform
49+
- **Validation & reference implementations**
50+
51+
While slower, these functions provide **near-bitwise reproducibility** — critical in sensitive domains.
52+
53+
54+
### Default Accuracy (≤ 3.5 ULP)
55+
56+
The default mode strikes a **practical balance** between performance and numerical fidelity. It’s optimized for:
57+
58+
- **General-purpose math libraries**
59+
- **Analytics workloads** [3]
60+
e.g., log/sqrt during feature extraction
61+
- **Inference pipelines** [4]
62+
especially on edge devices where latency matters
63+
64+
Also suitable for many **scientific workloads** that can tolerate modest error in exchange for **faster throughput**.
65+
66+
67+
### Low Accuracy / Max Performance (≤ 4096 ULP)
68+
69+
This mode trades precision for speed — aggressively. It's designed for:
70+
71+
- **Games, graphics, and shaders** [5]
72+
e.g., approximating sin/cos for animation curves
73+
- **Monte Carlo simulations**
74+
where statistical convergence outweighs per-sample accuracy [6]
75+
- **Genetic algorithms, audio processing, and embedded DSP**
76+
77+
Avoid in control-flow-critical code or where **errors amplify**.
78+
79+
80+
# Summary
81+
82+
| Accuracy Mode | Libamath example | Approx. Error | Performance | Typical Applications |
83+
|---------------|------------------------|------------------|-------------|-----------------------------------------------------------|
84+
| `_u10` | _ZGVnN4v_cosf_u10 | ≤1.0 ULP | Low | Scientific computing, backpropagation, validation |
85+
| *(default)* | _ZGVnN4v_cosf | ≤3.5 ULP | Medium | General compute, analytics, inference |
86+
| `_umax` | _ZGVnN4v_cosf_umax | ≤4096 ULP | High | Real-time graphics, DSP, approximations, simulations |
87+
88+
89+
90+
**Pro tip:** If your workload has mixed precision needs, you can *selectively call different accuracy modes* for different parts of your pipeline. Libamath lets you tailor precision where it matters — and boost performance where it doesn’t.
91+
92+
93+
#### References
94+
1. Higham, N. J. (2002). *Accuracy and Stability of Numerical Algorithms* (2nd ed.). SIAM.
95+
96+
2. Texas Instruments. Overflow Avoidance Techniques in Cascaded IIR Filter Implementations on the TMS320 DSPs. Application Report SPRA509, 1999.
97+
https://www.ti.com/lit/pdf/spra509
98+
99+
3. Ma, S., & Huai, J. (2019). Approximate Computation for Big Data Analytics. arXiv:1901.00232.
100+
https://arxiv.org/pdf/1901.00232
101+
102+
4. Gupta, S., Agrawal, A., Gopalakrishnan, K., & Narayanan, P. (2015). Deep Learning with Limited Numerical Precision. In Proceedings of the 32nd International Conference on Machine Learning (ICML), PMLR 37.
103+
https://proceedings.mlr.press/v37/gupta15.html
104+
105+
5. Unity Technologies. *Precision Modes*. Unity Shader Graph Documentation.
106+
[https://docs.unity3d.com/Packages/[email protected]/manual/Precision-Modes.html](https://docs.unity3d.com/Packages/[email protected]/manual/Precision-Modes.html)
107+
108+
6. Croci, M., Gorman, G. J., & Giles, M. B. (2021). Rounding Error using Low Precision Approximate Random Variables. arXiv:2012.09739.
109+
https://arxiv.org/abs/2012.09739
110+

0 commit comments

Comments
 (0)