Skip to content

Commit e007e49

Browse files
Merge pull request #2101 from madeline-underwood/libamath
Libamath_JA to check
2 parents 6d68715 + 30755bf commit e007e49

File tree

6 files changed

+155
-124
lines changed

6 files changed

+155
-124
lines changed

content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/_index.md

Lines changed: 6 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,15 @@
11
---
2-
title: Understanding Libamath's vector accuracy modes
3-
4-
draft: true
5-
cascade:
6-
draft: true
2+
title: Select accuracy modes in Libamath (Arm Performance Libraries)
73

84
minutes_to_complete: 20
95
author: Joana Cruz
106

11-
who_is_this_for: This is an introductory topic for software developers who want to learn how to use the different accuracy modes present in Libamath, a component of Arm Performance Libraries.
7+
who_is_this_for: This is an introductory topic for developers who want to use the different accuracy modes for vectorized math functions in Libamath, a component of Arm Performance Libraries.
128

139
learning_objectives:
14-
- Understand how accuracy is defined in Libamath.
15-
- Pick an appropriate accuracy mode for your application.
10+
- Understand how accuracy is defined in Libamath
11+
- Select an appropriate accuracy mode for your application
12+
- Use Libamath with different vector accuracy modes in practice
1613

1714
prerequisites:
1815
- An Arm computer running Linux with [Arm Performance Libraries](https://learn.arm.com/install-guides/armpl/) version 25.04 or newer installed.
@@ -25,7 +22,7 @@ armips:
2522
tools_software_languages:
2623
- Arm Performance Libraries
2724
- GCC
28-
- Libmath
25+
- Libamath
2926
operatingsystems:
3027
- Linux
3128

@@ -34,10 +31,6 @@ further_reading:
3431
title: ArmPL Libamath Documentation
3532
link: https://developer.arm.com/documentation/101004/2410/General-information/Arm-Performance-Libraries-math-functions
3633
type: documentation
37-
# - resource:
38-
# title: PLACEHOLDER BLOG
39-
# link: PLACEHOLDER BLOG LINK
40-
# type: blog
4134
- resource:
4235
title: ArmPL Installation Guide
4336
link: https://learn.arm.com/install-guides/armpl/

content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/examples.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Here is an example invoking all accuracy modes of the Neon single precision exp
1212

1313
Make sure you have [Arm Performance Libraries](https://learn.arm.com/install-guides/armpl/) installed.
1414

15-
Use a text editor save the code below in a file named `example.c`.
15+
Use a text editor to save the code below in a file named `example.c`.
1616

1717
```C { line_numbers = "true" }
1818
#include <amath.h>
@@ -40,7 +40,7 @@ int main(void) {
4040
printf("Libamath example:\n");
4141
printf("-----------------------------------------------\n");
4242
printf(" // Display worst-case ULP error in expf for each\n");
43-
printf(" // accuracy mode, along with approximate (`got`) and exact results (`want`)\n\n");
43+
printf(" // accuracy mode, along with approximate (\\\"got\\\") and exact results (\\\"want\\\")\n\n");
4444

4545
check_accuracy (armpl_vexpq_f32_u10, 0x1.ab312p+4, "armpl_vexpq_f32_u10(%a) delivers error under 1.0 ULP");
4646
check_accuracy (armpl_vexpq_f32, 0x1.8163ccp+5, "armpl_vexpq_f32(%a) delivers error under 3.5 ULP");
@@ -89,5 +89,5 @@ armpl_vexpq_f32_umax(-0x1.5b7322p+6) delivers result with half correct bits
8989
ULP error = 1745.2120
9090
```
9191

92-
The inputs used for each variant correspond to the worst case scenario known to date (ULP Error argmax).
92+
The inputs used for each variant correspond to the current worst-case scenario known to date (ULP Error argmax).
9393
This means that the ULP error should not be higher than the one demonstrated here, ensuring the results remain below the defined thresholds for each accuracy.
Lines changed: 73 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,66 @@
11
---
2-
title: Floating Point Representation
2+
title: Floating-point representation
33
weight: 2
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
## Floating-Point Representation Basics
9+
## Understanding the floating-point number system and IEEE-754 format
1010

11-
Floating Point numbers are a finite and discrete approximation of the real numbers, allowing us to implement and compute functions in the continuous domain with an adequate (but limited) resolution.
11+
Floating-point numbers are essential for representing real numbers in computing, but they come with limits on precision and range.
1212

13-
A Floating Point number is typically expressed as:
13+
This Learning Path covers the following:
14+
15+
* How floating-point values are structured
16+
* How bitwise representation works
17+
* The IEEE-754 standard definition, including special values such as NaN and subnormals
18+
19+
## What is a floating-point number?
20+
21+
Floating-point numbers are a finite, discrete approximation of real numbers. They allow functions in the continuous domain to be computed with adequate, but limited, resolution.
22+
23+
A floating-point number is typically expressed as:
1424

1525
```output
16-
+/-d.dddd...d x B^e
26+
± d.dddd...d × B^e
1727
```
1828

1929
where:
20-
* B is the base;
21-
* e is the exponent;
22-
* d.dddd...d is the mantissa (or significand). It is p-bit word, where p represents the precision;
23-
* +/- sign which is usually stored separately.
30+
* B is the base
31+
* e is the exponent
32+
* d.dddd...d is the mantissa (or significand)
33+
* *p* is the number of bits used for precision
34+
* the +/- sign is stored separately
2435

25-
If the leading digit is non-zero then it is a normalized representation/normal number.
36+
The precision of a floating-point format refers to the number of binary digits used to represent the mantissa. This is denoted by *p*, and a system with *p* bits of precision can distinguish between \( 2^p \) different fractional values.
2637

27-
{{% notice Example 1 %}}
28-
Fixing `B=2, p=24`
38+
If the leading digit is non-zero, the number is said to be normalized (also called a *normal number*).
39+
40+
{{% notice Example 1%}}
41+
Fixing `B = 2, p = 24`
2942

3043
`0.1 = 1.10011001100110011001101 × 2^4` is a normalized representation of 0.1
3144

32-
`0.1 = 0.000110011001100110011001 × 2^0` is a non normalized representation of 0.1
45+
`0.1 = 0.000110011001100110011001 × 2^0` is a non-normalized representation of 0.1
3346

3447
{{% /notice %}}
3548

36-
Usually a Floating Point number has multiple non-normalized representations, but only 1 normalized representation (assuming leading digit is strictly smaller than base), when fixing a base and a precision.
49+
A floating-point number can have multiple non-normalized forms, but only one normalized representation for a given value - assuming a fixed base and precision, and that the leading digit is strictly less than the base.
50+
51+
## How precision and exponents define floating-point values
52+
53+
Given:
3754

38-
### Building a Floating-Point Ruler
55+
* a base `B`
56+
* a precision `p`
57+
* a maximum exponent `emax`
58+
* a minimum exponent `emin`
3959

40-
Given a base `B`, a precision `p`, a maximum exponent `emax` and a minimum exponent `emin`, we can create the set of all the normalized values in this system.
60+
You can create the full set of representable normalized values.
4161

4262
{{% notice Example 2 %}}
43-
`B=2, p=3, emax=2, emin=-1`
63+
`B = 2, p = 3, emax = 2, emin = -1`
4464

4565
| Significand | × 2⁻¹ | × 2⁰ | × 2¹ | × 2² |
4666
|-------------|-------|------|------|------|
@@ -52,15 +72,15 @@ Given a base `B`, a precision `p`, a maximum exponent `emax` and a minimum expon
5272

5373
{{% /notice %}}
5474

55-
Note that, for any given integer n, numbers are evenly spaced between 2ⁿ and 2ⁿ⁺¹. But the gap between them (also called [ULP](/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp/), which is explained in the more detail in the next section) grows as the exponent increases. So the spacing between floating point numbers gets larger as numbers get bigger.
75+
For any exponent, *n*, numbers are evenly spaced between 2ⁿ and 2ⁿ⁺¹. However, the gap between them (also called a [ULP](/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp/), which is explained in more detail in the next section) increases with the magnitude of the exponent.
5676

57-
### The Floating-Point bitwise representation
77+
## Bitwise representation of floating-point numbers
5878

59-
Since there are `B^p` possible mantissas, and `emax-emin+1` possible exponents, then `log2(B^p) + log2(emax-emin+1) + 1` (sign) bits are needed to represent a given Floating Point number in a system.
79+
Since there are \( B^p \) possible mantissas and `emax-emin+1` possible exponents, then `log2(B^p) + log2(emax-emin+1) + 1` (sign) bits are needed to represent a given floating-point number in a system.
6080

6181
In Example 2, 3+2+1=6 bits are needed.
6282

63-
Based on this, the floating point's bitwise representation is defined to be:
83+
Based on this, the floating-point's bitwise representation is defined as:
6484

6585
```
6686
b0 b1 b2 b3 b4 b5
@@ -77,53 +97,64 @@ b3, b4, b5 -> mantissa (M)
7797
However, this is not enough. In this bitwise definition, the possible values of E are 0, 1, 2, 3.
7898
But in the system being defined, only the integer values in the range [-1, 2] are of interest.
7999

80-
For this reason, E is called the biased exponent, and in order to retrieve the value it is trying to represent (i.e. the unbiased exponent) an offset must be added or subtracted (in this case, subtract 1):
100+
E is stored as a biased exponent to allow representation of both positive and negative powers of two using only unsigned integers. In this example, a bias of 1 shifts the exponent range from [0, 3] to [−1, 2]:
81101

82102
```output
83-
x = (-1)^S x M x 2^(E-1)
103+
x = (-1)^S × M × 2^(E-1)
84104
```
85105

86-
## IEEE-754 Single Precision
106+
## IEEE-754 single precision format
87107

88-
Single precision (also called float) is a 32-bit format defined by the [IEEE-754 Floating Point Standard](https://ieeexplore.ieee.org/document/8766229)
108+
Single precision (also called float) is a 32-bit format defined by the [IEEE-754 Floating-Point Standard](https://ieeexplore.ieee.org/document/8766229).
89109

90-
In this standard the sign is represented using 1 bit, the exponent uses 8 bits and the mantissa uses 23 bits.
110+
In this format:
91111

92-
The value of a (normalized) Floating Point in IEEE-754 can be represented as:
112+
* The sign is represented using 1 bit
113+
* The exponent uses 8 bits
114+
* The mantissa uses 23 bits
115+
116+
The value of a normalized floating-point number in IEEE-754 can be represented as:
93117

94118
```output
95-
x=(−1)^S x 1.M x 2^E−127
119+
x = (−1)^S × (1.M) × 2^(E−127)
96120
```
97121

98-
The exponent bias of 127 allows storage of exponents from -126 to +127. The leading digit is implicit - that is we have 24 bits of precision. In normalized numbers the leading digit is implicitly 1.
122+
The exponent bias of 127 allows storage of exponents from -126 to +127. The leading digit is implicit in normalized numbers, giving a total of 24 bits of precision.
99123

100-
{{% notice Special Cases in IEEE-754 Single Precision %}}
101-
Since we have 8 bits of storage, meaning E ranges between 0 and 2^8-1=255. However not all these 256 values are going to be used for normal numbers.
124+
{{% notice Special cases in IEEE-754 single precision %}}
125+
Since the exponent field uses 8 bits, E ranges between 0 and 2^8-1=255. However not all these 256 values are used for normal numbers.
102126

103127
If the exponent E is:
104128
* 0, then we are either in the presence of a denormalized number or a 0 (if M is 0 as well);
105-
* 1 to 254 then we are in the normalized range;
106-
* 255 then we are in the presence of Inf (if M==0), or Nan (if M!=0).
129+
* 1 to 254 then this is in the normalized range;
130+
* 255: infinity (if M==0), or NaN (if M!=0).
107131

108-
Subnormal numbers (also called denormal numbers) are special floating-point values defined by the IEEE-754 standard.
132+
##### Subnormal numbers
133+
134+
Subnormal numbers (also called denormal numbers) allow representation of values closer to zero than is possible with normalized exponents. They are special floating-point values defined by the IEEE-754 standard.
109135

110136
They allow the representation of numbers very close to zero, smaller than what is normally possible with the standard exponent range.
111137

112-
Subnormal numbers do not have the a leading 1 in their representation. They also assume exponent is 0.
138+
Subnormal numbers do not have a leading 1 in their representation. They also assume an exponent of –126.
113139

114-
The interpretation of denormal Floating Point in IEEE-754 can be represented as:
140+
The interpretation of subnormal floating-point in IEEE-754 can be represented as:
115141

116142
```
117-
x=(−1)^S x 0.M x 2^−126
143+
x = (−1)^S × 0.M × 2^(−126)
118144
```
119145

120146
<!-- ### Subnormal numbers
121147
122148
Subnormal numbers (also called denormal numbers) are special floating-point values defined by the IEEE-754 standard.
123-
They allow the representation of numbers very close to zero, smaller than what is normally possible with the standard exponent range.
124-
Subnormal numbers do not have the a leading 1 in their representation. They also assume exponent is 0.
149+
They allow the representation of numbers closer to zero than any normalized float:
150+
151+
* Subnormal numbers do not have a leading 1 in their representation.
152+
* They assume the exponent is fixed at −126.
153+
* Interpretation:
154+
155+
x = (−1)^s × 0.M × 2^(−126)
125156
126-
x=(−1)^s x 0.M x 2^−126
157+
These values fill the underflow gap between 0 and the smallest normalized float.
127158
128159
-->
129160

@@ -135,5 +166,6 @@ x=(−1)^s x 0.M x 2^−126
135166
| 11 (1.75) | 0.375 | 0.875 | 1.75 | 3.5 | 7.0 | -->
136167
{{% /notice %}}
137168

138-
If you're interested in diving deeper in this subject, [What Every Computer Scientist Should Know About Floating-Point Arithmetic](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) by David Goldberg is a good place to start.
169+
## Further information
139170

171+
If you're interested in diving deeper into this subject, [What Every Computer Scientist Should Know About Floating-Point Arithmetic](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) by David Goldberg is a great place to start.

content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md

Lines changed: 28 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -7,64 +7,64 @@ layout: learningpathall
77
---
88

99

10-
## The 3 accuracy modes of Libamath
10+
## Accuracy modes
1111

12-
Libamath vector functions can come in various accuracy modes for the same mathematical function.
13-
This means, some of our functions allow users and compilers to choose between:
12+
Libamath provides multiple accuracy modes for the same vectorized mathematical function, allowing developers to choose between speed and precision depending on workload requirements.
13+
14+
Some functions offer selectable modes with tradeoffs between:
1415
- **High accuracy** (≤ 1 ULP)
1516
- **Default accuracy** (≤ 3.5 ULP)
1617
- **Low accuracy / max performance** (approx. ≤ 4096 ULP)
1718

1819

19-
## How accuracy modes are encoded in Libamath
20+
### How accuracy modes are encoded
2021

21-
You can recognize the accuracy mode of a function by inspecting the **suffix** in its symbol:
22+
You can recognize the accuracy mode of a function by the **suffix** in the function symbol:
2223

2324
- **`_u10`** → High accuracy
24-
For instance, `armpl_vcosq_f32_u10`
25-
Ensures results stay within **1 Unit in the Last Place (ULP)**.
25+
Example: `armpl_vcosq_f32_u10`
26+
Ensures results within **1 Unit in the Last Place (ULP)**.
2627

2728
- *(no suffix)* → Default accuracy
28-
For instance, `armpl_vcosq_f32`
29-
Keeps errors within **3.5 ULP** — a sweet spot for many workloads.
29+
Example: `armpl_vcosq_f32`
30+
Keeps errors within **3.5 ULP** - balancing precision and performance.
3031

31-
- **`_umax`** → Low accuracy
32-
For instance, `armpl_vcosq_f32_umax`
32+
- **`_umax`** → Low accuracy/max performance
33+
Example: `armpl_vcosq_f32_umax`
3334
Prioritizes speed, tolerating errors up to **4096 ULP**, or roughly **11 correct bits** in single-precision.
3435

3536

36-
## Applications
37+
## When to use each mode
3738

3839
Selecting an appropriate accuracy level helps avoid unnecessary compute cost while preserving output quality where it matters.
3940

4041

41-
### High Accuracy (≤ 1 ULP)
42+
### High accuracy (≤ 1 ULP)
4243

43-
Use when **numerical (almost) correctness** is a priority. These routines involve precise algorithms (such as high-degree polynomials, careful range reduction, or FMA usage) and are ideal for:
44+
Use when bit-level correctness matters. These routines employ advanced algorithms (such as high-degree polynomials, tight range reduction, or FMA usage) and are ideal for:
4445

4546
- **Scientific computing**
4647
such as simulations or finite element analysis
4748
- **Signal processing pipelines** [1,2]
4849
particularly recursive filters or transform
49-
- **Validation & reference implementations**
50-
51-
While slower, these functions provide **near-bitwise reproducibility** — critical in sensitive domains.
50+
- **Validation and reference implementations**
5251

52+
While slower, these functions provide **near-bitwise reproducibility** — critical for validation and scientific fidelity.
5353

54-
### Default Accuracy (≤ 3.5 ULP)
54+
### Default accuracy (≤ 3.5 ULP)
5555

5656
The default mode strikes a **practical balance** between performance and numerical fidelity. It’s optimized for:
5757

5858
- **General-purpose math libraries**
5959
- **Analytics workloads** [3]
6060
such as log or sqrt during feature extraction
6161
- **Inference pipelines** [4]
62-
especially on edge devices where latency matters
62+
especially on edge devices where latency is critical
6363

6464
Also suitable for many **scientific workloads** that can tolerate modest error in exchange for **faster throughput**.
6565

6666

67-
### Low Accuracy / Max Performance (≤ 4096 ULP)
67+
### Low accuracy / max performance (≤ 4096 ULP)
6868

6969
This mode trades precision for speed — aggressively. It's designed for:
7070

@@ -74,7 +74,7 @@ This mode trades precision for speed — aggressively. It's designed for:
7474
where statistical convergence outweighs per-sample accuracy [6]
7575
- **Genetic algorithms, audio processing, and embedded DSP**
7676

77-
Avoid in control-flow-critical code or where **errors amplify**.
77+
Avoid in control-flow-critical code or where errors might compound or affect control flow.
7878

7979

8080
## Summary
@@ -88,25 +88,25 @@ Avoid in control-flow-critical code or where **errors amplify**.
8888

8989

9090
{{% notice Tip %}}
91-
If your workload has mixed precision needs, you can *selectively call different accuracy modes* for different parts of your pipeline. Libamath lets you tailor precision where it mattersand boost performance where it doesn’t.
91+
If your workload has mixed precision needs, you can *selectively call different accuracy modes* for different parts of your pipeline. Choose conservatively where correctness matters, and push for speed elsewhere.
9292
{{% /notice %}}
9393

9494

95-
#### References
96-
1. Higham, N. J. (2002). *Accuracy and Stability of Numerical Algorithms* (2nd ed.). SIAM.
95+
## References
96+
1. Higham, N. J. (2002). *Accuracy and Stability of Numerical Algorithms* (2nd ed.), SIAM.
9797

98-
2. Texas Instruments. Overflow Avoidance Techniques in Cascaded IIR Filter Implementations on the TMS320 DSPs. Application Report SPRA509, 1999.
98+
2. Texas Instruments. *Overflow Avoidance Techniques in Cascaded IIR Filter Implementations on the TMS320 DSPs*. Application Report SPRA509, 1999.
9999
https://www.ti.com/lit/pdf/spra509
100100

101-
3. Ma, S., & Huai, J. (2019). Approximate Computation for Big Data Analytics. arXiv:1901.00232.
101+
3. Ma, S., & Huai, J. (2019). *Approximate Computation for Big Data Analytics*. arXiv:1901.00232.
102102
https://arxiv.org/pdf/1901.00232
103103

104-
4. Gupta, S., Agrawal, A., Gopalakrishnan, K., & Narayanan, P. (2015). Deep Learning with Limited Numerical Precision. In Proceedings of the 32nd International Conference on Machine Learning (ICML), PMLR 37.
104+
4. Gupta, S., Agrawal, A., Gopalakrishnan, K., & Narayanan, P. (2015). *Deep Learning with Limited Numerical Precision*. In Proceedings of the 32nd International Conference on Machine Learning (ICML), PMLR 37.
105105
https://proceedings.mlr.press/v37/gupta15.html
106106

107107
5. Unity Technologies. *Precision Modes*. Unity Shader Graph Documentation.
108108
[https://docs.unity3d.com/Packages/[email protected]/manual/Precision-Modes.html](https://docs.unity3d.com/Packages/[email protected]/manual/Precision-Modes.html)
109109

110-
6. Croci, M., Gorman, G. J., & Giles, M. B. (2021). Rounding Error using Low Precision Approximate Random Variables. arXiv:2012.09739.
110+
6. Croci, M., Gorman, G. J., & Giles, M. B. (2021). *Rounding Error using Low Precision Approximate Random Variables*. arXiv:2012.09739.
111111
https://arxiv.org/abs/2012.09739
112112

0 commit comments

Comments
 (0)