Tech review for Libmath accuracy Learning Path

jasonrandrews · jasonrandrews · commit 91cdb9bc3222 · 2025-06-25T14:45:05.000-05:00
diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/_index.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/_index.md
@@ -1,30 +1,31 @@
 ---
 title: Understanding Libamath's vector accuracy modes
 
+draft: true
+cascade:
+    draft: true
+
 minutes_to_complete: 20
 author: Joana Cruz
 
-who_is_this_for: This is an introductory topic for software developers who want to learn how to use the different accuracy modes present in Libamath, a component of ArmPL. This feature was introduced in ArmPL 25.04.
+who_is_this_for: This is an introductory topic for software developers who want to learn how to use the different accuracy modes present in Libamath, a component of Arm Performance Libraries. 
 
 learning_objectives: 
-    - understand how accuracy is defined in Libamath;
-    - pick an accuracy mode depending on your application.
-
-# [libamath](https://developer.arm.com/documentation/101004/2504/, (component of [ArmPL (Arm Performance Libraries)](https://developer.arm.com/documentation/101004/2504/General-information/Arm-Performance-Libraries?lang=en)). Since libamath only provides vector functions on Linux, we assume you are working in a Linux environment where ArmPL is installed (meaning you completed [ArmPL's installation guide](https://learn.arm.com/install-guides/armpl/).)
+    - Understand how accuracy is defined in Libamath.
+    - Pick an appropriate accuracy mode for your application.
 
 prerequisites:
-    - An Arm computer running Linux
-    - Build and install [ArmPL](https://learn.arm.com/install-guides/armpl/)
+    - An Arm computer running Linux with [Arm Performance Libraries](https://learn.arm.com/install-guides/armpl/) version 25.04 or newer installed. 
 
 ### Tags
 skilllevels: Introductory
 subjects: Performance and Architecture
 armips:
     - Neoverse
 tools_software_languages:
-- ArmPL
+- Arm Performance Libraries
 - GCC
-- Libamath
+- Libmath
 operatingsystems:
     - Linux
 
diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/examples.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/examples.md
@@ -1,14 +1,18 @@
 ---
-title: Examples
+title: Arm Performance Libraries example
 weight: 6
 
 ### FIXED, DO NOT MODIFY
 layout: learningpathall
 ---
 
-# Example
+# Arm Performance Libraries example
 
-Here is an example invoking all accuracy modes of the Neon single precision exp function (where `ulp_error.h` is the implementation of ULP error explained in [this section](/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error/)):
+Here is an example invoking all accuracy modes of the Neon single precision exp function. The file `ulp_error.h` is from the previous section. 
+
+Make sure you have [Arm Performance Libraries](https://learn.arm.com/install-guides/armpl/) installed. 
+
+Use a text editor save the code below in a file named `example.c`.
 
 ```C { line_numbers = "true" } 
 #include <amath.h>
@@ -46,14 +50,21 @@ int main(void) {
 }
 ```
 
-You can compile the above program with:
+Compile the program with:
+
 ```bash
 gcc -O2 -o example example.c -lamath -lm
 ```
 
-Running the example returns:
+Run the example:
+
 ```bash
-$ ./example 
+./example 
+```
+
+The output is:
+
+```output
 Libamath example:
 -----------------------------------------------
   // Display worst-case ULP error in expf for each
@@ -78,5 +89,5 @@ armpl_vexpq_f32_umax(-0x1.5b7322p+6) delivers result with half correct bits
     ULP error = 1745.2120
 ```
 
-The inputs we use for each variant correspond to the worst case scenario known to date (ULP Error argmax).
-This means that the ULP error should not be higher than the one we demonstrate here, meaning we stand below the thresholds we define for each accuracy.
+The inputs used for each variant correspond to the worst case scenario known to date (ULP Error argmax).
+This means that the ULP error should not be higher than the one demonstrated here, ensuring the results remain below the defined thresholds for each accuracy.
diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md
@@ -6,13 +6,13 @@ weight: 2
 layout: learningpathall
 ---
 
-# Floating-Point Representation Basics
+## Floating-Point Representation Basics
 
 Floating Point numbers are a finite and discrete approximation of the real numbers, allowing us to implement and compute functions in the continuous domain with an adequate (but limited) resolution.
 
 A Floating Point number is typically expressed as:
 
-```
+```output
 +/-d.dddd...d x B^e
 ```
 
@@ -33,14 +33,13 @@ Fixing `B=2, p=24`
 
 {{% /notice %}}
 
-Usually a Floating Point number has multiple non-normalized representations, but only 1 normalized representation (assuming leading digit is stricly smaller than base), when fixing a base and a precision.
-
+Usually a Floating Point number has multiple non-normalized representations, but only 1 normalized representation (assuming leading digit is strictly smaller than base), when fixing a base and a precision.
 
-## Building a Floating-Point Ruler
+### Building a Floating-Point Ruler
 
 Given a base `B`, a precision `p`, a maximum exponent `emax` and a minimum exponent `emin`, we can create the set of all the normalized values in this system.
 
-{{% notice Example 3 %}}
+{{% notice Example 2 %}}
 `B=2, p=3, emax=2, emin=-1`
 
 | Significand | × 2⁻¹ | × 2⁰ | × 2¹ | × 2² |
@@ -53,44 +52,46 @@ Given a base `B`, a precision `p`, a maximum exponent `emax` and a minimum expon
 
 {{% /notice %}}
 
-Note that, for any given integer n, numbers are evenly spaced between 2ⁿ and 2ⁿ⁺¹. But the gap between them (also called [ULP](/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp/), which we explain in the more detail in the next section) grows as the exponent increases. So the spacing between floating point numbers gets larger as numbers get bigger.
+Note that, for any given integer n, numbers are evenly spaced between 2ⁿ and 2ⁿ⁺¹. But the gap between them (also called [ULP](/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp/), which is explained in the more detail in the next section) grows as the exponent increases. So the spacing between floating point numbers gets larger as numbers get bigger.
 
 ### The Floating-Point bitwise representation
-Since there are `B^p` possible mantissas, and `emax-emin+1` possible exponents, then we need `log2(B^p) + log2(emax-emin+1) + 1` (sign) bits to represent a given Floating Point number in a system.
-In Example 3, we need 3+2+1=6 bits.
 
-We can then define Floating Point's bitwise representation in our system to be:
+Since there are `B^p` possible mantissas, and `emax-emin+1` possible exponents, then `log2(B^p) + log2(emax-emin+1) + 1` (sign) bits are needed to represent a given Floating Point number in a system.
+
+In Example 2, 3+2+1=6 bits are needed.
+
+Based on this, the floating point's bitwise representation is defined to be: 
 
 ```
 b0 b1 b2 b3 b4 b5
 ```
 
 where
 
-```
+```output
 b0 -> sign (S)
 b1, b2 -> exponent (E)
 b3, b4, b5 -> mantissa (M)
 ```
 
 However, this is not enough. In this bitwise definition, the possible values of E are 0, 1, 2, 3.
-But in the system we are trying to define, we are only interested in the the integer values in the range [-1, 2].
+But in the system being defined, only the integer values in the range [-1, 2] are of interest.
 
-For this reason, E is called the biased exponent, and in order to retrieve the value it is trying to represent (i.e. the unbiased exponent) we need to add/subtract an offset to it (in this case we subtract 1):
+For this reason, E is called the biased exponent, and in order to retrieve the value it is trying to represent (i.e. the unbiased exponent) an offset must be added or subtracted (in this case, subtract 1):
 
-```
+```output
 x = (-1)^S x M x 2^(E-1)
 ```
 
-# IEEE-754 Single Precision
+## IEEE-754 Single Precision
 
 Single precision (also called float) is a 32-bit format defined by the [IEEE-754 Floating Point Standard](https://ieeexplore.ieee.org/document/8766229)
 
 In this standard the sign is represented using 1 bit, the exponent uses 8 bits and the mantissa uses 23 bits. 
 
 The value of a (normalized) Floating Point in IEEE-754 can be represented as:
 
-```
+```output
 x=(−1)^S x 1.M x 2^E−127
 ```
 
diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md
@@ -1,13 +1,13 @@
 ---
-title: Accuracy Modes in Libamath
+title: Accuracy modes in Libamath
 weight: 5
 
 ### FIXED, DO NOT MODIFY
 layout: learningpathall
 ---
 
 
-# The 3 Accuracy Modes of Libamath
+## The 3 accuracy modes of Libamath
 
 Libamath vector functions can come in various accuracy modes for the same mathematical function.
 This means, some of our functions allow users and compilers to choose between:
@@ -16,36 +16,36 @@ This means, some of our functions allow users and compilers to choose between:
 - **Low accuracy / max performance** (approx. ≤ 4096 ULP)
 
 
-# How Accuracy Modes Are Encoded in Libamath
+## How accuracy modes are encoded in Libamath
 
 You can recognize the accuracy mode of a function by inspecting the **suffix** in its symbol:
 
 - **`_u10`** → High accuracy  
-  E.g., `armpl_vcosq_f32_u10`  
+  For instance, `armpl_vcosq_f32_u10`  
   Ensures results stay within **1 Unit in the Last Place (ULP)**.
 
 - *(no suffix)* → Default accuracy  
-  E.g., `armpl_vcosq_f32`  
+  For instance, `armpl_vcosq_f32`  
   Keeps errors within **3.5 ULP** — a sweet spot for many workloads.
 
 - **`_umax`** → Low accuracy  
-  E.g., `armpl_vcosq_f32_umax`  
+  For instance, `armpl_vcosq_f32_umax`  
   Prioritizes speed, tolerating errors up to **4096 ULP**, or roughly **11 correct bits** in single-precision.
 
 
-# Applications
+## Applications
 
 Selecting an appropriate accuracy level helps avoid unnecessary compute cost while preserving output quality where it matters.
 
 
 ### High Accuracy (≤ 1 ULP)
 
-Use when **numerical (almost) correctness** is a priority. These routines involve precise algorithms (e.g., high-degree polynomials, careful range reduction, FMA usage) and are ideal for:
+Use when **numerical (almost) correctness** is a priority. These routines involve precise algorithms (such as high-degree polynomials, careful range reduction, or FMA usage) and are ideal for:
 
 - **Scientific computing**
-  e.g., simulations, finite element analysis
+  such as simulations or finite element analysis
 - **Signal processing pipelines** [1,2]
-  especially recursive filters or transform 
+  particularly recursive filters or transform 
 - **Validation & reference implementations**
 
 While slower, these functions provide **near-bitwise reproducibility** — critical in sensitive domains.
@@ -57,7 +57,7 @@ The default mode strikes a **practical balance** between performance and numeric
 
 - **General-purpose math libraries**
 - **Analytics workloads** [3]
-  e.g., log/sqrt during feature extraction 
+  such as log or sqrt during feature extraction 
 - **Inference pipelines** [4]
   especially on edge devices where latency matters 
 
@@ -69,15 +69,15 @@ Also suitable for many **scientific workloads** that can tolerate modest error i
 This mode trades precision for speed — aggressively. It's designed for:
 
 - **Games, graphics, and shaders** [5]
-  e.g., approximating sin/cos for animation curves
+  such as approximating sin or cos for animation curves
 - **Monte Carlo simulations**  
   where statistical convergence outweighs per-sample accuracy [6]
 - **Genetic algorithms, audio processing, and embedded DSP**
 
 Avoid in control-flow-critical code or where **errors amplify**.
 
 
-# Summary
+## Summary
 
 | Accuracy Mode | Libamath example          | Approx. Error   | Performance | Typical Applications                                      |
 |---------------|------------------------|------------------|-------------|-----------------------------------------------------------|
@@ -87,7 +87,9 @@ Avoid in control-flow-critical code or where **errors amplify**.
 
 
 
-**Pro tip:** If your workload has mixed precision needs, you can *selectively call different accuracy modes* for different parts of your pipeline. Libamath lets you tailor precision where it matters — and boost performance where it doesn’t.
+{{% notice  Tip %}}
+If your workload has mixed precision needs, you can *selectively call different accuracy modes* for different parts of your pipeline. Libamath lets you tailor precision where it matters — and boost performance where it doesn’t.
+{{% /notice %}}
 
 
 #### References
diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error.md
@@ -8,7 +8,7 @@ layout: learningpathall
 
 # ULP Error and Accuracy
 
-In the development of Libamath, we use a metric called ULP error to assess the accuracy of our functions.
+In the development of Libamath, a metric called ULP error is used to assess the accuracy of functions.
 This metric measures the distance between two numbers, a reference (`want`) and an approximation (`got`), relative to how many floating-point “steps” (ULPs) these two numbers are apart.
 
 It can be calculated by:
@@ -17,14 +17,14 @@ It can be calculated by:
 ulp_err = | want - got | / ULP(want)
 ```
 
-Because this is a relative measure in terms of floating-point spacing (ULPs) - i.e. this metric is scale-aware - it is ideal for comparing accuracy across magnitudes. Otherwise, error measure would be very biased by the uneven distribution of the floats.
+Because this is a relative measure in terms of floating-point spacing (ULPs)—that is, this metric is scale-aware—it is ideal for comparing accuracy across magnitudes. Otherwise, error measures would be very biased by the uneven distribution of the floats.
 
 
 # ULP Error Implementation
 
-In practice, however, the above expression may take different forms, to account for sources of error that may happen during the computation of the error itself.
+In practice, however, the above expression may take different forms to account for sources of error that may occur during the computation of the error itself.
 
-In our implementation, this quantity is held by a term called `tail`:
+In the implementation used here, this quantity is held by a term called `tail`:
 
 ```
 ulp_err = | (got - want) / ULP(want) - tail |
@@ -36,8 +36,9 @@ This term takes into account the error introduced by casting `want` from a highe
 tail = | (want_l - want) / ULP(want) |
 ```
 
-Here is a simplified version of our ULP Error (where `ulp.h` is the implementation of ULP in the [previous section](/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp/)):
+Here is a simplified version of the ULP Error. Use the same `ulp.h` from the previous section.
 
+Use a text editor to opy the code below into a new file `ulp_error.h`.
 
 ```C
 // Defines ulpscale(x)
@@ -69,15 +70,17 @@ double ulp_error(float got, double want_l) {
 ```
 Note that the final scaling is done with respect to the rounded reference.
 
-In this implementation, it is possible to get exactly 0.0 ULP error in this implementation if and only if:
+In this implementation, it is possible to get exactly 0.0 ULP error if and only if:
 
 * The high-precision reference (`want_l`, a double) is exactly representable as a float, and
 * The computed result (`got`) is bitwise equal to that float representation.
 
-Here is a small snippet to check out this implementation in action.
+Below is a small example to check this implementation.
 
+Save the code below into a file named `ulp_error.c`.
 
 ```C
+#include <stdio.h>
 #include "ulp_error.h"
 
 int main() {
@@ -88,9 +91,23 @@ int main() {
     return 0;
 }
 ```
+
+Compile the program with GCC.
+
+```bash
+gcc -O2 ulp_error.c -o ulp_error
+```
+
+Run the program:
+
+```bash
+./ulp_error
+```
+
 The output should be:
+
 ```
 ULP error: 1.0
 ```
-Note that 
-If you are interested in diving into the full implementation of the ulp error we use internally, you can consult the [tester](https://github.com/ARM-software/optimized-routines/tree/master/math/test) tool in [AOR](https://github.com/ARM-software/optimized-routines/tree/master), with particular focus to the [ulp.h](https://github.com/ARM-software/optimized-routines/blob/master/math/test/ulp.h) file. Note this tool also handles special cases and considers the effect of different rounding modes in the ULP error.
+
+If you are interested in diving into the full implementation of the ulp error, you can consult the [tester](https://github.com/ARM-software/optimized-routines/tree/master/math/test) tool in [AOR](https://github.com/ARM-software/optimized-routines/tree/master), with particular focus to the [ulp.h](https://github.com/ARM-software/optimized-routines/blob/master/math/test/ulp.h) file. Note this tool also handles special cases and considers the effect of different rounding modes in the ULP error.
diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp.md