Merge pull request #2020 from joanaxcruz/main

jasonrandrews · web-flow · commit 55f41410b113 · 2025-06-25T09:37:32.000-05:00
New learning path: Understanding Libamath's vector accuracy modes
diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/_index.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/_index.md
@@ -0,0 +1,52 @@
+---
+title: Understanding Libamath's vector accuracy modes
+
+minutes_to_complete: 20
+author: Joana Cruz
+
+who_is_this_for: This is an introductory topic for software developers who want to learn how to use the different accuracy modes present in Libamath, a component of ArmPL. This feature was introduced in ArmPL 25.04.
+
+learning_objectives: 
+    - understand how accuracy is defined in Libamath;
+    - pick an accuracy mode depending on your application.
+
+# [libamath](https://developer.arm.com/documentation/101004/2504/, (component of [ArmPL (Arm Performance Libraries)](https://developer.arm.com/documentation/101004/2504/General-information/Arm-Performance-Libraries?lang=en)). Since libamath only provides vector functions on Linux, we assume you are working in a Linux environment where ArmPL is installed (meaning you completed [ArmPL's installation guide](https://learn.arm.com/install-guides/armpl/).)
+
+prerequisites:
+    - An Arm computer running Linux
+    - Build and install [ArmPL](https://learn.arm.com/install-guides/armpl/)
+
+### Tags
+skilllevels: Introductory
+subjects: Performance and Architecture
+armips:
+    - Neoverse
+tools_software_languages:
+- ArmPL
+- GCC
+- Libamath
+operatingsystems:
+    - Linux
+
+further_reading:
+    - resource:
+        title: ArmPL Libamath Documentation
+        link: https://developer.arm.com/documentation/101004/2410/General-information/Arm-Performance-Libraries-math-functions
+        type: documentation
+#     - resource:
+#         title: PLACEHOLDER BLOG 
+#         link: PLACEHOLDER BLOG LINK
+#         type: blog
+    - resource:
+        title: ArmPL Installation Guide
+        link: https://learn.arm.com/install-guides/armpl/
+        type: website
+
+
+
+### FIXED, DO NOT MODIFY
+# ================================================================================
+weight: 1                       # _index.md always has weight of 1 to order correctly
+layout: "learningpathall"       # All files under learning paths have this same wrapper
+learning_path_main_page: "yes"  # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
+---
diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/_next-steps.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/_next-steps.md
@@ -0,0 +1,8 @@
+---
+# ================================================================================
+#       FIXED, DO NOT MODIFY THIS FILE
+# ================================================================================
+weight: 21                  # Set to always be larger than the content in this path to be at the end of the navigation.
+title: "Next Steps"         # Always the same, html page title.
+layout: "learningpathall"   # All files under learning paths have this same wrapper for Hugo processing.
+---
diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/examples.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/examples.md
@@ -0,0 +1,82 @@
+---
+title: Examples
+weight: 6
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+# Example
+
+Here is an example invoking all accuracy modes of the Neon single precision exp function (where `ulp_error.h` is the implementation of ULP error explained in [this section](/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error/)):
+
+```C { line_numbers = "true" } 
+#include <amath.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <math.h>
+
+#include "ulp_error.h"
+
+void check_accuracy(float32x4_t (__attribute__((aarch64_vector_pcs)) *vexp_fun)(float32x4_t), float arg, const char *label) {
+    float32x4_t varg = vdupq_n_f32(arg);
+    float32x4_t vres = vexp_fun(varg);
+    double want = exp((double)arg);
+    float got = vgetq_lane_f32(vres, 0);
+
+    printf(label, arg);
+    printf("\n          got = %a\n", got);
+    printf("  (float)want = %a\n", (float)want);
+    printf("         want = %.12a\n", want);
+    printf("    ULP error = %.4f\n\n", ulp_error(got, want));
+}
+
+int main(void) {
+    // Inputs that trigger worst-case errors for each accuracy mode
+    printf("Libamath example:\n");
+    printf("-----------------------------------------------\n");
+    printf("  // Display worst-case ULP error in expf for each\n");
+    printf("  // accuracy mode, along with approximate (`got`) and exact results (`want`)\n\n");
+
+    check_accuracy (armpl_vexpq_f32_u10, 0x1.ab312p+4, "armpl_vexpq_f32_u10(%a) delivers error under 1.0 ULP");
+    check_accuracy (armpl_vexpq_f32, 0x1.8163ccp+5, "armpl_vexpq_f32(%a) delivers error under 3.5 ULP");
+    check_accuracy (armpl_vexpq_f32_umax, -0x1.5b7322p+6, "armpl_vexpq_f32_umax(%a) delivers result with half correct bits");
+
+    return 0;
+}
+```
+
+You can compile the above program with:
+```bash
+gcc -O2 -o example example.c -lamath -lm
+```
+
+Running the example returns:
+```bash
+$ ./example 
+Libamath example:
+-----------------------------------------------
+  // Display worst-case ULP error in expf for each
+  // accuracy mode, along with approximate (`got`) and exact results (`want`)
+
+armpl_vexpq_f32_u10(0x1.ab312p+4) delivers error under 1.0 ULP
+          got = 0x1.6ee554p+38
+  (float)want = 0x1.6ee556p+38
+         want = 0x1.6ee555bb01d1p+38
+    ULP error = 0.8652
+
+armpl_vexpq_f32(0x1.8163ccp+5) delivers error under 3.5 ULP
+          got = 0x1.6a09ep+69
+  (float)want = 0x1.6a09e4p+69
+         want = 0x1.6a09e3e3d585p+69
+    ULP error = 1.9450
+
+armpl_vexpq_f32_umax(-0x1.5b7322p+6) delivers result with half correct bits
+          got = 0x1.9b56bep-126
+  (float)want = 0x1.9b491cp-126
+         want = 0x1.9b491b9376d3p-126
+    ULP error = 1745.2120
+```
+
+The inputs we use for each variant correspond to the worst case scenario known to date (ULP Error argmax).
+This means that the ULP error should not be higher than the one we demonstrate here, meaning we stand below the thresholds we define for each accuracy.
diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/floating-point-rep.md
@@ -0,0 +1,138 @@
+---
+title: Floating Point Representation
+weight: 2
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+# Floating-Point Representation Basics
+
+Floating Point numbers are a finite and discrete approximation of the real numbers, allowing us to implement and compute functions in the continuous domain with an adequate (but limited) resolution.
+
+A Floating Point number is typically expressed as:
+
+```
++/-d.dddd...d x B^e
+```
+
+where:
+* B is the base;
+* e is the exponent;
+* d.dddd...d is the mantissa (or significand). It is p-bit word, where p represents the precision;
+* +/- sign which is usually stored separately.
+
+If the leading digit is non-zero then it is a normalized representation/normal number.
+
+{{% notice Example 1 %}}
+Fixing `B=2, p=24`
+
+`0.1 = 1.10011001100110011001101 ×  2^4` is a normalized representation of 0.1
+
+`0.1 = 0.000110011001100110011001 × 2^0` is a non normalized representation of 0.1
+
+{{% /notice %}}
+
+Usually a Floating Point number has multiple non-normalized representations, but only 1 normalized representation (assuming leading digit is stricly smaller than base), when fixing a base and a precision.
+
+
+## Building a Floating-Point Ruler
+
+Given a base `B`, a precision `p`, a maximum exponent `emax` and a minimum exponent `emin`, we can create the set of all the normalized values in this system.
+
+{{% notice Example 3 %}}
+`B=2, p=3, emax=2, emin=-1`
+
+| Significand | × 2⁻¹ | × 2⁰ | × 2¹ | × 2² |
+|-------------|-------|------|------|------|
+| 1.00 (1.0)  | 0.5   | 1.0  | 2.0  | 4.0  |
+| 1.01 (1.25) | 0.625 | 1.25 | 2.5  | 5.0  |
+| 1.10 (1.5)  | 0.75  | 1.5  | 3.0  | 6.0  |
+| 1.11 (1.75) | 0.875 | 1.75 | 3.5  | 7.0  |
+
+
+{{% /notice %}}
+
+Note that, for any given integer n, numbers are evenly spaced between 2ⁿ and 2ⁿ⁺¹. But the gap between them (also called [ULP](/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp/), which we explain in the more detail in the next section) grows as the exponent increases. So the spacing between floating point numbers gets larger as numbers get bigger.
+
+### The Floating-Point bitwise representation
+Since there are `B^p` possible mantissas, and `emax-emin+1` possible exponents, then we need `log2(B^p) + log2(emax-emin+1) + 1` (sign) bits to represent a given Floating Point number in a system.
+In Example 3, we need 3+2+1=6 bits.
+
+We can then define Floating Point's bitwise representation in our system to be:
+
+```
+b0 b1 b2 b3 b4 b5
+```
+
+where
+
+```
+b0 -> sign (S)
+b1, b2 -> exponent (E)
+b3, b4, b5 -> mantissa (M)
+```
+
+However, this is not enough. In this bitwise definition, the possible values of E are 0, 1, 2, 3.
+But in the system we are trying to define, we are only interested in the the integer values in the range [-1, 2].
+
+For this reason, E is called the biased exponent, and in order to retrieve the value it is trying to represent (i.e. the unbiased exponent) we need to add/subtract an offset to it (in this case we subtract 1):
+
+```
+x = (-1)^S x M x 2^(E-1)
+```
+
+# IEEE-754 Single Precision
+
+Single precision (also called float) is a 32-bit format defined by the [IEEE-754 Floating Point Standard](https://ieeexplore.ieee.org/document/8766229)
+
+In this standard the sign is represented using 1 bit, the exponent uses 8 bits and the mantissa uses 23 bits. 
+
+The value of a (normalized) Floating Point in IEEE-754 can be represented as:
+
+```
+x=(−1)^S x 1.M x 2^E−127
+```
+
+The exponent bias of 127 allows storage of exponents from -126 to +127. The leading digit is implicit - that is we have 24 bits of precision. In normalized numbers the leading digit is implicitly 1.
+
+{{% notice Special Cases in IEEE-754 Single Precision %}}
+Since we have 8 bits of storage, meaning E ranges between 0 and 2^8-1=255. However not all these 256 values are going to be used for normal numbers.
+
+If the exponent E is:
+* 0, then we are either in the presence of a denormalized number or a 0 (if M is 0 as well);
+* 1 to 254 then we are in the normalized range;
+* 255 then we are in the presence of Inf (if M==0), or Nan (if M!=0).
+
+Subnormal numbers (also called denormal numbers) are special floating-point values defined by the IEEE-754 standard.
+
+They allow the representation of numbers very close to zero, smaller than what is normally possible with the standard exponent range.
+
+Subnormal numbers do not have the a leading 1 in their representation. They also assume exponent is 0.
+
+The interpretation of denormal Floating Point in IEEE-754 can be represented as:
+
+```
+x=(−1)^S x 0.M x 2^−126
+```
+
+<!-- ### Subnormal numbers
+
+Subnormal numbers (also called denormal numbers) are special floating-point values defined by the IEEE-754 standard.
+They allow the representation of numbers very close to zero, smaller than what is normally possible with the standard exponent range.
+Subnormal numbers do not have the a leading 1 in their representation. They also assume exponent is 0.
+
+x=(−1)^s x 0.M x 2^−126
+
+-->
+
+<!-- | Significand | 0.? × 2⁻¹ | 1.? × 2⁻¹ | 1.? × 2⁰ | 1.? × 2¹ | 1.? × 2² |
+|-------------|-----------|-----------|----------|----------|----------|
+| 00 (1.0)    | 0         | 0.5       | 1.0      | 2.0      | 4.0      |
+| 01 (1.25)   | 0.125     | 0.625     | 1.25     | 2.5      | 5.0      |
+| 10 (1.5)    | 0.25      | 0.75      | 1.5      | 3.0      | 6.0      |
+| 11 (1.75)   | 0.375     | 0.875     | 1.75     | 3.5      | 7.0      |  -->
+{{% /notice %}}
+
+If you're interested in diving deeper in this subject, [What Every Computer Scientist Should Know About Floating-Point Arithmetic](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) by David Goldberg is a good place to start.
+
diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/multi-accuracy.md
@@ -0,0 +1,110 @@
+---
+title: Accuracy Modes in Libamath
+weight: 5
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+
+# The 3 Accuracy Modes of Libamath
+
+Libamath vector functions can come in various accuracy modes for the same mathematical function.
+This means, some of our functions allow users and compilers to choose between:
+- **High accuracy** (≤ 1 ULP)
+- **Default accuracy** (≤ 3.5 ULP)
+- **Low accuracy / max performance** (approx. ≤ 4096 ULP)
+
+
+# How Accuracy Modes Are Encoded in Libamath
+
+You can recognize the accuracy mode of a function by inspecting the **suffix** in its symbol:
+
+- **`_u10`** → High accuracy  
+  E.g., `armpl_vcosq_f32_u10`  
+  Ensures results stay within **1 Unit in the Last Place (ULP)**.
+
+- *(no suffix)* → Default accuracy  
+  E.g., `armpl_vcosq_f32`  
+  Keeps errors within **3.5 ULP** — a sweet spot for many workloads.
+
+- **`_umax`** → Low accuracy  
+  E.g., `armpl_vcosq_f32_umax`  
+  Prioritizes speed, tolerating errors up to **4096 ULP**, or roughly **11 correct bits** in single-precision.
+
+
+# Applications
+
+Selecting an appropriate accuracy level helps avoid unnecessary compute cost while preserving output quality where it matters.
+
+
+### High Accuracy (≤ 1 ULP)
+
+Use when **numerical (almost) correctness** is a priority. These routines involve precise algorithms (e.g., high-degree polynomials, careful range reduction, FMA usage) and are ideal for:
+
+- **Scientific computing**
+  e.g., simulations, finite element analysis
+- **Signal processing pipelines** [1,2]
+  especially recursive filters or transform 
+- **Validation & reference implementations**
+
+While slower, these functions provide **near-bitwise reproducibility** — critical in sensitive domains.
+
+
+### Default Accuracy (≤ 3.5 ULP)
+
+The default mode strikes a **practical balance** between performance and numerical fidelity. It’s optimized for:
+
+- **General-purpose math libraries**
+- **Analytics workloads** [3]
+  e.g., log/sqrt during feature extraction 
+- **Inference pipelines** [4]
+  especially on edge devices where latency matters 
+
+Also suitable for many **scientific workloads** that can tolerate modest error in exchange for **faster throughput**.
+
+
+### Low Accuracy / Max Performance (≤ 4096 ULP)
+
+This mode trades precision for speed — aggressively. It's designed for:
+
+- **Games, graphics, and shaders** [5]
+  e.g., approximating sin/cos for animation curves
+- **Monte Carlo simulations**  
+  where statistical convergence outweighs per-sample accuracy [6]
+- **Genetic algorithms, audio processing, and embedded DSP**
+
+Avoid in control-flow-critical code or where **errors amplify**.
+
+
+# Summary
+
+| Accuracy Mode | Libamath example          | Approx. Error   | Performance | Typical Applications                                      |
+|---------------|------------------------|------------------|-------------|-----------------------------------------------------------|
+| `_u10`        | _ZGVnN4v_cosf_u10       | ≤1.0 ULP         | Low         | Scientific computing, backpropagation, validation |
+| *(default)*   | _ZGVnN4v_cosf           | ≤3.5 ULP         | Medium      | General compute, analytics, inference              |
+| `_umax`       | _ZGVnN4v_cosf_umax      | ≤4096 ULP      | High        | Real-time graphics, DSP, approximations, simulations |
+
+
+
+**Pro tip:** If your workload has mixed precision needs, you can *selectively call different accuracy modes* for different parts of your pipeline. Libamath lets you tailor precision where it matters — and boost performance where it doesn’t.
+
+
+#### References
+1. Higham, N. J. (2002). *Accuracy and Stability of Numerical Algorithms* (2nd ed.). SIAM.
+
+2. Texas Instruments. Overflow Avoidance Techniques in Cascaded IIR Filter Implementations on the TMS320 DSPs. Application Report SPRA509, 1999.
+https://www.ti.com/lit/pdf/spra509
+
+3. Ma, S., & Huai, J. (2019). Approximate Computation for Big Data Analytics. arXiv:1901.00232.
+https://arxiv.org/pdf/1901.00232
+
+4. Gupta, S., Agrawal, A., Gopalakrishnan, K., & Narayanan, P. (2015). Deep Learning with Limited Numerical Precision. In Proceedings of the 32nd International Conference on Machine Learning (ICML), PMLR 37.
+https://proceedings.mlr.press/v37/gupta15.html
+
+5. Unity Technologies. *Precision Modes*. Unity Shader Graph Documentation.  
+[https://docs.unity3d.com/Packages/com.unity.shadergraph@17.1/manual/Precision-Modes.html](https://docs.unity3d.com/Packages/com.unity.shadergraph@17.1/manual/Precision-Modes.html)
+
+6. Croci, M., Gorman, G. J., & Giles, M. B. (2021). Rounding Error using Low Precision Approximate Random Variables. arXiv:2012.09739.
+https://arxiv.org/abs/2012.09739
+
diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error.md
diff --git a/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp.md b/content/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp.md