Revised floating point behavior Learning Path

jasonrandrews · jasonrandrews · commit fbeb51f289a0 · 2025-09-12T08:37:25.000-05:00
diff --git a/content/learning-paths/cross-platform/floating-point-rounding-errors/_index.md b/content/learning-paths/cross-platform/floating-point-rounding-errors/_index.md
@@ -1,19 +1,19 @@
 ---
-title: Explore floating-point differences between x86 and Arm
+title: Understand floating-point behavior across x86 and Arm architectures
 
 draft: true
 cascade:
     draft: true
 
 minutes_to_complete: 30
 
-who_is_this_for: This is an introductory topic for developers who are porting applications from x86 to Arm and want to understand how floating-point behavior differs between these architectures - particularly in the context of numerical consistency, performance, and debugging subtle bugs.
+who_is_this_for: This is an introductory topic for developers who are porting applications from x86 to Arm and want to understand floating-point behavior across these architectures. Both architectures provide reliable and consistent floating-point computation following the IEEE 754 standard.
 
 learning_objectives: 
-    - Identify key differences in floating-point behavior between the x86 and Arm architectures. 
-    - Recognize the impact of compiler optimizations and instruction sets on floating-point results.
-    - Apply compiler flags and best practices to ensure consistent floating-point behavior across 
-      platforms.
+    - Understand that Arm and x86 produce identical results for all well-defined floating-point operations.
+    - Recognize that differences only occur in special undefined cases permitted by IEEE 754.
+    - Learn best practices for writing portable floating-point code across architectures.
+    - Apply appropriate precision levels for portable results.
 
 prerequisites:
     - Access to an x86 and an Arm Linux machine.
@@ -47,8 +47,6 @@ further_reading:
         link: https://en.cppreference.com/w/cpp/numeric/fenv
         type: documentation
 
-
-
 ### FIXED, DO NOT MODIFY
 # ================================================================================
 weight: 1                       # _index.md always has weight of 1 to order correctly
diff --git a/content/learning-paths/cross-platform/floating-point-rounding-errors/how-to-1.md b/content/learning-paths/cross-platform/floating-point-rounding-errors/how-to-1.md
@@ -1,11 +1,19 @@
 ---
-title: "Floating-Point Representation"
+title: "Floating-point representation"
 weight: 2
 
 ### FIXED, DO NOT MODIFY
 layout: learningpathall
 ---
 
+## Introduction
+
+This Learning Path explores floating-point behavior across x86 and Arm architectures. Both architectures fully implement the IEEE 754 standard and produce identical results for all well-defined floating-point operations.
+
+Any differences you encounter are limited to special undefined cases where the IEEE 754 standard explicitly permits different implementations. These cases represent edge conditions that can be avoided, not fundamental differences in floating-point results.
+
+Arm processors provide completely reliable and accurate floating-point computation that is equivalent to x86 for all standard mathematical operations. By understanding the nuances of floating-point arithmetic and following best practices, you can write portable and robust code that performs consistently across platforms.
+
 ## Review of floating-point numbers
 
 {{% notice Learning tip%}}
@@ -47,8 +55,7 @@ Key takeaways:
 - ULP behavior impacts numerical stability and precision.
 
 {{% notice Learning tip %}}
-Keep in mind that rounding and representation issues aren't bugs — they’re a consequence of how floating-point math works at the hardware level. Understanding these fundamentals is essential when porting numerical code across architectures like x86 and Arm.
+Keep in mind that rounding and representation issues aren't bugs, they are a consequence of how floating-point math works at the hardware level. Understanding these fundamentals is useful when porting numerical code across architectures like x86 and Arm.
 {{% /notice %}}
 
-
-In the next section, you'll explore how x86 and Arm differ in how they implement and optimize floating-point operations — and why this matters for writing portable, accurate software.
+In the next section, you'll explore why you may come across differences in undefined floating point operations and how you can use this information to write portable floating-point code. 
diff --git a/content/learning-paths/cross-platform/floating-point-rounding-errors/how-to-2.md b/content/learning-paths/cross-platform/floating-point-rounding-errors/how-to-2.md
@@ -1,20 +1,28 @@
 ---
-title: Differences between x86 and Arm
+title: Overflow in floating-point to integer conversion
 weight: 3
 
 ### FIXED, DO NOT MODIFY
 layout: learningpathall
 ---
 
-## What are the differences in behavior between x86 and Arm floating point?
+## Are there differences in behavior between x86 and Arm floating point?
 
-Although both x86 and Arm generally follow the IEEE 754 standard for floating-point representation, their behavior in edge cases — like overflow and truncation — can differ due to implementation details and instruction sets.
+Both the x86 and Arm architectures fully comply with the IEEE 754 standard for floating-point representation. For all well-defined operations, both architectures produce identical results. Differences only occur in cases where the IEEE 754 standard explicitly leaves behavior undefined, such as converting out-of-range floating-point values to integers. These are special undefined cases where the standard permits implementations to behave differently and is not a flaw or limitation of either architecture.
 
-You can see this by comparing an example application on both an x86 and an Arm Linux system. 
+Understanding these undefined corner cases will help you correct any non-portable code.
 
-Run this example on any Linux system with x86 and Arm architecture; on AWS, use EC2 instance types `t3.micro` and `t4g.small` with Ubuntu 24.04.
+### Undefined behavior in floating-point to integer conversion
 
-To learn about floating-point differences, use an editor to copy and paste the C++ code below into a new file named `converting-float.cpp`:
+The following example demonstrates undefined behavior that occurs when converting out-of-range floating-point values to integers. An out-of-range floating-point value is too large or too small to be represented within the limits of the floating-point format used, such as float or double. 
+
+This behavior is explicitly undefined by the IEEE 754 standard and the C++ specification, meaning different architectures are permitted to handle these cases differently. 
+
+The differences shown below only occur in undefined behavior cases. Normal floating-point operations produce identical results on both architectures.
+
+An example of undefined behavior in floating-point code is provided below. You can run the example application on both an x86 and an Arm Linux system. If you are using AWS, use EC2 instance types `t3.micro` and `t4g.small` with Ubuntu 24.04.
+
+To learn about floating-point conversions, use an editor to copy and paste the C++ code below into a new file named `conversions.cpp`.
 
 ```cpp
 #include <iostream>
@@ -60,65 +68,65 @@ int main() {
 }
 ```
 
-If you need to install the `g++` compiler, run the commands below:
+If you need to install the `g++` and `clang` compilers, run the commands below:
 
 ```bash
 sudo apt update
-sudo apt install g++  -y
+sudo apt install g++ clang -y
 ```
 
-Compile `converting-float.cpp` on an Arm and x86 machine. 
+Compile `conversions.cpp` on an Arm and an x86 Linux machine. 
 
 The compile command is the same on both systems.
 
 ```bash
-g++ converting-float.cpp -o converting-float 
+g++ conversions.cpp -o converting-float 
+```
+
+Run the program on both systems:
+
+```bash
+./converting-float
 ```
 
 For easy comparison, the image below shows the x86 output (left) and Arm output (right). The  highlighted lines show the difference in output: 
 
 ![differences](./differences.png)
 
-As you can see, there are several cases where different behavior is observed. For example when trying to convert a signed number to an unsigned number or dealing with out-of-bounds numbers. 
+As you can see, there are several cases where different behavior is observed in these undefined scenarios. For example, when trying to convert a signed number to an unsigned number or dealing with out-of-bounds values. 
 
-## Removing hardcoded values with macros
+## Avoid out-of-range conversions
 
-The above differences show that explicitly checking for specific values will lead to unportable code. 
+The above differences demonstrate non-portable code. Undefined behavior, such as converting out-of-range floating-point values to integers, can lead to inconsistent results across platforms. To ensure portability and predictable behavior, it is essential to check for out-of-range values before performing such conversions.
 
-For example, the function below checks if the casted result is `0`. This can be misleading — on x86, casting an out-of-range floating-point value to `uint32_t` may wrap to `0`, while on Arm it may behave differently. Relying on these results makes the code unportable.
-
-  
+You can check for out-of-range values using the code below. This approach ensures that the conversion is only performed when the value is within the valid range for the target data type. If the value is out of range, a default value is used to handle the situation gracefully. This prevents unexpected results and makes the code portable.
 
 ```cpp
-void checkFloatToUint32(float num) {
-    uint32_t castedNum = static_cast<uint32_t>(num);
-    if (castedNum == 0) {
-        std::cout << "The casted number is 0, indicating that the float is out of bounds for uint32_t." << std::endl;
+constexpr float UINT32_MAX_F = static_cast<float>(UINT32_MAX);
+
+void convertFloatToInt(float value) {
+    // Convert to unsigned 32-bit integer with range checking
+    uint32_t u32;
+    if (!std::isnan(value) && value >= 0.0f && value <= UINT32_MAX_F) {
+        u32 = static_cast<uint32_t>(value);
+        std::cout << "The casted number is: " << u32 << std::endl;
     } else {
-        std::cout << "The casted number is: " << castedNum << std::endl;
+        u32 = 0; // Default value for out-of-range
+        std::cout << "The float is out of bounds for uint32_t, using 0." << std::endl;
     }
+
+    // ...existing code...
 }
 ```
 
-This can simply be corrected by using the macro, `UINT32_MAX`. 
+This checking provides a portable solution that identifies out-of-range values before casting and sets the out-of-range values to 0. By incorporating such checks, you can avoid undefined behavior and ensure that your code behaves consistently across different platforms.
 
-{{% notice Note %}} 
-To find out all the available compiler-defined macros, you can output them using:
-```bash
-echo "" | g++ -dM -E -
-```
-{{% /notice %}}
+### Key takeaways
 
-A portable version of the code is:
+- Arm and x86 produce identical results for all well-defined floating-point operations, both architectures comply with IEEE 754.
+- Differences only occur in special undefined cases where the IEEE 754 standard explicitly permits different behaviors.
+- An example undefined scenario is converting out-of-range floating-point values to integers.
+- You should avoid relying on undefined behavior to ensure portability.
 
-```cpp
-void checkFloatToUint32(float num) {
-    uint32_t castedNum = static_cast<uint32_t>(num);
-    if (castedNum == UINT32_MAX) {
-        std::cout << "The casted number is " << UINT32_MAX <<  " indicating the float was out of bounds for uint32_t." << std::endl;
-    } else {
-        std::cout << "The casted number is: " << castedNum << std::endl;
-    }
-}
-```
+By understanding these nuances, you can confidently write code that behaves consistently across platforms.
 
diff --git a/content/learning-paths/cross-platform/floating-point-rounding-errors/how-to-3.md b/content/learning-paths/cross-platform/floating-point-rounding-errors/how-to-3.md
@@ -1,22 +1,24 @@
 ---
-title: Error propagation
+title: Single and double precision considerations
 weight: 4
 
 ### FIXED, DO NOT MODIFY
 layout: learningpathall
 ---
 
-## What is error propagation in x86 and Arm systems?
+## Understanding numerical precision differences in single vs double precision
 
-One cause of different outputs between x86 and Arm stems from the order of instructions and how errors are propagated. As a hypothetical example, an Arm system may decide to reorder the instructions that each have a different rounding error so that subtle changes are observed. 
+This section explores how different levels of floating-point precision can affect numerical results. The differences shown here are not architecture-specific issues, but demonstrate the importance of choosing appropriate precision levels for numerical computations. 
 
-It is possible that two functions that are mathematically equivalent will propagate errors differently on a computer. 
+### Single precision limitations
 
- Functions `f1` and `f2` are mathematically equivalent. You would expect them to return the same value given the same input. 
- 
- If the input is a very small number, `1e-8`, the error is different due to the loss in precision caused by different operations. Specifically, `f2` avoids subtracting nearly equal numbers for clarity. For a full description look into the topic of [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability). 
+Consider two mathematically equivalent functions, `f1()` and `f2()`. While they should theoretically produce the same result, small differences can arise due to the limited precision of floating-point arithmetic. 
+
+The differences shown in this example are due to using single precision (float) arithmetic, not due to architectural differences between Arm and x86. Both architectures handle single precision arithmetic according to IEEE 754.
 
-Use an editor to copy and paste the C++ code below into a file named `error-propagation.cpp`: 
+Functions `f1()` and `f2()` are mathematically equivalent. You would expect them to return the same value given the same input. 
+ 
+Use an editor to copy and paste the C++ code below into a file named `single-precision.cpp` 
 
 ```cpp
 #include <stdio.h>
@@ -53,15 +55,14 @@ int main() {
 }
 ```
 
-Compile the code on both x86 and Arm with the following command:
+Compile and run the code on both x86 and Arm with the following command:
 
 ```bash
-g++ -g error-propagation.cpp -o error-propagation
+g++ -g single-precision.cpp -o single-precision
+./single-precision
 ```
 
-Running the two binaries shows that the second function, `f2`, has a small rounding error on both architectures. Additionally, there is a further rounding difference when run on x86 compared to Arm.
-
-Running on x86:
+Output running on x86:
 
 ```output
 f1(1.000000e-08) = 0.0000000000
@@ -70,10 +71,45 @@ Difference (f1 - f2) = -4.9999999696e-09
 Final result after magnification: -0.4999000132
 ```
 
-Running on Arm:
+Output running on Arm:
+
 ```output
 f1(1.000000e-08) = 0.0000000000
 f2(1.000000e-08) = 0.0000000050
 Difference (f1 - f2) = -4.9999999696e-09
 Final result after magnification: -0.4998999834
 ```
+
+Depending on your compiler and library versions, you may get the same output on both systems. You can also use the `clang` compiler and see if the output matches. 
+
+```bash
+clang -g single-precision.cpp -o single-precision -lm
+./single-precision
+```
+
+In some cases the GNU compiler output differs from the Clang output. 
+
+Here's what's happening:
+
+1. Different square root algorithms: x86 and Arm use different hardware and library implementations for `sqrtf(1 + 1e-8)`
+
+2. Tiny implementation differences get amplified. The difference between the two `sqrtf()` results is only about 3e-10, but this gets multiplied by 100,000,000, making it visible in the final result.
+
+3. Both `f1()` and `f2()` use `sqrtf()`. Even though `f2()` is more numerically stable, both functions call `sqrtf()` with the same input, so they both inherit the same architecture-specific square root result.
+
+4. Compiler and library versions may produce different output due to different implementations of library functions such as `sqrtf()`.
+
+The final result is that x86 and Arm libraries compute `sqrtf(1.00000001)` with tiny differences in the least significant bits. This is normal and expected behavior and IEEE 754 allows for implementation variations in transcendental functions like square root, as long as they stay within specified error bounds.
+
+The very small difference you see is within acceptable floating-point precision limits.
+
+### Key takeaways
+
+- The small differences shown are due to library implementations in single-precision mode, not fundamental architectural differences.
+- Single-precision arithmetic has inherent limitations that can cause small numerical differences.
+- Using numerically stable algorithms, like `f2()`, can minimize error propagation.
+- Understanding [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) is important for writing portable code.
+
+By adopting best practices and appropriate precision levels, developers can ensure consistent results across platforms. 
+
+Continue to the next section to see how precision impacts the results.
diff --git a/content/learning-paths/cross-platform/floating-point-rounding-errors/how-to-4.md b/content/learning-paths/cross-platform/floating-point-rounding-errors/how-to-4.md