Skip to content

Commit fbeb51f

Browse files
committed
Revised floating point behavior Learning Path
1 parent 368adc0 commit fbeb51f

File tree

5 files changed

+132
-99
lines changed

5 files changed

+132
-99
lines changed

content/learning-paths/cross-platform/floating-point-rounding-errors/_index.md

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
11
---
2-
title: Explore floating-point differences between x86 and Arm
2+
title: Understand floating-point behavior across x86 and Arm architectures
33

44
draft: true
55
cascade:
66
draft: true
77

88
minutes_to_complete: 30
99

10-
who_is_this_for: This is an introductory topic for developers who are porting applications from x86 to Arm and want to understand how floating-point behavior differs between these architectures - particularly in the context of numerical consistency, performance, and debugging subtle bugs.
10+
who_is_this_for: This is an introductory topic for developers who are porting applications from x86 to Arm and want to understand floating-point behavior across these architectures. Both architectures provide reliable and consistent floating-point computation following the IEEE 754 standard.
1111

1212
learning_objectives:
13-
- Identify key differences in floating-point behavior between the x86 and Arm architectures.
14-
- Recognize the impact of compiler optimizations and instruction sets on floating-point results.
15-
- Apply compiler flags and best practices to ensure consistent floating-point behavior across
16-
platforms.
13+
- Understand that Arm and x86 produce identical results for all well-defined floating-point operations.
14+
- Recognize that differences only occur in special undefined cases permitted by IEEE 754.
15+
- Learn best practices for writing portable floating-point code across architectures.
16+
- Apply appropriate precision levels for portable results.
1717

1818
prerequisites:
1919
- Access to an x86 and an Arm Linux machine.
@@ -47,8 +47,6 @@ further_reading:
4747
link: https://en.cppreference.com/w/cpp/numeric/fenv
4848
type: documentation
4949

50-
51-
5250
### FIXED, DO NOT MODIFY
5351
# ================================================================================
5452
weight: 1 # _index.md always has weight of 1 to order correctly

content/learning-paths/cross-platform/floating-point-rounding-errors/how-to-1.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,19 @@
11
---
2-
title: "Floating-Point Representation"
2+
title: "Floating-point representation"
33
weight: 2
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9+
## Introduction
10+
11+
This Learning Path explores floating-point behavior across x86 and Arm architectures. Both architectures fully implement the IEEE 754 standard and produce identical results for all well-defined floating-point operations.
12+
13+
Any differences you encounter are limited to special undefined cases where the IEEE 754 standard explicitly permits different implementations. These cases represent edge conditions that can be avoided, not fundamental differences in floating-point results.
14+
15+
Arm processors provide completely reliable and accurate floating-point computation that is equivalent to x86 for all standard mathematical operations. By understanding the nuances of floating-point arithmetic and following best practices, you can write portable and robust code that performs consistently across platforms.
16+
917
## Review of floating-point numbers
1018

1119
{{% notice Learning tip%}}
@@ -47,8 +55,7 @@ Key takeaways:
4755
- ULP behavior impacts numerical stability and precision.
4856

4957
{{% notice Learning tip %}}
50-
Keep in mind that rounding and representation issues aren't bugsthey’re a consequence of how floating-point math works at the hardware level. Understanding these fundamentals is essential when porting numerical code across architectures like x86 and Arm.
58+
Keep in mind that rounding and representation issues aren't bugs, they are a consequence of how floating-point math works at the hardware level. Understanding these fundamentals is useful when porting numerical code across architectures like x86 and Arm.
5159
{{% /notice %}}
5260

53-
54-
In the next section, you'll explore how x86 and Arm differ in how they implement and optimize floating-point operations — and why this matters for writing portable, accurate software.
61+
In the next section, you'll explore why you may come across differences in undefined floating point operations and how you can use this information to write portable floating-point code.
Lines changed: 47 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,28 @@
11
---
2-
title: Differences between x86 and Arm
2+
title: Overflow in floating-point to integer conversion
33
weight: 3
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
## What are the differences in behavior between x86 and Arm floating point?
9+
## Are there differences in behavior between x86 and Arm floating point?
1010

11-
Although both x86 and Arm generally follow the IEEE 754 standard for floating-point representation, their behavior in edge cases — like overflow and truncation — can differ due to implementation details and instruction sets.
11+
Both the x86 and Arm architectures fully comply with the IEEE 754 standard for floating-point representation. For all well-defined operations, both architectures produce identical results. Differences only occur in cases where the IEEE 754 standard explicitly leaves behavior undefined, such as converting out-of-range floating-point values to integers. These are special undefined cases where the standard permits implementations to behave differently and is not a flaw or limitation of either architecture.
1212

13-
You can see this by comparing an example application on both an x86 and an Arm Linux system.
13+
Understanding these undefined corner cases will help you correct any non-portable code.
1414

15-
Run this example on any Linux system with x86 and Arm architecture; on AWS, use EC2 instance types `t3.micro` and `t4g.small` with Ubuntu 24.04.
15+
### Undefined behavior in floating-point to integer conversion
1616

17-
To learn about floating-point differences, use an editor to copy and paste the C++ code below into a new file named `converting-float.cpp`:
17+
The following example demonstrates undefined behavior that occurs when converting out-of-range floating-point values to integers. An out-of-range floating-point value is too large or too small to be represented within the limits of the floating-point format used, such as float or double.
18+
19+
This behavior is explicitly undefined by the IEEE 754 standard and the C++ specification, meaning different architectures are permitted to handle these cases differently.
20+
21+
The differences shown below only occur in undefined behavior cases. Normal floating-point operations produce identical results on both architectures.
22+
23+
An example of undefined behavior in floating-point code is provided below. You can run the example application on both an x86 and an Arm Linux system. If you are using AWS, use EC2 instance types `t3.micro` and `t4g.small` with Ubuntu 24.04.
24+
25+
To learn about floating-point conversions, use an editor to copy and paste the C++ code below into a new file named `conversions.cpp`.
1826

1927
```cpp
2028
#include <iostream>
@@ -60,65 +68,65 @@ int main() {
6068
}
6169
```
6270
63-
If you need to install the `g++` compiler, run the commands below:
71+
If you need to install the `g++` and `clang` compilers, run the commands below:
6472
6573
```bash
6674
sudo apt update
67-
sudo apt install g++ -y
75+
sudo apt install g++ clang -y
6876
```
6977

70-
Compile `converting-float.cpp` on an Arm and x86 machine.
78+
Compile `conversions.cpp` on an Arm and an x86 Linux machine.
7179

7280
The compile command is the same on both systems.
7381

7482
```bash
75-
g++ converting-float.cpp -o converting-float
83+
g++ conversions.cpp -o converting-float
84+
```
85+
86+
Run the program on both systems:
87+
88+
```bash
89+
./converting-float
7690
```
7791

7892
For easy comparison, the image below shows the x86 output (left) and Arm output (right). The highlighted lines show the difference in output:
7993

8094
![differences](./differences.png)
8195

82-
As you can see, there are several cases where different behavior is observed. For example when trying to convert a signed number to an unsigned number or dealing with out-of-bounds numbers.
96+
As you can see, there are several cases where different behavior is observed in these undefined scenarios. For example, when trying to convert a signed number to an unsigned number or dealing with out-of-bounds values.
8397

84-
## Removing hardcoded values with macros
98+
## Avoid out-of-range conversions
8599

86-
The above differences show that explicitly checking for specific values will lead to unportable code.
100+
The above differences demonstrate non-portable code. Undefined behavior, such as converting out-of-range floating-point values to integers, can lead to inconsistent results across platforms. To ensure portability and predictable behavior, it is essential to check for out-of-range values before performing such conversions.
87101

88-
For example, the function below checks if the casted result is `0`. This can be misleading — on x86, casting an out-of-range floating-point value to `uint32_t` may wrap to `0`, while on Arm it may behave differently. Relying on these results makes the code unportable.
89-
90-
102+
You can check for out-of-range values using the code below. This approach ensures that the conversion is only performed when the value is within the valid range for the target data type. If the value is out of range, a default value is used to handle the situation gracefully. This prevents unexpected results and makes the code portable.
91103

92104
```cpp
93-
void checkFloatToUint32(float num) {
94-
uint32_t castedNum = static_cast<uint32_t>(num);
95-
if (castedNum == 0) {
96-
std::cout << "The casted number is 0, indicating that the float is out of bounds for uint32_t." << std::endl;
105+
constexpr float UINT32_MAX_F = static_cast<float>(UINT32_MAX);
106+
107+
void convertFloatToInt(float value) {
108+
// Convert to unsigned 32-bit integer with range checking
109+
uint32_t u32;
110+
if (!std::isnan(value) && value >= 0.0f && value <= UINT32_MAX_F) {
111+
u32 = static_cast<uint32_t>(value);
112+
std::cout << "The casted number is: " << u32 << std::endl;
97113
} else {
98-
std::cout << "The casted number is: " << castedNum << std::endl;
114+
u32 = 0; // Default value for out-of-range
115+
std::cout << "The float is out of bounds for uint32_t, using 0." << std::endl;
99116
}
117+
118+
// ...existing code...
100119
}
101120
```
102121
103-
This can simply be corrected by using the macro, `UINT32_MAX`.
122+
This checking provides a portable solution that identifies out-of-range values before casting and sets the out-of-range values to 0. By incorporating such checks, you can avoid undefined behavior and ensure that your code behaves consistently across different platforms.
104123
105-
{{% notice Note %}}
106-
To find out all the available compiler-defined macros, you can output them using:
107-
```bash
108-
echo "" | g++ -dM -E -
109-
```
110-
{{% /notice %}}
124+
### Key takeaways
111125
112-
A portable version of the code is:
126+
- Arm and x86 produce identical results for all well-defined floating-point operations, both architectures comply with IEEE 754.
127+
- Differences only occur in special undefined cases where the IEEE 754 standard explicitly permits different behaviors.
128+
- An example undefined scenario is converting out-of-range floating-point values to integers.
129+
- You should avoid relying on undefined behavior to ensure portability.
113130
114-
```cpp
115-
void checkFloatToUint32(float num) {
116-
uint32_t castedNum = static_cast<uint32_t>(num);
117-
if (castedNum == UINT32_MAX) {
118-
std::cout << "The casted number is " << UINT32_MAX << " indicating the float was out of bounds for uint32_t." << std::endl;
119-
} else {
120-
std::cout << "The casted number is: " << castedNum << std::endl;
121-
}
122-
}
123-
```
131+
By understanding these nuances, you can confidently write code that behaves consistently across platforms.
124132

content/learning-paths/cross-platform/floating-point-rounding-errors/how-to-3.md

Lines changed: 50 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,24 @@
11
---
2-
title: Error propagation
2+
title: Single and double precision considerations
33
weight: 4
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
## What is error propagation in x86 and Arm systems?
9+
## Understanding numerical precision differences in single vs double precision
1010

11-
One cause of different outputs between x86 and Arm stems from the order of instructions and how errors are propagated. As a hypothetical example, an Arm system may decide to reorder the instructions that each have a different rounding error so that subtle changes are observed.
11+
This section explores how different levels of floating-point precision can affect numerical results. The differences shown here are not architecture-specific issues, but demonstrate the importance of choosing appropriate precision levels for numerical computations.
1212

13-
It is possible that two functions that are mathematically equivalent will propagate errors differently on a computer.
13+
### Single precision limitations
1414

15-
Functions `f1` and `f2` are mathematically equivalent. You would expect them to return the same value given the same input.
16-
17-
If the input is a very small number, `1e-8`, the error is different due to the loss in precision caused by different operations. Specifically, `f2` avoids subtracting nearly equal numbers for clarity. For a full description look into the topic of [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability).
15+
Consider two mathematically equivalent functions, `f1()` and `f2()`. While they should theoretically produce the same result, small differences can arise due to the limited precision of floating-point arithmetic.
16+
17+
The differences shown in this example are due to using single precision (float) arithmetic, not due to architectural differences between Arm and x86. Both architectures handle single precision arithmetic according to IEEE 754.
1818

19-
Use an editor to copy and paste the C++ code below into a file named `error-propagation.cpp`:
19+
Functions `f1()` and `f2()` are mathematically equivalent. You would expect them to return the same value given the same input.
20+
21+
Use an editor to copy and paste the C++ code below into a file named `single-precision.cpp`
2022

2123
```cpp
2224
#include <stdio.h>
@@ -53,15 +55,14 @@ int main() {
5355
}
5456
```
5557
56-
Compile the code on both x86 and Arm with the following command:
58+
Compile and run the code on both x86 and Arm with the following command:
5759
5860
```bash
59-
g++ -g error-propagation.cpp -o error-propagation
61+
g++ -g single-precision.cpp -o single-precision
62+
./single-precision
6063
```
6164

62-
Running the two binaries shows that the second function, `f2`, has a small rounding error on both architectures. Additionally, there is a further rounding difference when run on x86 compared to Arm.
63-
64-
Running on x86:
65+
Output running on x86:
6566

6667
```output
6768
f1(1.000000e-08) = 0.0000000000
@@ -70,10 +71,45 @@ Difference (f1 - f2) = -4.9999999696e-09
7071
Final result after magnification: -0.4999000132
7172
```
7273

73-
Running on Arm:
74+
Output running on Arm:
75+
7476
```output
7577
f1(1.000000e-08) = 0.0000000000
7678
f2(1.000000e-08) = 0.0000000050
7779
Difference (f1 - f2) = -4.9999999696e-09
7880
Final result after magnification: -0.4998999834
7981
```
82+
83+
Depending on your compiler and library versions, you may get the same output on both systems. You can also use the `clang` compiler and see if the output matches.
84+
85+
```bash
86+
clang -g single-precision.cpp -o single-precision -lm
87+
./single-precision
88+
```
89+
90+
In some cases the GNU compiler output differs from the Clang output.
91+
92+
Here's what's happening:
93+
94+
1. Different square root algorithms: x86 and Arm use different hardware and library implementations for `sqrtf(1 + 1e-8)`
95+
96+
2. Tiny implementation differences get amplified. The difference between the two `sqrtf()` results is only about 3e-10, but this gets multiplied by 100,000,000, making it visible in the final result.
97+
98+
3. Both `f1()` and `f2()` use `sqrtf()`. Even though `f2()` is more numerically stable, both functions call `sqrtf()` with the same input, so they both inherit the same architecture-specific square root result.
99+
100+
4. Compiler and library versions may produce different output due to different implementations of library functions such as `sqrtf()`.
101+
102+
The final result is that x86 and Arm libraries compute `sqrtf(1.00000001)` with tiny differences in the least significant bits. This is normal and expected behavior and IEEE 754 allows for implementation variations in transcendental functions like square root, as long as they stay within specified error bounds.
103+
104+
The very small difference you see is within acceptable floating-point precision limits.
105+
106+
### Key takeaways
107+
108+
- The small differences shown are due to library implementations in single-precision mode, not fundamental architectural differences.
109+
- Single-precision arithmetic has inherent limitations that can cause small numerical differences.
110+
- Using numerically stable algorithms, like `f2()`, can minimize error propagation.
111+
- Understanding [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) is important for writing portable code.
112+
113+
By adopting best practices and appropriate precision levels, developers can ensure consistent results across platforms.
114+
115+
Continue to the next section to see how precision impacts the results.

0 commit comments

Comments
 (0)