Skip to content

Commit df09782

Browse files
committed
Merge remote-tracking branch 'origin/ollama_on_gke' into ollama_on_gke
2 parents 8707d12 + 1c039db commit df09782

File tree

68 files changed

+1553
-825
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

68 files changed

+1553
-825
lines changed
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
---
2+
title: Learn about floating point rounding on Arm
3+
4+
draft: true
5+
cascade:
6+
draft: true
7+
8+
minutes_to_complete: 30
9+
10+
who_is_this_for: Developers porting applications from x86 to Arm who observe different floating point values on each platform.
11+
12+
learning_objectives:
13+
- Understand the differences between floating point numbers on x86 and Arm.
14+
- Understand factors that affect floating point behavior.
15+
- How to use compiler flags to produce predictable behavior.
16+
17+
prerequisites:
18+
- Access to an x86 and an Arm Linux machine.
19+
- Basic understanding of floating point numbers.
20+
21+
author: Kieran Hejmadi
22+
23+
### Tags
24+
skilllevels: Introductory
25+
subjects: Performance and Architecture
26+
armips:
27+
- Cortex-A
28+
- Neoverse
29+
tools_software_languages:
30+
- C++
31+
operatingsystems:
32+
- Linux
33+
shared_path: true
34+
shared_between:
35+
- servers-and-cloud-computing
36+
- laptops-and-desktops
37+
- mobile-graphics-and-gaming
38+
39+
further_reading:
40+
- resource:
41+
title: G++ Optimisation Flags
42+
link: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
43+
type: documentation
44+
- resource:
45+
title: Floating-point environment
46+
link: https://en.cppreference.com/w/cpp/numeric/fenv
47+
type: documentation
48+
49+
50+
51+
### FIXED, DO NOT MODIFY
52+
# ================================================================================
53+
weight: 1 # _index.md always has weight of 1 to order correctly
54+
layout: "learningpathall" # All files under learning paths have this same wrapper
55+
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
56+
---
448 KB
Loading
93.6 KB
Loading
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
---
2+
title: Floating Point Representations
3+
weight: 2
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Review of floating point numbers
10+
11+
If you are unfamiliar with floating point number representation, you can review [Learn about integer and floating-point conversions](/learning-paths/cross-platform/integer-vs-floats/introduction-integer-float-types/). It covers different data types and explains data type conversions.
12+
13+
Floating-point numbers are a fundamental representation of real numbers in computer systems, enabling efficient storage and computation of decimal values with varying degrees of precision. In C/C++, floating point variables are created with keywords such as `float` or `double`. The IEEE 754 standard, established in 1985, is the most widely used format for floating-point arithmetic, ensuring consistency across different hardware and software implementations.
14+
15+
IEEE 754 defines two primary formats: single-precision (32-bit) and double-precision (64-bit).
16+
17+
Each floating-point number consists of three components:
18+
- **sign bit**. (Determining positive or negative value)
19+
- **exponent** (defining the scale or magnitude)
20+
- **significand** (also called the mantissa, representing the significant digits of the number).
21+
22+
The standard uses a biased exponent to handle both large and small numbers efficiently, and it incorporates special values such as NaN (Not a Number), infinity, and subnormal numbers for robust numerical computation. A key feature of IEEE 754 is its support for rounding modes and exception handling, ensuring predictable behavior in mathematical operations. However, floating-point arithmetic is inherently imprecise due to limited precision, leading to small rounding errors.
23+
24+
The graphic below illustrates various forms of floating point representation supported by Arm, each with varying number of bits assigned to the exponent and mantissa.
25+
26+
![floating-point](./floating-point-numbers.png)
27+
28+
## Rounding errors
29+
30+
Since computers use a finite number of bits to store a continuous range of numbers, rounding errors are introduced. The unit in last place (ULP) is the smallest difference between two consecutive floating-point numbers. It measures floating-point rounding error, which arises because not all real numbers can be exactly represented.
31+
32+
When an operation is performed, the result is rounded to the nearest representable value, introducing a small error. This error, often measured in ULPs, indicates how close the computed value is to the exact result. For a simple example, if a floating-point schema with 3 bits for the mantissa (precision) and an exponent in the range of -1 to 2 is used, the possible values are represented in the graph below.
33+
34+
![ulp](./ulp.png)
35+
36+
Key takeaways:
37+
38+
- ULP size varies with the number’s magnitude.
39+
- Larger numbers have bigger ULPs due to wider spacing between values.
40+
- Smaller numbers have smaller ULPs, reducing quantization error.
41+
- ULP behavior impacts numerical stability and precision in computations.
Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
---
2+
title: Differences between x86 and Arm
3+
weight: 3
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## What are the differences in behavior between x86 and Arm floating point?
10+
11+
Architecture and standards define floating point overflows and truncations in different ways.
12+
13+
You can see this by comparing an example application on an x86 and an Arm Linux system.
14+
15+
You can use any Linux systems for this example. If you are using AWS, you can use EC2 instance types
16+
`t3.micro` and `t4g.small` running Ubuntu 24.04.
17+
18+
To learn about floating point differences, use an editor to copy and paste the C++ code below into a new file named `converting-float.cpp`.
19+
20+
```cpp
21+
#include <iostream>
22+
#include <cmath>
23+
#include <limits>
24+
#include <cstdint>
25+
26+
void convertFloatToInt(float value) {
27+
// Convert to unsigned 32-bit integer
28+
uint32_t u32 = static_cast<uint32_t>(value);
29+
30+
// Convert to signed 32-bit integer
31+
int32_t s32 = static_cast<int32_t>(value);
32+
33+
// Convert to unsigned 16-bit integer (truncation happens)
34+
uint16_t u16 = static_cast<uint16_t>(u32);
35+
uint8_t u8 = static_cast<uint8_t>(value);
36+
37+
// Convert to signed 16-bit integer (truncation happens)
38+
int16_t s16 = static_cast<int16_t>(s32);
39+
40+
std::cout << "Floating-Point Value: " << value << "\n";
41+
std::cout << " → uint32_t: " << u32 << " (0x" << std::hex << u32 << std::dec << ")\n";
42+
std::cout << " → int32_t: " << s32 << " (0x" << std::hex << s32 << std::dec << ")\n";
43+
std::cout << " → uint16_t (truncated): " << u16 << " (0x" << std::hex << u16 << std::dec << ")\n";
44+
std::cout << " → int16_t (truncated): " << s16 << " (0x" << std::hex << s16 << std::dec << ")\n";
45+
std::cout << " → uint8_t (truncated): " << static_cast<int>(u8) << std::endl;
46+
47+
std::cout << "----------------------------------\n";
48+
}
49+
50+
int main() {
51+
std::cout << "Demonstrating Floating-Point to Integer Conversion\n\n";
52+
53+
// Test cases
54+
convertFloatToInt(42.7f); // Normal case
55+
convertFloatToInt(-15.3f); // Negative value -> wraps on unsigned
56+
convertFloatToInt(4294967296.0f); // Overflow: 2^32 (UINT32_MAX + 1)
57+
convertFloatToInt(3.4e+38f); // Large float exceeding UINT32_MAX
58+
convertFloatToInt(-3.4e+38f); // Large negative float
59+
convertFloatToInt(NAN); // NaN behavior on different platforms
60+
return 0;
61+
}
62+
```
63+
64+
If you need to install the `g++` compiler, run the commands below.
65+
66+
```bash
67+
sudo apt update
68+
sudo apt install g++ -y
69+
```
70+
71+
Compile `converting-float.cpp` on an Arm and x86 machine.
72+
73+
The compile command is the same on both systems.
74+
75+
```bash
76+
g++ converting-float.cpp -o converting-float
77+
```
78+
79+
For easy comparison, the image below shows the x86 output (left) and Arm output (right). The highlighted lines show the difference in output.
80+
81+
![differences](./differences.png)
82+
83+
As you can see, there are several cases where different behavior is observed. For example when trying to convert a signed number to a unsigned number or dealing with out-of-bounds numbers.
84+
85+
## Removing hardcoded values with macros
86+
87+
The above differences show that explicitly checking for specific values will lead to unportable code.
88+
89+
For example, consider the function below. The code checks if the value is 0. The value an x86 machine will convert a floating point number that exceeds the maximum 32-bit float value. This is different from Arm behavior leading to unportable code.
90+
91+
```cpp
92+
void checkFloatToUint32(float num) {
93+
uint32_t castedNum = static_cast<uint32_t>(num);
94+
if (castedNum == 0) {
95+
std::cout << "The casted number is 0, indicating the float could out of bounds for uint32_t." << std::endl;
96+
} else {
97+
std::cout << "The casted number is: " << castedNum << std::endl;
98+
}
99+
}
100+
```
101+
102+
This can simply be corrected by using the macro, `UINT32_MAX`.
103+
104+
{{% notice Note %}}
105+
To find out all the available compiler-defined macros, you can output them using:
106+
```bash
107+
echo "" | g++ -dM -E -
108+
```
109+
{{% /notice %}}
110+
111+
A portable version of the code is:
112+
113+
```cpp
114+
void checkFloatToUint32(float num) {
115+
uint32_t castedNum = static_cast<uint32_t>(num);
116+
if (castedNum == UINT32_MAX) {
117+
std::cout << "The casted number is " << UINT32_MAX << " indicating the float was out of bounds for uint32_t." << std::endl;
118+
} else {
119+
std::cout << "The casted number is: " << castedNum << std::endl;
120+
}
121+
}
122+
```
123+
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
---
2+
title: Error propagation
3+
weight: 4
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## What is error propagation in x86 and Arm systems?
10+
11+
One cause of different outputs between x86 and Arm stems from the order of instructions and how errors are propagated. As a hypothetical example, an Arm system may decide to reorder the instructions that each have a different rounding error so that subtle changes are observed.
12+
13+
It is possible that 2 functions that are mathematically equivalent will propagate errors differently on a computer.
14+
15+
Functions `f1` and `f2` are mathematically equivalent. You would expect them to return the same value given the same input.
16+
17+
If the input is a very small number, `1e-8`, the error is different due to the loss in precision caused by different operations. Specifically, `f2` avoids the subtraction of nearly equal number. For a full description look into the topic of [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability).
18+
19+
Use an editor to copy and paste the C++ code below into a file named `error-propagation.cpp`.
20+
21+
```cpp
22+
#include <stdio.h>
23+
#include <math.h>
24+
25+
// Function 1: Computes sqrt(1 + x) - 1 using the naive approach
26+
float f1(float x) {
27+
return sqrtf(1 + x) - 1;
28+
}
29+
30+
// Function 2: Computes the same value using an algebraically equivalent transformation
31+
// This version is numerically more stable
32+
float f2(float x) {
33+
return x / (sqrtf(1 + x) + 1);
34+
}
35+
36+
int main() {
37+
float x = 1e-8; // A small value that causes floating-point precision issues
38+
float result1 = f1(x);
39+
float result2 = f2(x);
40+
41+
// Theoretically, result1 and result2 should be the same
42+
float difference = result1 - result2;
43+
// Multiply by a large number to amplify the error
44+
float final_result = 100000000.0f * difference + 0.0001f;
45+
46+
// Print the results
47+
printf("f1(%e) = %.10f\n", x, result1);
48+
printf("f2(%e) = %.10f\n", x, result2);
49+
printf("Difference (f1 - f2) = %.10e\n", difference);
50+
printf("Final result after magnification: %.10f\n", final_result);
51+
52+
return 0;
53+
}
54+
```
55+
56+
Compile the code on both x86 and Arm with the following command.
57+
58+
```bash
59+
g++ -g error-propagation.cpp -o error-propagation
60+
```
61+
62+
Running the 2 binaries shows that the second function, `f2`, has a small rounding error on both architectures. Additionally, there is a further rounding difference when run on x86 compared to Arm.
63+
64+
Running on x86:
65+
66+
```output
67+
f1(1.000000e-08) = 0.0000000000
68+
f2(1.000000e-08) = 0.0000000050
69+
Difference (f1 - f2) = -4.9999999696e-09
70+
Final result after magnification: -0.4999000132
71+
```
72+
73+
Running on Arm:
74+
```output
75+
f1(1.000000e-08) = 0.0000000000
76+
f2(1.000000e-08) = 0.0000000050
77+
Difference (f1 - f2) = -4.9999999696e-09
78+
Final result after magnification: -0.4998999834
79+
```
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
---
2+
title: Minimizing variability across platforms
3+
weight: 5
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## How can I minimize variability across x86 and Arm?
10+
11+
The line `#pragma STDC FENV_ACCESS ON` is a directive that informs the compiler to enable access to the floating-point environment.
12+
13+
This is part of the C++11 standard and is used to ensure that the program can properly handle floating-point exceptions and rounding modes enabling your program to continue running if an exception is thrown.
14+
15+
In the context below, enabling floating-point environment access is crucial because the functions you are working with involve floating-point arithmetic, which can be prone to precision errors and exceptions such as overflow, underflow, division by zero, and invalid operations. This is not necessary for this example, but is included because it may be relevant for your own application.
16+
17+
This directive is particularly important when performing operations that require high numerical stability and precision, such as the square root calculations in functions below. It allows the program to manage the floating-point state and handle any anomalies that might occur during these calculations, thereby improving the robustness and reliability of your numerical computations.
18+
19+
Use an editor to copy and paste the C++ file below into a file named `error-propagation-min.cpp`.
20+
21+
```cpp
22+
#include <cstdio>
23+
#include <cmath>
24+
#include <cfenv>
25+
26+
// Enable floating-point exceptions
27+
#pragma STDC FENV_ACCESS ON
28+
29+
// Function 1: Computes sqrt(1 + x) - 1 using the naive approach
30+
double f1(double x) {
31+
return sqrt(1 + x) - 1;
32+
}
33+
34+
// Function 2: Computes the same value using an algebraically equivalent transformation
35+
// This version is numerically more stable
36+
double f2(double x) {
37+
return x / (sqrt(1 + x) + 1);
38+
}
39+
40+
int main() {
41+
// Enable all floating-point exceptions
42+
std::feclearexcept(FE_ALL_EXCEPT);
43+
std::feraiseexcept(FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW);
44+
45+
double x = 1e-8; // A small value that causes floating-point precision issues
46+
double result1 = f1(x);
47+
double result2 = f2(x);
48+
49+
// Theoretically, result1 and result2 should be the same
50+
double difference = result1 - result2;
51+
// Multiply by a large number to amplify the error
52+
double final_result = 100000000.0 * difference + 0.0001;
53+
54+
// Print the results
55+
printf("f1(%e) = %.10f\n", x, result1);
56+
printf("f2(%e) = %.10f\n", x, result2);
57+
printf("Difference (f1 - f2) = %.10e\n", difference);
58+
printf("Final result after magnification: %.10f\n", final_result);
59+
60+
return 0;
61+
}
62+
```
63+
64+
Compile on both computers, using the C++ flag, `-frounding-math`.
65+
66+
You should use this flat when your program dynamically changes the floating-point rounding mode or needs to run correctly under different rounding modes. In this example, it results in a predictable rounding mode on function `f1` across x86 and Arm.
67+
68+
```bash
69+
g++ -o error-propagation-min error-propagation-min.cpp -frounding-math
70+
```
71+
72+
Running the new binary on both systems leads to function, `f1` having a similar value to `f2`. Further the difference is now identical across both Arm64 and x86.
73+
74+
Here is the output on both systems:
75+
76+
```output
77+
./error-propagation-min
78+
f1(1.000000e-08) = 0.0000000050
79+
f2(1.000000e-08) = 0.0000000050
80+
Difference (f1 - f2) = -1.7887354748e-17
81+
Final result after magnification: 0.0000999982
82+
```
83+
84+
G++ provides several compiler flags to help balance accuracy and performance such as`-ffp-contract` which is useful when lossy, fused operations are used, such as fused-multiple.
85+
86+
Another example is `-ffloat-store` which prevents floating point variables from being stored in registers which can have different levels of precision and rounding.
87+
88+
You can refer to compiler documentation for more information about the available flags.
89+
22.1 KB
Loading

content/learning-paths/cross-platform/integer-vs-floats/_index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Learn about Integer and floating-point conversions
2+
title: Learn about integer and floating-point conversions
33

44
minutes_to_complete: 30
55

0 commit comments

Comments
 (0)