|
1 | 1 | # Numeric types |
2 | 2 |
|
3 | | -In Fable, we use F# numeric types, which are all translated to Python integers at the exception of `float`, `double`, and `decimal`. |
| 3 | +In Fable, we use F# numeric types, which are all translated to Python integers except for `float`, `double`, and |
| 4 | +`decimal`. |
4 | 5 |
|
5 | 6 | Fable numbers are very nearly compatible with .NET semantics, but translating into Python types has consequences: |
6 | 7 |
|
7 | | -* (non-standard) All floating point numbers are implemented as 64 bit (`double`). This makes `float32` numbers more accurate than expected. |
8 | | -* (non-standard) Arithmetic integers of 32 bits or less are implemented with different truncation from that expected, as whole numbers embedded within `double`. |
| 8 | +* (non-standard) All floating point numbers are implemented as 64 bit (`double`). This makes `float32` numbers more |
| 9 | + accurate than expected. |
| 10 | +* (non-standard) Arithmetic integers of 32 bits or less are implemented with different truncation from that expected, as |
| 11 | + whole numbers embedded within `double`. |
9 | 12 | * (OK) Conversions between types are correctly truncated. |
10 | 13 | * (OK) Bitwise operations for 64 bit and 32 bit integers are correct and truncated to the appropriate number of bits. |
11 | | -* (non-standard) Bitwise operations for 16 bit and 8 bit integers use the underlying JavaScript 32 bit bitwise semantics. Results are not truncated as expected, and shift operands are not masked to fit the data type. |
12 | | -* (OK) Longs have a custom implementation which is identical in semantics to .NET and truncates in 64 bits, although it is slower. |
| 14 | +* (non-standard) Bitwise operations for 16 bit and 8 bit integers use the underlying JavaScript 32 bit bitwise |
| 15 | + semantics. Results are not truncated as expected, and shift operands are not masked to fit the data type. |
| 16 | +* (OK) Longs have a custom implementation which is identical in semantics to .NET and truncates in 64 bits, although it |
| 17 | + is slower. |
13 | 18 |
|
14 | 19 | 32 bit integers thus differ from .NET in two ways: |
15 | 20 |
|
16 | | -* Underlying 52 bit precision, without expected truncation to 32 bits on overflow. Truncation can be forced if needed by `>>> 0`. |
17 | | -* On exceeding 52 bits absolute value floating point loses precision. So overflow will result in unexpected lower order 0 bits. |
| 21 | +* Underlying unlimited precision, without expected truncation to 32 bits on overflow. Truncation can be forced if needed |
| 22 | + by `>>> 0`. |
18 | 23 |
|
19 | 24 | The loss of precision can be seen in a single multiplication: |
20 | 25 |
|
21 | 26 | ```fsharp |
22 | 27 | ((1 <<< 28) + 1) * ((1 <<< 28) + 1) >>> 0 |
23 | 28 | ``` |
24 | 29 |
|
25 | | -The multiply product will have internal double representation rounded to `0x0100_0000_2000_0000`. When it is truncated to 32 bits by `>>> 0` the result will be `0x2000_0000` not the .NET exact lower order bits value of `0x2000_0001`. |
26 | | - |
27 | | -The same problem can be seen where repeated arithmetic operations make the internal (non-truncated) value large. For example a linear congruence random number generator: |
28 | | - |
29 | | -```fsharp |
30 | | -let rng (s:int32) = 10001*s + 12345 |
31 | | -``` |
32 | | - |
33 | | -The numbers generated by repeated application of this to its result will all be even after the 4th pseudo-random number, when `s` value exceeds 2^53: |
34 | | - |
35 | | -```fsharp |
36 | | -let rec randLst n s = |
37 | | - match n with |
38 | | - | 0 -> [s] |
39 | | - | n -> s :: randLst (n-1) (rng s) |
40 | | -
|
41 | | -List.iter (printfn "%x") (randLst 7 1) |
42 | | -``` |
43 | | - |
44 | | -The resulting printed list of pseudo-random numbers does not work in Fable: |
45 | | - |
46 | | -| Fable | .NET | |
47 | | -|-------:|------:| |
48 | | -|1|1| |
49 | | -|574a|574a |
50 | | -|d524223|d524223| |
51 | | -|6a89e98c|6a89e98c| |
52 | | -|15bd0684|15bd0685| |
53 | | -|3d8b8000|3d8be20e| |
54 | | -|50000000|65ba5527| |
55 | | -|0|2458c8d0| |
| 30 | +The multiply product will have internal double representation rounded to `0x0100_0000_2000_0000`. When it is truncated |
| 31 | +to 32 bits by `>>> 0` the result will be `0x2000_0000` not the .NET exact lower order bits value of `0x2000_0001`. |
56 | 32 |
|
57 | 33 | ## Workarounds |
58 | 34 |
|
59 | | -* When accurate low-order bit arithmetic is needed and overflow can result in numbers larger than 2^53 use `int64`, `uint64`, which use exact 64 bits, instead of `int32`, `uint32`. |
60 | | -* Alternately, truncate all arithmetic with `>>> 0` or `>>> 0u` as appropriate before numbers can get larger than 2^53: `let rng (s:int32) = 10001*s + 12345 >>> 0` |
| 35 | +* When accurate low-order bit arithmetic is needed and overflow can result in numbers larger than 2^53 use `int64`, |
| 36 | + `uint64`, which use exact 64 bits, instead of `int32`, `uint32`. |
| 37 | +* Alternately, truncate all arithmetic with `>>> 0` or `>>> 0u` as appropriate before numbers can get larger than 2^53: |
| 38 | + `let rng (s:int32) = 10001*s + 12345 >>> 0` |
61 | 39 |
|
62 | 40 | ## Printing |
63 | 41 |
|
64 | | -One small change from .NET in `printf`, `sprintf`, `ToString`. Negative signed integers are printed in hexadecimal format as sign + magnitude, in .NET they are printed as two's complement bit patterns. |
| 42 | +One small change from .NET in `printf`, `sprintf`, `ToString`. Negative signed integers are printed in hexadecimal |
| 43 | +format as sign + magnitude, in .NET they are printed as two's complement bit patterns. |
0 commit comments