Emulate Float64 #520
ggkountouras
started this conversation in
Ideas
Replies: 1 comment 3 replies
-
I think this would be useful to have, but ideally as part of a vendor-neutral package (a la DoubleFloats.jl -- maybe something exists already). |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
1) Make it work
Using the theory from SoftFloat (https://github.com/ucb-bar/berkeley-softfloat-3) and the partially finished libMetalFloat64 (https://github.com/philipturner/metal-float64), implement a proof-of-concept version. At this stage, it is okay to have low throughput compared to native
Float32
.2) Make it right
Implement rounding modes. Add atomics. Ensure IEEE-754 compliance with tests.
3) Make it fast
Add option to drop strict IEEE-754 compliance (remove denormals, don't check for
Inf
/NaN
). Add vectorization. Inline at a higher level. Implement Fused Multiply-Add.Beta Was this translation helpful? Give feedback.
All reactions