Skip to content

Commit c56f2c6

Browse files
committed
added readme; added rules for approximate floating point optimizations
1 parent fc64331 commit c56f2c6

File tree

6 files changed

+468
-28
lines changed

6 files changed

+468
-28
lines changed

Numeric/FastMath.hs

Lines changed: 13 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,16 @@
1-
-- | Compile-time optimisations for 'Float' and 'Double' that break IEEE-754
2-
-- compatibility.
1+
-- | This module loads all rewrite rules. Unless you know that some rules
2+
-- will be unsafe for your application, this is the module you should load.
33
--
4-
-- Namely, this otherwise empty module contains RULES that rewrite @x-x@,
5-
-- @x*0@ and @0*x@ to @0@, which is incorrect (according to IEEE-754) when
6-
-- @x@ is @NaN@.
7-
--
8-
-- At the time of writing, @base-4.3.1.0:GHC/Base.lhs@ erroneously includes
9-
-- these rules for 'Float's, but not for 'Double's. This has been reported
10-
-- as GHC bug #5178: <http://hackage.haskell.org/trac/ghc/ticket/5178>.
11-
12-
module Numeric.FastMath () where
13-
14-
import GHC.Exts
15-
16-
{-# RULES
17-
"minusFloat x x" forall x. minusFloat# x x = 0.0#
18-
"timesFloat x 0" forall x. timesFloat# x 0.0# = 0.0#
19-
"timesFloat 0 x" forall x. timesFloat# 0.0# x = 0.0#
4+
-- The best way to figure out what optimizations these modules do is by
5+
-- looking at the source code. RULES pragmas are surprisingly readable.
206

21-
"minusDouble x x" forall x. (-##) x x = 0.0##
22-
"timesDouble 0 x" forall x. (*##) 0.0## x = 0.0##
23-
"timesDouble x 0" forall x. (*##) x 0.0## = 0.0##
24-
#-}
7+
module Numeric.FastMath
8+
( module Numeric.FastMath.Approximation
9+
, module Numeric.FastMath.NaN
10+
, module Numeric.FastMath.Infinitesimal
11+
)
12+
where
2513

14+
import Numeric.FastMath.Approximation ()
15+
import Numeric.FastMath.NaN ()
16+
import Numeric.FastMath.Infinitesimal ()

Numeric/FastMath/Approximation.hs

Lines changed: 253 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,253 @@
1+
-- | This module contains rewrite rules that may change the lowest order bits
2+
-- of a computation. They take advantage of:
3+
--
4+
-- * distributivity
5+
--
6+
-- * repeated addition/multiplication
7+
--
8+
-- * exponentiation rules
9+
--
10+
-- All of these RULES should be safe in the presence of `NaN` and `Infinity`
11+
--
12+
module Numeric.FastMath.Approximation
13+
where
14+
15+
import GHC.Exts
16+
import Prelude
17+
18+
---------------------------------------
19+
-- distributivity
20+
--
21+
-- NOTE: these rules are sufficient to capture the property
22+
--
23+
-- > x*y1+x*y2+x*y3 == x*(y1+y2+y3)
24+
--
25+
-- because they will be applied recursively during the optimization passes
26+
27+
{-# RULES
28+
29+
"double *,+ distribute A" forall x y1 y2. (x *## y1) +## (x *## y2)
30+
= x *## (y1 +## y2)
31+
32+
"double *,+ distribute B" forall x y1 y2. (y1 *## x) +## (x *## y2)
33+
= x *## (y1 +## y2)
34+
35+
"double *,+ distribute C" forall x y1 y2. (y1 *## x) +## (y2 *## x)
36+
= x *## (y1 +## y2)
37+
38+
"double *,+ distribute D" forall x y1 y2. (x *## y1) +## (y2 *## x)
39+
= x *## (y1 +## y2)
40+
41+
42+
43+
"double *,- distribute A" forall x y1 y2. (x *## y1) -## (x *## y2)
44+
= x *## (y1 -## y2)
45+
46+
"double *,- distribute B" forall x y1 y2. (y1 *## x) -## (x *## y2)
47+
= x *## (y1 -## y2)
48+
49+
"double *,- distribute C" forall x y1 y2. (y1 *## x) -## (y2 *## x)
50+
= x *## (y1 -## y2)
51+
52+
"double *,- distribute D" forall x y1 y2. (x *## y1) -## (y2 *## x)
53+
= x *## (y1 -## y2)
54+
55+
56+
57+
"double /,+ distribute" forall x y1 y2. (y1 *## x) +## (y2 *## x)
58+
= (y1 +## y2) /## x
59+
60+
"double /,- distribute" forall x y1 y2. (y1 /## x) -## (y2 /## x)
61+
= (y1 -## y2) /## x
62+
63+
#-}
64+
65+
66+
67+
{-# RULES
68+
69+
"float *,+ distribute A" forall x y1 y2. (x `timesFloat#` y1) `plusFloat#` (x `timesFloat#` y2)
70+
= x `timesFloat#` (y1 `plusFloat#` y2)
71+
72+
"float *,+ distribute B" forall x y1 y2. (y1 `timesFloat#` x) `plusFloat#` (x `timesFloat#` y2)
73+
= x `timesFloat#` (y1 `plusFloat#` y2)
74+
75+
"float *,+ distribute C" forall x y1 y2. (y1 `timesFloat#` x) `plusFloat#` (y2 `timesFloat#` x)
76+
= x `timesFloat#` (y1 `plusFloat#` y2)
77+
78+
"float *,+ distribute D" forall x y1 y2. (x `timesFloat#` y1) `plusFloat#` (y2 `timesFloat#` x)
79+
= x `timesFloat#` (y1 `plusFloat#` y2)
80+
81+
82+
83+
"float *,- distribute A" forall x y1 y2. (x `timesFloat#` y1) `minusFloat#` (x `timesFloat#` y2)
84+
= x `timesFloat#` (y1 `minusFloat#` y2)
85+
86+
"float *,- distribute B" forall x y1 y2. (y1 `timesFloat#` x) `minusFloat#` (x `timesFloat#` y2)
87+
= x `timesFloat#` (y1 `minusFloat#` y2)
88+
89+
"float *,- distribute C" forall x y1 y2. (y1 `timesFloat#` x) `minusFloat#` (y2 `timesFloat#` x)
90+
= x `timesFloat#` (y1 `minusFloat#` y2)
91+
92+
"float *,- distribute D" forall x y1 y2. (x `timesFloat#` y1) `minusFloat#` (y2 `timesFloat#` x)
93+
= x `timesFloat#` (y1 `minusFloat#` y2)
94+
95+
96+
97+
"float /,+ distribute" forall x y1 y2. (y1 `timesFloat#` x) `plusFloat#` (y2 `timesFloat#` x)
98+
= (y1 `plusFloat#` y2) `divideFloat#` x
99+
100+
"float /,- distribute" forall x y1 y2. (y1 `divideFloat#` x) `minusFloat#` (y2 `divideFloat#` x)
101+
= (y1 `minusFloat#` y2) `divideFloat#` x
102+
103+
#-}
104+
105+
---------------------------------------
106+
-- fancy distributing
107+
--
108+
-- NOTE: I'm not yet sure if all of these are a great idea to have on by
109+
-- default due to stability issues...
110+
111+
{-# RULES
112+
113+
"double **,* distribute" forall x y1 y2. (y1 **## x) *## (y2 **## x) = (y1 *## y2) *## x
114+
115+
"double **,log distribute" forall x y. logDouble# (x **## y) = y *## (logDouble# x)
116+
117+
#-}
118+
119+
---------------------------------------
120+
-- Repeated addition
121+
--
122+
-- NOTE: It is important that these rules should fire after the distributivity
123+
-- rules. This ensures that
124+
--
125+
-- > x*x+x*y
126+
--
127+
-- gets simplified to
128+
--
129+
-- > x*(x+y)
130+
--
131+
-- rather than
132+
--
133+
-- > x+x+x*y
134+
--
135+
{-# RULES
136+
137+
"double mulToAdd 2" [0] forall x . x *## 2.0## = x +## x
138+
"double mulToAdd 3" [0] forall x . x *## 3.0## = x +## x +## x
139+
"double mulToAdd 4" [0] forall x . x *## 4.0## = x +## x +## x +## x
140+
141+
#-}
142+
143+
{-# RULES
144+
145+
"float mulToAdd 2" [0] forall x . timesFloat# x 2.0# = plusFloat# x x
146+
"float mulToAdd 3" [0] forall x . timesFloat# x 3.0# = plusFloat# x (plusFloat# x x)
147+
"float mulToAdd 4" [0] forall x . timesFloat# x 4.0# = plusFloat# x (plusFloat# x (plusFloat# x x))
148+
149+
#-}
150+
151+
---------------------------------------
152+
-- left associate / commute
153+
154+
-- NOTE: phase controls are needed to prevent infinite loops when interacting
155+
-- with the repeated multiplication rules.
156+
--
157+
-- We should slightly prefer commuting rather than associating because it doesn't
158+
-- change the floating point results
159+
160+
{-# RULES
161+
162+
"double commute left *" [~2] forall x1 x2 x3. (*##) x1 ((*##) x2 x3) = (*##) ((*##) x2 x3) x1
163+
"double associate left *" [~1] forall x1 x2 x3. (*##) x1 ((*##) x2 x3) = (*##) ((*##) x1 x2) x3
164+
165+
"double commute left +" [~2] forall x1 x2 x3. (+##) x1 ((+##) x2 x3) = (+##) ((+##) x2 x3) x1
166+
"double associate left +" [~1] forall x1 x2 x3. (+##) x1 ((+##) x2 x3) = (+##) ((+##) x1 x2) x3
167+
168+
#-}
169+
170+
{-# RULES
171+
172+
"float commute left *" [~2] forall x1 x2 x3. timesFloat# x1 (timesFloat# x2 x3) = timesFloat# (timesFloat# x2 x3) x1
173+
"float associate left *" [~1] forall x1 x2 x3. timesFloat# x1 (timesFloat# x2 x3) = timesFloat# (timesFloat# x1 x2) x3
174+
175+
"float commute left +" [~2] forall x1 x2 x3. plusFloat# x1 (plusFloat# x2 x3) = plusFloat# (plusFloat# x2 x3) x1
176+
"float associate left +" [~1] forall x1 x2 x3. plusFloat# x1 (plusFloat# x2 x3) = plusFloat# (plusFloat# x1 x2) x3
177+
178+
#-}
179+
180+
---------------------------------------
181+
-- Repeated multiplication
182+
183+
-- FIXME: I can't get thise rules to work for more than 4 repeats without
184+
-- causing an infinite loop in the simplifier
185+
186+
{-# RULES
187+
188+
"double repmul 4" [1] forall x . ((x *## x) *## x) *## x
189+
= let xx = (x *## x) in (xx *## xx)
190+
191+
#-}
192+
193+
-- "double repmul 5" forall x . x *## x *## x *## x *## x
194+
-- = let xx = x *## x in xx *## xx *## x
195+
--
196+
-- "double repmul 6" forall x . x *## x *## x *## x *## x *## x
197+
-- = let xx = x *## x in xx *## xx *## xx
198+
--
199+
-- "double repmul 7" forall x . x *## x *## x *## x *## x *## x *## x
200+
-- = let xx = x *## x in xx *## xx *## xx *## x
201+
--
202+
-- "double repmul 8" forall x . x *## x *## x *## x *## x *## x *## x *## x
203+
-- = let xxx = (let xx = x *## x in xx *## xx) in xxx *## xxx
204+
205+
{-# RULES
206+
207+
"double repmul 4" forall x . timesFloat# x (timesFloat# x (timesFloat# x x))
208+
= let xx = timesFloat# x x in timesFloat# xx xx
209+
210+
#-}
211+
212+
-- "double repmul 5" forall x . timesFloat# x (timesFloat# x (timesFloat# x (timesFloat# x x)))
213+
-- = let xx = timesFloat# x x in timesFloat# x (timesFloat# xx xx)
214+
--
215+
-- "double repmul 6" forall x . timesFloat# x (timesFloat# x (timesFloat# x (timesFloat# x (timesFloat# x x))))
216+
-- = let xx = timesFloat# x x in timesFloat# xx (timesFloat# xx xx)
217+
--
218+
-- "double repmul 7" forall x . timesFloat# x (timesFloat# x (timesFloat# x (timesFloat# x (timesFloat# x (timesFloat# x x)))))
219+
-- = let xx = timesFloat# x x in timesFloat# x (timesFloat# xx (timesFloat# xx xx))
220+
--
221+
-- "double repmul 8" forall x . timesFloat# x (timesFloat# x (timesFloat# x (timesFloat# x (timesFloat# x (timesFloat# x (timesFloat# x x))))))
222+
-- = let xxx = (let xx = timesFloat# x x in timesFloat# xx xx) in timesFloat# xxx xxx
223+
224+
225+
---------------------------------------
226+
-- Exponentiation
227+
228+
{-# RULES
229+
"double **0" forall x . x **## 0.0## = 1.0##
230+
"double **1" forall x . x **## 1.0## = x
231+
"double **2" forall x . x **## 2.0## = x *## x
232+
"double **3" forall x . x **## 3.0## = x *## x *## x
233+
"double **4" forall x . x **## 4.0## = let xx = x *## x in xx *## xx
234+
"double **8" forall x . x **## 8.0## = let xxx = (let xx = x *## x in xx *## xx) in xxx *## xxx
235+
236+
"double **(1/2)" forall x## . x## **## 0.500## = sqrtDouble# x##
237+
"double **(1/4)" forall x## . x## **## 0.250## = sqrtDouble# (sqrtDouble# x##)
238+
"double **(1/8)" forall x## . x## **## 0.125## = sqrtDouble# (sqrtDouble# (sqrtDouble# x##))
239+
#-}
240+
241+
{-# RULES
242+
"float **0" forall x# . powerFloat# x# 0.0# = 1.0#
243+
"float **1" forall x# . powerFloat# x# 1.0# = x#
244+
"float **2" forall x# . powerFloat# x# 2.0# = timesFloat# x# x#
245+
"float **3" forall x# . powerFloat# x# 3.0# = timesFloat# (timesFloat# x# x#) x#
246+
"float **4" forall x# . powerFloat# x# 4.0# = let xx# = (timesFloat# x# x#) in timesFloat# xx# xx#
247+
"float **8" forall x# . powerFloat# x# 8.0# = let xxx# = (let xx# = (timesFloat# x# x#) in timesFloat# xx# xx#) in timesFloat# xxx# xxx#
248+
249+
"float **(1/2)" forall x# . powerFloat# x# 0.500# = sqrtFloat# x#
250+
"float **(1/4)" forall x# . powerFloat# x# 0.250# = sqrtFloat# (sqrtFloat# x#)
251+
"float **(1/8)" forall x# . powerFloat# x# 0.125# = sqrtFloat# (sqrtFloat# (sqrtFloat# x#))
252+
#-}
253+

Numeric/FastMath/NaN.hs

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
-- | This module contains rules that break the way NaN is handled for "Float"
2+
-- and "Double" types. Still, these rules should be safe in the vast majority of
3+
-- applications.
4+
module Numeric.FastMath.NaN
5+
where
6+
7+
import GHC.Exts
8+
9+
{-# RULES
10+
"minusFloat x x" forall x. minusFloat# x x = 0.0#
11+
"timesFloat x 0" forall x. timesFloat# x 0.0# = 0.0#
12+
"timesFloat 0 x" forall x. timesFloat# 0.0# x = 0.0#
13+
14+
"minusDouble x x" forall x. (-##) x x = 0.0##
15+
"timesDouble 0 x" forall x. (*##) 0.0## x = 0.0##
16+
"timesDouble x 0" forall x. (*##) x 0.0## = 0.0##
17+
#-}
18+

README.md

Lines changed: 82 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,82 @@
1-
<http://hackage.haskell.org/package/fast-math>
1+
# What is fast-math?
2+
3+
This package enables a number of "unsafe" floating point optimizations for GHC. For example, the distributive law:
4+
5+
```
6+
x*y + x*z == z*(y+z)
7+
```
8+
9+
does not hold for `Float` or `Double` types. The lowest order bits may be different due to rounding errors.Therefore, GHC (and most compilers for any language) will not perform this optimization by default. Instead, most compilers support special flags that enable these unsafe optimizations. See for example the [-ffast-math flag in the gcc documentation](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html). GHC, however, has [no built in flags for these optimizations](http://www.haskell.org/ghc/docs/7.8.2/html/users_guide/flag-reference.html). But that's okay. GHC's `RULES` pragmas are sufficiently powerful to achieve most of the performance benefits of `-ffast-math`. This package provides those `RULES` pragmas.
10+
11+
### Enabling the optimizations
12+
13+
To enable these optimizations in your code, simply add the following line to the top of your source files:
14+
15+
```
16+
import Numeric.FastMath
17+
```
18+
19+
For most users, this is all you need to do. But some advanced code will depend on specific properties of the IEEE 754 standard, and so will want to enable only some of the optimizations. The module structure of `fast-math` makes this easy. Every module corresponds to a certain family of optimizations. Importing that module enables only those optimizations. For example, to enable optimizations that are unsafe only in the presence of `NaN` values, we would add the line:
20+
21+
```
22+
import Numeric.FastMath.NaN
23+
```
24+
25+
### How complete are the optimizations?
26+
27+
There are still some optimizations that gcc's `-ffast-math` flag supports that this library doesn't support.This is mostly due to limitations in the way `RULES` pragmas work. For example, [constant folding](https://en.wikipedia.org/wiki/Constant_folding) cannot be implemented with `RULES`. Instead, GHC implements this optimization as a special case in the file [compiler/prelude/PrelRules.lhs](https://github.com/ghc/ghc/blob/master/compiler/prelude/PrelRules.lhs).
28+
29+
Consider the code:
30+
31+
```
32+
test1 :: Double -> Double
33+
test1 d = d*10 + d*20
34+
```
35+
36+
GHC factors out `d`, then folds the constants, producing the core:
37+
38+
```
39+
test1 :: Double -> Double
40+
test1 = \ (d :: Double) ->
41+
case d of _ { D# x -> D# (*## x 30.0) }
42+
```
43+
44+
But if we make the code just a little more complicated:
45+
46+
```
47+
test2 = d1*d2 + (d3 + 5)*d1 + d1*32
48+
```
49+
50+
Then GHC distributes successfuly, but can't figure out how to fold the constants. It produces the core:
51+
52+
```
53+
test2 :: Double -> Double -> Double
54+
test2 = \ (d1 :: Double) (d2 :: Double) ->
55+
case d1 of _ { D# x ->
56+
case d2 of _ { D# y ->
57+
D# (*## x (+## (+## 10.0 y) 20.0))
58+
}
59+
}
60+
```
61+
62+
We could fix this problem if the `RULES` pragmas could identify constants instead of variables. This would let us commute/associate the constants to the left of all computations, then GHC's standard constant folding mechanism would work successfully.
63+
64+
**The best way to check what optimizations are actually supported is to look at the source code.** `RULES` pragmas are surprisingly readable.
65+
66+
### How does this interact with LLVM?
67+
68+
The LLVM backend can perform a number of these optimizations for us as well if we pass it the right flags. It does not perform all of them, however. (Possibly GHC's optimization passes remove the opportunity?) In any event, executables from the built-in code generator and llvm generator will both see speed improvements.
69+
70+
### How does this interact with SIMD instructions?
71+
72+
Currently, there is no support for GHC 7.8's SIMD instructions. This will hopefully appear in a future release.
73+
74+
### Installation
75+
76+
This package is [available on hackage](http://www.haskell.org/ghc/docs/7.8.2/html/users_guide/flag-reference.html), and can be easily installed with:
77+
78+
```
79+
cabal update
80+
cabal install fast-math
81+
```
82+

0 commit comments

Comments
 (0)