added readme; added rules for approximate floating point optimizations

mikeizbicki · mikeizbicki · commit c56f2c618e08 · 2014-05-31T23:15:03.000-07:00
diff --git a/Numeric/FastMath.hs b/Numeric/FastMath.hs
@@ -1,25 +1,16 @@
--- | Compile-time optimisations for 'Float' and 'Double' that break IEEE-754
--- compatibility.
+-- | This module loads all rewrite rules.  Unless you know that some rules
+-- will be unsafe for your application, this is the module you should load.
 --
--- Namely, this otherwise empty module contains RULES that rewrite @x-x@,
--- @x*0@ and @0*x@ to @0@, which is incorrect (according to IEEE-754) when
--- @x@ is @NaN@.
---
--- At the time of writing, @base-4.3.1.0:GHC/Base.lhs@ erroneously includes
--- these rules for 'Float's, but not for 'Double's. This has been reported
--- as GHC bug #5178: <http://hackage.haskell.org/trac/ghc/ticket/5178>.
-
-module Numeric.FastMath () where
-
-import GHC.Exts
-
-{-# RULES
-"minusFloat x x"    forall x. minusFloat#  x    x       = 0.0#
-"timesFloat x 0"    forall x. timesFloat#  x    0.0#    = 0.0#
-"timesFloat 0 x"    forall x. timesFloat#  0.0# x       = 0.0#
+-- The best way to figure out what optimizations these modules do is by 
+-- looking at the source code.  RULES pragmas are surprisingly readable.
 
-"minusDouble x x"   forall x. (-##) x       x       = 0.0##
-"timesDouble 0 x"   forall x. (*##) 0.0##   x       = 0.0##
-"timesDouble x 0"   forall x. (*##) x       0.0##   = 0.0##
-  #-}
+module Numeric.FastMath
+    ( module Numeric.FastMath.Approximation
+    , module Numeric.FastMath.NaN
+    , module Numeric.FastMath.Infinitesimal
+    )
+    where
 
+import Numeric.FastMath.Approximation ()
+import Numeric.FastMath.NaN ()
+import Numeric.FastMath.Infinitesimal ()
diff --git a/Numeric/FastMath/Approximation.hs b/Numeric/FastMath/Approximation.hs
@@ -0,0 +1,253 @@
+-- | This module contains rewrite rules that may change the lowest order bits
+-- of a computation.  They take advantage of:
+--
+-- * distributivity
+--
+-- * repeated addition/multiplication
+--
+-- * exponentiation rules 
+--
+-- All of these RULES should be safe in the presence of `NaN` and `Infinity`
+--
+module Numeric.FastMath.Approximation
+    where
+
+import GHC.Exts
+import Prelude
+
+---------------------------------------
+-- distributivity
+--
+-- NOTE: these rules are sufficient to capture the property
+--
+-- > x*y1+x*y2+x*y3 == x*(y1+y2+y3)
+--
+-- because they will be applied recursively during the optimization passes
+
+{-# RULES
+
+"double *,+ distribute A" forall x y1 y2. (x *## y1) +## (x *## y2) 
+    = x *## (y1 +## y2)
+
+"double *,+ distribute B" forall x y1 y2. (y1 *## x) +## (x *## y2) 
+    = x *## (y1 +## y2)
+
+"double *,+ distribute C" forall x y1 y2. (y1 *## x) +## (y2 *## x) 
+    = x *## (y1 +## y2)
+
+"double *,+ distribute D" forall x y1 y2. (x *## y1) +## (y2 *## x) 
+    = x *## (y1 +## y2)
+
+
+
+"double *,- distribute A" forall x y1 y2. (x *## y1) -## (x *## y2) 
+    = x *## (y1 -## y2)
+
+"double *,- distribute B" forall x y1 y2. (y1 *## x) -## (x *## y2) 
+    = x *## (y1 -## y2)
+
+"double *,- distribute C" forall x y1 y2. (y1 *## x) -## (y2 *## x) 
+    = x *## (y1 -## y2)
+
+"double *,- distribute D" forall x y1 y2. (x *## y1) -## (y2 *## x) 
+    = x *## (y1 -## y2)
+
+
+
+"double /,+ distribute" forall x y1 y2. (y1 *## x) +## (y2 *## x) 
+    = (y1 +## y2) /## x
+
+"double /,- distribute" forall x y1 y2. (y1 /## x) -## (y2 /## x) 
+    = (y1 -## y2) /## x
+
+  #-}
+
+
+
+{-# RULES
+
+"float *,+ distribute A" forall x y1 y2. (x `timesFloat#` y1) `plusFloat#` (x `timesFloat#` y2) 
+    = x `timesFloat#` (y1 `plusFloat#` y2)
+
+"float *,+ distribute B" forall x y1 y2. (y1 `timesFloat#` x) `plusFloat#` (x `timesFloat#` y2) 
+    = x `timesFloat#` (y1 `plusFloat#` y2)
+
+"float *,+ distribute C" forall x y1 y2. (y1 `timesFloat#` x) `plusFloat#` (y2 `timesFloat#` x) 
+    = x `timesFloat#` (y1 `plusFloat#` y2)
+
+"float *,+ distribute D" forall x y1 y2. (x `timesFloat#` y1) `plusFloat#` (y2 `timesFloat#` x) 
+    = x `timesFloat#` (y1 `plusFloat#` y2)
+
+
+
+"float *,- distribute A" forall x y1 y2. (x `timesFloat#` y1) `minusFloat#` (x `timesFloat#` y2) 
+    = x `timesFloat#` (y1 `minusFloat#` y2)
+
+"float *,- distribute B" forall x y1 y2. (y1 `timesFloat#` x) `minusFloat#` (x `timesFloat#` y2) 
+    = x `timesFloat#` (y1 `minusFloat#` y2)
+
+"float *,- distribute C" forall x y1 y2. (y1 `timesFloat#` x) `minusFloat#` (y2 `timesFloat#` x) 
+    = x `timesFloat#` (y1 `minusFloat#` y2)
+
+"float *,- distribute D" forall x y1 y2. (x `timesFloat#` y1) `minusFloat#` (y2 `timesFloat#` x) 
+    = x `timesFloat#` (y1 `minusFloat#` y2)
+
+
+
+"float /,+ distribute" forall x y1 y2. (y1 `timesFloat#` x) `plusFloat#` (y2 `timesFloat#` x) 
+    = (y1 `plusFloat#` y2) `divideFloat#` x
+
+"float /,- distribute" forall x y1 y2. (y1 `divideFloat#` x) `minusFloat#` (y2 `divideFloat#` x) 
+    = (y1 `minusFloat#` y2) `divideFloat#` x
+
+  #-}
+
+---------------------------------------
+-- fancy distributing
+--
+-- NOTE: I'm not yet sure if all of these are a great idea to have on by 
+-- default due to stability issues...
+
+{-# RULES
+
+"double **,* distribute" forall x y1 y2. (y1 **## x) *## (y2 **## x) = (y1 *## y2) *## x
+
+"double **,log distribute" forall x y. logDouble# (x **## y) = y *## (logDouble# x)
+
+  #-}
+
+---------------------------------------
+-- Repeated addition
+--
+-- NOTE: It is important that these rules should fire after the distributivity
+-- rules.  This ensures that
+--
+-- > x*x+x*y
+--
+-- gets simplified to
+--
+-- > x*(x+y)
+--
+-- rather than 
+--
+-- > x+x+x*y
+--
+{-# RULES 
+
+"double mulToAdd 2" [0] forall x . x *## 2.0## = x +## x
+"double mulToAdd 3" [0] forall x . x *## 3.0## = x +## x +## x
+"double mulToAdd 4" [0] forall x . x *## 4.0## = x +## x +## x +## x
+
+  #-}
+
+{-# RULES
+
+"float mulToAdd 2" [0] forall x . timesFloat# x 2.0# = plusFloat# x x
+"float mulToAdd 3" [0] forall x . timesFloat# x 3.0# = plusFloat# x (plusFloat# x x)
+"float mulToAdd 4" [0] forall x . timesFloat# x 4.0# = plusFloat# x (plusFloat# x (plusFloat# x x))
+
+  #-}
+
+---------------------------------------
+-- left associate / commute
+
+-- NOTE: phase controls are needed to prevent infinite loops when interacting 
+-- with the repeated multiplication rules.
+--
+-- We should slightly prefer commuting rather than associating because it doesn't 
+-- change the floating point results
+
+{-# RULES
+
+"double commute left *"   [~2] forall x1 x2 x3. (*##) x1 ((*##) x2 x3) = (*##) ((*##) x2 x3) x1
+"double associate left *" [~1] forall x1 x2 x3. (*##) x1 ((*##) x2 x3) = (*##) ((*##) x1 x2) x3
+
+"double commute left +"   [~2] forall x1 x2 x3. (+##) x1 ((+##) x2 x3) = (+##) ((+##) x2 x3) x1
+"double associate left +" [~1] forall x1 x2 x3. (+##) x1 ((+##) x2 x3) = (+##) ((+##) x1 x2) x3
+
+  #-}
+
+{-# RULES
+
+"float commute left *"   [~2] forall x1 x2 x3. timesFloat# x1 (timesFloat# x2 x3) = timesFloat# (timesFloat# x2 x3) x1
+"float associate left *" [~1] forall x1 x2 x3. timesFloat# x1 (timesFloat# x2 x3) = timesFloat# (timesFloat# x1 x2) x3
+
+"float commute left +"   [~2] forall x1 x2 x3. plusFloat# x1 (plusFloat# x2 x3) = plusFloat# (plusFloat# x2 x3) x1
+"float associate left +" [~1] forall x1 x2 x3. plusFloat# x1 (plusFloat# x2 x3) = plusFloat# (plusFloat# x1 x2) x3
+
+  #-}
+
+---------------------------------------
+-- Repeated multiplication
+
+-- FIXME: I can't get thise rules to work for more than 4 repeats without
+-- causing an infinite loop in the simplifier
+
+{-# RULES
+
+"double repmul 4" [1] forall x . ((x *## x) *## x) *## x 
+    = let xx = (x *## x) in (xx *## xx)
+
+  #-}
+
+-- "double repmul 5" forall x . x *## x *## x *## x *## x 
+--     = let xx = x *## x in xx *## xx *## x
+-- 
+-- "double repmul 6" forall x . x *## x *## x *## x *## x *## x
+--     = let xx = x *## x in xx *## xx *## xx
+-- 
+-- "double repmul 7" forall x . x *## x *## x *## x *## x *## x *## x
+--     = let xx = x *## x in xx *## xx *## xx *## x
+-- 
+-- "double repmul 8" forall x . x *## x *## x *## x *## x *## x *## x *## x 
+--     = let xxx = (let xx = x *## x in xx *## xx) in xxx *## xxx
+
+{-# RULES
+
+"double repmul 4" forall x . timesFloat# x (timesFloat# x (timesFloat# x x))
+    = let xx = timesFloat# x x in timesFloat# xx xx
+
+  #-}
+
+-- "double repmul 5" forall x . timesFloat# x (timesFloat# x (timesFloat# x (timesFloat# x x)))
+--     = let xx = timesFloat# x x in timesFloat# x (timesFloat# xx xx)
+-- 
+-- "double repmul 6" forall x . timesFloat# x (timesFloat# x (timesFloat# x (timesFloat# x (timesFloat# x x))))
+--     = let xx = timesFloat# x x in timesFloat# xx (timesFloat# xx xx)
+-- 
+-- "double repmul 7" forall x . timesFloat# x (timesFloat# x (timesFloat# x (timesFloat# x (timesFloat# x (timesFloat# x x)))))
+--     = let xx = timesFloat# x x in timesFloat# x (timesFloat# xx (timesFloat# xx xx))
+-- 
+-- "double repmul 8" forall x . timesFloat# x (timesFloat# x (timesFloat# x (timesFloat# x (timesFloat# x (timesFloat# x (timesFloat# x x))))))
+--     = let xxx = (let xx = timesFloat# x x in timesFloat# xx xx) in timesFloat# xxx xxx
+
+
+---------------------------------------
+-- Exponentiation 
+
+{-# RULES 
+"double **0" forall x . x **## 0.0## = 1.0##
+"double **1" forall x . x **## 1.0## = x
+"double **2" forall x . x **## 2.0## = x *## x
+"double **3" forall x . x **## 3.0## = x *## x *## x
+"double **4" forall x . x **## 4.0## = let xx = x *## x in xx *## xx
+"double **8" forall x . x **## 8.0## = let xxx = (let xx = x *## x in xx *## xx) in xxx *## xxx
+
+"double **(1/2)" forall x## . x## **## 0.500## = sqrtDouble# x##
+"double **(1/4)" forall x## . x## **## 0.250## = sqrtDouble# (sqrtDouble# x##)
+"double **(1/8)" forall x## . x## **## 0.125## = sqrtDouble# (sqrtDouble# (sqrtDouble# x##))
+  #-}
+
+{-# RULES
+"float **0" forall x# . powerFloat# x# 0.0# = 1.0#
+"float **1" forall x# . powerFloat# x# 1.0# = x#
+"float **2" forall x# . powerFloat# x# 2.0# = timesFloat# x# x#
+"float **3" forall x# . powerFloat# x# 3.0# = timesFloat# (timesFloat# x# x#) x#
+"float **4" forall x# . powerFloat# x# 4.0# = let xx# = (timesFloat# x# x#) in timesFloat# xx# xx#
+"float **8" forall x# . powerFloat# x# 8.0# = let xxx# = (let xx# = (timesFloat# x# x#) in timesFloat# xx# xx#) in timesFloat# xxx# xxx#
+
+"float **(1/2)" forall x# . powerFloat# x# 0.500# = sqrtFloat# x#
+"float **(1/4)" forall x# . powerFloat# x# 0.250# = sqrtFloat# (sqrtFloat# x#)
+"float **(1/8)" forall x# . powerFloat# x# 0.125# = sqrtFloat# (sqrtFloat# (sqrtFloat# x#))
+  #-}
+
diff --git a/Numeric/FastMath/NaN.hs b/Numeric/FastMath/NaN.hs
@@ -0,0 +1,18 @@
+-- | This module contains rules that break the way NaN is handled for "Float" 
+-- and "Double" types.  Still, these rules should be safe in the vast majority of
+-- applications.
+module Numeric.FastMath.NaN
+    where
+
+import GHC.Exts
+
+{-# RULES
+"minusFloat x x"    forall x. minusFloat#  x    x       = 0.0#
+"timesFloat x 0"    forall x. timesFloat#  x    0.0#    = 0.0#
+"timesFloat 0 x"    forall x. timesFloat#  0.0# x       = 0.0#
+
+"minusDouble x x"   forall x. (-##) x       x       = 0.0##
+"timesDouble 0 x"   forall x. (*##) 0.0##   x       = 0.0##
+"timesDouble x 0"   forall x. (*##) x       0.0##   = 0.0##
+  #-}
+
diff --git a/README.md b/README.md
@@ -1 +1,82 @@
-<http://hackage.haskell.org/package/fast-math>
+# What is fast-math?
+
+This package enables a number of "unsafe" floating point optimizations for GHC.  For example, the distributive law:
+
+```
+x*y + x*z == z*(y+z)
+```
+
+does not hold for `Float` or `Double` types.  The lowest order bits may be different due to rounding errors.Therefore, GHC (and most compilers for any language) will not perform this optimization by default.   Instead, most compilers support special flags that enable these unsafe optimizations.  See for example the [-ffast-math flag in the gcc documentation](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html).  GHC, however, has [no built in flags for these optimizations](http://www.haskell.org/ghc/docs/7.8.2/html/users_guide/flag-reference.html).  But that's okay.  GHC's `RULES` pragmas are sufficiently powerful to achieve most of the performance benefits of `-ffast-math`. This package provides those `RULES` pragmas.  
+
+### Enabling the optimizations
+
+To enable these optimizations in your code, simply add the following line to the top of your source files:
+
+```
+import Numeric.FastMath
+```
+
+For most users, this is all you need to do.  But some advanced code will depend on specific properties of the IEEE 754 standard, and so will want to enable only some of the optimizations.  The module structure of `fast-math` makes this easy.  Every module corresponds to a certain family of optimizations.  Importing that module enables only those optimizations.  For example, to enable optimizations that are unsafe only in the presence of `NaN` values, we would add the line:
+
+```
+import Numeric.FastMath.NaN
+```
+
+### How complete are the optimizations?
+
+There are still some optimizations that gcc's `-ffast-math` flag supports that this library doesn't support.This is mostly due to limitations in the way `RULES` pragmas work.  For example, [constant folding](https://en.wikipedia.org/wiki/Constant_folding) cannot be implemented with `RULES`.  Instead, GHC implements this optimization as a special case in the file [compiler/prelude/PrelRules.lhs](https://github.com/ghc/ghc/blob/master/compiler/prelude/PrelRules.lhs).
+
+Consider the code:
+
+```
+test1 :: Double -> Double
+test1 d = d*10 + d*20
+```
+
+GHC factors out `d`, then folds the constants, producing the core:
+
+```
+test1 :: Double -> Double
+test1 = \ (d :: Double) ->
+    case d of _ { D# x -> D# (*## x 30.0) }
+```
+
+But if we make the code just a little more complicated:
+
+```
+test2 = d1*d2 + (d3 + 5)*d1 + d1*32
+```
+
+Then GHC distributes successfuly, but can't figure out how to fold the constants.  It produces the core:
+
+```
+test2 :: Double -> Double -> Double
+test2 = \ (d1 :: Double) (d2 :: Double) ->
+    case d1 of _ { D# x ->
+    case d2 of _ { D# y ->
+    D# (*## x (+## (+## 10.0 y) 20.0))
+    }
+    }
+```
+
+We could fix this problem if the `RULES` pragmas could identify constants instead of variables.  This would let us commute/associate the constants to the left of all computations, then GHC's standard constant folding mechanism would work successfully.
+
+**The best way to check what optimizations are actually supported is to look at the source code.**  `RULES` pragmas are surprisingly readable.
+
+### How does this interact with LLVM?
+
+The LLVM backend can perform a number of these optimizations for us as well if we pass it the right flags.  It does not perform all of them, however.  (Possibly GHC's optimization passes remove the opportunity?)  In any event, executables from the built-in code generator and llvm generator will both see speed improvements.
+
+### How does this interact with SIMD instructions?
+
+Currently, there is no support for GHC 7.8's SIMD instructions.  This will hopefully appear in a future release.
+
+### Installation
+
+This package is [available on hackage](http://www.haskell.org/ghc/docs/7.8.2/html/users_guide/flag-reference.html), and can be easily installed with:
+
+```
+cabal update
+cabal install fast-math
+```
+
diff --git a/fast-math.cabal b/fast-math.cabal
diff --git a/test/test.hs b/test/test.hs