Skip to content

Performance of Generic-based instances.Β #200

@AndreasPK

Description

@AndreasPK

As perhaps obvious from my other recent tickets I've been looking at how Binary gets compiled as a matter of investigating GHC performance.

While looking at the resulting code I found that generic-based instances generally don't fully optimize away the overhead of generics.

In particular I've looked at slight variations of this code:

{-# LANGUAGE DeriveGeneric #-}

{-# OPTIONS_GHC #-}
module Main
  ( -- PathComponent(..)
   main
  ) where

import GHC.Generics
import Data.Typeable
import Data.Binary
import qualified Data.ByteString.Lazy as BL
import qualified Data.ByteString as BS

data PathTemplateVariable =

       Var0
     | Var1
     | Var2
     | Var3
     | Var4
     | Var5
     | Var6
     | Var7
     | Var8
     | Var9
  deriving (Generic,Enum)

instance Binary PathTemplateVariable

main :: IO ()
main = do
  let lists = replicate 5000000 Var0
      lbs = encode lists
  print $ BS.length $ BS.toStrict lbs

I found that for an expression like this (and having split the deriving into it's own module):

foo :: PathTemplateVariable -> BL.ByteString
foo x = encode  x

It results in this kind of core on ghcs master branch currently:

Main.foo
  = \ (x_a8bn :: PathTemplateVariable) ->
      B.toLazyByteString
        (case x_a8bn of {
           Var0 ->
             case Derive.$fBinaryPathTemplateVariable79 `cast` <Co:2> :: .. of
             { Data.Binary.Put.PairS ds_a8q5 b_a8q6 ->
             b_a8q6
             };
           Var1 ->
             case Derive.$fBinaryPathTemplateVariable75 `cast` <Co:2> :: .. of
             { Data.Binary.Put.PairS ds_a8q5 b_a8q6 ->
             b_a8q6
             };
        ....

Which looks fine assuming the code in the Derive module we call is just the "put" method for each constructor. But sadly instead these methods all end up calling the generic put method (but at least with a statically computed generic representation of the individual constructor).

-- RHS size: {terms: 6, types: 98, coercions: 100, joins: 0/0}
Derive.$fBinaryPathTemplateVariable79 :: Put
[GblId,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
         WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 60 0}]
Derive.$fBinaryPathTemplateVariable79
  = binary-0.8.9.0-inplace:Data.Binary.Generic.$w$cgput
      @((C1 ('MetaCons "Var0" 'PrefixI 'False) U1
         :+: C1 ('MetaCons "Var1" 'PrefixI 'False) U1)
        :+: (C1 ('MetaCons "Var2" 'PrefixI 'False) U1
             :+: (C1 ('MetaCons "Var3" 'PrefixI 'False) U1
                  :+: C1 ('MetaCons "Var4" 'PrefixI 'False) U1)))
      @((C1 ('MetaCons "Var5" 'PrefixI 'False) U1
         :+: C1 ('MetaCons "Var6" 'PrefixI 'False) U1)
        :+: (C1 ('MetaCons "Var7" 'PrefixI 'False) U1
             :+: (C1 ('MetaCons "Var8" 'PrefixI 'False) U1
                  :+: C1 ('MetaCons "Var9" 'PrefixI 'False) U1)))
      (Derive.$fBinaryPathTemplateVariable35 `cast` <Co:50> :: ..)
      (Derive.$fBinaryPathTemplateVariable26 `cast` <Co:50> :: ..)
      5##64
      5##64
      @ghc-prim:GHC.Types.Any
      Derive.$fBinaryPathTemplateVariable80 -- This is the generic representation of Var0

For runtime performance the issue here is that $w$cgput doesn't get inlined. And indeed for a regular function it's rather large so it not being inlined is not unexpected. But we could force it to inline trivially by adding INLINE pragmas on the methods in inData.Binary.Generic.


And indeed I tried this and for encoding the example data type above via it's generic instance allocations at runtime went down by around a third and runtime similarly improved significantly (but I didn't take exact measurements for runtime).

This isn't, sadly, enough for complete elimination of overhead. The resulting partial specialized in pseudo code looks something like:

put con =
  let put_con x =
      case x of
        L x' -> case x' of
          L x'' -> putWord 0
          R x'' -> putWord 1
        R x' -> case x' of
          L x'' -> ...
  in case x of
    Var0 -> put_con generic_var0_rep
    Var1 -> put_con generic_var1_rep
    ...

If put_con would inline it would cancel out with generic_var0_rep, same if it where to get specialized by SpecConstr but since it's non-recursive this doesn't happen either.

Not that nothing about this is allocating so this is a good win over the current behaviour. But there are a lot of conditional branches taken in order to compute the encoding and the overhead is also larger with larger types.

Maybe this too can be fixed with with some well place INLINE pragmas or inline applications.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions