-
Notifications
You must be signed in to change notification settings - Fork 19
Description
The SoA backend, the branch is here: branch is moving towards supporting a wide variety of programs.
This backend currently factors out each field of a data constructor to its own region / separate buffer. This is contrary to the original Gibbon which uses one buffer for the complete data type. The SoA transformation is necessary for complex transformations like vectorization. It may also be important to use this representation for skipping over unused fields in a traversal. Since each field is in a separate buffer, this makes it easier to skip past unused fields. See this blog post I wrote that talks about detailed experiments I did with a list-based representation.
Current Status:
Gibbon1 backend is able to compile Gibbon SoA code. Currently moving towards supporting the Gibbon2 version which introduces redirection pointers, indirection pointers and shortcut pointers.
For gibbon1, all tests except the twitter benchmark compiles.
There are bugs which need to be fixed.
Layout bechmarks also need to be tested.
Near term TODO (Next 6 months)
-
A data type definition could contain other packed fields in the definition that are not self recursive.
For instance, consider a data type definition of a Tree that contains aList.data Tree = Node Int List Tree Tree | Leaf Int List. There are a few problems here, firstly,Listis not self recursive but it is still a packed type. Hence, there are two options for its representation, an AoS or an SoA. Since the implementation factors out all packed fields to SoA when we compile with the--SoAflag. It means that theListwill also be an SoA representation. Hence we need to make sure Inferlocations and other passes can handle this correctly. We also want to have the choice to compile theListas AoS vs SoA. This also needs to be supported. Update: We compile all packed types to SoA for now. -
The SoA optimization needs to be supported for the gibbon2 backend as well. This means we need support for indirection and redirection pointers that point to SoA regions.
-
Right now, the compiler provides a compile time flag
--SoAthat changes all packed types to SoA layouts. This is too restrictive. We may want some datatypes as AoS vs others as SoA. To have this choice, The programmer needs to have the flexibility to choose the representation. One way is to provide an Annotation in the top level haskell code, or have it in the type definition. The latter is nicer IMO. -
Performance optimization: The SoA backend for
ListwithO3is around 10 times slower than the AoS list. We should generate more efficient code. In particular, for the SoA backend, there is a big opportunity to remove dead code amongst others.
In addition, the cursor array is currently malloced in every recursive call. This is excessive and inefficient.
I need to add an additional temporary cursor per packed output type as a function argument to ensure that we don't waste time mallocing and making function calls.
Relatively long term TODO : (Next 1-3 years)
-
Since the Gibbon compiler cannot support data types where scalar fields are after Packed fields correctly. This also follows for the SoA backend.
Hence, a data type likedata Tree = Node Tree Tree Int | Leaf Intmay result in a compilation error. -
There is also a tradeoff between a fully factored representation vs a mix between AoS and SoA. We should provide the flexibility to choose something in between. We could also develop a heuristic to choose the best intermediate layout.
-
My Blog post showed that inplace updates can be important for more complex optimizations like vectorization and skipping over dead fields, we should add support for inplace updates.
-
The current functions are not tail recursive, we should have support for tail recursive functions. I was working on a branch a long time back but this needs to be thoroughly updated.
-
We also need the compiler to codegen for loops in the
Ccode for better vectorization, however, at the moment, there is no support for enabling for loops. Tail recursion is fine for simple data types like List but not for more complex data types. This requires new L2 IR and data flow analysis. My plan is to make a new PR where I introduce a new L2 Ir construct. Maybe a "map" like function? -
Right now we generate
C, but we may need a different backend for instanceC++to utilize optimizations / generate faster code.