|
| 1 | +# Work-in-progress: Lazy Evaluation in Vortex |
| 2 | + |
| 3 | +This guide intends to provide an overview of the in-flight and upcoming changes to Vortex to enable |
| 4 | +fully lazy evaluation of Vortex arrays. |
| 5 | + |
| 6 | +Hopefully this document helps users and contributors understand the design decisions and plan around |
| 7 | +the upcoming breaking API changes required to implement this feature. |
| 8 | + |
| 9 | +The motivation for this work comes in many parts, including: |
| 10 | + |
| 11 | +* Support for alternate execution models such as GPU, pipelined CPU, or JIT-compiled CPU. |
| 12 | +* Improved scan performance with common-subtree elimination. |
| 13 | +* Improved visibility into the optimizations that Vortex applies by making the computation graph explicit. |
| 14 | +* Easier to benchmark and improvement performance of individual compute functions by isolating them from |
| 15 | + lazy decompression logic. |
| 16 | +* Easier to extend Vortex with new compute functions, such as geo-spatial functionality. |
| 17 | +* Simpler to implement custom arrays and layouts by reducing the API surface area. |
| 18 | +* Enabling more advanced statistics and pruning such as using bloom filters and free-text indexes. |
| 19 | + |
| 20 | +## Summary of Changes |
| 21 | + |
| 22 | +* Define `vortex-vector` as a fully decompressed in-memory format used for CPU computation. |
| 23 | +* Vortex `Array` to represent a logical decompression plan. |
| 24 | +* Introduce `ScalarFn` to define semantics and implementation of scalar compute over Vortex vectors. |
| 25 | +* Make `Expression` a non-pluggable closed enum. Plugins will implement `ScalarFn` instead. |
| 26 | + * Note this avoids the current situation we're in where all arrays need to know about all compute functions. |
| 27 | +* Introduce `ScalarFnArray` to represent lazy application of a `ScalarFn` over one or more Vortex arrays. |
| 28 | + * Existing compute function dispatch is re-implemented as Array optimization rules. |
| 29 | +* Redesign the `Layout` API to use simpler optimization rules instead of complex expression partitioning. |
| 30 | +* Implement statistics falsification as optimizer rules over expressions. |
| 31 | + * e.g. `falsify(a > 10)` becomes `stat.max(a) <= 10`. |
| 32 | + * This also enables custom falsification expressions such as bloom filter checks. |
0 commit comments