Add support for dictionaries shared across multiple columns #5691
Replies: 1 comment
-
|
Hi @aditanase ! Thanks for creating this issue! Issue #2657 tracks (a portion of) our string wishlist. In the single column case, what you've described above is implemented as the DictLayout (see this folder in vortex-layout). The dictionary layout has two child layouts: values and codes. The values is the dictionary and the codes are indices therein. The codes can be (and, indeed, in the default btrblocks-style compressor are) stored as a ChunkedLayout which permits either streamed or partitioned reading of the codes separately from the values.
Yeah, this would be very cool! We are not currently working on that; though we're aware of the F3 paper [1]. The Vortex community is eager to welcome new open source contributors! I think the best way to get started is to propose a design. There's also now a Slack community you can join here. The DictLayout is probably the best place to start. A MultiColumnDictLayout should look similar. Maybe it's exactly a DictLayout where the codes are required to be a StructLayout? That might require some kind of MergeLayout to stitch together the non-multi-column-dict columns with the multi-column-dict columns. [1] For anyone else stumbling on this issue, the paper is: Zeng, et al., "F3: The Open-Source Data File Format for the Future" https://db.cs.cmu.edu/papers/2025/zeng-sigmod2025.pdf . |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
One interesting feature from the F3 paper is shared dictionaries, across a combination of columns.
The extreme version of this would be a single dictionary referenced by all the columns.
There is some related work in the C3 repo as well: https://github.com/cwida/C3
I am assuming this is something that the vortex layout could accomodate. Any pointers on how to approach this with the current extensibility layers?
Beta Was this translation helpful? Give feedback.
All reactions