You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: add into_arrow to IntoCanonicalVTable (#1604)
Historically, we've gated the ability to go from Vortex -> Arrow arrays
behind the `Canonical` type, which picks one "blessed" Arrow encoding
for each of our DTypes.
Since the introduction of VarBinView in #1082, we are in a position
where there are now 2 Vortex string encodings that can each be directly
converted to Arrow.
What's more, FSSTArray internally uses a `VarBin` array to encode the
FSST-compressed strings. It delegates in its CompareFn implementation to
running a comparison against the values, which are `VarBin` that will
use the default `compare` codepath which does
`into_canonical()?.into_arrow()?` and then uses the Arrow codec.
This is slow now, because VarBin.into_canonical() will iterate over all
the strings to build a canonical `VarBinView`. This requires a full
decompress which makes the pushdown pointless.
This PR augments the existing `IntoCanonicalVTable` allowing encodings
to implement their own `into_arrow()` method. The default continues to
call `into_canonical().into_arrow()`, but we implement a fast version
for VarBin.
0 commit comments