-
Notifications
You must be signed in to change notification settings - Fork 107
Open
Labels
featureRelease label indicating a new feature or requestRelease label indicating a new feature or request
Description
Discussed in #5725
Originally posted by paultiq December 14, 2025
Problem
A few facts conspire to be a problem for Vortex-DuckDB integration:
- PyArrow does not implement string_view-string_view (in)equality.
- When Vortex converts a PyArrow expression into a Vortex expression, it goes through substrait. Vortex provides the true schema (which has string_view fields) to pyarrow.compute.Expression.to_substrait.
- pyarrow.compute.Expression.to_substrait will only produce a substrait expression if the PyArrow expression has an implementation in PyArrow.
Due to 1 and 3, PyArrow errors on equality involving a string_view column. I think @paultiq correctly identifies three options ("modify to_substrait", "implement string_view (in)equality in PyArrow", and "deceit"). There's another option which is to convert from Arrow expression directly to Vortex expression. I do not know how hard that is.
I think deceit is our best option here. We ignore the schema when we parse a Substrait expression (our column expressions are untyped, see substrait.py:94 in field_reference).
Metadata
Metadata
Assignees
Labels
featureRelease label indicating a new feature or requestRelease label indicating a new feature or request