Skip to content

Arrow Expressions on Vortex Datasets raise ArrowNotImplementedError on string_views #5759

@connortsui20

Description

@connortsui20

Discussed in #5725

Originally posted by paultiq December 14, 2025

Problem

A few facts conspire to be a problem for Vortex-DuckDB integration:

  1. PyArrow does not implement string_view-string_view (in)equality.
  2. When Vortex converts a PyArrow expression into a Vortex expression, it goes through substrait. Vortex provides the true schema (which has string_view fields) to pyarrow.compute.Expression.to_substrait.
  3. pyarrow.compute.Expression.to_substrait will only produce a substrait expression if the PyArrow expression has an implementation in PyArrow.

Due to 1 and 3, PyArrow errors on equality involving a string_view column. I think @paultiq correctly identifies three options ("modify to_substrait", "implement string_view (in)equality in PyArrow", and "deceit"). There's another option which is to convert from Arrow expression directly to Vortex expression. I do not know how hard that is.

I think deceit is our best option here. We ignore the schema when we parse a Substrait expression (our column expressions are untyped, see substrait.py:94 in field_reference).

Metadata

Metadata

Labels

featureRelease label indicating a new feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions