Skip to content

Emit geoarrow.wkb Arrow metadata for geometry columns #340

@jatorre

Description

@jatorre

Summary

Databricks geometry columns arrive as EWKT strings (SRID=4326;POINT(55.4 25.2)) in Arrow string fields. The Arrow field metadata already labels them (Spark:DataType:SqlName: GEOMETRY(4326)), but the driver doesn't convert them to a standard geospatial Arrow format.

Request

Have the driver emit geoarrow.wkb Arrow extension metadata on geometry columns, converting EWKT→WKB in the IPC reader. This would allow consumers like DuckDB's adbc_scanner to map geometry to native GEOMETRY automatically — no ST_AsBinary() on the Databricks side or ST_GeomFromWKB() on the client side.

The Redshift ADBC driver already does this — its geometry columns arrive with ARROW:extension:name: geoarrow.wkb metadata, and DuckDB maps them to native GEOMETRY with zero conversion needed.

Proof of concept

I built a patch in ipc_reader_adapter.go that:

  1. Detects geometry columns via Spark:DataType:SqlName metadata
  2. Converts EWKT→WKB per row using go-geom (WKT parse + WKB marshal)
  3. Replaces String arrays with Binary arrays + ARROW:extension:name: geoarrow.wkb

It works — DuckDB sees native GEOMETRY, GeoParquet output includes geo metadata with WKB encoding, bbox, geometry_types.

However, the per-row WKT parsing in Go is ~25% slower for points and much slower for complex polygons compared to just using ST_AsBinary() server-side. The ideal solution would be for the driver (or databricks-sql-go) to emit WKB directly from the server, avoiding WKT string serialization entirely.

Current workaround

-- Databricks side: explicit binary conversion
SELECT *, ST_AsBinary(geom) as geom_wkb FROM table
-- DuckDB side: explicit geometry conversion
ST_GeomFromWKB(geom_wkb) as geom

Desired behavior

-- Just SELECT * — geometry arrives as native GEOMETRY via geoarrow.wkb
SELECT * FROM adbc_scan(conn, 'SELECT * FROM table')

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions