-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Summary
Databricks geometry columns arrive as EWKT strings (SRID=4326;POINT(55.4 25.2)) in Arrow string fields. The Arrow field metadata already labels them (Spark:DataType:SqlName: GEOMETRY(4326)), but the driver doesn't convert them to a standard geospatial Arrow format.
Request
Have the driver emit geoarrow.wkb Arrow extension metadata on geometry columns, converting EWKT→WKB in the IPC reader. This would allow consumers like DuckDB's adbc_scanner to map geometry to native GEOMETRY automatically — no ST_AsBinary() on the Databricks side or ST_GeomFromWKB() on the client side.
The Redshift ADBC driver already does this — its geometry columns arrive with ARROW:extension:name: geoarrow.wkb metadata, and DuckDB maps them to native GEOMETRY with zero conversion needed.
Proof of concept
I built a patch in ipc_reader_adapter.go that:
- Detects geometry columns via
Spark:DataType:SqlNamemetadata - Converts EWKT→WKB per row using
go-geom(WKT parse + WKB marshal) - Replaces String arrays with Binary arrays +
ARROW:extension:name: geoarrow.wkb
It works — DuckDB sees native GEOMETRY, GeoParquet output includes geo metadata with WKB encoding, bbox, geometry_types.
However, the per-row WKT parsing in Go is ~25% slower for points and much slower for complex polygons compared to just using ST_AsBinary() server-side. The ideal solution would be for the driver (or databricks-sql-go) to emit WKB directly from the server, avoiding WKT string serialization entirely.
Current workaround
-- Databricks side: explicit binary conversion
SELECT *, ST_AsBinary(geom) as geom_wkb FROM table
-- DuckDB side: explicit geometry conversion
ST_GeomFromWKB(geom_wkb) as geomDesired behavior
-- Just SELECT * — geometry arrives as native GEOMETRY via geoarrow.wkb
SELECT * FROM adbc_scan(conn, 'SELECT * FROM table')References
- GeoArrow extension types spec
- Redshift ADBC driver — already emits geoarrow.wkb
- DuckDB
adbc_scannerextension — maps geoarrow.wkb to native GEOMETRY