Add geoarrow.wkb Arrow extension metadata for geometry columns

## Summary

When Databricks tables contain geometry columns (e.g., `geometry(4326)`), the ADBC driver returns them as plain `binary` Arrow arrays without any extension metadata. This means downstream consumers like DuckDB cannot automatically recognize them as geometry.

**Proposal**: Tag geometry columns with `geoarrow.wkb` Arrow extension metadata so they flow as native geometry types through the Arrow ecosystem.

## Background

[GeoArrow](https://geoarrow.org/extension-types) defines standard Arrow extension types for geospatial data. The `geoarrow.wkb` extension type wraps WKB-encoded geometry in a `binary` Arrow array with two metadata fields:

- `ARROW:extension:name` = `"geoarrow.wkb"`
- `ARROW:extension:metadata` = JSON with CRS info (e.g., `{"crs": "OGC:CRS84"}`)

DuckDB 1.5 has built-in GEOMETRY type support and can consume `geoarrow.wkb` Arrow arrays natively via `register_geoarrow_extensions()`. The DuckDB Snowflake extension already uses this pattern for geometry passthrough ([iqea-ai/duckdb-snowflake#24](https://github.com/iqea-ai/duckdb-snowflake/pull/24)).

## Current workaround

When using the Databricks ADBC driver with DuckDB's `adbc_scanner`, geometry requires explicit WKB conversion on both sides:

```sql
-- Databricks side: explicitly convert to WKB
SELECT *, ST_AsBinary(geom) as geom_wkb FROM my_table

-- DuckDB side: explicitly convert back from WKB
SELECT ST_GeomFromWKB(geom_wkb) as geom FROM adbc_scan(...)
```

## Proposed behavior

If the driver tagged geometry columns with `geoarrow.wkb` metadata:

```sql
-- Just works — geometry flows as native type
SELECT * FROM adbc_scan(...)
```

## Implementation sketch

In `ipc_reader_adapter.go`, after obtaining the Arrow schema from the IPC stream:

1. Query Databricks column metadata (from `INFORMATION_SCHEMA.COLUMNS` or the Thrift response) to identify which columns have `DATA_TYPE` matching `geometry(...)` 
2. For those columns, modify the Arrow schema field to include:
   - `ARROW:extension:name` = `"geoarrow.wkb"`
   - `ARROW:extension:metadata` = `{"crs": {"type": "authority_code", "value": "OGC:CRS84"}}` (or extract SRID from the type definition)
3. The underlying binary data is already WKB, so no data transformation is needed — just metadata annotation

## Use case

We're benchmarking geospatial data transfer between DuckDB and cloud warehouses ([duckdb-warehouse-transfer](https://github.com/jatorre/duckdb-warehouse-transfer)). The Databricks ADBC export via `adbc_scanner` already achieves ~24,000 rows/sec (2x faster than the `@databricks/sql` JSON connector). Adding `geoarrow.wkb` metadata would eliminate the WKB conversion overhead and enable native geometry passthrough.

## References

- [GeoArrow extension types spec](https://geoarrow.org/extension-types)
- [DuckDB spatial GeoArrow support (PR #485)](https://github.com/duckdb/duckdb-spatial/pull/485)
- [DuckDB Arrow extension registration (PR #15285)](https://github.com/duckdb/duckdb/pull/15285)
- [DuckDB Snowflake extension geometry support](https://github.com/iqea-ai/duckdb-snowflake/pull/24)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add geoarrow.wkb Arrow extension metadata for geometry columns #339

Summary

Background

Current workaround

Proposed behavior

Implementation sketch

Use case

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add geoarrow.wkb Arrow extension metadata for geometry columns #339

Description

Summary

Background

Current workaround

Proposed behavior

Implementation sketch

Use case

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions