feat: GeoArrow-aware bulk ingest — WKB staging + CTAS for geometry columns

## Summary

When PR #247 (staging + COPY INTO bulk ingest) lands, geometry columns sent via `adbc_insert` with geoarrow.wkb Arrow metadata will need special handling. Databricks doesn't support direct ingestion of geometry types via COPY INTO — the data must arrive as BINARY (WKB) and be converted server-side with `ST_GeomFromWKB`.

This is the same pattern already implemented in the Snowflake ADBC driver ([adbc-drivers/snowflake#99](https://github.com/adbc-drivers/snowflake/pull/99)) and proposed for Redshift ([adbc-drivers/redshift#3](https://github.com/adbc-drivers/redshift/issues/3)).

## Proposed Solution

When the driver detects `geoarrow.wkb` or `geoarrow.wkt` in Arrow field extension metadata during ingest:

1. **Staging:** Create the column as `BINARY` (for WKB) or `STRING` (for WKT) in the staging table
2. **COPY INTO:** Load via Parquet → Volume → COPY INTO (PR #247's path — works fine for BINARY)
3. **CTAS:** Convert to geometry on Databricks:
   ```sql
   CREATE TABLE target AS
   SELECT *, ST_GeomFromWKB(geom_col) as geom
   FROM staging;
   ```
   Or for GEOGRAPHY: `ST_GeogFromWKB(geom_col)`
4. **Cleanup:** Drop the staging table

### Statement option

```
adbc.databricks.statement.ingest_geo_type = "geometry" (default) | "geography"
```

For GEOGRAPHY: `ST_GeogFromWKB(geom_col)` instead of `ST_GeomFromWKB`.

### SRID from CRS metadata

The geoarrow.wkb field may carry CRS metadata (PROJJSON or `EPSG:NNNN`). This connects with PR #350 (geoarrow.wkb export) which already handles CRS on the export side — the import side should mirror that.

## Prior Art

| Driver | Import PR | Pattern |
|--------|----------|---------|
| **Snowflake** | [#99](https://github.com/adbc-drivers/snowflake/pull/99) (open) | geoarrow.wkb → BINARY via Parquet → PUT → COPY INTO → CTAS TO_GEOGRAPHY |
| **Redshift** | [#3](https://github.com/adbc-drivers/redshift/issues/3) (proposed) | geoarrow.wkb → VARBYTE via Parquet → S3 → COPY → CTAS ST_GeomFromWKB |
| **Databricks** | this issue | geoarrow.wkb → BINARY via Parquet → Volume → COPY INTO → CTAS ST_GeomFromWKB |

All three drivers follow the same three-step pattern: staging as binary → bulk load → server-side conversion. The details differ only in the SQL dialect and staging mechanism.

## Current Workaround

Users must manually convert geometry to WKB before calling `adbc_insert`, then run CTAS on Databricks:

```python
# In DuckDB
CREATE TABLE _import AS SELECT *, ST_AsWKB(geom) as geom_wkb EXCLUDE(geom) FROM source;
# adbc_insert sends geom_wkb as BINARY (works with PR #247)
# Then on Databricks:
CREATE TABLE final AS SELECT *, ST_GeomFromWKB(geom_wkb) as geom FROM staging;
```

This is what our benchmark scripts do today and it works at ~15-23K rows/sec. Making it transparent in the driver would enable a unified `adbc_insert` API for geometry across all warehouses.

## Relationship to other PRs

- **#247** — Staging + COPY INTO bulk ingest (prerequisite — provides the transport layer)
- **#350** — geoarrow.wkb export (the export-side counterpart)
- **databricks/databricks-sql-go#328** — geospatialAsArrow Thrift flag (enables native geo Arrow transport)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: GeoArrow-aware bulk ingest — WKB staging + CTAS for geometry columns #361

Summary

Proposed Solution

Statement option

SRID from CRS metadata

Prior Art

Current Workaround

Relationship to other PRs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Driver	Import PR	Pattern
Snowflake	#99 (open)	geoarrow.wkb → BINARY via Parquet → PUT → COPY INTO → CTAS TO_GEOGRAPHY
Redshift	#3 (proposed)	geoarrow.wkb → VARBYTE via Parquet → S3 → COPY → CTAS ST_GeomFromWKB
Databricks	this issue	geoarrow.wkb → BINARY via Parquet → Volume → COPY INTO → CTAS ST_GeomFromWKB

feat: GeoArrow-aware bulk ingest — WKB staging + CTAS for geometry columns #361

Description

Summary

Proposed Solution

Statement option

SRID from CRS metadata

Prior Art

Current Workaround

Relationship to other PRs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions