-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Summary
adbc_insert with geoarrow.wkb Arrow columns fails because the driver creates a GEOGRAPHY column in the staging table, but Redshift's COPY FROM Parquet doesn't support GEOGRAPHY columns:
ERROR: Ingesting GEOGRAPHY columns is only supported from TEXT or CSV. (SQLSTATE 22P04)
Proposed Solution
Detect geoarrow.wkb (or geoarrow.wkt) in Arrow field metadata during ingest, and use a two-step approach:
- Staging: Create the column as
VARBYTE(for WKB) orVARCHAR(for WKT) in the staging table - COPY: Load via the existing Parquet → S3 → COPY pipeline (works fine for VARBYTE)
- CTAS: Convert to GEOMETRY/GEOGRAPHY:
CREATE TABLE target AS SELECT *, ST_GeomFromWKB(geom_col) as geom -- or ST_SetSRID(..., srid) FROM staging;
- Cleanup: Drop the staging table
Statement option
A statement option to control the target type:
adbc.redshift.statement.ingest_geo_type = "geometry" (default) | "geography"
For GEOGRAPHY:
ST_SetSRID(ST_GeomFromWKB(geom_col), 4326)::geography as geomSRID from CRS metadata
The geoarrow.wkb field may carry CRS metadata (PROJJSON or EPSG:NNNN). The driver should extract the SRID and use it in the ST_SetSRID call. This connects with #2 (CRS propagation on the export side).
Prior Art
This is the same pattern implemented in the Snowflake ADBC driver: adbc-drivers/snowflake#99
The Snowflake driver:
- Detects geoarrow.wkb/wkt from Arrow metadata
- Loads as BINARY via Parquet → PUT → COPY INTO
- Post-COPY CTAS with
TO_GEOGRAPHY/TO_GEOMETRYconversion - Statement option
adbc.snowflake.statement.ingest_geo_type
The Redshift implementation would be very similar since the driver already handles S3 staging internally (redshift.ingest.bucket + Parquet → S3 → COPY).
Workaround
Currently users must manually convert geometry to WKB blob before calling adbc_insert, then run a CTAS on Redshift to convert VARBYTE → GEOMETRY:
-- In DuckDB: convert GEOMETRY to WKB blob
CREATE TABLE _import AS SELECT *, ST_AsWKB(geom) as geom_wkb FROM source;
-- adbc_insert sends geom_wkb as VARBYTE (works)
-- Then on Redshift:
CREATE TABLE final AS SELECT *, ST_GeomFromWKB(geom_wkb) as geom FROM staging;This works but requires user-side orchestration that should be handled transparently by the driver.
Benchmark context
We're building a unified geospatial import pipeline using DuckDB + ADBC across all major warehouses. Current throughput with the manual workaround:
| Platform | Rows/sec | Method |
|---|---|---|
| Redshift | ~27,000 | Manual WKB staging + CTAS |
| Snowflake | ~68,000 | PR #99 (driver-native geoarrow) |
Making this transparent in the Redshift driver would match the Snowflake driver's capabilities and enable a truly unified adbc_insert API across all warehouses.