Skip to content

feat: GeoArrow-aware bulk ingest — VARBYTE staging + CTAS for GEOMETRY/GEOGRAPHY #3

@jatorre

Description

@jatorre

Summary

adbc_insert with geoarrow.wkb Arrow columns fails because the driver creates a GEOGRAPHY column in the staging table, but Redshift's COPY FROM Parquet doesn't support GEOGRAPHY columns:

ERROR: Ingesting GEOGRAPHY columns is only supported from TEXT or CSV. (SQLSTATE 22P04)

Proposed Solution

Detect geoarrow.wkb (or geoarrow.wkt) in Arrow field metadata during ingest, and use a two-step approach:

  1. Staging: Create the column as VARBYTE (for WKB) or VARCHAR (for WKT) in the staging table
  2. COPY: Load via the existing Parquet → S3 → COPY pipeline (works fine for VARBYTE)
  3. CTAS: Convert to GEOMETRY/GEOGRAPHY:
    CREATE TABLE target AS
    SELECT *, ST_GeomFromWKB(geom_col) as geom  -- or ST_SetSRID(..., srid)
    FROM staging;
  4. Cleanup: Drop the staging table

Statement option

A statement option to control the target type:

adbc.redshift.statement.ingest_geo_type = "geometry" (default) | "geography"

For GEOGRAPHY:

ST_SetSRID(ST_GeomFromWKB(geom_col), 4326)::geography as geom

SRID from CRS metadata

The geoarrow.wkb field may carry CRS metadata (PROJJSON or EPSG:NNNN). The driver should extract the SRID and use it in the ST_SetSRID call. This connects with #2 (CRS propagation on the export side).

Prior Art

This is the same pattern implemented in the Snowflake ADBC driver: adbc-drivers/snowflake#99

The Snowflake driver:

  • Detects geoarrow.wkb/wkt from Arrow metadata
  • Loads as BINARY via Parquet → PUT → COPY INTO
  • Post-COPY CTAS with TO_GEOGRAPHY/TO_GEOMETRY conversion
  • Statement option adbc.snowflake.statement.ingest_geo_type

The Redshift implementation would be very similar since the driver already handles S3 staging internally (redshift.ingest.bucket + Parquet → S3 → COPY).

Workaround

Currently users must manually convert geometry to WKB blob before calling adbc_insert, then run a CTAS on Redshift to convert VARBYTE → GEOMETRY:

-- In DuckDB: convert GEOMETRY to WKB blob
CREATE TABLE _import AS SELECT *, ST_AsWKB(geom) as geom_wkb FROM source;

-- adbc_insert sends geom_wkb as VARBYTE (works)
-- Then on Redshift:
CREATE TABLE final AS SELECT *, ST_GeomFromWKB(geom_wkb) as geom FROM staging;

This works but requires user-side orchestration that should be handled transparently by the driver.

Benchmark context

We're building a unified geospatial import pipeline using DuckDB + ADBC across all major warehouses. Current throughput with the manual workaround:

Platform Rows/sec Method
Redshift ~27,000 Manual WKB staging + CTAS
Snowflake ~68,000 PR #99 (driver-native geoarrow)

Making this transparent in the Redshift driver would match the Snowflake driver's capabilities and enable a truly unified adbc_insert API across all warehouses.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions