GeoParquet files with GEOGRAPHY type produce incompatible Parquet metadata for BigQuery

When using `hyparquet-writer` to write GeoParquet files with `type: "GEOGRAPHY"`, BigQuery rejects the resulting files with the error:

```
Cannot annotate Geography from BYTE_ARRAY for field ...
```

## Background

`hyparquet-writer` correctly applies the Parquet-native `GEOGRAPHY` logical type (with `crs` and `algorithm` parameters) as defined in the [Apache Parquet format specification](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#geography). However, BigQuery's Parquet importer does not currently handle this logical type annotation. Instead, BigQuery relies on the [GeoParquet file-level JSON metadata](https://geoparquet.org/releases/v1.1.0/) (the `geo` key in the Parquet file's key-value metadata) to identify and load geography columns.

For reference, DuckDB handles the `GEOGRAPHY` logical type without issue, so this is specifically a BigQuery compatibility problem (though I have not tested other consumers).

### What BigQuery expects

For a geography column named `geog`, BigQuery expects:
- Column type: `BYTE_ARRAY` with WKB-encoded data and **no** Parquet-level logical type annotation
- File-level `geo` key-value metadata describing the column's encoding, CRS, and geometry types

### What hyparquet-writer currently produces

- Column type: `optional binary geog (Geography(crs=, algorithm=spherical))` — BigQuery errors on this annotation
- File-level `geo` metadata: note present by default, but can be added via `kvMetadata`

### What a compatible writer (e.g. geopandas) produces

- Column type: `optional binary geog` (plain `BYTE_ARRAY`, no logical type)
- File-level `geo` metadata (automatically generated)
- Arrow extension metadata (`ARROW:extension:name: geoarrow.wkb`) on the column (doesn't seem to affect BigQuery either way)

## What I've Tried

1. **Adding file-level `geo` metadata via `kvMetadata`**: This correctly adds the metadata, but BigQuery still fails because the column-level `Geography` logical type annotation is present.

2. **Using `schemaOverrides`**: We tried overriding the geog column schema to strip the logical type, but the `GEOGRAPHY` type in `columnData` triggers the WKB conversion in `unconvert.js` based on the logical type check (`ltype?.type === 'GEOGRAPHY'`). So overriding the schema alone doesn't cleanly decouple the two concerns.

## Workaround

I ended up **pre-converting GeoJSON to WKB ourselves** using `geojsonToWkb` from `hyparquet-writer/src/wkb.js`, and passing the resulting byte arrays directly without any `type` annotation:

```javascript
import { parquetWriteFile } from 'hyparquet-writer';
import { geojsonToWkb } from 'hyparquet-writer/src/wkb.js';

// Convert GeoJSON to WKB manually
const geogColumn = features.map(f => f.geometry ? geojsonToWkb(f.geometry) : null);

parquetWriteFile({
  filename: 'output.parquet',
  columnData: [
    ...propertyColumns,
    { name: 'geog', data: geogColumn, type: 'BYTE_ARRAY' },
  ],
  kvMetadata: [
    { key: 'geo', value: JSON.stringify({
      version: '1.1.0',
      primary_column: 'geog',
      columns: { geog: { encoding: 'WKB', geometry_types: [] } },
    }) },
  ],
});
```

This produces files that BigQuery accepts, but it requires users to manually handle WKB conversion and construct the file-level metadata, which the `GEOGRAPHY` type handles automatically.

## Feature Request

### 1. A `geoMetadata` writer option

It would be great to have a dedicated `geoMetadata` option on `parquetWriteFile` that manages the GeoParquet file-level `geo` metadata. Some suggested behaviors:

- **Default (auto-populate):** When any `GEOMETRY`/`GEOGRAPHY` columns are present in `columnData`, `geoMetadata` would be automatically generated. These columns would be listed in the metadata, with the first geo column serving as the `primary_column`.
- **Manual override:** In the absence of `GEOMETRY`/`GEOGRAPHY` columns (e.g., when passing pre-converted WKB as plain `BYTE_ARRAY`), users could manually specify which columns should be included in `geoMetadata` and their encoding/CRS/geometry_types.
- **Suppress:** When `GEOMETRY`/`GEOGRAPHY` columns are present but file-level metadata is not wanted for some reason, setting `geoMetadata: null` would suppress it.

This would eliminate the need for users to manually construct the `geo` JSON and pass it through `kvMetadata`.

### 2. Easier access to WKB conversion (nice-to-have)

For users who need to work around the logical type issue (e.g., for BigQuery compatibility), it would help to have `geojsonToWkb` exported as a public API rather than requiring an import from `hyparquet-writer/src/wkb.js`.

### 3. Consider a BigQuery compatibility option (probably a bit much)

In an ideal world, BigQuery would handle the Parquet-native `GEOGRAPHY` logical type. I've filed a [separate issue](https://issuetracker.google.com/issues/485900029) with Google about this. In the meantime, a compatibility option (e.g., `geoMetadata: { logicalType: false }` or similar) that writes the geo column as plain `BYTE_ARRAY` while still handling the GeoJSON→WKB conversion could help users targeting BigQuery.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GeoParquet files with GEOGRAPHY type produce incompatible Parquet metadata for BigQuery #23

Background

What BigQuery expects

What hyparquet-writer currently produces

What a compatible writer (e.g. geopandas) produces

What I've Tried

Workaround

Feature Request

1. A `geoMetadata` writer option

2. Easier access to WKB conversion (nice-to-have)

3. Consider a BigQuery compatibility option (probably a bit much)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GeoParquet files with GEOGRAPHY type produce incompatible Parquet metadata for BigQuery #23

Description

Background

What BigQuery expects

What hyparquet-writer currently produces

What a compatible writer (e.g. geopandas) produces

What I've Tried

Workaround

Feature Request

1. A geoMetadata writer option

2. Easier access to WKB conversion (nice-to-have)

3. Consider a BigQuery compatibility option (probably a bit much)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. A `geoMetadata` writer option