Skip to content

Handling of geoparquet when not loading geoarrow #28

@kylebutts

Description

@kylebutts

First of all, thanks for this awesome work. It's been great to see the progress on all this :-)

In the example on the readme, you load a .parquet file that contains a geometry example. Since there is not a separate naming format/convention (e.g. .geo.parquet or .geoparquet), I might not know that there is a geometry in there, so I just load arrow and open the dataset as normal. Looking at the geometry column would be confusing to me. This behavior differs whether I have the geoarrow package loaded or not.

library(tidyverse)
library(arrow)

open_dataset("~/Desktop/nc.parquet") |>
  head(n = 1) |>
  pull(geometry, as_vector = TRUE)
#> <arrow_binary[1]>
#> [1] 01, 06, 00, 00, 00, 01, 00, 00, 00, 01, 03, 00, 00, 00, 01, 00, 00, 00, 1b, 00, 00, 00, 00, 00, 00, a0, 41, 5e, 54, c0, 00, 00, ...

library(geoarrow)
open_dataset("~/Desktop/nc.parquet") |>
  head(n = 1) |>
  pull(geometry, as_vector = TRUE)
#> <geoarrow_wkb[1]>
#> [1] MULTIPOLYGON (((-81.47276 36.23436, -81.54084 36.27251, -81.56198 36.27359, -81.63306 36.34069, -81.74107 36.39178, -81.69828 36.47178...

This issue might should be in the R arrow package, but I'm wondering if arrow should detect when there is a geometry column present and adjust behavior (the metadata is in there, so this information is known). For example, when calling collect(), should there be a warning that a geometry column is being collected and that geoarrow::st_collect() might be the better option (as in #21)? Or a warning when opening a geoparquet without geoarrow loaded?

library(tidyverse)
library(arrow)

nc = open_dataset("~/Desktop/nc.parquet") 
# We know there is a geometry from the metadata
nc$metadata[[1]]
#> [1] "{\"version\":\"0.3.0\",\"primary_column\":\"geometry\",\"columns\":{\"geometry\":{\"encoding\":\"WKB\",\"crs\":\"GEOGCS[\\\"NAD27\\\",DATUM[\\\"North_American_Datum_1927\\\",SPHEROID[\\\"Clarke 1866\\\",6378206.4,294.978698213898]],PRIMEM[\\\"Greenwich\\\",0],UNIT[\\\"degree\\\",0.0174532925199433,AUTHORITY[\\\"EPSG\\\",\\\"9122\\\"]],AXIS[\\\"Latitude\\\",NORTH],AXIS[\\\"Longitude\\\",EAST],AUTHORITY[\\\"EPSG\\\",\\\"4267\\\"]]\",\"bbox\":[-84.3239,33.882,-75.457,36.5896],\"geometry_type\":\"MultiPolygon\"}}}"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions