diff --git a/.github/workflows/test.yaml b/.github/workflows/test.yaml index 99d873e..75ed9c9 100644 --- a/.github/workflows/test.yaml +++ b/.github/workflows/test.yaml @@ -121,7 +121,9 @@ jobs: runs-on: ubuntu-latest steps: - - uses: actions/checkout@v3 + - uses: actions/checkout@v4 + with: + fetch-depth: 0 - name: Set up Python uses: actions/setup-python@v4 diff --git a/README.ipynb b/README.ipynb index 1d1213a..b01b57e 100644 --- a/README.ipynb +++ b/README.ipynb @@ -7,226 +7,263 @@ "source": [ "# GeoArrow for Python\n", "\n", - "The GeoArrow Python packages provide an implementation of the [GeoArrow specification](https://github.com/geoarrow/geoarrow) that integrates with [pyarrow](https://arrow.apache.org/docs/python) and [pandas](https://pandas.pydata.org/). The GeoArrow Python bindings enable input/output to/from Arrow-friendly formats (e.g., Parquet, Arrow Stream, Arrow File) and general-purpose coordinate shuffling tools among GeoArrow, WKT, and WKB encodings. \n", + "The GeoArrow Python packages provide an implementation of the [GeoArrow specification](https://geoarrow.org) that integrates with [pyarrow](https://arrow.apache.org/docs/python). The GeoArrow Python bindings enable input/output to/from Arrow-friendly formats (e.g., Parquet, Arrow Stream, Arrow File) and general-purpose coordinate shuffling tools among GeoArrow, WKT, and WKB encodings. \n", "\n", "## Installation\n", "\n", "Python bindings for GeoArrow are available on PyPI. You can install them with:\n", "\n", "```bash\n", - "pip install geoarrow-pyarrow geoarrow-pandas\n", + "pip install geoarrow-pyarrow\n", "```\n", "\n", - "You can install the latest development versions with:\n", + "You can install the latest development version with:\n", "\n", "```bash\n", - "pip install \"git+https://github.com/geoarrow/geoarrow-python.git#egg=geoarrow-pyarrow&subdirectory=geoarrow-pyarrow\"\n", - "pip install \"git+https://github.com/geoarrow/geoarrow-python.git#egg=geoarrow-pandas&subdirectory=geoarrow-pandas\"\n", + "pip install \"git+https://github.com/geoarrow/geoarrow-python.git#subdirectory=geoarrow-pyarrow\"\n", "```\n", "\n", - "If you can import the namespaces, you're good to go!" + "If you can import the namespace, you're good to go!" ] }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 2, "metadata": {}, "outputs": [], "source": [ - "import geoarrow.pyarrow as ga\n", - "import geoarrow.pandas as _" + "import geoarrow.pyarrow as ga" ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "## Examples\n", + "## Example\n", "\n", - "You can create geoarrow-encoded `pyarrow.Array`s with `as_geoarrow()`:" + "The most important thing that `geoarrow.pyarrow` does is register pyarrow extension types so that metadata is kept intact when reading files or interacting with other libraries. For example, we can now read Arrow IPC files written with GeoArrow extension types and the CRS and geometry type is kept:" ] }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "PointArray:PointType(geoarrow.point)[1]\n", - "" + "WkbType(geoarrow.wkb )" ] }, - "execution_count": 2, + "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "ga.as_geoarrow([\"POINT (0 1)\"])" + "import pyarrow as pa\n", + "import urllib.request\n", + "\n", + "url = \"https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_cities_wkb.arrows\"\n", + "with urllib.request.urlopen(url) as f, pa.ipc.open_stream(f) as reader:\n", + " tab = reader.read_all()\n", + "\n", + "tab.schema.field(\"geometry\").type" ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "This will work with:\n", - "\n", - "- An existing array created by geoarrow\n", - "- A `geopandas.GeoSeries`\n", - "- A `pyarrow.Array` or `pyarrow.ChunkedArray` (geoarrow text interpreted as well-known text; binary interpreted as well-known binary)\n", - "- Anything that `pyarrow.array()` will convert to a text or binary array\n", - "\n", - "If there is no common geometry type among elements of the input, `as_geoarrow()` will fall back to well-known binary encoding. To explicitly convert to well-known text or binary, use `as_wkt()` or `as_wkb()`.\n", - "\n", - "Alternatively, you can construct GeoArrow arrays directly from a series of buffers as described in the specification:" + "Use `geoarrow.pyarrow.to_geopandas()` to convert to [geopandas](https://geopandas.org):" ] }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "PointArray:PointType(geoarrow.point)[3]\n", - "\n", - "\n", - "" + "\n", + "Name: WGS 84\n", + "Axis Info [ellipsoidal]:\n", + "- Lat[north]: Geodetic latitude (degree)\n", + "- Lon[east]: Geodetic longitude (degree)\n", + "Area of Use:\n", + "- name: World.\n", + "- bounds: (-180.0, -90.0, 180.0, 90.0)\n", + "Datum: World Geodetic System 1984 ensemble\n", + "- Ellipsoid: WGS 84\n", + "- Prime Meridian: Greenwich" ] }, - "execution_count": 3, + "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "import numpy as np\n", - "\n", - "ga.point().from_geobuffers(\n", - " None, \n", - " np.array([1.0, 2.0, 3.0]),\n", - " np.array([3.0, 4.0, 5.0])\n", - ")" + "df = ga.to_geopandas(tab)\n", + "df.geometry.crs" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "...and use `GeoDataFrame.to_arrow()` to get it back:" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "PointArray:PointType(interleaved geoarrow.point)[3]\n", - "\n", - "\n", - "" + "ProjJsonCrs(EPSG:4326)" ] }, - "execution_count": 4, + "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "ga.point().with_coord_type(ga.CoordType.INTERLEAVED).from_geobuffers(\n", - " None,\n", - " np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0])\n", - ")" + "pa.table(df.to_arrow())[\"geometry\"].type.crs" ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "Importing `geoarrow.pyarrow` will register the geoarrow extension types with pyarrow such that you can read/write Arrow streams, Arrow files, and Parquet that contains Geoarrow extension types. A number of these files are available from the [geoarrow-data](https://github.com/geoarrow/geoarrow-data) repository." + "These Python bindings also include [GeoParquet](https://geoparquet.org) and [pyogrio](https://github.com/geopandas/pyogrio) integration for direct IO to/from pyarrow. This can be useful when loading data approaching the size of available memory as GeoPandas requires many times more memory for some types of data (notably: large numbers of points)." ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "OBJECTID: int64\n", - "FEAT_CODE: string\n", - "LINE_CLASS: int32\n", - "MISCID_1: string\n", - "MISCNAME_1: string\n", - "MISCID_2: string\n", - "MISCNAME_2: string\n", - "HID: string\n", - "MISCID_3: string\n", - "MISCNAME_3: string\n", - "MISCID_4: string\n", - "MISCNAME_4: string\n", - "SHAPE_LEN: double\n", - "geometry: extension>" + "pyarrow.Table\n", + "name: string\n", + "geometry: extension>\n", + "----\n", + "name: [[\"Vatican City\",\"San Marino\",\"Vaduz\",\"Lobamba\",\"Luxembourg\",...,\"Rio de Janeiro\",\"Sao Paulo\",\"Sydney\",\"Singapore\",\"Hong Kong\"]]\n", + "geometry: [[010100000054E57B4622E828408B074AC09EF34440,0101000000DCB122B42FE228402376B7FCD1F74540,01010000006DAE9AE78808234032D989DC1D914740,01010000007BCB8B0233333F40289B728577773AC0,0101000000C08D39741F8518400F2153E34ACE4840,...,0101000000667B47AA269B45C002B53F5745E836C0,0101000000F15A536A405047C0C1148A19868E37C0,0101000000A286FD30CDE662401F04CF2989EF40C0,01010000003A387DE2A5F659409AF3E7363CB8F43F,0101000000D865F84FB78B5C40144438C1924E3640]]" ] }, - "execution_count": 5, + "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "import urllib.request\n", - "from pyarrow import feather\n", + "import geoarrow.pyarrow.io\n", "\n", - "url = \"https://github.com/geoarrow/geoarrow-data/releases/download/v0.1.0/ns-water-basin_line.arrow\"\n", - "local_filename, headers = urllib.request.urlretrieve(url)\n", - "feather.read_table(local_filename).schema" + "url = \"https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_cities.fgb\"\n", + "geoarrow.pyarrow.io.read_pyogrio_table(url)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "pyarrow.Table\n", + "name: string\n", + "geometry: extension>\n", + "----\n", + "name: [[\"Vatican City\",\"San Marino\",\"Vaduz\",\"Lobamba\",\"Luxembourg\",...,\"Rio de Janeiro\",\"Sao Paulo\",\"Sydney\",\"Singapore\",\"Hong Kong\"]]\n", + "geometry: [[010100000054E57B4622E828408B074AC09EF34440,0101000000DCB122B42FE228402376B7FCD1F74540,01010000006DAE9AE78808234032D989DC1D914740,01010000007BCB8B0233333F40289B728577773AC0,0101000000C08D39741F8518400F2153E34ACE4840,...,0101000000667B47AA269B45C002B53F5745E836C0,0101000000F15A536A405047C0C1148A19868E37C0,0101000000A286FD30CDE662401F04CF2989EF40C0,01010000003A387DE2A5F659409AF3E7363CB8F43F,0101000000D865F84FB78B5C40144438C1924E3640]]" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "url = \"https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_cities_geo.parquet\"\n", + "local_filename, _ = urllib.request.urlretrieve(url)\n", + "\n", + "geoarrow.pyarrow.io.read_geoparquet_table(local_filename)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "The `as_geoarrow()` function can accept a `geopandas.GeoSeries` as input:" + "Finally, a number of compute functions are provided for common transformations required to create/consume arrays of geometries:" ] }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "MultiLinestringArray:MultiLinestringType(geoarrow.multilinestring <{\"$schema\":\"https://proj.org/schem...>)[255]\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "...245 values...\n", - "\n", - "\n", - "\n", - "\n", - "" + "\n", + "[\n", + " [\n", + " \"POINT (12.4533865 41.9032822)\",\n", + " \"POINT (12.4417702 43.9360958)\",\n", + " \"POINT (9.5166695 47.1337238)\",\n", + " \"POINT (31.1999971 -26.4666675)\",\n", + " \"POINT (6.1300028 49.6116604)\"\n", + " ]\n", + "]" ] }, - "execution_count": 6, + "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "import geopandas\n", + "ga.format_wkt(tab[\"geometry\"])[:5]" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create/Consume GeoArrow Arrays\n", "\n", - "url = \"https://github.com/geoarrow/geoarrow-data/releases/download/v0.1.0/ns-water-basin_line.fgb.zip\"\n", - "df = geopandas.read_file(url)\n", - "array = ga.as_geoarrow(df.geometry)\n", - "array" + "The `geoarrow-pyarrow` package also provides a number of utilities for working with serialized and GeoArrow-native arrays. For example, you can create geoarrow-encoded `pyarrow.Array`s with `as_geoarrow()`:" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "GeometryExtensionArray:PointType(geoarrow.point)[1]\n", + "" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ga.as_geoarrow([\"POINT (0 1)\"])" ] }, { @@ -234,76 +271,108 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "You can convert back to geopandas using `to_geopandas()`:" + "This will work with:\n", + "\n", + "- An existing array created by geoarrow\n", + "- A `geopandas.GeoSeries`\n", + "- A `pyarrow.Array` or `pyarrow.ChunkedArray` (geoarrow text interpreted as well-known text; binary interpreted as well-known binary)\n", + "- Anything that `pyarrow.array()` will convert to a text or binary array\n", + "\n", + "If there is no common geometry type among elements of the input, `as_geoarrow()` will fall back to well-known binary encoding. To explicitly convert to well-known text or binary, use `as_wkt()` or `as_wkb()`.\n", + "\n", + "Alternatively, you can construct GeoArrow arrays directly from a series of buffers as described in the specification:" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "GeometryExtensionArray:PointType(geoarrow.point)[3]\n", + "\n", + "\n", + "" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import numpy as np\n", + "\n", + "ga.point().from_geobuffers(\n", + " None,\n", + " np.array([1.0, 2.0, 3.0]),\n", + " np.array([3.0, 4.0, 5.0])\n", + ")" ] }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "0 MULTILINESTRING ((648686.211 5099183.050, 6486...\n", - "1 MULTILINESTRING ((687688.017 5117030.253, 6867...\n", - "2 MULTILINESTRING ((631355.706 5122893.354, 6313...\n", - "3 MULTILINESTRING ((665166.211 5138643.057, 6651...\n", - "4 MULTILINESTRING ((673606.211 5162963.061, 6736...\n", - " ... \n", - "250 MULTILINESTRING ((681672.818 5078602.647, 6818...\n", - "251 MULTILINESTRING ((414868.067 5093041.934, 4147...\n", - "252 MULTILINESTRING ((414868.067 5093041.934, 4148...\n", - "253 MULTILINESTRING ((414868.067 5093041.934, 4149...\n", - "254 MULTILINESTRING ((648686.211 5099183.050, 6488...\n", - "Length: 255, dtype: geometry" + "GeometryExtensionArray:PointType(interleaved geoarrow.point)[3]\n", + "\n", + "\n", + "" ] }, - "execution_count": 7, + "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "ga.to_geopandas(array)" + "ga.point().with_coord_type(ga.CoordType.INTERLEAVED).from_geobuffers(\n", + " None,\n", + " np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0])\n", + ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Pandas integration\n", + "## For Developers\n", "\n", - "The `geoarrow-pandas` package provides an extension array that wraps geoarrow memory and an accessor that provides pandas-friendly wrappers around the compute functions available in `geoarrow.pyarrow`." + "One of the challeneges with GeoArrow data is the large number of permutations between X, Y, Z, M, geometry types, and serialized encodings. The `geoarrow-types` package provides pure Python utilities to manage, compute on, and specify these types (or parts of them, as required)." ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "0 MULTIPOINT (277022.6936181751 4820886.609673489)\n", - "1 MULTIPOINT (315701.2552756762 4855051.378571571)\n", - "2 MULTIPOINT (255728.65994492616 4851022.107901295)\n", - "3 MULTIPOINT (245206.7841665779 4895609.409696873)\n", - "4 MULTIPOINT (337143.18135472975 4860312.288760258)\n", - "dtype: string[pyarrow]" + "MultiPointType(geoarrow.multipoint_zm)" ] }, - "execution_count": 8, + "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "import geoarrow.pandas as _\n", - "import pandas as pd\n", + "import geoarrow.types as gt\n", "\n", - "df = pd.read_feather(\"https://github.com/geoarrow/geoarrow-data/releases/download/v0.1.0/ns-water-basin_point.arrow\")\n", - "df.geometry.geoarrow.format_wkt().head(5)" + "gt.TypeSpec.common(\n", + " gt.Encoding.GEOARROW,\n", + " gt.GeometryType.POINT,\n", + " gt.GeometryType.MULTIPOINT,\n", + " gt.Dimensions.XYM,\n", + " gt.Dimensions.XYZ,\n", + ").to_pyarrow()" ] }, { @@ -318,7 +387,7 @@ "\n", "```shell\n", "git clone https://github.com/geoarrow/geoarrow-python.git\n", - "pip install -e geoarrow-pyarrow/ geoarrow-pandas/\n", + "pip install -e geoarrow-pyarrow/ geoarrow-types/\n", "```\n", "\n", "Tests use [pytest](https://docs.pytest.org/):\n", @@ -331,7 +400,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": ".venv", "language": "python", "name": "python3" }, @@ -345,7 +414,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.2" + "version": "3.13.3" }, "orig_nbformat": 4 }, diff --git a/README.md b/README.md index 0e75307..c5d843a 100644 --- a/README.md +++ b/README.md @@ -1,206 +1,243 @@ # GeoArrow for Python -The GeoArrow Python packages provide an implementation of the [GeoArrow specification](https://github.com/geoarrow/geoarrow) that integrates with [pyarrow](https://arrow.apache.org/docs/python) and [pandas](https://pandas.pydata.org/). The GeoArrow Python bindings enable input/output to/from Arrow-friendly formats (e.g., Parquet, Arrow Stream, Arrow File) and general-purpose coordinate shuffling tools among GeoArrow, WKT, and WKB encodings. +The GeoArrow Python packages provide an implementation of the [GeoArrow specification](https://geoarrow.org) that integrates with [pyarrow](https://arrow.apache.org/docs/python). The GeoArrow Python bindings enable input/output to/from Arrow-friendly formats (e.g., Parquet, Arrow Stream, Arrow File) and general-purpose coordinate shuffling tools among GeoArrow, WKT, and WKB encodings. ## Installation Python bindings for GeoArrow are available on PyPI. You can install them with: ```bash -pip install geoarrow-pyarrow geoarrow-pandas +pip install geoarrow-pyarrow ``` -You can install the latest development versions with: +You can install the latest development version with: ```bash -pip install "git+https://github.com/geoarrow/geoarrow-python.git#egg=geoarrow-pyarrow&subdirectory=geoarrow-pyarrow" -pip install "git+https://github.com/geoarrow/geoarrow-python.git#egg=geoarrow-pandas&subdirectory=geoarrow-pandas" +pip install "git+https://github.com/geoarrow/geoarrow-python.git#subdirectory=geoarrow-pyarrow" ``` -If you can import the namespaces, you're good to go! +If you can import the namespace, you're good to go! ```python import geoarrow.pyarrow as ga -import geoarrow.pandas as _ ``` -## Examples +## Example -You can create geoarrow-encoded `pyarrow.Array`s with `as_geoarrow()`: +The most important thing that `geoarrow.pyarrow` does is register pyarrow extension types so that metadata is kept intact when reading files or interacting with other libraries. For example, we can now read Arrow IPC files written with GeoArrow extension types and the CRS and geometry type is kept: ```python -ga.as_geoarrow(["POINT (0 1)"]) +import pyarrow as pa +import urllib.request + +url = "https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_cities_wkb.arrows" +with urllib.request.urlopen(url) as f, pa.ipc.open_stream(f) as reader: + tab = reader.read_all() + +tab.schema.field("geometry").type ``` - PointArray:PointType(geoarrow.point)[1] - + WkbType(geoarrow.wkb ) -This will work with: +Use `geoarrow.pyarrow.to_geopandas()` to convert to [geopandas](https://geopandas.org): -- An existing array created by geoarrow -- A `geopandas.GeoSeries` -- A `pyarrow.Array` or `pyarrow.ChunkedArray` (geoarrow text interpreted as well-known text; binary interpreted as well-known binary) -- Anything that `pyarrow.array()` will convert to a text or binary array -If there is no common geometry type among elements of the input, `as_geoarrow()` will fall back to well-known binary encoding. To explicitly convert to well-known text or binary, use `as_wkt()` or `as_wkb()`. +```python +df = ga.to_geopandas(tab) +df.geometry.crs +``` + + + + + + Name: WGS 84 + Axis Info [ellipsoidal]: + - Lat[north]: Geodetic latitude (degree) + - Lon[east]: Geodetic longitude (degree) + Area of Use: + - name: World. + - bounds: (-180.0, -90.0, 180.0, 90.0) + Datum: World Geodetic System 1984 ensemble + - Ellipsoid: WGS 84 + - Prime Meridian: Greenwich -Alternatively, you can construct GeoArrow arrays directly from a series of buffers as described in the specification: + + +...and use `GeoDataFrame.to_arrow()` to get it back: ```python -import numpy as np +pa.table(df.to_arrow())["geometry"].type.crs +``` -ga.point().from_geobuffers( - None, - np.array([1.0, 2.0, 3.0]), - np.array([3.0, 4.0, 5.0]) -) + + + + ProjJsonCrs(EPSG:4326) + + + +These Python bindings also include [GeoParquet](https://geoparquet.org) and [pyogrio](https://github.com/geopandas/pyogrio) integration for direct IO to/from pyarrow. This can be useful when loading data approaching the size of available memory as GeoPandas requires many times more memory for some types of data (notably: large numbers of points). + + +```python +import geoarrow.pyarrow.io + +url = "https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_cities.fgb" +geoarrow.pyarrow.io.read_pyogrio_table(url) ``` - PointArray:PointType(geoarrow.point)[3] - - - + pyarrow.Table + name: string + geometry: extension> + ---- + name: [["Vatican City","San Marino","Vaduz","Lobamba","Luxembourg",...,"Rio de Janeiro","Sao Paulo","Sydney","Singapore","Hong Kong"]] + geometry: [[010100000054E57B4622E828408B074AC09EF34440,0101000000DCB122B42FE228402376B7FCD1F74540,01010000006DAE9AE78808234032D989DC1D914740,01010000007BCB8B0233333F40289B728577773AC0,0101000000C08D39741F8518400F2153E34ACE4840,...,0101000000667B47AA269B45C002B53F5745E836C0,0101000000F15A536A405047C0C1148A19868E37C0,0101000000A286FD30CDE662401F04CF2989EF40C0,01010000003A387DE2A5F659409AF3E7363CB8F43F,0101000000D865F84FB78B5C40144438C1924E3640]] ```python -ga.point().with_coord_type(ga.CoordType.INTERLEAVED).from_geobuffers( - None, - np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0]) -) +url = "https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_cities_geo.parquet" +local_filename, _ = urllib.request.urlretrieve(url) + +geoarrow.pyarrow.io.read_geoparquet_table(local_filename) ``` - PointArray:PointType(interleaved geoarrow.point)[3] - - - + pyarrow.Table + name: string + geometry: extension> + ---- + name: [["Vatican City","San Marino","Vaduz","Lobamba","Luxembourg",...,"Rio de Janeiro","Sao Paulo","Sydney","Singapore","Hong Kong"]] + geometry: [[010100000054E57B4622E828408B074AC09EF34440,0101000000DCB122B42FE228402376B7FCD1F74540,01010000006DAE9AE78808234032D989DC1D914740,01010000007BCB8B0233333F40289B728577773AC0,0101000000C08D39741F8518400F2153E34ACE4840,...,0101000000667B47AA269B45C002B53F5745E836C0,0101000000F15A536A405047C0C1148A19868E37C0,0101000000A286FD30CDE662401F04CF2989EF40C0,01010000003A387DE2A5F659409AF3E7363CB8F43F,0101000000D865F84FB78B5C40144438C1924E3640]] -Importing `geoarrow.pyarrow` will register the geoarrow extension types with pyarrow such that you can read/write Arrow streams, Arrow files, and Parquet that contains Geoarrow extension types. A number of these files are available from the [geoarrow-data](https://github.com/geoarrow/geoarrow-data) repository. +Finally, a number of compute functions are provided for common transformations required to create/consume arrays of geometries: ```python -import urllib.request -from pyarrow import feather +ga.format_wkt(tab["geometry"])[:5] +``` + + + + + + [ + [ + "POINT (12.4533865 41.9032822)", + "POINT (12.4417702 43.9360958)", + "POINT (9.5166695 47.1337238)", + "POINT (31.1999971 -26.4666675)", + "POINT (6.1300028 49.6116604)" + ] + ] + + + +## Create/Consume GeoArrow Arrays -url = "https://github.com/geoarrow/geoarrow-data/releases/download/v0.1.0/ns-water-basin_line.arrow" -local_filename, headers = urllib.request.urlretrieve(url) -feather.read_table(local_filename).schema +The `geoarrow-pyarrow` package also provides a number of utilities for working with serialized and GeoArrow-native arrays. For example, you can create geoarrow-encoded `pyarrow.Array`s with `as_geoarrow()`: + + +```python +ga.as_geoarrow(["POINT (0 1)"]) ``` - OBJECTID: int64 - FEAT_CODE: string - LINE_CLASS: int32 - MISCID_1: string - MISCNAME_1: string - MISCID_2: string - MISCNAME_2: string - HID: string - MISCID_3: string - MISCNAME_3: string - MISCID_4: string - MISCNAME_4: string - SHAPE_LEN: double - geometry: extension> + GeometryExtensionArray:PointType(geoarrow.point)[1] + -The `as_geoarrow()` function can accept a `geopandas.GeoSeries` as input: +This will work with: + +- An existing array created by geoarrow +- A `geopandas.GeoSeries` +- A `pyarrow.Array` or `pyarrow.ChunkedArray` (geoarrow text interpreted as well-known text; binary interpreted as well-known binary) +- Anything that `pyarrow.array()` will convert to a text or binary array + +If there is no common geometry type among elements of the input, `as_geoarrow()` will fall back to well-known binary encoding. To explicitly convert to well-known text or binary, use `as_wkt()` or `as_wkb()`. + +Alternatively, you can construct GeoArrow arrays directly from a series of buffers as described in the specification: ```python -import geopandas +import numpy as np -url = "https://github.com/geoarrow/geoarrow-data/releases/download/v0.1.0/ns-water-basin_line.fgb.zip" -df = geopandas.read_file(url) -array = ga.as_geoarrow(df.geometry) -array +ga.point().from_geobuffers( + None, + np.array([1.0, 2.0, 3.0]), + np.array([3.0, 4.0, 5.0]) +) ``` - MultiLinestringArray:MultiLinestringType(geoarrow.multilinestring <{"$schema":"https://proj.org/schema...>)[255] - - - - - - ...245 values... - - - - - - + GeometryExtensionArray:PointType(geoarrow.point)[3] + + + -You can convert back to geopandas using `to_geopandas()`: ```python -ga.to_geopandas(array) +ga.point().with_coord_type(ga.CoordType.INTERLEAVED).from_geobuffers( + None, + np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0]) +) ``` - 0 MULTILINESTRING ((648686.211 5099183.050, 6486... - 1 MULTILINESTRING ((687688.017 5117030.253, 6867... - 2 MULTILINESTRING ((631355.706 5122893.354, 6313... - 3 MULTILINESTRING ((665166.211 5138643.057, 6651... - 4 MULTILINESTRING ((673606.211 5162963.061, 6736... - ... - 250 MULTILINESTRING ((681672.818 5078602.647, 6818... - 251 MULTILINESTRING ((414868.067 5093041.934, 4147... - 252 MULTILINESTRING ((414868.067 5093041.934, 4148... - 253 MULTILINESTRING ((414868.067 5093041.934, 4149... - 254 MULTILINESTRING ((648686.211 5099183.050, 6488... - Length: 255, dtype: geometry + GeometryExtensionArray:PointType(interleaved geoarrow.point)[3] + + + -## Pandas integration +## For Developers -The `geoarrow-pandas` package provides an extension array that wraps geoarrow memory and an accessor that provides pandas-friendly wrappers around the compute functions available in `geoarrow.pyarrow`. +One of the challeneges with GeoArrow data is the large number of permutations between X, Y, Z, M, geometry types, and serialized encodings. The `geoarrow-types` package provides pure Python utilities to manage, compute on, and specify these types (or parts of them, as required). ```python -import geoarrow.pandas as _ -import pandas as pd - -df = pd.read_feather("https://github.com/geoarrow/geoarrow-data/releases/download/v0.1.0/ns-water-basin_point.arrow") -df.geometry.geoarrow.format_wkt().head(5) +import geoarrow.types as gt + +gt.TypeSpec.common( + gt.Encoding.GEOARROW, + gt.GeometryType.POINT, + gt.GeometryType.MULTIPOINT, + gt.Dimensions.XYM, + gt.Dimensions.XYZ, +).to_pyarrow() ``` - 0 MULTIPOINT (277022.6936181751 4820886.609673489) - 1 MULTIPOINT (315701.2552756762 4855051.378571571) - 2 MULTIPOINT (255728.65994492616 4851022.107901295) - 3 MULTIPOINT (245206.7841665779 4895609.409696873) - 4 MULTIPOINT (337143.18135472975 4860312.288760258) - dtype: string[pyarrow] + MultiPointType(geoarrow.multipoint_zm) @@ -211,7 +248,7 @@ This means you can build the project using: ```shell git clone https://github.com/geoarrow/geoarrow-python.git -pip install -e geoarrow-pyarrow/ geoarrow-pandas/ +pip install -e geoarrow-pyarrow/ geoarrow-types/ ``` Tests use [pytest](https://docs.pytest.org/):