-
Notifications
You must be signed in to change notification settings - Fork 234
Add pygmt.gmtread to read a dataset/grid/image into pandas.DataFrame/xarray.DataArray #3673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
d913c86
f456bf8
c3cbb6e
f2a4ce4
1dd97c6
7790ea3
e588008
40d12ee
fa1021d
c378225
7b749e0
8befa58
a758752
9d66cf4
a05383a
6ca4ef2
7851ced
084b87a
b21997c
a812317
1f0f158
957c7eb
6aef3ca
72afbfe
03de9b7
85c533d
663c76d
3ed1032
7d320f4
2e72ebe
6d634cc
4dc7974
69f5c45
061f5f2
4f0779e
a06ddca
82b80f5
a6c4ee7
b4a0b9d
37fc1de
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -172,6 +172,7 @@ Input/output | |
| :toctree: generated | ||
|
|
||
| load_dataarray | ||
| read | ||
|
|
||
| GMT Defaults | ||
| ------------ | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| """ | ||
| PyGMT input/output (I/O) utilities. | ||
| """ | ||
|
|
||
| from pygmt.io.gmtread import gmtread | ||
| from pygmt.io.load_dataarray import load_dataarray |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,125 @@ | ||
| """ | ||
| Read a file into an appropriate object. | ||
| """ | ||
|
|
||
| from collections.abc import Mapping, Sequence | ||
| from pathlib import PurePath | ||
| from typing import Any, Literal | ||
|
|
||
| import pandas as pd | ||
| import xarray as xr | ||
| from pygmt.clib import Session | ||
| from pygmt.helpers import build_arg_list, is_nonstr_iter | ||
| from pygmt.src.which import which | ||
|
|
||
|
|
||
| def gmtread( | ||
| file: str | PurePath, | ||
| kind: Literal["dataset", "grid", "image"], | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does GMT read also handle 'cube'?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes (xref: https://github.com/GenericMappingTools/gmt/blob/9a8769f905c2b55cf62ed57cd0c21e40c00b3560/src/gmtread.c#L75-L81), but need to wait for #3150, which may have upstream bugs. |
||
| region: Sequence[float] | str | None = None, | ||
| header: int | None = None, | ||
| column_names: pd.Index | None = None, | ||
| dtype: type | Mapping[Any, type] | None = None, | ||
| index_col: str | int | None = None, | ||
| ) -> pd.DataFrame | xr.DataArray: | ||
|
Comment on lines
+16
to
+24
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On second thought, I'm thinking if we should make
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems the
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, not needed for grids/images, but we could still use |
||
| """ | ||
| Read a dataset, grid, or image from a file and return the appropriate object. | ||
|
|
||
| The returned object is a :class:`pandas.DataFrame` for datasets, and | ||
| :class:`xarray.DataArray` for grids and images. | ||
|
|
||
| For datasets, keyword arguments ``column_names``, ``header``, ``dtype``, and | ||
| ``index_col`` are supported. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| file | ||
| The file name to read. | ||
| kind | ||
| The kind of data to read. Valid values are ``"dataset"``, ``"grid"``, and | ||
| ``"image"``. | ||
| region | ||
| The region of interest. Only data within this region will be read. | ||
| column_names | ||
| A list of column names. | ||
| header | ||
| Row number containing column names. ``header=None`` means not to parse the | ||
| column names from table header. Ignored if the row number is larger than the | ||
| number of headers in the table. | ||
| dtype | ||
| Data type. Can be a single type for all columns or a dictionary mapping | ||
| column names to types. | ||
| index_col | ||
| Column to set as index. | ||
|
Comment on lines
+43
to
+53
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we indicate in the docstring that these params are only used for
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At line 31:
|
||
|
|
||
| Returns | ||
| ------- | ||
| data | ||
| Return type depends on the ``kind`` argument: | ||
|
|
||
| - ``"dataset"``: :class:`pandas.DataFrame` | ||
| - ``"grid"`` or ``"image"``: :class:`xarray.DataArray` | ||
|
|
||
|
|
||
| Examples | ||
| -------- | ||
| Read a dataset into a :class:`pandas.DataFrame` object: | ||
|
|
||
| >>> from pygmt import gmtread | ||
| >>> df = gmtread("@hotspots.txt", kind="dataset") | ||
| >>> type(df) | ||
| <class 'pandas.core.frame.DataFrame'> | ||
|
|
||
| Read a grid into an :class:`xarray.DataArray` object: | ||
|
|
||
| >>> dataarray = gmtread("@earth_relief_01d", kind="grid") | ||
| >>> type(dataarray) | ||
| <class 'xarray.core.dataarray.DataArray'> | ||
|
|
||
| Read an image into an :class:`xarray.DataArray` object: | ||
| >>> image = gmtread("@earth_day_01d", kind="image") | ||
| >>> type(image) | ||
| <class 'xarray.core.dataarray.DataArray'> | ||
| """ | ||
| if kind not in {"dataset", "grid", "image"}: | ||
| msg = f"Invalid kind '{kind}': must be one of 'dataset', 'grid', or 'image'." | ||
| raise ValueError(msg) | ||
|
|
||
| if kind != "dataset" and any( | ||
| v is not None for v in [column_names, header, dtype, index_col] | ||
| ): | ||
| msg = ( | ||
| "Only the 'dataset' kind supports the 'column_names', 'header', 'dtype', " | ||
| "and 'index_col' arguments." | ||
| ) | ||
| raise ValueError(msg) | ||
|
|
||
| kwdict = { | ||
| "R": "/".join(f"{v}" for v in region) if is_nonstr_iter(region) else region, # type: ignore[union-attr] | ||
| "T": {"dataset": "d", "grid": "g", "image": "i"}[kind], | ||
| } | ||
|
|
||
| with Session() as lib: | ||
| with lib.virtualfile_out(kind=kind) as voutfile: | ||
| lib.call_module( | ||
| module="read", args=[file, voutfile, *build_arg_list(kwdict)] | ||
| ) | ||
|
|
||
| match kind: | ||
| case "dataset": | ||
| return lib.virtualfile_to_dataset( | ||
| vfname=voutfile, | ||
| column_names=column_names, | ||
| header=header, | ||
| dtype=dtype, | ||
| index_col=index_col, | ||
| ) | ||
| case "grid" | "image": | ||
| raster = lib.virtualfile_to_raster(vfname=voutfile, kind=kind) | ||
| # Add "source" encoding | ||
| source = which(fname=file) | ||
| raster.encoding["source"] = ( | ||
| source[0] if isinstance(source, list) else source | ||
| ) | ||
| _ = raster.gmt # Load GMTDataArray accessor information | ||
| return raster | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| """ | ||
| Test the gmtread function. | ||
| """ | ||
|
|
||
| import importlib | ||
|
|
||
| import numpy as np | ||
| import pytest | ||
| import rioxarray | ||
| import xarray as xr | ||
| from pygmt import gmtread, which | ||
|
|
||
| _HAS_NETCDF4 = bool(importlib.util.find_spec("netCDF4")) | ||
| _HAS_RIORASTERIO = bool(importlib.util.find_spec("rioxarray")) | ||
|
|
||
|
|
||
| @pytest.mark.skipif(not _HAS_NETCDF4, reason="netCDF4 is not installed.") | ||
| def test_io_gmtread_grid(): | ||
| """ | ||
| Test that reading a grid returns an xr.DataArray and the grid is the same as the one | ||
| loaded via xarray.load_dataarray. | ||
| """ | ||
| grid = gmtread("@static_earth_relief.nc", kind="grid") | ||
| assert isinstance(grid, xr.DataArray) | ||
| expected_grid = xr.load_dataarray(which("@static_earth_relief.nc", download="a")) | ||
| assert np.allclose(grid, expected_grid) | ||
|
Comment on lines
+17
to
+26
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also should have a similar test for
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done in a6c4ee7. When I tried to add a test for reading datasets, I realized that the DataFrame returned by the The last column We have three options:
I'm inclined to option 3.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Agree with this. We should also add dtype related checks for the tabular dataset tests in |
||
|
|
||
|
|
||
| @pytest.mark.skipif(not _HAS_RIORASTERIO, reason="rioxarray is not installed.") | ||
| def test_io_gmtread_image(): | ||
| """ | ||
| Test that reading an image returns an xr.DataArray. | ||
| """ | ||
| image = gmtread("@earth_day_01d", kind="image") | ||
| assert isinstance(image, xr.DataArray) | ||
| with rioxarray.open_rasterio( | ||
| which("@earth_day_01d", download="a") | ||
| ) as expected_image: | ||
| assert np.allclose(image, expected_image) | ||
|
|
||
|
|
||
| def test_io_gmtread_invalid_kind(): | ||
| """ | ||
| Test that an invalid kind raises a ValueError. | ||
| """ | ||
| with pytest.raises(ValueError, match="Invalid kind"): | ||
| gmtread("file.cpt", kind="cpt") | ||
|
|
||
|
|
||
| def test_io_gmtread_invalid_arguments(): | ||
| """ | ||
| Test that invalid arguments raise a ValueError for non-'dataset' kind. | ||
| """ | ||
| with pytest.raises(ValueError, match="Only the 'dataset' kind supports"): | ||
| gmtread("file.nc", kind="grid", column_names="foo") | ||
|
|
||
| with pytest.raises(ValueError, match="Only the 'dataset' kind supports"): | ||
| gmtread("file.nc", kind="grid", header=1) | ||
|
|
||
| with pytest.raises(ValueError, match="Only the 'dataset' kind supports"): | ||
| gmtread("file.nc", kind="grid", dtype="float") | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
load_dataarrayfunction was put under thepygmt.ionamespace. Should we consider puttingreadunderpygmt.iotoo? (Thinking about whether we need a low-levelpygmt.clib.readand high-levelpygmt.io.readin my other comment).Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that sounds good. I have two questions:
readsource code inpygmt/io.py, or restructureio.pyinto a directory and put it inpygmt/io/read.pyinstead?load_dataarrayfunction in favor of the newreadfunction?I'm expecting to have a
writefunction that writes a pandas.DataFrame/xarray.DataArray into a tabular/netCDF fileGMT.jl also wraps the
readmodule (xref: https://www.generic-mapping-tools.org/GMTjl_doc/documentation/utilities/gmtread/). The differences are:gmtread, which I think is better sincereadis a little to general.GMTVector,GMTGrid. [This doesn't work in PyGMT]Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think making the
iodirectory sounds good, especially if you're planning on making awritefunction in the future.No, let's keep
load_dataarrayfor now. Something I'm contemplating is to make an xarray BackendEntrypoint that uses GMTread, so that users can then dopygmt.io.load_dataarray(..., engine="gmtread")or something like that. Theload_dataarrayfunction would use this newgmtreadbackend engine by default instead ofnetcdf4.