-
Couldn't load subscription status.
- Fork 15
Update pgstac_reader #101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update pgstac_reader #101
Conversation
…ample for dumping pgstac partitions
|
@ghidalgo3 ^^^ |
stac_geoparquet/pgstac_reader.py
Outdated
| # types are consistent across all items. | ||
| if "naip:year" in item["properties"]: | ||
| item["properties"]["naip:year"] = int(item["properties"]["naip:year"]) | ||
| if "proj:epsg" in item["properties"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think stac-geoparquet should have special cases for NAIP, or any other particular collection. Can we have a callable argument to pgstac_to_iter that allows callers to inspect and modify the item before it is yielded by the iterator?
Consider this item: https://planetarycomputer.microsoft.com/api/stac/v1/collections/modis-21A2-061/items/MYD21A2.A2025113.h35v10.061.2025125160047
When I tried to export this collection, I got this error:
File "/Users/gustavo/miniconda3/envs/pctasks312/lib/python3.12/site-packages/stac_geoparquet/pgstac_reader.py", line 89, in __call__
item["properties"]["proj:epsg"] = int(item["properties"]["proj:epsg"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I removed that and just added in the case for the test items I was using to be called in the row_func that is called per item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still having problems with item[properties][proj:epsg] with OpenPC's STAC items. I think because some STAC items have a null properties.proj:epsg, and this bit of code runs before the row_func is called, we error out exporting collections that have a null EPSG proj property, like the one linked above.
Can you remove this cast and make it the responsiblity of the row_func?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d'oh. missed that one. removed now.
…s inline with all pgstac_to* functions, add sync pgstac_to_parquet function
|
@ghidalgo3 I made it so that you can pass a row_func that can transform items to every pgstac_to* function. I moved the code to fix the issues with naip:year in my example dataset into the row_func that is used in the notebook. I also made the sync example an actual function in the library. |
| ) | ||
|
|
||
| to_parquet( | ||
| record_batch_reader, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parsing output_path into a Path doesn't work well with fsspec, because it complains that only local filesystems can use Path. I don't think it's necessary to parse output_path into a Path, can you leave it as as string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm. Yeah, I definitely was just thinking local filesystem here and using Path because we need to make sure that directories exist before putting things in them. I'll need to look at what utilities there may be in fsspec land or alternately make sure that I only create directories when using local filesystem.
|
I can't approve, but I have tested these changes in the |
Supersedes #97