Conversation
|
@phargogh I think this is ready for an initial review, with the understanding that, if we decide to go with this implementation, we'll need to add FlatGeobufs alongside our shapefiles to use for clipping before these changes can be deployed. |
phargogh
left a comment
There was a problem hiding this comment.
Thanks @megannissel ! I just had a handful of minor questions and maybe one thing that might make sense to address, but this is looking good to me!
| finally: | ||
| if invalid: | ||
| invalid_feature_count += 1 | ||
| app.logger.warning( | ||
| f"The geometry at feature {feature.GetFID()} is invalid " | ||
| "and will be skipped.") | ||
|
|
||
| target_layer.CommitTransaction() | ||
|
|
||
| if invalid_feature_count: | ||
| app.logger.warning( | ||
| f"{invalid_feature_count} features in {source_vector_path} " | ||
| "were found to be invalid during clipping and were skipped.") | ||
|
|
||
| base_layer = None | ||
| base_vector = None | ||
| target_layer = None | ||
| target_vector = None |
There was a problem hiding this comment.
Now that I'm thinking of it, should we maybe move all of the target_layer.CommitTransaction(), and all of the = Nones into the finally block? Doing so might help make sure that we're always closing files, committing transactions, etc, even if there's an error.
There was a problem hiding this comment.
We can't move them into the finally as written, since that try/except/finally block is inside of a for loop that's iterating over the features. We could wrap the whole thing in another try/except, though.
There was a problem hiding this comment.
Oh right, I forgot about the loop up there! OK yes, especially since we're running this on linux, it isn't as necessary to wrap things up in that way
clipping-service/app/app.py
Outdated
| app.logger.error("Failed to read raster info for %s", vsi_raster_path) | ||
| raise | ||
| def cached_file_info(vsi_file_path, file_type): | ||
| app.logger.info("Getting file info...") |
There was a problem hiding this comment.
Thinking ahead to when we need to look at the logs in GCP, might it be useful to also log the file type and vsi filepath?
clipping-service/app/app.py
Outdated
| if source_file_type == VECTOR: | ||
| source_file_path = source_file_path.replace('.mvt', '.fgb') |
There was a problem hiding this comment.
Are we assuming that the MVT is what we will be passing to the clipping layer? If so, should we assert that it has the expected extension?
There was a problem hiding this comment.
Right now it's always .mvt, since that's all we're supporting for the map preview at the moment. But this comment got me thinking that it's probably a good idea to keep things flexible in case we start supporting geojson again (or some other format) in the future. Especially since the exact extension of the source file doesn't really matter and the point here is to swap extensions!
|
Hello! @megannissel, I am not familiar with FlatGeobufs, but it seems to me to be a valid choice for our purposes. Do you have an example of the code you used to convert the NOAA Shorelines shapefile to that format? I'd like to see the processing you used. |
|
@empavia Absolutely! The code I used was very simple, just: As with other things on the Hub, the filenames should match between the source shapefile and the .fgb, and they will need to live next to each other in the same directory. |
Thanks, Megan! Glad we can do this with ogr2ogr as that wasn't clear with my google search. I don't have any reservations on merging this and creating the .fgb files for this purpose! We can easily add this line when we make tiles. |
|
OK! I think we're all good to have this merged now, so thank you for adding this feature, @megannissel ! |
Fixes #86
Implements vector clipping functionality analogous to the existing raster clipping functionality.
This version uses FlatGeobuf as a vector format (it does seem faster than other vector alternatives).
Currently, there's a FlatGeobuf version of the NOAA Shorelines layer in our temp cache and that URL is hard-coded in for testing purposes. If we decide we want to use FlatGeobufs (or some other format we don't already have) for all clipping operations, we'll need to add those alongside the source vectors for our datasets.
Things that I've observed:
/vsicurl/-prefixed URL forpygeoprocessing.get_vector_infohas been consistently faster than the non-vsicurl alternative./vsicurl/-prefixed URL forpygeoprocessing.reproject_vectorwas consistently slower. Likewise when opening the vector and iterating through features for clipping.Layer.SetSpatialFilter, but this resulted in far more variable run-times. In the time testing I did (admittedly small sample size), the shortest run took 42 seconds; the longest took 113. Without GDAL's spatial filter, runs varied from 56 to 62 seconds.James was able to optimize this with two key changes:
feature.Clone()significantly sped up the copying of features from the base vector to the clipped vectorreproject_vector