Skip to content

Vector Clipping#179

Merged
phargogh merged 6 commits intomasterfrom
feature/86-vector-clipping
Dec 3, 2025
Merged

Vector Clipping#179
phargogh merged 6 commits intomasterfrom
feature/86-vector-clipping

Conversation

@megannissel
Copy link

@megannissel megannissel commented Nov 20, 2025

Fixes #86

Implements vector clipping functionality analogous to the existing raster clipping functionality.

This version uses FlatGeobuf as a vector format (it does seem faster than other vector alternatives).

Currently, there's a FlatGeobuf version of the NOAA Shorelines layer in our temp cache and that URL is hard-coded in for testing purposes. If we decide we want to use FlatGeobufs (or some other format we don't already have) for all clipping operations, we'll need to add those alongside the source vectors for our datasets.

Things that I've observed:

  • Using a /vsicurl/-prefixed URL for pygeoprocessing.get_vector_info has been consistently faster than the non-vsicurl alternative.
  • Using a /vsicurl/-prefixed URL for pygeoprocessing.reproject_vector was consistently slower. Likewise when opening the vector and iterating through features for clipping.
  • I experimented with GDAL's Layer.SetSpatialFilter, but this resulted in far more variable run-times. In the time testing I did (admittedly small sample size), the shortest run took 42 seconds; the longest took 113. Without GDAL's spatial filter, runs varied from 56 to 62 seconds.

James was able to optimize this with two key changes:

  • Using GDAL's feature.Clone() significantly sped up the copying of features from the base vector to the clipped vector
  • Integrating the coordinate transformation instead of separately calling reproject_vector

@megannissel
Copy link
Author

@phargogh I think this is ready for an initial review, with the understanding that, if we decide to go with this implementation, we'll need to add FlatGeobufs alongside our shapefiles to use for clipping before these changes can be deployed.

@megannissel megannissel marked this pull request as ready for review November 26, 2025 16:25
@megannissel megannissel requested a review from phargogh November 26, 2025 16:26
@megannissel megannissel changed the title [WIP] Vector Clipping Vector Clipping Nov 26, 2025
Copy link
Member

@phargogh phargogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @megannissel ! I just had a handful of minor questions and maybe one thing that might make sense to address, but this is looking good to me!

Comment on lines +154 to +171
finally:
if invalid:
invalid_feature_count += 1
app.logger.warning(
f"The geometry at feature {feature.GetFID()} is invalid "
"and will be skipped.")

target_layer.CommitTransaction()

if invalid_feature_count:
app.logger.warning(
f"{invalid_feature_count} features in {source_vector_path} "
"were found to be invalid during clipping and were skipped.")

base_layer = None
base_vector = None
target_layer = None
target_vector = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I'm thinking of it, should we maybe move all of the target_layer.CommitTransaction(), and all of the = Nones into the finally block? Doing so might help make sure that we're always closing files, committing transactions, etc, even if there's an error.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't move them into the finally as written, since that try/except/finally block is inside of a for loop that's iterating over the features. We could wrap the whole thing in another try/except, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, I forgot about the loop up there! OK yes, especially since we're running this on linux, it isn't as necessary to wrap things up in that way

app.logger.error("Failed to read raster info for %s", vsi_raster_path)
raise
def cached_file_info(vsi_file_path, file_type):
app.logger.info("Getting file info...")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking ahead to when we need to look at the logs in GCP, might it be useful to also log the file type and vsi filepath?

Comment on lines +288 to +289
if source_file_type == VECTOR:
source_file_path = source_file_path.replace('.mvt', '.fgb')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we assuming that the MVT is what we will be passing to the clipping layer? If so, should we assert that it has the expected extension?

Copy link
Author

@megannissel megannissel Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it's always .mvt, since that's all we're supporting for the map preview at the moment. But this comment got me thinking that it's probably a good idea to keep things flexible in case we start supporting geojson again (or some other format) in the future. Especially since the exact extension of the source file doesn't really matter and the point here is to swap extensions!

@megannissel megannissel requested a review from phargogh December 2, 2025 19:36
Copy link
Member

@phargogh phargogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! As discussed, I'll hold off on merging until @empavia has had a chance to confirm that it's OK to depend on flatgeobufs alongside our other vectors.

@empavia
Copy link

empavia commented Dec 3, 2025

Hello! @megannissel, I am not familiar with FlatGeobufs, but it seems to me to be a valid choice for our purposes. Do you have an example of the code you used to convert the NOAA Shorelines shapefile to that format? I'd like to see the processing you used.

@megannissel
Copy link
Author

megannissel commented Dec 3, 2025

@empavia Absolutely! The code I used was very simple, just: ogr2ogr -f FlatGeobuf target_filename.fgb base_filename.shp. The spatial index is one of the handy features of FlatGeobufs, and it's created by default so no additional arguments need to be passed.

As with other things on the Hub, the filenames should match between the source shapefile and the .fgb, and they will need to live next to each other in the same directory.

@empavia
Copy link

empavia commented Dec 3, 2025

@empavia Absolutely! The code I used was very simple, just: ogr2ogr -f FlatGeobuf target_filename.fgb base_filename.shp. The spatial index is one of the handy features of FlatGeobufs, and it's created by default so no additional arguments need to be passed.

Thanks, Megan! Glad we can do this with ogr2ogr as that wasn't clear with my google search. I don't have any reservations on merging this and creating the .fgb files for this purpose! We can easily add this line when we make tiles.

@phargogh
Copy link
Member

phargogh commented Dec 3, 2025

OK! I think we're all good to have this merged now, so thank you for adding this feature, @megannissel !

@phargogh phargogh merged commit c62e085 into master Dec 3, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support vector clipping

3 participants