Commit 0b9997e
Add CLIP-based zero-shot classification for vector features (#585)
* Add CLIP-based zero-shot classification for vector features
Add CLIPVectorClassifier class and clip_classify_vector convenience
function that classify vector polygon features using zero-shot CLIP
inference. For each polygon, extracts the bounding-box image chip from
a raster, encodes it with CLIP, and assigns the best-matching category
label from a user-provided list via cosine similarity.
Features:
- Accepts GeoDataFrame or file path + any raster format
- Pre-computes text embeddings once for efficient batch inference
- Automatic CRS alignment between vector and raster
- Handles multi-band, single-band, and float rasters
- Configurable batch size, top-k predictions, min chip size
- Output save support (.geojson, .parquet, .gpkg)
Closes #129
* Address Copilot review: device, logging, chip logic, tests
- Use get_device() for CUDA/MPS/CPU auto-detection (lazy import)
- Replace print() with logger.info() for model load status
- Fix min_chip_size: use 'or' so either dimension below threshold rejects
- Move Window import to module level alongside window_from_bounds
- Collapse identical single-band/two-band branches in _to_rgb_uint8
- Add FileNotFoundError for vector_data to convenience func docstring
- Add end-to-end batch classification test covering _process_batch
- Add test_narrow_chip_returns_none for the 'or' fix
* Add notebook example for CLIP zero-shot vector classification
- New docs/examples/clip_classify_vector.ipynb demonstrating:
- clip_classify_vector() convenience function
- CLIPVectorClassifier class API with top-k predictions
- Result visualization with color-coded labels and confidence maps
- Export to GeoJSON
- Register notebook in mkdocs.yml nav and execute_ignore
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix transformers 5.x compatibility and notebook issues
- Handle BaseModelOutputWithPooling from get_text_features() and
get_image_features() which changed return type in transformers>=5.0
- Reproject vector data to raster CRS in preview plot to fix empty figure
- Skip unclassified polygons (None top-k) in top-3 display loop
- Shrink confidence colorbar legend for better readability
* Improve normalization, immutability, and CLIPSeg consistency
- Per-channel float normalization in _to_rgb_uint8 preserves color
balance across bands with different value ranges
- uint8 passthrough skips unnecessary renormalization
- GeoDataFrame immutability: classify() never mutates the input
- CLIPSegmentation: use get_device() for CUDA/MPS/CPU auto-detection,
replace print() with logger, align tile normalization with
_to_rgb_uint8 approach
- Add tests for immutability, uint8 passthrough, per-channel
normalization, transformers 5.x compat, and CLIPSeg improvements
* Rework notebook to use RS-CLIP model and diverse land-cover grid
Switch the clip_classify_vector notebook from generic CLIP on building
footprints to flax-community/clip-rsicd-v2 (remote-sensing fine-tuned)
with a 60 m grid of sample patches covering diverse land types.
- Primary demo: land-cover classification (residential, vegetation,
road, parking lot, grass) on grid patches — 82.6% mean confidence
- Secondary demo: building footprint classification with RS-CLIP
- Top-k predictions section using CLIPVectorClassifier class API
- Model recommendations table comparing RS-CLIP vs generic CLIP
- Black formatting applied to test files
* Update notebook to Chesapeake dataset with ground-truth validation
Replace the suburban NAIP example with the Chesapeake Bay Watershed
NAIP scene (eastern Maryland, 2018) paired with its 13-class land-cover
ground truth raster. This produces significantly better results:
- 88.6% overall agreement with Chesapeake ground truth (zero-shot)
- Forest recall: 97%, Agricultural field recall: 83%
- Mean confidence: 0.928
- Balanced label distribution (244 forest / 256 agricultural)
Key changes:
- Dataset: m_3807511_ne_18_060_20181104.tif + landcover raster
- Labels: forest / agricultural field (matching dominant land cover)
- New section: ground-truth comparison with per-class accuracy
- Downsampled raster display (10x) to prevent OOM on Colab
- Side-by-side CLIP prediction vs ground truth visualization
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: giswqs <giswqs@gmail.com>1 parent 627a5aa commit 0b9997e
File tree
7 files changed
+2172
-39
lines changed- docs/examples
- geoai
- tests
7 files changed
+2172
-39
lines changed
0 commit comments