Skip to content

Conversation

@SuMayaBee
Copy link
Contributor

@SuMayaBee SuMayaBee commented Dec 19, 2025

Description

Allow users to pass image_path and label arguments to read_file() when reading shapefiles that don't have these columns.

Problem

When users have a shapefile without image_path or label columns, read_file() crashes with:

ValueError: No image_path column found in shapefile, please specify rgb path

Solution

  1. Pass image_path argument to shapefile_to_annotations() - Modified utilities.py to forward the image_path argument when reading shapefiles.

  2. Add warning when image_path is passed - Users see a warning confirming the value will be assigned to every row.

  3. Allow label argument - Removed blocking error for missing label column. Now defaults to "Unknown" if not provided.

  4. Write tests for shapefiles - Added 4 tests covering all scenarios.

  5. Add documentation example - Added example in docs/user_guide/01_Reading_data.md with argument table showing required vs optional parameters.

Before & After

For shapefiles without image_path and label columns:

# Scenario Before After
1 Pass only image_path argument ValueError ✅ Works, label defaults to "Unknown"
2 Pass both image_path and label arguments ValueError ✅ Works

For shapefiles with image_path and label columns:

# Scenario Before After
1 No arguments passed ✅ Works ✅ Works (reads from columns)
2 Pass image_path argument ✅ Works ✅ Works (uses argument, overrides column)
3 Pass label argument ✅ Works ✅ Works (uses argument, overrides column)
4 Pass both arguments ✅ Works ✅ Works (uses arguments, overrides columns)

Usage Improvement

Previously (workaround):

raster_path = "/path/to/image.tif"
gdf = gpd.read_file("/path/to/annotations.shp")
gdf["image_path"] = os.path.basename(raster_path)
gdf["label"] = "Tree"
ground_truth = read_file(gdf, root_dir=os.path.dirname(raster_path))

This required two extra library imports (os and geopandas).

Now:

ground_truth = read_file("/path/to/annotations.shp", image_path="/path/to/image.tif", label="Tree")

No extra library imports needed. Here, root_dir is optional when image_path is a full path. label is optional too and defaults to "Unknown" when not provided.

Related Issue(s)

Closes #997

AI-Assisted Development

  • I used AI tools (e.g., GitHub Copilot, ChatGPT, etc.) in developing this PR
  • I understand all the code I'm submitting

@codecov
Copy link

codecov bot commented Dec 19, 2025

Codecov Report

❌ Patch coverage is 33.33333% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.79%. Comparing base (0ab23a3) to head (15c07a7).
⚠️ Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
src/deepforest/utilities.py 33.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1242      +/-   ##
==========================================
+ Coverage   87.73%   87.79%   +0.05%     
==========================================
  Files          20       20              
  Lines        2716     2770      +54     
==========================================
+ Hits         2383     2432      +49     
- Misses        333      338       +5     
Flag Coverage Δ
unittests 87.79% <33.33%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jveitchmichaelis
Copy link
Collaborator

Thanks for the contribution. A couple of comments on the tests, there is quite a lot of duplicate code and perhaps it would be better to use pytest fixtures for some of the input data?

Please could you also include the AI assistance declaration from the PR template? (you can see an example here)

@SuMayaBee
Copy link
Contributor Author

@jveitchmichaelis Thanks for the feedback! I've addressed both points:

  1. Refactored the tests to use a sample_shapefile_gdf pytest fixture to eliminate duplicate code
  2. Added the AI assistance declaration to the PR description

Copy link
Contributor

@henrykironde henrykironde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SuMayaBee Thanks for your perspective on the issue. The function you’re modifying has changed; could you rebase and update this PR so we can confirm whether these changes are still needed?

@SuMayaBee SuMayaBee force-pushed the feat/improve-read-file-shp-without-image-path-column-997 branch 2 times, most recently from 622f1d7 to ea257ed Compare January 16, 2026 08:43
@SuMayaBee SuMayaBee force-pushed the feat/improve-read-file-shp-without-image-path-column-997 branch from ea257ed to e45b597 Compare January 16, 2026 12:20
@SuMayaBee
Copy link
Contributor Author

@henrykironde Thanks! I've rebased onto upstream/main and updated the implementation to work with the new function structure. The changes are still needed - I've integrated them and fixed a bug where label override wasn't being applied. All tests pass. Ready for review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve deepforest.utilities.read_file for reading a .shp that doesn't have a image_path column

3 participants