Match dask all_touched polygon boundary to eager on grid lines (#3384)#3386
Merged
Conversation
The dask tile workers re-extracted polygon boundary float segments per tile against each tile's own world origin. That reintroduced a different floating-point rounding than the eager backend: a boundary segment on an exact integer pixel row in the full grid landed just below it in tile-local space, so floor() picked the wrong cell and the supercover walk burned a different row/column. The dask+numpy and dask+cupy backends then disagreed with eager numpy/cupy under all_touched=True whenever an on-grid boundary segment was split at a non-aligned tile boundary, across every merge mode. Extract the boundary float segments in the global grid frame (identical px/py and origin to the eager path) and shift to tile-local space by the integer tile offset, which preserves the fractional part and the floor result. Add a regression test pinning dask == eager for a polygon whose edges sit on pixel-grid lines, across all four backends and all merge modes. Closes #3384
brendancol
commented
Jun 18, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
PR Review: Match dask all_touched polygon boundary to eager on grid lines (#3384)
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
None.
Nits (optional improvements)
_extract_polygon_boundary_segments_float_globalduplicates most of_extract_polygon_boundary_segments_float: the ring-coordinate collection and the Liang-Barsky pass are identical, and only the final world-to-pixel conversion differs (full-grid origin plus the integer offset). A later cleanup could factor the shared body into a helper that takes the conversion frame as parameters. Not worth doing in a bug-fix PR, but worth flagging so the two copies do not drift.
What looks good
- The root cause is pinned precisely. Clipping stays in tile world space so only in-tile ring segments are walked, but the world-to-pixel conversion now uses the full grid's px/py and origin, then shifts by the integer tile offset. Integer translation does not perturb the fractional part, so floor() lands on the same cell as the eager walk. I confirmed the global-frame helper produces tile-local row 2.0 exactly where the old per-tile extraction produced 1.9999999999999996.
- All six merge modes are covered, including the sum/count dedup path. The same global-frame segments are threaded into
_boundary_cells_sum_countvia the newboundary_segmentsparameter, so the dedup path and the direct supercover path stay aligned. - Both dask backends are fixed and the eager backends are untouched (they still call the original extraction). The new args are threaded through both tile workers and both dask drivers with the same integer r_start/c_start the tile loop already computes.
- The regression test exercises the case the existing supercover parity module deliberately skips (vertices on pixel-grid lines), across all four backends and all merge modes, including the chunk layouts that split an on-grid edge at a non-aligned tile boundary. The 49-test run passed including dask+cupy on a CUDA host, and the broader rasterize suite (394 passed) shows no regression.
Checklist
- Algorithm matches reference: yes; the supercover walk is unchanged, only the coordinate frame feeding it was corrected to match eager
- All implemented backends produce consistent results: yes; verified numpy/cupy/dask+numpy/dask+cupy parity for all six merge modes and both all_touched values
- NaN handling is correct: unchanged; the fix touches only pixel-coordinate arithmetic, not value merging
- Edge cases covered by tests: on-grid boundary, multiple chunk layouts, all merge modes
- Dask chunk boundaries handled correctly: this is exactly what the PR fixes
- No premature materialization or unnecessary copies: no new .compute()/.values; segments are computed host-side as before
- Benchmark exists or is not needed: not needed (bug fix, no new public API)
- README feature matrix updated: not applicable (no new function, no backend change)
- Docstrings present and accurate: the new helper is documented; the public docstring's all_touched parity claim is now actually true on dask
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #3384
What
Under
all_touched=True, the dask backends (dask+numpy and dask+cupy) could produce a different raster than the eager numpy/cupy backends for the same input when a polygon boundary segment fell exactly on a pixel-grid line.The eager backend converts polygon boundary vertices to pixel coordinates against the full raster origin and walks them with the Amanatides-Woo supercover traversal. The dask tile workers re-extracted those segments per tile against each tile's own world origin, which reintroduced a different floating-point rounding. A segment on an exact integer pixel row in the full grid (
(10.0 - 4.0) / 0.4 == 15.0) landed at1.9999999999999996in tile-local space, sofloor()picked the wrong cell and the supercover walk burned a different row/column. The divergence showed up whenever an on-grid boundary segment was split at a tile boundary that did not itself sit on that grid line, and it affected every merge mode.Fix
Extract the boundary float segments in the global grid frame (same
px/pyand origin as the eager path), then shift to tile-local space by the integer tile offset. Integer translation does not perturb the fractional part, sofloor()lands on the same cell as the eager walk. The same global-frame segments feed the sum/count dedup path, so all merge modes stay aligned.Backend coverage
Test plan
test_rasterize,test_rasterize_all_touched_supercover_2169,test_rasterize_lines_all_touched_3102,test_rasterize_merge_dedup_3304,test_rasterize_accuracy,test_rasterize_nan_propagation_2255,test_rasterize_mixed_type_ordered_merge_3296