Skip to content

Match dask all_touched polygon boundary to eager on grid lines (#3384)#3386

Merged
brendancol merged 1 commit into
mainfrom
deep-sweep-accuracy-rasterize-2026-06-18
Jun 18, 2026
Merged

Match dask all_touched polygon boundary to eager on grid lines (#3384)#3386
brendancol merged 1 commit into
mainfrom
deep-sweep-accuracy-rasterize-2026-06-18

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #3384

What

Under all_touched=True, the dask backends (dask+numpy and dask+cupy) could produce a different raster than the eager numpy/cupy backends for the same input when a polygon boundary segment fell exactly on a pixel-grid line.

The eager backend converts polygon boundary vertices to pixel coordinates against the full raster origin and walks them with the Amanatides-Woo supercover traversal. The dask tile workers re-extracted those segments per tile against each tile's own world origin, which reintroduced a different floating-point rounding. A segment on an exact integer pixel row in the full grid ((10.0 - 4.0) / 0.4 == 15.0) landed at 1.9999999999999996 in tile-local space, so floor() picked the wrong cell and the supercover walk burned a different row/column. The divergence showed up whenever an on-grid boundary segment was split at a tile boundary that did not itself sit on that grid line, and it affected every merge mode.

Fix

Extract the boundary float segments in the global grid frame (same px/py and origin as the eager path), then shift to tile-local space by the integer tile offset. Integer translation does not perturb the fractional part, so floor() lands on the same cell as the eager walk. The same global-frame segments feed the sum/count dedup path, so all merge modes stay aligned.

Backend coverage

  • numpy: unchanged (reference)
  • cupy: unchanged (reference)
  • dask+numpy: fixed
  • dask+cupy: fixed

Test plan

  • New regression test pins dask == eager for a polygon whose edges sit on pixel-grid lines, across all four backends and all six merge modes, including the chunk layouts that split an on-grid edge at a non-aligned tile boundary
  • Existing rasterize suite passes (394 passed, 2 skipped): test_rasterize, test_rasterize_all_touched_supercover_2169, test_rasterize_lines_all_touched_3102, test_rasterize_merge_dedup_3304, test_rasterize_accuracy, test_rasterize_nan_propagation_2255, test_rasterize_mixed_type_ordered_merge_3296
  • dask+cupy paths verified on a CUDA host

The dask tile workers re-extracted polygon boundary float segments per
tile against each tile's own world origin. That reintroduced a different
floating-point rounding than the eager backend: a boundary segment on an
exact integer pixel row in the full grid landed just below it in
tile-local space, so floor() picked the wrong cell and the supercover
walk burned a different row/column. The dask+numpy and dask+cupy
backends then disagreed with eager numpy/cupy under all_touched=True
whenever an on-grid boundary segment was split at a non-aligned tile
boundary, across every merge mode.

Extract the boundary float segments in the global grid frame (identical
px/py and origin to the eager path) and shift to tile-local space by the
integer tile offset, which preserves the fractional part and the floor
result. Add a regression test pinning dask == eager for a polygon whose
edges sit on pixel-grid lines, across all four backends and all merge
modes.

Closes #3384

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Match dask all_touched polygon boundary to eager on grid lines (#3384)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None.

Nits (optional improvements)

  • _extract_polygon_boundary_segments_float_global duplicates most of _extract_polygon_boundary_segments_float: the ring-coordinate collection and the Liang-Barsky pass are identical, and only the final world-to-pixel conversion differs (full-grid origin plus the integer offset). A later cleanup could factor the shared body into a helper that takes the conversion frame as parameters. Not worth doing in a bug-fix PR, but worth flagging so the two copies do not drift.

What looks good

  • The root cause is pinned precisely. Clipping stays in tile world space so only in-tile ring segments are walked, but the world-to-pixel conversion now uses the full grid's px/py and origin, then shifts by the integer tile offset. Integer translation does not perturb the fractional part, so floor() lands on the same cell as the eager walk. I confirmed the global-frame helper produces tile-local row 2.0 exactly where the old per-tile extraction produced 1.9999999999999996.
  • All six merge modes are covered, including the sum/count dedup path. The same global-frame segments are threaded into _boundary_cells_sum_count via the new boundary_segments parameter, so the dedup path and the direct supercover path stay aligned.
  • Both dask backends are fixed and the eager backends are untouched (they still call the original extraction). The new args are threaded through both tile workers and both dask drivers with the same integer r_start/c_start the tile loop already computes.
  • The regression test exercises the case the existing supercover parity module deliberately skips (vertices on pixel-grid lines), across all four backends and all merge modes, including the chunk layouts that split an on-grid edge at a non-aligned tile boundary. The 49-test run passed including dask+cupy on a CUDA host, and the broader rasterize suite (394 passed) shows no regression.

Checklist

  • Algorithm matches reference: yes; the supercover walk is unchanged, only the coordinate frame feeding it was corrected to match eager
  • All implemented backends produce consistent results: yes; verified numpy/cupy/dask+numpy/dask+cupy parity for all six merge modes and both all_touched values
  • NaN handling is correct: unchanged; the fix touches only pixel-coordinate arithmetic, not value merging
  • Edge cases covered by tests: on-grid boundary, multiple chunk layouts, all merge modes
  • Dask chunk boundaries handled correctly: this is exactly what the PR fixes
  • No premature materialization or unnecessary copies: no new .compute()/.values; segments are computed host-side as before
  • Benchmark exists or is not needed: not needed (bug fix, no new public API)
  • README feature matrix updated: not applicable (no new function, no backend change)
  • Docstrings present and accurate: the new helper is documented; the public docstring's all_touched parity claim is now actually true on dask

@brendancol brendancol merged commit f2190f7 into main Jun 18, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rasterize all_touched=True: dask backends diverge from eager when a polygon boundary lies on a pixel-grid line

1 participant