-
Notifications
You must be signed in to change notification settings - Fork 302
Description
st_distance has by_element = TRUE for paired computation, dist(x[1], y[1]), dist(x[2], y[2]), etc. None of the other binary functions have it. The predicates (st_intersects, st_contains, ...), the geometric operations (st_intersection, st_difference, st_sym_difference), and st_relate all compute the full Cartesian product. If you've already "matched" your geometries the only way to get op(x[i], y[i]) (the diagonal of the Cartesian) is mapply or purrr::map2. On top of that, st_distance's own by_element path falls back to mapply(st_distance, x, y) for non-POINT geometries, so it pays the same per-call overhead it was supposed to avoid.
My sense is that since every predicate already passes ... to st_geos_binop, and every geometric op passes ... to geos_op2_geom, adding by_element = FALSE to these two internal dispatch functions would make it available everywhere. st_intersects(x, y, by_element = TRUE) returns a logical vector, st_intersection(x, y, by_element = TRUE) returns an sfc, st_relate(x, y, by_element = TRUE) returns a character vector, all without changing any public method signatures. The only exception is st_relate, which doesn't currently pass ..., so it needs either ... added or an explicit by_element parameter. Three C++ functions back it, following the CPL_geos_nearest_points pairwise pattern: one GEOS context, one bulk WKB pass, a tight loop calling the GEOS function directly with no STR tree or candidate search. Function pointer selection (which_geom_fn, geom_fn/geom_fnp, dist_fn/dist_parfn) happens before the loop, matching the existing CPL_geos_binop and CPL_geos_op2 style. This also makes the API more consistent, since right now st_distance has by_element, st_union has by_feature, and st_nearest_points has pairwise, three names for the same idea.
There are a few edge cases (st_disjoint wraps st_intersects at the R level with its own sgbp logic, the sf method for ops would need to decide what to do about attributes, st_relate needs ... or an explicit parameter) but I think they're all straightforward to handle. I have a working implementation on my fork against current main -- C++ compiles cleanly, all existing tests pass, 5–60x faster than mapply/map2 depending on geometry complexity. Happy to open a PR if there's interest.