Skip to content

Add by_element for binary predicates, operations, and relate #2593

@rariariari

Description

@rariariari

st_distance has by_element = TRUE for paired computation, dist(x[1], y[1]), dist(x[2], y[2]), etc. None of the other binary functions have it. The predicates (st_intersects, st_contains, ...), the geometric operations (st_intersection, st_difference, st_sym_difference), and st_relate all compute the full Cartesian product. If you've already "matched" your geometries the only way to get op(x[i], y[i]) (the diagonal of the Cartesian) is mapply or purrr::map2. On top of that, st_distance's own by_element path falls back to mapply(st_distance, x, y) for non-POINT geometries, so it pays the same per-call overhead it was supposed to avoid.

My sense is that since every predicate already passes ... to st_geos_binop, and every geometric op passes ... to geos_op2_geom, adding by_element = FALSE to these two internal dispatch functions would make it available everywhere. st_intersects(x, y, by_element = TRUE) returns a logical vector, st_intersection(x, y, by_element = TRUE) returns an sfc, st_relate(x, y, by_element = TRUE) returns a character vector, all without changing any public method signatures. The only exception is st_relate, which doesn't currently pass ..., so it needs either ... added or an explicit by_element parameter. Three C++ functions back it, following the CPL_geos_nearest_points pairwise pattern: one GEOS context, one bulk WKB pass, a tight loop calling the GEOS function directly with no STR tree or candidate search. Function pointer selection (which_geom_fn, geom_fn/geom_fnp, dist_fn/dist_parfn) happens before the loop, matching the existing CPL_geos_binop and CPL_geos_op2 style. This also makes the API more consistent, since right now st_distance has by_element, st_union has by_feature, and st_nearest_points has pairwise, three names for the same idea.

There are a few edge cases (st_disjoint wraps st_intersects at the R level with its own sgbp logic, the sf method for ops would need to decide what to do about attributes, st_relate needs ... or an explicit parameter) but I think they're all straightforward to handle. I have a working implementation on my fork against current main -- C++ compiles cleanly, all existing tests pass, 5–60x faster than mapply/map2 depending on geometry complexity. Happy to open a PR if there's interest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions