Skip to content

cv_spatial - double intersection #40

@bfakos

Description

@bfakos

Dear Roozbeh,
I tried to run cv_spatial(x), where nrow(x) == 37000000, and it does not finish within a day, even if iteration == 1 or selection == "systematic". I wanted to understand the source code, find the bottleneck, and modify the code in a way it run parallel in multiple cores, when I found this issue. The code of cv_spatial() calls sf::st_intersects() twice, even though st_intersects() is a really time-consuming function (and can be easily parallelized).

L254:  sub_blocks <- blocks[x, ]

Here, you run "[.sf", which call st_intersects() and create a logical mask from the results (please refer to L325 in sf.R in package "sf"), but drop the results of the predicate function

L277: sf::st_intersects(sf::st_geometry(x), sf::st_geometry(sub_blocks))

You call again the time-consuming st_intersects(), with almost the same inputs.
I recommend calling st_intersects() once, then calculating the logical mask, and finally subsetting the blocks. E.g. something like this:

blocks_intersection <- sf::st_intersects(sf::st_geometry(x), sf::st_geometry(sub_blocks))
mask <- lengths(blocks_intersection) != 0 
sub_blocks <- blocks[mask, ]
blocks_len <- nrow(sub_blocks)
blocks_df <- as.data.frame(blocks_intersection)

Also I suggest parallelizing st_intersects() that could make cv_spatial() much faster. Please let me know if I could collaborate in this development.
Have a nice week,
Ákos

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions