Skip to content

Spatial joins #113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Apr 23, 2024
Merged
6 changes: 5 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,16 @@ Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"

[weakdeps]
FlexiJoins = "e37f2e79-19fa-4eb7-8510-b63b51fe0a37"
Proj = "c94c279d-25a6-4763-9509-64d165bea63e"

[extensions]
GeometryOpsFlexiJoinsExt = "FlexiJoins"
GeometryOpsProjExt = "Proj"

[compat]
CoordinateTransformations = "0.5, 0.6"
FlexiJoins = "0.1.30"
GeoInterface = "1.2"
GeometryBasics = "0.4.7"
LinearAlgebra = "1"
Expand All @@ -32,6 +35,7 @@ ArchGDAL = "c9ce4bd3-c3d5-55b8-8973-c0e20141b8c3"
CoordinateTransformations = "150eb455-5306-5404-9cee-2592286d6298"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
FlexiJoins = "e37f2e79-19fa-4eb7-8510-b63b51fe0a37"
GeoFormatTypes = "68eda718-8dee-11e9-39e7-89f7f65f511f"
GeoJSON = "61d90e0f-e114-555e-ac52-39dfb47a3ef9"
JLD2 = "033835bb-8acc-5ee8-8aae-3f567f8a3819"
Expand All @@ -42,4 +46,4 @@ Shapefile = "8e980c4a-a4fe-5da2-b3a7-4b4b0353a2f4"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
test = ["ArchGDAL", "CoordinateTransformations", "DataFrames", "Distributions", "GeoFormatTypes", "GeoJSON", "Proj", "JLD2", "LibGEOS", "Random", "Shapefile", "Test"]
test = ["ArchGDAL", "CoordinateTransformations", "DataFrames", "Distributions", "FlexiJoins", "GeoFormatTypes", "GeoJSON", "Proj", "JLD2", "LibGEOS", "Random", "Shapefile", "Test"]
5 changes: 5 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
DocumenterVitepress = "4710194d-e776-4893-9690-8d956a29c365"
DoubleFloats = "497a8b3b-efae-58df-a0af-a86822472b78"
FlexiJoins = "e37f2e79-19fa-4eb7-8510-b63b51fe0a37"
GADM = "a8dd9ffe-31dc-4cf5-a379-ea69100a8233"
GeoDatasets = "ddc7317b-88db-5cb5-a849-8449e5df04f9"
GeoInterface = "cf35fbd7-0cd7-5166-be24-54bfbe79505f"
GeoInterfaceMakie = "0edc0954-3250-4c18-859d-ec71c1660c08"
Expand All @@ -20,6 +23,8 @@ LibGEOS = "a90b1aa1-3769-5649-ba7e-abc5a9d163eb"
Literate = "98b081ad-f1c9-55d3-8b20-4c87d4299306"
Makie = "ee78f7c6-11fb-53f2-987a-cfe4a2b5a57a"
MakieThemes = "e296ed71-da82-5faf-88ab-0034a9761098"
Measurements = "eff96d63-e80a-5855-80a2-b1b0885c5ab7"
MonteCarloMeasurements = "0987c9cc-fe09-11e8-30f0-b96dd679fdca"
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
Proj = "c94c279d-25a6-4763-9509-64d165bea63e"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Expand Down
3 changes: 3 additions & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,9 @@ makedocs(;
pages=[
"Introduction" => "introduction.md",
"API Reference" => "api.md",
"Tutorials" => [
"Spatial Joins" => "tutorials/spatial_joins.md",
],
"Source code" => literate_pages,
],
warnonly = true,
Expand Down
77 changes: 77 additions & 0 deletions docs/src/tutorials/spatial_joins.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Spatial joins

Spatial joins are joins which are based not on equality, but on some predicate ``p(x, y)``, which takes two geometries, and returns a value of either `true` or `false`. For geometries, the [`DE-9IM`](https://en.wikipedia.org/wiki/DE-9IM) spatial relationship model is used to determine the spatial relationship between two geometries.

In this tutorial, we will show how to perform a spatial join on first a toy dataset and then two Natural Earth datasets, to show how this can be used in the real world.

In order to perform the spatial join, we use [FlexiJoins.jl](https://github.com/JuliaAPlavin/FlexiJoins.jl) to perform the join, specifically using its `by_pred` joining method. This allows the user to specify a predicate in the following manner:
```julia
[inner/left/outer/...]join((table1, table1),
by_pred(:table1_column, predicate_function, :table2_column)
)
```

We have enabled the use of all of GeometryOps' boolean comparisons here. These are:

```julia
GO.contains, GO.within, GO.intersects, GO.touches, GO.crosses, GO.disjoint, GO.overlaps, GO.covers, GO.coveredby, GO.equals
```

## Simple example

This example demonstrates how to perform a spatial join between two datasets: a set of polygons and a set of randomly generated points.

The polygons are represented as a DataFrame with geometries and colors, while the points are stored in a separate DataFrame.

The spatial join is performed using the `contains` predicate from GeometryOps, which checks if each point is contained within any of the polygons. The resulting joined DataFrame is then used to plot the points, colored according to the containing polygon.

First, we generate our data. We create two triangle polygons which, together, span the rectangle (0, 0, 1, 1), and a set of points which are randomly distributed within this rectangle.

```@example spatialjoins
import GeoInterface as GI, GeometryOps as GO
using FlexiJoins, DataFrames

using CairoMakie, GeoInterfaceMakie

pl = GI.Polygon([GI.LinearRing([(0, 0), (1, 0), (1, 1), (0, 0)])])
pu = GI.Polygon([GI.LinearRing([(0, 0), (0, 1), (1, 1), (0, 0)])])
poly_df = DataFrame(geometry = [pl, pu], color = [:red, :blue])
f, a, p = Makie.with_theme(Attributes(; Axis = (; aspect = DataAspect()))) do # hide
f, a, p = poly(poly_df.geometry; color = tuple.(poly_df.color, 0.3))
end # hide
```

Here, the upper polygon is blue, and the lower polygon is red. Keep this in mind!

Now, we generate the points.

```@example spatialjoins
points = tuple.(rand(100), rand(100))
points_df = DataFrame(geometry = points)
scatter!(points_df.geometry)
f
```

You can see that they are evenly distributed around the box. But how do we know which points are in which polygons?

The answer here is to perform a spatial join.

Now, we can perform the "spatial join" using FlexiJoins. We are performing an outer join here

```@example spatialjoins
joined_df = FlexiJoins.innerjoin(
(poly_df, points_df),
by_pred(:geometry, GO.contains, :geometry)
)
```

```@example spatialjoins
scatter(joined_df.geometry_1; color = joined_df.color)
```

Here, you can see that the colors were assigned appropriately to the scattered points!

## Real-world example

Suppose I have a list of polygons representing administrative regions (or mining sites, or what have you), and I have a list of polygons for each country. I want to find the country each region is in.

35 changes: 35 additions & 0 deletions ext/GeometryOpsFlexiJoinsExt/GeometryOpsFlexiJoinsExt.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
module GeometryOpsFlexiJoinsExt

using GeometryOps
import GeometryOps as GO
using FlexiJoins

# This module defines the FlexiJoins APIs for GeometryOps' boolean comparison functions, taken from DE-9IM.

# For now, we only allow regular n^2 loops for the predicates, primarily because I'm not entirely sure how to do the "fast" mode for these.
# TODO: fix that and allow faster joins.

# First, we define that FlexiJoins supports the "NestedLoopFast" mode for the predicates.

FlexiJoins.supports_mode(::FlexiJoins.Mode.NestedLoopFast, ::FlexiJoins.ByPred{typeof(GO.contains)}, datas) = true
FlexiJoins.supports_mode(::FlexiJoins.Mode.NestedLoopFast, ::FlexiJoins.ByPred{typeof(GO.within)}, datas) = true
FlexiJoins.supports_mode(::FlexiJoins.Mode.NestedLoopFast, ::FlexiJoins.ByPred{typeof(GO.intersects)}, datas) = true
FlexiJoins.supports_mode(::FlexiJoins.Mode.NestedLoopFast, ::FlexiJoins.ByPred{typeof(GO.disjoint)}, datas) = true
FlexiJoins.supports_mode(::FlexiJoins.Mode.NestedLoopFast, ::FlexiJoins.ByPred{typeof(GO.touches)}, datas) = true
FlexiJoins.supports_mode(::FlexiJoins.Mode.NestedLoopFast, ::FlexiJoins.ByPred{typeof(GO.crosses)}, datas) = true
FlexiJoins.supports_mode(::FlexiJoins.Mode.NestedLoopFast, ::FlexiJoins.ByPred{typeof(GO.overlaps)}, datas) = true
FlexiJoins.supports_mode(::FlexiJoins.Mode.NestedLoopFast, ::FlexiJoins.ByPred{typeof(GO.covers)}, datas) = true
FlexiJoins.supports_mode(::FlexiJoins.Mode.NestedLoopFast, ::FlexiJoins.ByPred{typeof(GO.coveredby)}, datas) = true
FlexiJoins.supports_mode(::FlexiJoins.Mode.NestedLoopFast, ::FlexiJoins.ByPred{typeof(GO.equals)}, datas) = true

# Next, just in case, we define the `swap_sides` function for those predicates which are defined as inversions.

FlexiJoins.swap_sides(::typeof(GO.contains)) = GO.within
FlexiJoins.swap_sides(::typeof(GO.within)) = GO.contains
FlexiJoins.swap_sides(::typeof(GO.coveredby)) = GO.covers
FlexiJoins.swap_sides(::typeof(GO.covers)) = GO.coveredby

# That's a wrap, folks!

end

22 changes: 22 additions & 0 deletions test/extensions/flexijoins.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
import GeometryOps as GO, GeoInterface as GI
using FlexiJoins, DataFrames

pl = GI.Polygon([GI.LinearRing([(0, 0), (1, 0), (1, 1), (0, 0)])])
pu = GI.Polygon([GI.LinearRing([(0, 0), (0, 1), (1, 1), (0, 0)])])
poly_df = DataFrame(geometry = [pl, pu], color = [:red, :blue])

points = tuple.(rand(100), rand(100))
points_df = DataFrame(geometry = points)

@testset "Basic integration" begin

@test_nowarn joined_df = FlexiJoins.innerjoin((poly_df, points_df), by_pred(:geometry, GO.contains, :geometry))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to wrap this and avoid the :geometry :geometry part

Like this maybe:

joined_df = GO.innerjoin(poly_df, points_df, GO.contains)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're saying here now - but until we resolve the DataFrames/GI.geometrycolumn/ArchGDAL mess, it's probably best to keep this explicit IMO :D

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it actually a mess? It mostly just works

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case yes - GADM for example only outputs tables with geom columns, as do many other ArchGDAL-based loaders.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArchGDAL tables are fine, GADM returns a NamedTuple. Theres an issue for that

# Test that the join happened correctly
joined_df = FlexiJoins.innerjoin((poly_df, points_df), by_pred(:geometry, GO.contains, :geometry))
@test all(GO.contains.((pl,), joined_df.geometry_1[joined_df.color .== :red]))
@test all(GO.contains.((pu,), joined_df.geometry_1[joined_df.color .== :blue]))
# Test that within also works
@test_nowarn joined_df = FlexiJoins.innerjoin((points_df, poly_df), by_pred(:geometry, GO.within, :geometry))

end

2 changes: 2 additions & 0 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,6 @@ const GO = GeometryOps
include("transformations/correction/closed_ring.jl")
include("transformations/correction/intersecting_polygons.jl")
end
# # Extensions
@testset "FlexiJoins" begin include("extensions/flexijoins.jl") end
end