Skip to content

Commit 8f46d15

Browse files
asinghvi17rafaqz
andauthored
Add more descriptive docs + some experiments (#108)
* Add more descriptive docs + some experiments * Update docs project to include experiment packages * Add more benchmark files (still raw) * Update apply return type docs Co-authored-by: Rafael Schouten <[email protected]> * Update docs/src/paradigms.md Co-authored-by: Rafael Schouten <[email protected]> * Update docs/src/paradigms.md Co-authored-by: Rafael Schouten <[email protected]> * Update docs/src/peculiarities.md * Add code for `orient` demo This is a dashboard that's meant to be run. Will add an animation later as well. * Add examples from issue * Add a true summary figure to the docs * Import the relevant Chairmarks/BenchmarkTools functions * Update Project.toml * Write the GeometryOps HackMD call notes to the docs As a hidden page for now but that can always change! * Add MultiFloats * Add NaturalEarth.jl devbranch when building docs * make Julia actually execute the code * Add Statistics, fix namespacing error * `geometry_providers.jl`: Remove redundancy, add comments * `vector_benchmark_plot.jl`: add a comment on top * rearrange file * Add warning that BoolsAsTypes are not public API --------- Co-authored-by: Rafael Schouten <[email protected]>
1 parent c671b34 commit 8f46d15

File tree

9 files changed

+533
-0
lines changed

9 files changed

+533
-0
lines changed

.github/workflows/CI.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,8 @@ jobs:
4848
- uses: julia-actions/setup-julia@v1
4949
with:
5050
version: '1'
51+
- name: Add custom versions of packages
52+
run: julia --project=docs -e 'using Pkg; Pkg.add(PackageSpec(; url = "https://github.com/JuliaGeo/NaturalEarth.jl", rev = "as/scratchspaces"))'
5153
- uses: julia-actions/julia-buildpkg@v1
5254
- uses: julia-actions/julia-docdeploy@v1
5355
env:

benchmarks/geometry_providers.jl

Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
#=
2+
# Geometry providers
3+
4+
This file benchmarks GeometryOps methods on every GeoInterface.jl implementation we can find, in order to test:
5+
a. genericness, i.e., does GeometryOps work correctly with all GeoInterface.jl implementations?
6+
b. performance, i.e., how does GeometryOps compare to the native implementation?
7+
c. performance issues in the packages' implementations of GeoInterface
8+
=#
9+
10+
# First, we import the providers:
11+
using ArchGDAL, LibGEOS, Shapefile, GeoJSON, WellKnownGeometry, GeometryBasics, GeoInterface, GeoFormatTypes
12+
PROVIDERS = (ArchGDAL, LibGEOS, GeometryBasics, GI.Wrappers)
13+
# Now, we import GeoInterface and GeometryOps,
14+
import GeometryOps as GO, GeoInterface as GI
15+
# Finally, we import some utility benchmarking, plotting and data munging packages!
16+
using BenchmarkTools, Chairmarks, CairoMakie, MakieThemes, DataFrames, Proj
17+
using CoordinateTransformations, Rotations
18+
19+
20+
# Polylabel.jl is a package that finds the "pole of inaccessibility" of a polygon,
21+
# i.e., the point within it that is furthest away from its boundaries.
22+
23+
# It depends on GeometryOps, but in this instance, we'll grab some of its test geometries
24+
# to use.
25+
import Polylabel
26+
27+
# TODO: the reason we change to LibGEOS intermediately here is so that the
28+
# linear rings of the WKG polygons are interpreted correctly. Unfortunately
29+
# that doesn't work when read, which there's an issue up for.
30+
water1 = GeoFormatTypes.WellKnownText(GeoFormatTypes.Geom(), readchomp(joinpath(dirname(dirname(pathof(Polylabel))), "test", "data", "water1.wkt")) |> String) |> x -> GI.convert(LibGEOS, x) |> GO.tuples
31+
water2 = GeoFormatTypes.WellKnownText(GeoFormatTypes.Geom(), readchomp(joinpath(dirname(dirname(pathof(Polylabel))), "test", "data", "water2.wkt")) |> String) |> x -> GI.convert(LibGEOS, x) |> GO.tuples
32+
# To fix these polygons is a complicated task, and even then LibGEOS gets it wrong:
33+
# water1 |> x -> LibGEOS.makeValid(GI.convert(LibGEOS, x)) |> GI.getgeom |> collect |> x -> filter(y -> GI.trait(y) isa Union{GI.PolygonTrait, GI.MultiPolygonTrait}, x) |> first |> GO.tuples # hide
34+
35+
f, a, p = poly(water1; axis = (; title = "water1")); poly(f[1, 2], water2; axis = (; title = "water2")); f
36+
# Now, we rotate the `water1` polygon about its centroid, so we can use it to
37+
# test the time it takes to intersect complex polygons:
38+
water1r = GO.transform(
39+
Translation(GO.centroid(water1)) LinearMap(Makie.rotmatrix2d/2)) Translation((-).(GO.centroid(water1))),
40+
water1
41+
)
42+
f, a, p = poly(water1; label = "Original")
43+
poly!(water1r; label = "Rotated")
44+
axislegend(a)
45+
f
46+
# WARNING: does not work
47+
@b GO.union($(water1), $(water1r); target = GI.PolygonTrait()) seconds=3
48+
@b LibGEOS.union($(GI.convert(LibGEOS, water1)), $(GI.convert(LibGEOS, water1r))) seconds=3
49+
@b ArchGDAL.union($(GI.convert(ArchGDAL, water1)), $(GI.convert(ArchGDAL, water1r))) seconds=3
50+
51+
poly(GO.union(w1g, w1rg; target = GI.PolygonTrait()))
52+
53+
GI.getgeom(water1, 3) |> GI.trait
54+
55+
# We can benchmark each provider and see if any of them have glaring issues.
56+
57+
water1_centroid_suite = BenchmarkGroup()
58+
59+
for provider in PROVIDERS
60+
@info "Benchmarking $provider"
61+
geom = GI.convert(provider, water1)
62+
water1_centroid_suite[string(provider)] = @be GO.centroid($geom) seconds=3
63+
end
64+
65+
66+
# ## Tables.jl performance in `apply`
67+
#=
68+
This code checks how Tables.jl performs when using `apply`.
69+
We use two sources for this: `Shapefile.jl` and `DataFrames.jl`.
70+
More will be coming in the future!
71+
=#
72+
shp_file = "/Users/anshul/Downloads/ne_10m_admin_0_countries (1)/ne_10m_admin_0_countries.shp"
73+
table = Shapefile.Table(shp_file)
74+
go_df = DataFrame(table)
75+
go_df.geometry = GO.tuples(go_df.geometry);
76+
77+
table_suite = BenchmarkGroup()
78+
79+
80+
ll2moll = Proj.Transformation("+proj=longlat +datum=WGS84", "+proj=moll")
81+
82+
# First, we try reprojecting the geometries using Proj,
83+
reproject_suite = table_suite["reproject"] = BenchmarkGroup(["title:Reproject", "subtitle:All country borders from Natural Earth, 1:10m res."])
84+
85+
reproject_suite["Shapefile.Table"] = @be GO.reproject($table, $ll2moll) seconds=3
86+
reproject_suite["DataFrame (Shapefile)"] = @be GO.reproject($(DataFrame(table)), $ll2moll) seconds=3
87+
reproject_suite["DataFrame (GO)"] = @be GO.reproject($(go_df), $ll2moll) seconds=3
88+
reproject_suite["Shapefile geoms"] = @be GO.reproject($(table.geometry), $ll2moll) seconds=3
89+
reproject_suite["GeometryOps geoms"] = @be GO.reproject($(GO.tuples(table.geometry)), $ll2moll) seconds=3
90+
91+
# then transforming, just to see the difference in runtime
92+
# between calling out to C vs pure Julia,
93+
function _scaleby5(x)
94+
return x .* 5
95+
end
96+
97+
transform_suite = table_suite["transform"] = BenchmarkGroup(["title:Transform", "subtitle:All country borders from Natural Earth, 1:10m res."])
98+
transform_suite["Shapefile.Table"] = @be GO.transform($_scaleby5, $table) seconds=3
99+
transform_suite["DataFrame (Shapefile)"] = @be GO.transform($_scaleby5, $(DataFrame(table))) seconds=3
100+
transform_suite["DataFrame (GO)"] = @be GO.transform($_scaleby5, $(go_df)) seconds=3
101+
transform_suite["Shapefile geoms"] = @be GO.transform($_scaleby5, $(table.geometry)) seconds=3
102+
transform_suite["GeometryOps geoms"] = @be GO.transform($_scaleby5, $(GO.tuples(table.geometry))) seconds=3
103+
104+
# and finally, calling `applyreduce` to find the area of each
105+
# polygon.
106+
area_suite = table_suite["area"] = BenchmarkGroup(["title:Area", "subtitle:All country borders from Natural Earth, 1:10m res."])
107+
108+
area_suite["Shapefile.Table"] = @be GO.area($(table)) seconds=3
109+
area_suite["DataFrame (Shapefile)"] = @be GO.area($(DataFrame(table))) seconds=3
110+
area_suite["DataFrame (GO)"] = @be GO.area($(go_df)) seconds=3
111+
area_suite["Shapefile geoms"] = @be GO.area($(table.geometry)) seconds=3
112+
area_suite["GeometryOps geoms"] = @be GO.area($(GO.tuples(table.geometry))) seconds=3
113+
114+
ts = getproperty.(area_suite["Shapefile.Table"].samples, :time)
115+
boxplot(ones(length(ts)), ts)
116+
violin(ones(length(ts)), ts; npoints = 3500, axis = (; yscale = log10,))
117+
118+
119+
# ## Plotting
120+
function Makie.convert_arguments(::Makie.PointBased, xs, bs::AbstractVector{<: Chairmarks.Benchmark})
121+
ts = getproperty.(Statistics.mean.(bs), :time)
122+
return (xs, ts)
123+
end
124+
125+
function Makie.convert_arguments(::Makie.PointBased, bs::AbstractVector{<: Chairmarks.Benchmark})
126+
ts = getproperty.(Statistics.mean.(bs), :time)
127+
return (1:length(bs), ts)
128+
end
129+
130+
function Makie.convert_arguments(::Makie.SampleBased, b::Chairmarks.Benchmark)
131+
ts = getproperty.(b.samples, :time)
132+
return (ones(length(ts)), ts)
133+
end
134+
135+
function Makie.convert_arguments(::Makie.SampleBased, n::Number, b::Chairmarks.Benchmark)
136+
ts = getproperty.(b.samples, :time)
137+
return (fill(n, length(ts)), ts)
138+
end
139+
140+
function Makie.convert_arguments(::Makie.SampleBased, labels::AbstractVector{<: AbstractString}, bs::AbstractVector{<: Chairmarks.Benchmark})
141+
ts = map(b -> getproperty.(b.samples, :time), bs)
142+
labels =
143+
return flatten
144+
end
145+
146+
function Makie.convert_arguments(::Type{Makie.Errorbars}, xs, bs::AbstractVector{<: Chairmarks.Benchmark})
147+
ts = map(b -> getproperty.(b.samples, :time), bs)
148+
means = map(Statistics.mean, ts)
149+
stds = map(Statistics.std, ts)
150+
return (xs, ts)
151+
end
152+
153+
ks = keys(area_suite) |> collect .|> identity
154+
155+
bs = getindex.((area_suite,), ks)
156+
b_lengths = length.(getproperty.(bs, :samples))
157+
b_timing_flattened = collect(Iterators.flatten(Iterators.map(b -> getproperty.(b.samples, :time), bs)))
158+
k_strings = Iterators.flatten((fill(k, bl) for (k, bl) in zip(ks, b_lengths))) |> collect
159+
160+
f = Figure()
161+
ax = Axis(f[1, 1];
162+
convert_dim_1=Makie.CategoricalConversion(; sortby=nothing),
163+
)
164+
violin!(ax, k_strings, b_timing_flattened .|> log10)
165+
f
166+
ax.yscale = log10
167+
ax.xticklabelrotation = π/12
168+
f
169+
170+
171+
bs = values(area_suite) |> collect .|> identity
172+
labels = ["ST", "DS", "DG", "SG", "GG"]
173+
174+
175+
using AlgebraOfGraphics
176+
177+
boxplot(b1)
178+
boxplot!.(1:5, values(area_suite) |> collect .|> identity)
179+
Makie.current_figure()
180+
Makie.current_axis().yscale = log10
181+
182+
data((; x = labels, y = bs)) * mapping(:y => verbatim, :x, :y) * visual(BoxPlot) |> draw

benchmarks/vector_benchmark_plot.jl

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
#=
2+
# `vector-benchmark` result plot
3+
4+
This code plots the results of the `kadyb/vector-benchmark` repository,
5+
and needs the MakieTeX SVG pr for now.
6+
7+
The unique feature (and what takes up so many lines of code) is that
8+
the scatter markers for each language are SVGs of the logo! This
9+
makes the plot eye-catching and allows users to quickly grasp language
10+
wise performance.
11+
12+
Stepwise, here's what is going on:
13+
1. It loads the benchmark data from a CSV file into a DataFrame.
14+
2. It defines color and marker mappings for each package, where the markers are SVG logos of the respective programming languages.
15+
3. It uses the beeswarm function from the SwarmMakie package to create a scatter plot, where the x-axis represents the different benchmark tasks, and the y-axis represents the median execution time (in seconds) on a log scale.
16+
4. The scatter points are colored and marked according to the package and programming language, using the predefined color and marker mappings.
17+
5. It adds a legend to the plot, displaying the package names and their corresponding language logos.
18+
19+
=#
20+
21+
using CairoMakie, MakieTeX, SwarmMakie
22+
23+
using CSV, DataFrames, CategoricalArrays
24+
using DataToolkit
25+
26+
path_to_makietex_datatoml = joinpath(dirname(dirname(@__DIR__)), "MakieTeX", "docs", "Data.toml")
27+
data = DataToolkit.load(path_to_makietex_datatoml)
28+
29+
30+
using DataToolkit, DataFrames, StatsBase
31+
using CairoMakie, SwarmMakie #=beeswarm plots=#, Colors
32+
using MakieTeX # for SVG icons
33+
34+
function svg_icon(name::String)
35+
if name == "go"
36+
icon = d"go-logo-solid::IO"
37+
else
38+
path = "svg/$name.svg"
39+
icon = get(d"file-icons::Dict{String,IO}", path, nothing)
40+
end
41+
if isnothing(icon)
42+
icon = get(d"file-icons-mfixx::Dict{String,IO}", path, nothing)
43+
end
44+
if isnothing(icon)
45+
icon = get(d"file-icons-devopicons::Dict{String,IO}", path, nothing)
46+
end
47+
isnothing(icon) && return missing
48+
return CachedSVG(read(seekstart(icon), String))
49+
end
50+
51+
const colours_vibrant = range(LCHab(60,70,0), stop=LCHab(60,70,360), length=36)
52+
const colours_dim = range(LCHab(25,50,0), stop=LCHab(25,50,360), length=36)
53+
54+
const julia_logo = svg_icon("Julia")
55+
const r_logo = svg_icon("R")
56+
const python_logo = svg_icon("python")
57+
58+
marker_map = Dict(
59+
"geometryops" => julia_logo,
60+
# "gdal-jl" => julia_logo,
61+
"sf" => r_logo,
62+
"terra" => r_logo,
63+
"geos" => r_logo,
64+
"s2" => r_logo,
65+
"geopandas" => python_logo,
66+
)
67+
68+
69+
color_map = Dict(
70+
# R packages
71+
"sf" => Makie.wong_colors()[1],
72+
"s2" => Makie.wong_colors()[5],
73+
"terra" => Makie.wong_colors()[6],
74+
"geos" => Makie.wong_colors()[4],
75+
# Python package
76+
"geopandas" => Makie.wong_colors()[2],
77+
# Julia package
78+
"geometryops" => Makie.wong_colors()[3],
79+
)
80+
81+
path_to_vector_benchmark = "/Users/anshul/git/vector-benchmark"
82+
timings_df = CSV.read(joinpath(path_to_vector_benchmark, "timings.csv"), DataFrame)
83+
replace!(timings_df.package, "sf-project" => "sf", "sf-transform" => "sf")
84+
85+
# now plot
86+
87+
task_ca = CategoricalArray(timings_df.task)
88+
89+
group_marker = [MarkerElement(; color = color_map[package], marker = marker_map[package], markersize = 12) for package in keys(marker_map)]
90+
names_marker = collect(keys(marker_map))
91+
lang_markers = ["R" => r_logo, "Python" => python_logo, "Julia" => julia_logo]
92+
group_package = [MarkerElement(; marker, markersize = 12) for (lang, marker) in lang_markers]
93+
names_package = first.(lang_markers)
94+
95+
96+
f, a, p = beeswarm(
97+
task_ca.refs, timings_df.median;
98+
marker = getindex.((marker_map,), timings_df.package),
99+
color = getindex.((color_map,), timings_df.package),
100+
markersize = 10,
101+
axis = (;
102+
xticks = (1:length(task_ca.pool.levels), task_ca.pool.levels),
103+
xlabel = "Task",
104+
ylabel = "Median time (s)",
105+
yscale = log10,
106+
title = "Benchmark vector operations",
107+
xgridvisible = false,
108+
xminorgridvisible = true,
109+
yminorgridvisible = true,
110+
yminorticks = IntervalsBetween(5),
111+
ygridcolor = RGBA{Float32}(0.0f0,0.0f0,0.0f0,0.05f0),
112+
)
113+
)
114+
leg = Legend(
115+
f[1, 2],
116+
[group_marker, group_package],
117+
[names_marker, names_package],
118+
["Package", "Language"],
119+
tellheight = false,
120+
tellwidth = true,
121+
gridshalign = :left,
122+
)
123+
resize!(f, 650, 450)
124+
a.spinewidth[] = 0.5
125+
f

docs/Project.toml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
[deps]
2+
AccurateArithmetic = "22286c92-06ac-501d-9306-4abd417d9753"
23
Base64 = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"
34
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
45
CairoMakie = "13f3f980-e62b-5c42-98c6-ff1f3baf88f0"
@@ -10,6 +11,8 @@ DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
1011
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
1112
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
1213
DocumenterVitepress = "4710194d-e776-4893-9690-8d956a29c365"
14+
DoubleFloats = "497a8b3b-efae-58df-a0af-a86822472b78"
15+
ExactPredicates = "429591f6-91af-11e9-00e2-59fbe8cec110"
1316
GeoDatasets = "ddc7317b-88db-5cb5-a849-8449e5df04f9"
1417
GeoInterface = "cf35fbd7-0cd7-5166-be24-54bfbe79505f"
1518
GeoInterfaceMakie = "0edc0954-3250-4c18-859d-ec71c1660c08"
@@ -20,7 +23,9 @@ LibGEOS = "a90b1aa1-3769-5649-ba7e-abc5a9d163eb"
2023
Literate = "98b081ad-f1c9-55d3-8b20-4c87d4299306"
2124
Makie = "ee78f7c6-11fb-53f2-987a-cfe4a2b5a57a"
2225
MakieThemes = "e296ed71-da82-5faf-88ab-0034a9761098"
26+
MultiFloats = "bdf0d083-296b-4888-a5b6-7498122e68a5"
2327
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
2428
Proj = "c94c279d-25a6-4763-9509-64d165bea63e"
2529
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
2630
Shapefile = "8e980c4a-a4fe-5da2-b3a7-4b4b0353a2f4"
31+
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"

docs/make.jl

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,9 @@ withenv("JULIA_DEBUG" => "Literate") do # allow Literate debug output to escape
7373
# TODO: We should probably fix the above in `process_literate_recursive!`.
7474
end
7575

76+
# Now that the Literate stuff is done, we also download the call notes from HackMD:
77+
download("https://hackmd.io/kpIqAR8YRJOZQDJjUKVAUQ/download", joinpath(@__DIR__, "src", "call_notes.md"))
78+
7679
# Finally, make the docs!
7780
makedocs(;
7881
modules=[GeometryOps],
@@ -91,6 +94,10 @@ makedocs(;
9194
pages=[
9295
"Introduction" => "introduction.md",
9396
"API Reference" => "api.md",
97+
"Explanations" => [
98+
"Paradigms" => "paradigms.md",
99+
"Peculiarities" => "peculiarities.md",
100+
],
94101
"Source code" => literate_pages,
95102
],
96103
warnonly = true,

0 commit comments

Comments
 (0)