The goal of sasf is to simplify the process of loading and visualising
spatial data for South Africa in R. Shapefiles encompass various
administrative levels, such as provinces, districts, municipalities,
mainplaces, and subplaces, using Census 2011 demarcations.
The main dataset of interest subplaces data frame is embedded in the
package. _id columns represent unique numeric identifiers, while
_name columns provide descriptive names. _mdb columns present string
identifiers corresponding to the demarcations of the Municipal
Demarcation Board of South Africa for provinces, districts, and
municipalities.
You can install the development version of sasf from
GitHub with:
# install.packages("remotes")
remotes::install_github("WihanZA/sasf")# load the package
library(sasf)
# recommended packages
library(dplyr)
library(tidyr)
library(ggplot2)
library(sf)
library(ragg)Somewhat large datasets are lazily loaded to reduce the demand on your
system’s memory when they’re not in use. See memory usage before and
after loading the subplaces dataset.
lobstr::mem_used()#> 73.28 MB
invisible(subplaces)
lobstr::mem_used()#> 116.66 MB
subplaces is structured hierarchically:
subplaces %>%
st_drop_geometry() %>%
select(ends_with("_id")) %>%
pivot_longer(everything()) %>%
group_by(name) %>%
summarise(n = n_distinct(value)) %>%
arrange(desc(n))#> # A tibble: 5 × 2
#> name n
#> <chr> <int>
#> 1 subplace_id 22108
#> 2 mainplace_id 14039
#> 3 municipality_id 234
#> 4 district_id 52
#> 5 province_id 9
I recommend the ggplot2 package for visualising the spatial data:
subplaces %>%
ggplot() +
geom_sf()Maintain consistently formatted figures by setting default themes:
# set default ggplot theme
theme_set(
theme_void() +
theme(
plot.title = element_text(hjust = 0.5),
legend.position = "bottom",
legend.box = "vertical",
legend.text = element_text(size = rel(0.5)),
legend.title.position = "top",
legend.title = element_text(size = rel(0.5), hjust = 0.5)
)
)
# set GeomSf defaults
update_geom_defaults(
"sf",
list(alpha = 0.5)
)
# create new plot
subplaces %>%
filter(province_name == "Western Cape") %>%
group_by(district_name) %>%
summarise() %>%
ggplot(aes(fill = district_name)) +
geom_sf(color = "grey50") +
labs(
fill = "District Muncipality",
title = "Western Cape Districts"
)Arbitrarily setting generated figures’ dimensions with code chunk
options fig.width and fig.height, or the width and height
arguments to ggsave, may amount to aspect ratios which don’t
correspond to that of the sf object. This leads to redundant
whitespace in the image created by knitr or ggsave.
To prevent this, the sasf package offers the get_asp helper
function. It takes an sf object (and optionally a figure’s target
width), and returns its inherent aspect ratio (and target height),
accounting for any latitude distortion where present.
The aforementioned unintended whitespace—and the fix provided by
get_asp—is best illustrated with images generated using ggsave
although the outcome is very much the same as with inappropriate code
chunk options.
# filter spatial data
gauteng_sf <- subplaces %>%
filter(province_name == "Gauteng")
# create plot
gauteng_plot <- gauteng_sf %>%
ggplot() +
geom_sf(
aes(fill = subplace_id),
show.legend = FALSE
) +
theme(
panel.background = element_rect(fill = "white"),
plot.background = element_rect(fill = "grey50")
)
# save images to the same path
fig_path <- "man/figures/readme"
# with arbitrary dimensions 6x5
ggsave(
filename = "gauteng-whitespace.png",
plot = gauteng_plot,
device = ragg::agg_png,
path = fig_path,
width = 6,
height = 5,
dpi = 180
)
# get ideal asepct ratio for the sf object
# targeting the same width as before
gauteng_asp <- get_asp(
sf_obj = gauteng_sf,
target_width = 6
)
gauteng_asp#> $asp
#> [1] 0.9620082
#>
#> $target_width
#> [1] 6
#>
#> $target_height
#> [1] 6.236953
# with recommended dimensions
ggsave(
filename = "gauteng-corrected.png",
plot = gauteng_plot,
device = ragg::agg_png,
path = fig_path,
width = gauteng_asp$target_width,
height = gauteng_asp$target_height,
dpi = 180
)
# using out.width = 50%
knitr::include_graphics(file.path(fig_path, "gauteng-whitespace.png"))
knitr::include_graphics(file.path(fig_path, "gauteng-corrected.png"))This comprehensive example demonstrates how to combine sasf with
external spatial data. We’ll plot open access datacentres across South
Africa through these steps:
Download, process and plot spatial data on open access datacentres in Africa which has been collected from publicly available sources. Big thanks to Steve Song!
# set path to resources for README
resources <- "man/resources/readme"
# download raw geojson data
if (!file.exists(file.path(resources, "datacentres.geojson"))) {
download.file(
url = "https://raw.githubusercontent.com/stevesong/Africa-Datacentres/refs/heads/main/Africa_datacentres_05Jan2025.geojson",
destfile = file.path(resources, "datacentres.geojson"),
method = "curl"
)
}
# read geojson file
datacentres <- geojsonsf::geojson_sf(file.path(resources, "datacentres.geojson"))
# clean column names and filter data
datacentres <- datacentres %>%
clean_names() %>%
filter(grepl("south africa", country, ignore.case = TRUE))
# convert char to num where appropriate
char_to_num <- function(col) {
num <- suppressWarnings(as.numeric(col))
if (all(is.na(num))) {
return(col)
} else {
return(num)
}
}
datacentres <- datacentres %>%
mutate(across(
where(is.character) & !matches("post_code"),
~ char_to_num(.x)
))
# clean char columns
datacentres <- datacentres %>%
mutate(across(
where(is.character),
~ case_when(
. == "" ~ NA,
TRUE ~ stringr::str_squish(.)
)
))
# take a look at the structure
glimpse(datacentres)#> Rows: 17
#> Columns: 15
#> $ tel <chr> NA, "+27 (0) 11 573 2800", NA, "+27 (0) 11 573 2800", …
#> $ country <chr> "South Africa", "South Africa", "South Africa", "South…
#> $ address2 <chr> "Elfindale", "Rondebosch", "Rondebosch", "Brackenfell"…
#> $ city <chr> "Cape Town", "Cape Town", "Cape Town", "Cape Town", "C…
#> $ address <chr> "108 De Waal Rd", "Great Westerford Building", "Belmon…
#> $ it_power <dbl> 9.0, 3.0, 2.0, 50.0, 5.0, 4.0, 2.0, 20.0, NA, NA, NA, …
#> $ sqm_white_space <dbl> 2700, 2600, 700, 18000, 1500, 2000, 1500, NA, NA, 2000…
#> $ post_code <chr> "7945", NA, NA, NA, NA, "4000", NA, NA, "1614", NA, NA…
#> $ subsidiary_url <chr> "https://www.africadatacentres.com/", "https://www.ter…
#> $ url <chr> "https://www.cassavatechnologies.com/", "https://www.d…
#> $ short_name <chr> "CPT1", "CT1", "CPT1", "CT2", "CPT2", "DUR1", "DB1", "…
#> $ subsidiary <chr> "Africa DataCentres", "Teraco", NA, "Teraco", NA, NA, …
#> $ company <chr> "Cassava Technologies", "Digital Realty", "Open Access…
#> $ exchange_host <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, "NAPAfrica IX", NA…
#> $ geometry <POINT [°]> POINT (18.47787 -34.03742), POINT (18.4649 -33.9…
For many use cases, spatial data by subplace is too granular and
computationally cumbersome. The dimensions of the subplaces dataset
can be reduced by aggregating the geometries of individual subplaces
toward less granular administrative levels/geographic units. In this
example, we merely want to plot provincial borders.
There are various ways to accomplish this feat and below we’ll benchmark a couple of similar approaches:
-
rmapshaperrelying on themapshapertool written in JavaScript. We benchmark the package’sms_dissolvewith and without the use of a locally installedmapshaperlibrary. -
sfpackage’s well-knownst_unionto combine several feature geometries into one with- or without resolving internal boundaries.
Each approach has their own peculiar costs and benenfits. The execution time will also differ dramatically depending on each function’s given set of arguments, as well as the given spatial data of interest. It is up to the user to determine the best approach for their use case.
# dissolve internal boundaries
tictoc::tic("dissolve")
provinces <- subplaces %>%
rmapshaper::ms_dissolve(field = "province_name")
tictoc::toc()#> dissolve: 6.87 sec elapsed
# confirm local installation
# rmapshaper::check_sys_mapshaper()
# dissolve internal boundaries + local library
tictoc::tic("dissolve_sys")
provinces <- subplaces %>%
rmapshaper::ms_dissolve(
field = "province_name",
sys = TRUE,
sys_mem = 6,
quiet = TRUE
)
tictoc::toc()#> dissolve_sys: 4.39 sec elapsed
# geometric union of set of features into a single one
tictoc::tic("st_union")
provinces <- subplaces %>%
group_by(province_name) %>%
summarise(geometry = st_union(geometry, by_feature = FALSE))
tictoc::toc()#> st_union: 47.88 sec elapsed
The resulting provinces object (in this case from ms_dissolve)
entails only the names and geometries for each province, and looks like
this:
ggplot() +
geom_sf(
data = provinces,
aes(color = province_name),
fill = "white",
linewidth = 1,
show.legend = FALSE
)Now that we have the provincial backdrop, we’re able to plot the
datacentres. Fortunately, in this case, it wasn’t really necessary to
transform the coordinate reference systems (CRS) of either datacentres
or provinces. However, for completeness, we can ensure their
consistency:
datacentres <- st_transform(
x = datacentres,
crs = st_crs(provinces)
)ggplot2 and geom_sf deals with the distinct objects’ bounding boxes
and aspect ratios, and the resulting plot passes the smell test. If we
intend on hardcoding figure dimensions, we should remember to determine
the appropriate dimensions for our new plot. As provinces is the
larger of the two, and datacentres is entirely contained within the
former, we’ll only need the aspect ratio of provinces.
plot_asp <- get_asp(provinces, 6)Using the values in plot_asp, we can set fig.height and fig.width
in the next code chunk. However, introducing additional plot elements
like titles and legends would obviously impact the optimal aspect ratio
of the output, but one can use get_asp as a starting point and
incrementally revise from there. Here is the result of all our hard
work:
datacentres %>%
ggplot() +
geom_sf(
# plot's CRS defaults to first layer
data = provinces,
fill = NA,
) +
geom_sf(
aes(color = company),
data = datacentres,
size = 10,
alpha = 0.2
) +
labs(
title = "Open Access Datacentres in South Africa",
color = "Company"
) +
guides(colour = guide_legend(override.aes = list(size = 5)))Creating a multi-panel visualisation by province allows for a more detailed examination.
First, we’ll need to disable S2 spherical geometry to avoid topological errors that can occur when joining spatial objects. Then we’ll spatially join our datacentres with provinces to determine which datacentres fall within each province.
# Disable S2 spherical geometry to avoid topological errors
sf_use_s2(FALSE)#> Spherical geometry (s2) switched off
# join datasets by checking whether x is within y
combined <- st_join(
x = datacentres,
y = provinces,
join = st_within,
left = FALSE
)#> although coordinates are longitude/latitude, st_within assumes that they are
#> planar
# Check which provinces contain datacentres
unique(combined$province_name)#> [1] "Western Cape" "KwaZulu-Natal" "Gauteng"
The spatial join preserves the datacentre geometries while associating each point with its containing province. This shows us that datacentres are concentrated in just three provinces.
Standard plot faceting doesn’t work well with spatial data and geom_sf
because the axes cannot be scaled freely while maintaining the correct
geographic proportions. Therefore, we’ll use a custom approach with
cowplot to create a composite. We’ll need to create individual plots
for each province.
# Create a list of plots, one for each province with datacentres
provinces_plots <- purrr::map(
unique(combined$province_name),
function(x) {
ggplot() +
geom_sf(
data = provinces %>%
filter(province_name == x),
fill = NA
) +
geom_sf(
data = combined %>%
filter(province_name == x),
aes(color = company),
size = 10,
alpha = 0.5,
show.legend = FALSE
)
}
)For a complete visualization, we need a shared legend. We’ll extract a legend from a dummy plot and then combine it with our province plots.
# Extract a legend from a dummy plot
plot_legend <- cowplot::get_plot_component(
plot = {
combined %>%
ggplot() +
geom_sf(
aes(color = company),
size = 10,
alpha = 0.5,
show.legend = TRUE
) +
guides(color = guide_legend(
"Company",
nrow = 1,
override.aes = list(size = 5)
)) +
theme(legend.position = "bottom")
},
"guide-box-bottom",
return_all = TRUE
)
# Arrange the individual province plots in a row
plot_top_row <- cowplot::plot_grid(
plotlist = provinces_plots,
ncol = 3,
labels = c("Western Cape", "Kwa-Zulu Natal", "Gauteng"),
label_x = 0.5,
hjust = 0.5
)
# Combine the plots with the legend underneath
cowplot::plot_grid(
plot_top_row,
plot_legend,
ncol = 1,
rel_heights = c(1, .1)
)This composite visualization shows that South Africa’s datacentres are
concentrated in three provinces, with different companies maintaining
presence across these economic hubs. The approach demonstrates how
sasf can be combined with other packages to create informative
visualisations and analyses.
-
Demarcations used by the 2011 Census are detailed in the metadata published by Statistics South Africa (2012).
-
The wiki by konektaz offers a useful summary of the hierarchical structure of spatial layers: South Africa Census 2011 spatial metadata
-
The original shapefiles were sourced from the OpenUp Data Portal at this link: Census 2011 Boundaries Subplace Layer
-
For data on African datacentres, see the
Africa-Datacentresby Steve Song.
sessionInfo()#> R version 4.5.0 (2025-04-11 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26100)
#>
#> Matrix products: default
#> LAPACK version 3.12.1
#>
#> locale:
#> [1] LC_COLLATE=English_United Kingdom.utf8
#> [2] LC_CTYPE=English_United Kingdom.utf8
#> [3] LC_MONETARY=English_United Kingdom.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United Kingdom.utf8
#>
#> time zone: Africa/Johannesburg
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] janitor_2.2.1 ragg_1.4.0 sf_1.0-20 ggplot2_3.5.2 tidyr_1.3.1
#> [6] dplyr_1.1.4 sasf_1.0.0
#>
#> loaded via a namespace (and not attached):
#> [1] s2_1.1.8 utf8_1.2.5 generics_0.1.4 class_7.3-23
#> [5] KernSmooth_2.23-26 lattice_0.22-7 stringi_1.8.7 digest_0.6.37
#> [9] magrittr_2.0.3 evaluate_1.0.3 grid_4.5.0 timechange_0.3.0
#> [13] RColorBrewer_1.1-3 fastmap_1.2.0 jsonlite_2.0.0 lobstr_1.1.2
#> [17] e1071_1.7-16 DBI_1.2.3 purrr_1.0.4 scales_1.4.0
#> [21] rmapshaper_0.5.0 codetools_0.2-20 textshaping_1.0.1 cli_3.6.5
#> [25] rlang_1.1.6 units_0.8-7 cowplot_1.1.3 withr_3.0.2
#> [29] yaml_2.3.10 tools_4.5.0 geojsonsf_2.0.3 curl_6.2.2
#> [33] vctrs_0.6.5 R6_2.6.1 proxy_0.4-27 lifecycle_1.0.4
#> [37] lubridate_1.9.4 classInt_0.4-11 snakecase_0.11.1 stringr_1.5.1
#> [41] V8_6.0.3 pkgconfig_2.0.3 pillar_1.10.2 gtable_0.3.6
#> [45] glue_1.8.0 Rcpp_1.0.14 systemfonts_1.2.3 xfun_0.52
#> [49] tibble_3.2.1 tidyselect_1.2.1 knitr_1.50 farver_2.1.2
#> [53] htmltools_0.5.8.1 labeling_0.4.3 rmarkdown_2.29 wk_0.9.4
#> [57] compiler_4.5.0 prettyunits_1.2.0 sp_2.2-0






