Skip to content

WihanZA/sasf

Repository files navigation

sasf

Lifecycle: experimental CRAN status

The goal of sasf is to simplify the process of loading and visualising spatial data for South Africa in R. Shapefiles encompass various administrative levels, such as provinces, districts, municipalities, mainplaces, and subplaces, using Census 2011 demarcations.

The main dataset of interest subplaces data frame is embedded in the package. _id columns represent unique numeric identifiers, while _name columns provide descriptive names. _mdb columns present string identifiers corresponding to the demarcations of the Municipal Demarcation Board of South Africa for provinces, districts, and municipalities.

Installation

You can install the development version of sasf from GitHub with:

# install.packages("remotes")
remotes::install_github("WihanZA/sasf")

Basics

# load the package
library(sasf)

# recommended packages
library(dplyr)
library(tidyr)
library(ggplot2)
library(sf)
library(ragg)

Somewhat large datasets are lazily loaded to reduce the demand on your system’s memory when they’re not in use. See memory usage before and after loading the subplaces dataset.

lobstr::mem_used()
#> 73.28 MB
invisible(subplaces)
lobstr::mem_used()
#> 116.66 MB

subplaces is structured hierarchically:

subplaces %>%
  st_drop_geometry() %>%
  select(ends_with("_id")) %>%
  pivot_longer(everything()) %>%
  group_by(name) %>%
  summarise(n = n_distinct(value)) %>%
  arrange(desc(n))
#> # A tibble: 5 × 2
#>   name                n
#>   <chr>           <int>
#> 1 subplace_id     22108
#> 2 mainplace_id    14039
#> 3 municipality_id   234
#> 4 district_id        52
#> 5 province_id         9

Plotting

I recommend the ggplot2 package for visualising the spatial data:

subplaces %>%
  ggplot() +
  geom_sf()

Maintain consistently formatted figures by setting default themes:

# set default ggplot theme
theme_set(
  theme_void() +
    theme(
      plot.title = element_text(hjust = 0.5),
      legend.position = "bottom",
      legend.box = "vertical",
      legend.text = element_text(size = rel(0.5)),
      legend.title.position = "top",
      legend.title = element_text(size = rel(0.5), hjust = 0.5)
    )
)

# set GeomSf defaults
update_geom_defaults(
  "sf",
  list(alpha = 0.5)
)

# create new plot
subplaces %>%
  filter(province_name == "Western Cape") %>%
  group_by(district_name) %>%
  summarise() %>%
  ggplot(aes(fill = district_name)) +
  geom_sf(color = "grey50") +
  labs(
    fill = "District Muncipality",
    title = "Western Cape Districts"
  )

Helper Functions

Arbitrarily setting generated figures’ dimensions with code chunk options fig.width and fig.height, or the width and height arguments to ggsave, may amount to aspect ratios which don’t correspond to that of the sf object. This leads to redundant whitespace in the image created by knitr or ggsave.

To prevent this, the sasf package offers the get_asp helper function. It takes an sf object (and optionally a figure’s target width), and returns its inherent aspect ratio (and target height), accounting for any latitude distortion where present.

The aforementioned unintended whitespace—and the fix provided by get_asp—is best illustrated with images generated using ggsave although the outcome is very much the same as with inappropriate code chunk options.

# filter spatial data
gauteng_sf <- subplaces %>%
  filter(province_name == "Gauteng")

# create plot
gauteng_plot <- gauteng_sf %>%
  ggplot() +
  geom_sf(
    aes(fill = subplace_id),
    show.legend = FALSE
  ) +
  theme(
    panel.background = element_rect(fill = "white"),
    plot.background = element_rect(fill = "grey50")
  )

# save images to the same path
fig_path <- "man/figures/readme"

# with arbitrary dimensions 6x5
ggsave(
  filename = "gauteng-whitespace.png",
  plot = gauteng_plot,
  device = ragg::agg_png,
  path = fig_path,
  width = 6,
  height = 5,
  dpi = 180
)

# get ideal asepct ratio for the sf object
# targeting the same width as before
gauteng_asp <- get_asp(
  sf_obj = gauteng_sf,
  target_width = 6
)

gauteng_asp
#> $asp
#> [1] 0.9620082
#> 
#> $target_width
#> [1] 6
#> 
#> $target_height
#> [1] 6.236953
# with recommended dimensions
ggsave(
  filename = "gauteng-corrected.png",
  plot = gauteng_plot,
  device = ragg::agg_png,
  path = fig_path,
  width = gauteng_asp$target_width,
  height = gauteng_asp$target_height,
  dpi = 180
)

# using out.width = 50%
knitr::include_graphics(file.path(fig_path, "gauteng-whitespace.png"))
knitr::include_graphics(file.path(fig_path, "gauteng-corrected.png"))

Complete Example: Mapping Datacentres in South Africa

This comprehensive example demonstrates how to combine sasf with external spatial data. We’ll plot open access datacentres across South Africa through these steps:

Step 1: Download and Process Datacentre Data

Download, process and plot spatial data on open access datacentres in Africa which has been collected from publicly available sources. Big thanks to Steve Song!

# set path to resources for README
resources <- "man/resources/readme"

# download raw geojson data
if (!file.exists(file.path(resources, "datacentres.geojson"))) {
  download.file(
    url = "https://raw.githubusercontent.com/stevesong/Africa-Datacentres/refs/heads/main/Africa_datacentres_05Jan2025.geojson",
    destfile = file.path(resources, "datacentres.geojson"),
    method = "curl"
  )
}

# read geojson file
datacentres <- geojsonsf::geojson_sf(file.path(resources, "datacentres.geojson"))

# clean column names and filter data
datacentres <- datacentres %>%
  clean_names() %>%
  filter(grepl("south africa", country, ignore.case = TRUE))

# convert char to num where appropriate
char_to_num <- function(col) {
  num <- suppressWarnings(as.numeric(col))
  if (all(is.na(num))) {
    return(col)
  } else {
    return(num)
  }
}

datacentres <- datacentres %>%
  mutate(across(
    where(is.character) & !matches("post_code"),
    ~ char_to_num(.x)
  ))

# clean char columns
datacentres <- datacentres %>%
  mutate(across(
    where(is.character),
    ~ case_when(
      . == "" ~ NA,
      TRUE ~ stringr::str_squish(.)
    )
  ))

# take a look at the structure
glimpse(datacentres)
#> Rows: 17
#> Columns: 15
#> $ tel             <chr> NA, "+27 (0) 11 573 2800", NA, "+27 (0) 11 573 2800", …
#> $ country         <chr> "South Africa", "South Africa", "South Africa", "South…
#> $ address2        <chr> "Elfindale", "Rondebosch", "Rondebosch", "Brackenfell"…
#> $ city            <chr> "Cape Town", "Cape Town", "Cape Town", "Cape Town", "C…
#> $ address         <chr> "108 De Waal Rd", "Great Westerford Building", "Belmon…
#> $ it_power        <dbl> 9.0, 3.0, 2.0, 50.0, 5.0, 4.0, 2.0, 20.0, NA, NA, NA, …
#> $ sqm_white_space <dbl> 2700, 2600, 700, 18000, 1500, 2000, 1500, NA, NA, 2000…
#> $ post_code       <chr> "7945", NA, NA, NA, NA, "4000", NA, NA, "1614", NA, NA…
#> $ subsidiary_url  <chr> "https://www.africadatacentres.com/", "https://www.ter…
#> $ url             <chr> "https://www.cassavatechnologies.com/", "https://www.d…
#> $ short_name      <chr> "CPT1", "CT1", "CPT1", "CT2", "CPT2", "DUR1", "DB1", "…
#> $ subsidiary      <chr> "Africa DataCentres", "Teraco", NA, "Teraco", NA, NA, …
#> $ company         <chr> "Cassava Technologies", "Digital Realty", "Open Access…
#> $ exchange_host   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, "NAPAfrica IX", NA…
#> $ geometry        <POINT [°]> POINT (18.47787 -34.03742), POINT (18.4649 -33.9…

Step 2: Create Provincial Boundaries

For many use cases, spatial data by subplace is too granular and computationally cumbersome. The dimensions of the subplaces dataset can be reduced by aggregating the geometries of individual subplaces toward less granular administrative levels/geographic units. In this example, we merely want to plot provincial borders.

There are various ways to accomplish this feat and below we’ll benchmark a couple of similar approaches:

  • rmapshaper relying on the mapshaper tool written in JavaScript. We benchmark the package’s ms_dissolve with and without the use of a locally installed mapshaper library.

  • sf package’s well-known st_union to combine several feature geometries into one with- or without resolving internal boundaries.

Each approach has their own peculiar costs and benenfits. The execution time will also differ dramatically depending on each function’s given set of arguments, as well as the given spatial data of interest. It is up to the user to determine the best approach for their use case.

# dissolve internal boundaries
tictoc::tic("dissolve")
provinces <- subplaces %>%
  rmapshaper::ms_dissolve(field = "province_name")
tictoc::toc()
#> dissolve: 6.87 sec elapsed
# confirm local installation
# rmapshaper::check_sys_mapshaper()

# dissolve internal boundaries + local library
tictoc::tic("dissolve_sys")
provinces <- subplaces %>%
  rmapshaper::ms_dissolve(
    field = "province_name",
    sys = TRUE,
    sys_mem = 6,
    quiet = TRUE
  )
tictoc::toc()
#> dissolve_sys: 4.39 sec elapsed
# geometric union of set of features into a single one
tictoc::tic("st_union")
provinces <- subplaces %>%
  group_by(province_name) %>%
  summarise(geometry = st_union(geometry, by_feature = FALSE))
tictoc::toc()
#> st_union: 47.88 sec elapsed

The resulting provinces object (in this case from ms_dissolve) entails only the names and geometries for each province, and looks like this:

ggplot() +
  geom_sf(
    data = provinces,
    aes(color = province_name),
    fill = "white",
    linewidth = 1,
    show.legend = FALSE
  )

Step 3: Basic Datacentre Visualisation

Now that we have the provincial backdrop, we’re able to plot the datacentres. Fortunately, in this case, it wasn’t really necessary to transform the coordinate reference systems (CRS) of either datacentres or provinces. However, for completeness, we can ensure their consistency:

datacentres <- st_transform(
  x = datacentres,
  crs = st_crs(provinces)
)

ggplot2 and geom_sf deals with the distinct objects’ bounding boxes and aspect ratios, and the resulting plot passes the smell test. If we intend on hardcoding figure dimensions, we should remember to determine the appropriate dimensions for our new plot. As provinces is the larger of the two, and datacentres is entirely contained within the former, we’ll only need the aspect ratio of provinces.

plot_asp <- get_asp(provinces, 6)

Using the values in plot_asp, we can set fig.height and fig.width in the next code chunk. However, introducing additional plot elements like titles and legends would obviously impact the optimal aspect ratio of the output, but one can use get_asp as a starting point and incrementally revise from there. Here is the result of all our hard work:

datacentres %>%
  ggplot() +
  geom_sf(
    # plot's CRS defaults to first layer
    data = provinces,
    fill = NA,
  ) +
  geom_sf(
    aes(color = company),
    data = datacentres,
    size = 10,
    alpha = 0.2
  ) +
  labs(
    title = "Open Access Datacentres in South Africa",
    color = "Company"
  ) +
  guides(colour = guide_legend(override.aes = list(size = 5)))

Step 4: Advanced Faceted Visualization by Province

Creating a multi-panel visualisation by province allows for a more detailed examination.

First, we’ll need to disable S2 spherical geometry to avoid topological errors that can occur when joining spatial objects. Then we’ll spatially join our datacentres with provinces to determine which datacentres fall within each province.

# Disable S2 spherical geometry to avoid topological errors
sf_use_s2(FALSE)
#> Spherical geometry (s2) switched off
# join datasets by checking whether x is within y
combined <- st_join(
  x = datacentres,
  y = provinces,
  join = st_within,
  left = FALSE
)
#> although coordinates are longitude/latitude, st_within assumes that they are
#> planar
# Check which provinces contain datacentres
unique(combined$province_name)
#> [1] "Western Cape"  "KwaZulu-Natal" "Gauteng"

The spatial join preserves the datacentre geometries while associating each point with its containing province. This shows us that datacentres are concentrated in just three provinces.

Standard plot faceting doesn’t work well with spatial data and geom_sf because the axes cannot be scaled freely while maintaining the correct geographic proportions. Therefore, we’ll use a custom approach with cowplot to create a composite. We’ll need to create individual plots for each province.

# Create a list of plots, one for each province with datacentres
provinces_plots <- purrr::map(
  unique(combined$province_name),
  function(x) {
    ggplot() +
      geom_sf(
        data = provinces %>%
          filter(province_name == x),
        fill = NA
      ) +
      geom_sf(
        data = combined %>%
          filter(province_name == x),
        aes(color = company),
        size = 10,
        alpha = 0.5,
        show.legend = FALSE
      )
  }
)

For a complete visualization, we need a shared legend. We’ll extract a legend from a dummy plot and then combine it with our province plots.

# Extract a legend from a dummy plot
plot_legend <- cowplot::get_plot_component(
  plot = {
    combined %>%
      ggplot() +
      geom_sf(
        aes(color = company),
        size = 10,
        alpha = 0.5,
        show.legend = TRUE
      ) +
      guides(color = guide_legend(
        "Company",
        nrow = 1,
        override.aes = list(size = 5)
      )) +
      theme(legend.position = "bottom")
  },
  "guide-box-bottom",
  return_all = TRUE
)

# Arrange the individual province plots in a row
plot_top_row <- cowplot::plot_grid(
  plotlist = provinces_plots,
  ncol = 3,
  labels = c("Western Cape", "Kwa-Zulu Natal", "Gauteng"),
  label_x = 0.5,
  hjust = 0.5
)

# Combine the plots with the legend underneath
cowplot::plot_grid(
  plot_top_row,
  plot_legend,
  ncol = 1,
  rel_heights = c(1, .1)
)

This composite visualization shows that South Africa’s datacentres are concentrated in three provinces, with different companies maintaining presence across these economic hubs. The approach demonstrates how sasf can be combined with other packages to create informative visualisations and analyses.

Acknowledgements

Session Information

sessionInfo()
#> R version 4.5.0 (2025-04-11 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26100)
#> 
#> Matrix products: default
#>   LAPACK version 3.12.1
#> 
#> locale:
#> [1] LC_COLLATE=English_United Kingdom.utf8 
#> [2] LC_CTYPE=English_United Kingdom.utf8   
#> [3] LC_MONETARY=English_United Kingdom.utf8
#> [4] LC_NUMERIC=C                           
#> [5] LC_TIME=English_United Kingdom.utf8    
#> 
#> time zone: Africa/Johannesburg
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] janitor_2.2.1 ragg_1.4.0    sf_1.0-20     ggplot2_3.5.2 tidyr_1.3.1  
#> [6] dplyr_1.1.4   sasf_1.0.0   
#> 
#> loaded via a namespace (and not attached):
#>  [1] s2_1.1.8           utf8_1.2.5         generics_0.1.4     class_7.3-23      
#>  [5] KernSmooth_2.23-26 lattice_0.22-7     stringi_1.8.7      digest_0.6.37     
#>  [9] magrittr_2.0.3     evaluate_1.0.3     grid_4.5.0         timechange_0.3.0  
#> [13] RColorBrewer_1.1-3 fastmap_1.2.0      jsonlite_2.0.0     lobstr_1.1.2      
#> [17] e1071_1.7-16       DBI_1.2.3          purrr_1.0.4        scales_1.4.0      
#> [21] rmapshaper_0.5.0   codetools_0.2-20   textshaping_1.0.1  cli_3.6.5         
#> [25] rlang_1.1.6        units_0.8-7        cowplot_1.1.3      withr_3.0.2       
#> [29] yaml_2.3.10        tools_4.5.0        geojsonsf_2.0.3    curl_6.2.2        
#> [33] vctrs_0.6.5        R6_2.6.1           proxy_0.4-27       lifecycle_1.0.4   
#> [37] lubridate_1.9.4    classInt_0.4-11    snakecase_0.11.1   stringr_1.5.1     
#> [41] V8_6.0.3           pkgconfig_2.0.3    pillar_1.10.2      gtable_0.3.6      
#> [45] glue_1.8.0         Rcpp_1.0.14        systemfonts_1.2.3  xfun_0.52         
#> [49] tibble_3.2.1       tidyselect_1.2.1   knitr_1.50         farver_2.1.2      
#> [53] htmltools_0.5.8.1  labeling_0.4.3     rmarkdown_2.29     wk_0.9.4          
#> [57] compiler_4.5.0     prettyunits_1.2.0  sp_2.2-0

About

Easy Access to Spatial Data for South Africa in R

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Contributors

Languages