Skip to content
Merged
Show file tree
Hide file tree
Changes from 75 commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
b3bf693
Update names of pipelines
Mar 4, 2025
918695b
Small aesthetic changes to layout markdowns
Mar 4, 2025
e1fbc0d
Create functions to add observation cyclus and correct species names
Mar 4, 2025
48abc51
Add markdwon report for expl_ana_targets
Mar 4, 2025
3604e17
Update meta
Mar 4, 2025
609f55e
Change order of arguments
Mar 6, 2025
0f44f64
Create list of analysed species in ABV, with scientific names
Mar 6, 2025
d6d7476
Update species names so that they match in both data sets
Mar 6, 2025
4e8ef7e
Extend pipeline
Mar 6, 2025
508f624
Add plot
Mar 6, 2025
0964110
Add function to add rareness category
Mar 9, 2025
4d0352d
Add documentation
Mar 9, 2025
951a4d7
Add additional comparisons
Mar 10, 2025
7242234
Add code for debugging
Mar 10, 2025
5081a56
Add function to compare trends and functionality to keep category inf…
Mar 10, 2025
d495df5
Add "filter" to weigh data based on total observations in time periods
Mar 10, 2025
634b42d
Update meta
Mar 10, 2025
e6078d0
Add trend comparison over cyclus
Mar 11, 2025
623a44e
Clean data reading
Mar 11, 2025
36fabd9
Move file
Mar 11, 2025
8d15258
Functions imported from mbag-mas to find scientific name
Mar 11, 2025
acc5457
Update data name
Mar 11, 2025
418046b
Update meta
Mar 11, 2025
713948b
Apply filter 3 to both ABV and cube data
Mar 11, 2025
9c3eba5
projectId
Mar 17, 2025
921ea5f
Test for species names
Mar 17, 2025
4d4c631
Add titles to all plots indicating structured or unstructered data wa…
Mar 17, 2025
3be523a
Add histogram for coordinata uncertainty
Mar 17, 2025
823a8e7
Add 10km utm grid
Mar 24, 2025
acfd5d6
Load data aggregated at 10km² for comparison
Mar 24, 2025
4e0da0e
Fix text
Mar 24, 2025
615be9b
Fix error in settings of coordinate uncertainty, regenerate cubes
Mar 27, 2025
ef626d6
Regenerate cube now that random grid desgination is fixed
Mar 27, 2025
0922838
Add branching to pipeline for year/cycle and 1km/10km
Apr 1, 2025
b76f2cc
Simplify visnetwork
Apr 1, 2025
f890347
Add ways to identify different branches data
Apr 1, 2025
ab91d31
update meta
Apr 1, 2025
de3dd8a
Add filter for the correct data from the right branch of the pipeline
Apr 1, 2025
3e8b028
Exclude fossil specimens and specimens living in collections
Apr 7, 2025
378380e
Rerun pipeline
Apr 7, 2025
797b604
Add identifiers for branching over dateset, time period and spatial r…
Apr 11, 2025
740925b
Change names for easier use in targets pipeline and turn randomisatio…
Apr 11, 2025
c9b31ea
Rewrite functions to work in the branched target pipeline
Apr 11, 2025
5a3f078
Simplify pipeline with branching
Apr 11, 2025
3dcd0ac
Update meta
Apr 11, 2025
06427da
Small aesthetic changes
Apr 11, 2025
737ea89
Simplify pipeline for trend comparison
Apr 11, 2025
a30cd96
.
Apr 11, 2025
826c54d
Rewrite tren comparison function to work in braching pipeline
Apr 11, 2025
d6abace
Update meta
Apr 11, 2025
76b5e78
Add category to range_comp data
Apr 11, 2025
01b2e0b
Update meta
Apr 11, 2025
f34773b
First adaptations to new outputs from timeline, categories aren't cor…
Apr 11, 2025
7dbb960
Update meta
Apr 14, 2025
7e7a9de
Some more adjustments to deal with the branching
Apr 14, 2025
8b60f33
Extract correct data from pipeline
Apr 14, 2025
1c13740
Get data from GBIF now that randomization is fixed, set uncertainty t…
Apr 15, 2025
8f85693
Get data from GBIF now that randomization is fixed
Apr 15, 2025
d07e20e
Drop rows with NA's
Apr 15, 2025
96eb9e9
Make sure data is grouped (important when looking per cyclus)
Apr 15, 2025
e958ad7
Update meta
Apr 15, 2025
e929ee2
Adjust visualisation
Apr 15, 2025
efceada
Update pipeline to include branching
Apr 15, 2025
3f5e600
Update data reading functions
Apr 15, 2025
0da7036
Update meta
Apr 15, 2025
cca9cb3
coding style
wlangera Apr 16, 2025
b8a13c2
Add filter 4
Apr 17, 2025
d928c6d
Update meta
Apr 17, 2025
500cbff
Add different visualisations, to do: limit shared data to best and wo…
Apr 17, 2025
31ddaa0
Script to generate figure for visualising the principles of the trend…
Apr 18, 2025
1bbe961
Add plot layouts to use in report
Apr 18, 2025
de0ba6b
pull origin
wlangera Aug 4, 2025
3625c72
some coding style
wlangera Aug 4, 2025
e9d2859
Merge pull request #6 from b-cubed-eu/review-ward
wlangera Aug 4, 2025
9969df9
Fix checklist stuff
Aug 4, 2025
7af0cbf
Update data names and following dependencies
Aug 5, 2025
3da3081
Update README
Aug 12, 2025
3c4bf7c
Remove old versions of code and move certain scripts
Aug 12, 2025
5627d02
Add documentation to README
Aug 12, 2025
1866286
Checklist
Aug 12, 2025
f38662a
Add title
Aug 12, 2025
e9d08b6
Update README.md
EmmaCartuyvels1 Aug 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions _targets.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
target_workflow:
script: C:/R/git_repositories/comp-unstructured-data/source/pipelines/target_workflow/_targets.R
store: C:/R/git_repositories/comp-unstructured-data/source/pipelines/target_workflow/_targets
use_crew: yes
biodiversity_indicators:
script: C:/R/git_repositories/comp-unstructured-data/source/pipelines/biodiversity_indicators/_targets.R
store: C:/R/git_repositories/comp-unstructured-data/source/pipelines/biodiversity_indicators/_targets
use_crew: yes
exploratory_analysis:
script: C:/R/git_repositories/comp-unstructured-data/source/pipelines/exploratory_analysis/_targets.R
store: C:/R/git_repositories/comp-unstructured-data/source/pipelines/exploratory_analysis/_targets
use_crew: yes
1 change: 1 addition & 0 deletions comp-unstructured-data.Rproj
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
Version: 1.0
ProjectId: 917f1e07-7bf8-4404-b0ed-c2b02a93dc01

RestoreWorkspace: Default
SaveWorkspace: Default
Expand Down
Binary file added data/raw/utm_grid/utm10_vlgrens_zBRU.dbf
Binary file not shown.
1 change: 1 addition & 0 deletions data/raw/utm_grid/utm10_vlgrens_zBRU.prj
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
PROJCS["Belge_Lambert_1972",GEOGCS["GCS_Belge_1972",DATUM["D_Belge_1972",SPHEROID["International_1924",6378388.0,297.0]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic"],PARAMETER["False_Easting",150000.01256],PARAMETER["False_Northing",5400088.4378],PARAMETER["Central_Meridian",4.367486666666666],PARAMETER["Standard_Parallel_1",49.8333339],PARAMETER["Standard_Parallel_2",51.16666733333333],PARAMETER["Latitude_Of_Origin",90.0],UNIT["Meter",1.0]],VERTCS["Oostende",VDATUM["Oostende"],PARAMETER["Vertical_Shift",0.0],PARAMETER["Direction",1.0],UNIT["Meter",1.0]]
Binary file added data/raw/utm_grid/utm10_vlgrens_zBRU.sbn
Binary file not shown.
Binary file added data/raw/utm_grid/utm10_vlgrens_zBRU.sbx
Binary file not shown.
Binary file added data/raw/utm_grid/utm10_vlgrens_zBRU.shp
Binary file not shown.
Binary file added data/raw/utm_grid/utm10_vlgrens_zBRU.shx
Binary file not shown.
11 changes: 11 additions & 0 deletions inst/en_gb.dic
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
Algemene
Anthus
Bosonderzoek
Broedvogelmonitoring
Broedvogels
Cartuyvels
Cetti's
Cettia
Chloris
Cyanistes
Daele
Databricks
Dendrocopos
Expand All @@ -20,6 +23,7 @@ Laridae
Larus
Luscinia
MGRS
Motacilla
Natuur
OOSTENDE
Parus
Expand All @@ -35,23 +39,30 @@ Watervogels
abv
argentatus
birdcube
caeruleus
cetti
chloris
color
communis
datacube
datacubes
domesticus
eBird
flava
fuscus
gbi
ies
labeled
megarhynchos
modularis
montanus
org
rubicola
sublicensable
synched
tabset
torquatus
trivialis
utm
voor
waarnemingen
Expand Down
302 changes: 302 additions & 0 deletions source/Prepare_data_10km.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,302 @@
---
title: "Download and prepare ABV and cube data at 10km² grid"
author: "Ward Langeraert, Emma Cartuyvels"
date: "`r Sys.Date()`"
output:
html_document:
code_folding: show
toc: true
toc_float: true
toc_collapsed: true
editor_options:
chunk_output_type: console
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

```{r, warning=FALSE, message=FALSE}
# Load packages
library(tidyverse) # Data wrangling and visualisation
library(zen4R) # Download from zenodo
library(here) # Relative paths
library(sf) # Work with spatial data

# Source
source(here("source/R/download_occ_cube.R"))

# Data path and create directory if necessary
data_path <- here("data", "raw")
dir.create(data_path, showWarnings = FALSE, recursive = TRUE)
```

# Goal

Load and save structured data of the “Common Breeding Bird Survey Flanders” (ABV) at 10km² grid.
Load and save unstructured data at 10km² grid.

# Structured data

## Occurrence data

The ABV data is downloaded as a cube from GBIF.org.
The zip file is stored under *./data/raw*.

> GBIF.org (15 April 2025) GBIF Occurrence Download https://doi.org/10.15468/dl.hdwm9t

```{r}
# nolint start: line_length_linter.
query_abv <- "SELECT
\"year\",
GBIF_MGRSCode(10000, decimalLatitude, decimalLongitude,
COALESCE(coordinateUncertaintyInMeters, 1000)) AS mgrsCode,
speciesKey,
species,
family,
COUNT(*) AS n,
MIN(COALESCE(coordinateUncertaintyInMeters, 1000)) AS minCoordinateUncertaintyInMeters,
IF(ISNULL(family), NULL, SUM(COUNT(*)) OVER (PARTITION BY family)) AS familyCount
FROM
occurrence
WHERE
occurrenceStatus = 'PRESENT'
AND NOT occurrence.basisofrecord IN ('FOSSIL_SPECIMEN', 'LIVING_SPECIMEN')
AND NOT ARRAY_CONTAINS(issue, 'ZERO_COORDINATE')
AND NOT ARRAY_CONTAINS(issue, 'COORDINATE_OUT_OF_RANGE')
AND NOT ARRAY_CONTAINS(issue, 'COORDINATE_INVALID')
AND NOT ARRAY_CONTAINS(issue, 'COUNTRY_COORDINATE_MISMATCH')
AND level1gid = 'BEL.2_1'
AND \"year\" >= 2007
AND \"year\" <= 2022
AND speciesKey IS NOT NULL
AND decimalLatitude IS NOT NULL
AND decimalLongitude IS NOT NULL
AND class = 'Aves'
AND collectionCode = 'ABV'
GROUP BY
\"year\",
mgrsCode,
speciesKey,
family,
species
ORDER BY
\"year\" ASC,
mgrsCode ASC,
speciesKey ASC"
# nolint end

abv_data_total <- download_occ_cube(
sql_query = query_abv,
file = "abv_data_10km.csv",
path = data_path,
overwrite = FALSE
)
```

We get a big dataframe with all occurrences.

```{r}
# Explore dataframe
glimpse(abv_data_total)
```

# Unstructured data

The cube data is downloaded from GBIF.org.
The zip file is stored under *./data/raw*.

> GBIF.org (15 April 2025) GBIF Occurrence Download https://doi.org/10.15468/dl.75hgxm

```{r}
# nolint start: line_length_linter.
query_birdcube <- "SELECT
\"year\",
GBIF_MGRSCode(10000, decimalLatitude, decimalLongitude,
COALESCE(coordinateUncertaintyInMeters, 10000)) AS mgrsCode,
speciesKey,
species,
family,
COUNT(*) AS n,
MIN(COALESCE(coordinateUncertaintyInMeters, 10000)) AS minCoordinateUncertaintyInMeters,
IF(ISNULL(family), NULL, SUM(COUNT(*)) OVER (PARTITION BY family)) AS familyCount
FROM
occurrence
WHERE
occurrenceStatus = 'PRESENT'
AND NOT occurrence.basisofrecord IN ('FOSSIL_SPECIMEN', 'LIVING_SPECIMEN')
AND NOT ARRAY_CONTAINS(issue, 'ZERO_COORDINATE')
AND NOT ARRAY_CONTAINS(issue, 'COORDINATE_OUT_OF_RANGE')
AND NOT ARRAY_CONTAINS(issue, 'COORDINATE_INVALID')
AND NOT ARRAY_CONTAINS(issue, 'COUNTRY_COORDINATE_MISMATCH')
AND level1gid = 'BEL.2_1'
AND \"year\" >= 2007
AND \"year\" <= 2022
AND speciesKey IS NOT NULL
AND decimalLatitude IS NOT NULL
AND decimalLongitude IS NOT NULL
AND class = 'Aves'
AND collectionCode != 'ABV'
GROUP BY
\"year\",
mgrsCode,
speciesKey,
family,
species
ORDER BY
\"year\" ASC,
mgrsCode ASC,
speciesKey ASC"
# nolint end

birdcube_data_total <- download_occ_cube(
sql_query = query_birdcube,
file = "birdcube_10km.csv",
path = data_path,
overwrite = FALSE
)
```

We get a big dataframe with all occurrences.

```{r}
# Explore dataframe
glimpse(birdcube_data_total)
```

# Select Flanders grid cells
The datacubes cover multiple zones although Flanders is present only in zone 31U.

```{r}
# Number of rows per zone
table(substring(abv_data_total$mgrscode, 1, 3))
```

We load in the UTM grid for Flanders (10 km) and add 31U to the tag names.

```{r}
# Read UTM 10 km grid and add new column with correct MGRS code
utm_grid <- read_sf(file.path(data_path, "utm_grid", "utm10_vlgrens_zBRU.shp"))
utm_grid <- utm_grid %>%
mutate(mgrscode = paste0("31U", TAG))

# Explore dataframe
glimpse(utm_grid)
```

We add the geometry to the data layers by taking an inner join.

```{r}
# Add UTM geometry by taking an inner join
abv_data_total_sf <- utm_grid %>%
inner_join(abv_data_total, by = join_by(mgrscode)) %>%
st_sf(sf_column_name = "geometry")

# Visualise spatial distribution of the ABV data
utm_grid %>%
left_join(abv_data_total %>%
group_by(mgrscode) %>%
summarise(n_species = n_distinct(species), .groups = "drop"),
by = join_by(mgrscode)) %>%
ggplot() +
geom_sf(aes(fill = n_species), col = alpha("white", 0)) +
scale_fill_viridis_c(option = "inferno") +
ggtitle("ABV data")
```

We select cube data from Flanders and add the geometry to the data layers by taking an inner join.

```{r}
# Add UTM geometry and select data by taking an inner join
birdcube_data_total_sf <- utm_grid %>%
inner_join(birdcube_data_total, by = join_by(mgrscode)) %>%
st_sf(sf_column_name = "geometry")
```

```{r}
# Visualise spatial distribution data cube as number of species
utm_grid %>%
left_join(birdcube_data_total %>%
group_by(mgrscode) %>%
summarise(n_species = n_distinct(species), .groups = "drop"),
by = join_by(mgrscode)) %>%
ggplot() +
geom_sf(aes(fill = n_species), col = alpha("white", 0)) +
scale_fill_viridis_c(option = "inferno") +
ggtitle("Bird cube data from Flanders")
```

# Correction of species names

There are some double accepted species names that cause trouble.

```{r}
abv_data_total_sf <- abv_data_total_sf %>%
mutate(
species = case_when(
species == "Dendrocopus major" ~ "Dendrocopos major",
species == "Saxicola torquatus" ~ "Saxicola rubicola",
TRUE ~ species
),
specieskey = case_when(
species == "Dendrocopos major" ~ 2477968,
species == "Saxicola rubicola" ~ 4408759,
TRUE ~ specieskey
)
)
```

```{r}
birdcube_data_total_sf <- birdcube_data_total_sf %>%
mutate(
species = case_when(
species == "Poecile montanus" ~ "Parus montanus",
TRUE ~ species
),
specieskey = case_when(
species == "Parus montanus" ~ 4409010,
TRUE ~ specieskey
)
)
```

# Write out data

We select the columns we want in a logical order:

```{r}
abv_data_out_sf <- abv_data_total_sf %>%
select("mgrscode", "year", "specieskey", "species", "family", "n",
"mincoordinateuncertaintyinmeters", "familycount", "geometry")
abv_data_out <- st_drop_geometry(abv_data_out_sf)

birdcube_data_out_sf <- birdcube_data_total_sf %>%
select("mgrscode", "year", "specieskey", "species", "family", "n",
"mincoordinateuncertaintyinmeters", "familycount", "geometry")
birdcube_data_out <- st_drop_geometry(birdcube_data_out_sf)
```

We write out the data for exploration and analysis.

```{r}
out_path <- here("data", "interim")
dir.create(out_path, showWarnings = FALSE, recursive = TRUE)

# Structured data
## CSV
write_csv(abv_data_out,
file.path(out_path, "abv_data_cube_10km.csv"))

## Spatial object
write_sf(abv_data_out_sf,
file.path(out_path, "abv_data_cube_10km.gpkg"))

# Unstructured data
## CSV
write_csv(birdcube_data_out,
file.path(out_path, "birdflanders_cube_10km.csv"))

## Spatial object
write_sf(birdcube_data_out_sf,
file.path(out_path, "birdflanders_cube_10km.gpkg"))
```
5 changes: 3 additions & 2 deletions source/R/download_occ_cube.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ download_occ_cube <- function(sql_query, file, path, overwrite = FALSE) {
file_path <- file.path(path, file)
if (file.exists(file_path) && !overwrite) {
message(paste("File already exists. Reading existing file.",
"Set `overwrite = TRUE` to overwrite file.", sep = "\n"))
"Set `overwrite = TRUE` to overwrite file.", sep = "\n"))

occ_cube <- readr::read_csv(file = file_path, show_col_types = FALSE)

Expand All @@ -34,7 +34,8 @@ download_occ_cube <- function(sql_query, file, path, overwrite = FALSE) {
readr::write_csv(
x = occ_cube,
file = file_path,
append = FALSE)
append = FALSE
)

# Return tibble
return(occ_cube)
Expand Down
Loading