diff --git a/README.md b/README.md index 522725e..5331ca2 100644 --- a/README.md +++ b/README.md @@ -23,19 +23,31 @@ European Union's Horizon Europe Research and Innovation Programme (ID No 1010595 -### Description - Scripts to explore the conditions that determine the reliability of models, trends and status by comparing aggregated cubes with structured monitoring schemes. This code is developed in context of **T4.5** of the [B-Cubed project](https://b-cubed.eu/). +### Analyses workflow +To download the latest version of each of the data sets run `prepare_abv_data.Rmd` and `Prepare_data_10km.Rmd`. +Alternatively, you can download the exact same data we used by following the GBIF links in these same Rmd's. +These data sets are saved under `data/raw`, the Rmd's further clean the data and add geometric properties. +The cleaned data is stored in both `.csv` and `.gpkg` format under `data/interim`. + +To get the list of ABV birds used to filter the data in both pipelines run the `get_abv_species.R` after sourcing the functions in `taxon_mapping.R`. + +To run the targets pipelines, run `run_pipeline.R` in the folder of the pipeline you want to run. +Afterwards, you can run the associated Rmd found under `reports`. + + ### Repo structure ``` -├── source ├ R markdown files -│ └── R ├ R scripts +├── source ├ +│ ├── pipelines ├ target pipelines https://books.ropensci.org/targets/ +│ ├── R ├ R scripts and R markdown files +│ └── reports ├ reports based on output from target pipelines ├── data │ ├── raw ├ create this folder and store raw data, see prepare_abv_data.Rmd │ ├── intermediate ├ store intermediate data diff --git a/_targets.yaml b/_targets.yaml index 2961a78..15fb01e 100644 --- a/_targets.yaml +++ b/_targets.yaml @@ -1,8 +1,8 @@ -target_workflow: - script: C:/R/git_repositories/comp-unstructured-data/source/pipelines/target_workflow/_targets.R - store: C:/R/git_repositories/comp-unstructured-data/source/pipelines/target_workflow/_targets - use_crew: yes biodiversity_indicators: script: C:/R/git_repositories/comp-unstructured-data/source/pipelines/biodiversity_indicators/_targets.R store: C:/R/git_repositories/comp-unstructured-data/source/pipelines/biodiversity_indicators/_targets use_crew: yes +exploratory_analysis: + script: C:/R/git_repositories/comp-unstructured-data/source/pipelines/exploratory_analysis/_targets.R + store: C:/R/git_repositories/comp-unstructured-data/source/pipelines/exploratory_analysis/_targets + use_crew: yes diff --git a/comp-unstructured-data.Rproj b/comp-unstructured-data.Rproj index 5383437..ef72e24 100644 --- a/comp-unstructured-data.Rproj +++ b/comp-unstructured-data.Rproj @@ -1,4 +1,5 @@ Version: 1.0 +ProjectId: 917f1e07-7bf8-4404-b0ed-c2b02a93dc01 RestoreWorkspace: Default SaveWorkspace: Default diff --git a/data/raw/utm_grid/utm10_vlgrens_zBRU.dbf b/data/raw/utm_grid/utm10_vlgrens_zBRU.dbf new file mode 100644 index 0000000..79540f4 Binary files /dev/null and b/data/raw/utm_grid/utm10_vlgrens_zBRU.dbf differ diff --git a/data/raw/utm_grid/utm10_vlgrens_zBRU.prj b/data/raw/utm_grid/utm10_vlgrens_zBRU.prj new file mode 100644 index 0000000..29c882d --- /dev/null +++ b/data/raw/utm_grid/utm10_vlgrens_zBRU.prj @@ -0,0 +1 @@ +PROJCS["Belge_Lambert_1972",GEOGCS["GCS_Belge_1972",DATUM["D_Belge_1972",SPHEROID["International_1924",6378388.0,297.0]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic"],PARAMETER["False_Easting",150000.01256],PARAMETER["False_Northing",5400088.4378],PARAMETER["Central_Meridian",4.367486666666666],PARAMETER["Standard_Parallel_1",49.8333339],PARAMETER["Standard_Parallel_2",51.16666733333333],PARAMETER["Latitude_Of_Origin",90.0],UNIT["Meter",1.0]],VERTCS["Oostende",VDATUM["Oostende"],PARAMETER["Vertical_Shift",0.0],PARAMETER["Direction",1.0],UNIT["Meter",1.0]] \ No newline at end of file diff --git a/data/raw/utm_grid/utm10_vlgrens_zBRU.sbn b/data/raw/utm_grid/utm10_vlgrens_zBRU.sbn new file mode 100644 index 0000000..264bd13 Binary files /dev/null and b/data/raw/utm_grid/utm10_vlgrens_zBRU.sbn differ diff --git a/data/raw/utm_grid/utm10_vlgrens_zBRU.sbx b/data/raw/utm_grid/utm10_vlgrens_zBRU.sbx new file mode 100644 index 0000000..6175108 Binary files /dev/null and b/data/raw/utm_grid/utm10_vlgrens_zBRU.sbx differ diff --git a/data/raw/utm_grid/utm10_vlgrens_zBRU.shp b/data/raw/utm_grid/utm10_vlgrens_zBRU.shp new file mode 100644 index 0000000..cd3812a Binary files /dev/null and b/data/raw/utm_grid/utm10_vlgrens_zBRU.shp differ diff --git a/data/raw/utm_grid/utm10_vlgrens_zBRU.shx b/data/raw/utm_grid/utm10_vlgrens_zBRU.shx new file mode 100644 index 0000000..70682fe Binary files /dev/null and b/data/raw/utm_grid/utm10_vlgrens_zBRU.shx differ diff --git a/inst/en_gb.dic b/inst/en_gb.dic index 490dd33..e6ee65b 100644 --- a/inst/en_gb.dic +++ b/inst/en_gb.dic @@ -1,10 +1,13 @@ Algemene +Anthus Bosonderzoek Broedvogelmonitoring Broedvogels Cartuyvels Cetti's Cettia +Chloris +Cyanistes Daele Databricks Dendrocopos @@ -20,11 +23,14 @@ Laridae Larus Luscinia MGRS +Motacilla Natuur OOSTENDE Parus Pielou Poecile +Rmd +Rmd's Saxicola Sui Teirlinckgebouw @@ -35,16 +41,22 @@ Watervogels abv argentatus birdcube +caeruleus cetti +chloris color +communis datacube datacubes +domesticus eBird +flava fuscus gbi ies labeled megarhynchos +modularis montanus org rubicola @@ -52,6 +64,7 @@ sublicensable synched tabset torquatus +trivialis utm voor waarnemingen diff --git a/source/R/Prepare_data_10km.Rmd b/source/R/Prepare_data_10km.Rmd new file mode 100644 index 0000000..d44f878 --- /dev/null +++ b/source/R/Prepare_data_10km.Rmd @@ -0,0 +1,302 @@ +--- +title: "Download and prepare ABV and cube data at 10km² grid" +author: "Ward Langeraert, Emma Cartuyvels" +date: "`r Sys.Date()`" +output: + html_document: + code_folding: show + toc: true + toc_float: true + toc_collapsed: true +editor_options: + chunk_output_type: console +--- + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = TRUE) +``` + +```{r, warning=FALSE, message=FALSE} +# Load packages +library(tidyverse) # Data wrangling and visualisation +library(zen4R) # Download from zenodo +library(here) # Relative paths +library(sf) # Work with spatial data + +# Source +source(here("source/R/download_occ_cube.R")) + +# Data path and create directory if necessary +data_path <- here("data", "raw") +dir.create(data_path, showWarnings = FALSE, recursive = TRUE) +``` + +# Goal + +Load and save structured data of the “Common Breeding Bird Survey Flanders” (ABV) at 10km² grid. +Load and save unstructured data at 10km² grid. + +# Structured data + +## Occurrence data + +The ABV data is downloaded as a cube from GBIF.org. +The zip file is stored under *./data/raw*. + +> GBIF.org (15 April 2025) GBIF Occurrence Download https://doi.org/10.15468/dl.hdwm9t + +```{r} +# nolint start: line_length_linter. +query_abv <- "SELECT + \"year\", + GBIF_MGRSCode(10000, decimalLatitude, decimalLongitude, + COALESCE(coordinateUncertaintyInMeters, 1000)) AS mgrsCode, + speciesKey, + species, + family, + COUNT(*) AS n, + MIN(COALESCE(coordinateUncertaintyInMeters, 1000)) AS minCoordinateUncertaintyInMeters, + IF(ISNULL(family), NULL, SUM(COUNT(*)) OVER (PARTITION BY family)) AS familyCount + FROM + occurrence + WHERE + occurrenceStatus = 'PRESENT' + AND NOT occurrence.basisofrecord IN ('FOSSIL_SPECIMEN', 'LIVING_SPECIMEN') + AND NOT ARRAY_CONTAINS(issue, 'ZERO_COORDINATE') + AND NOT ARRAY_CONTAINS(issue, 'COORDINATE_OUT_OF_RANGE') + AND NOT ARRAY_CONTAINS(issue, 'COORDINATE_INVALID') + AND NOT ARRAY_CONTAINS(issue, 'COUNTRY_COORDINATE_MISMATCH') + AND level1gid = 'BEL.2_1' + AND \"year\" >= 2007 + AND \"year\" <= 2022 + AND speciesKey IS NOT NULL + AND decimalLatitude IS NOT NULL + AND decimalLongitude IS NOT NULL + AND class = 'Aves' + AND collectionCode = 'ABV' + GROUP BY + \"year\", + mgrsCode, + speciesKey, + family, + species + ORDER BY + \"year\" ASC, + mgrsCode ASC, + speciesKey ASC" +# nolint end + +abv_data_total <- download_occ_cube( + sql_query = query_abv, + file = "abv_data_10km.csv", + path = data_path, + overwrite = FALSE +) +``` + +We get a big dataframe with all occurrences. + +```{r} +# Explore dataframe +glimpse(abv_data_total) +``` + +# Unstructured data + +The cube data is downloaded from GBIF.org. +The zip file is stored under *./data/raw*. + +> GBIF.org (15 April 2025) GBIF Occurrence Download https://doi.org/10.15468/dl.75hgxm + +```{r} +# nolint start: line_length_linter. +query_birdcube <- "SELECT + \"year\", + GBIF_MGRSCode(10000, decimalLatitude, decimalLongitude, + COALESCE(coordinateUncertaintyInMeters, 10000)) AS mgrsCode, + speciesKey, + species, + family, + COUNT(*) AS n, + MIN(COALESCE(coordinateUncertaintyInMeters, 10000)) AS minCoordinateUncertaintyInMeters, + IF(ISNULL(family), NULL, SUM(COUNT(*)) OVER (PARTITION BY family)) AS familyCount + FROM + occurrence + WHERE + occurrenceStatus = 'PRESENT' + AND NOT occurrence.basisofrecord IN ('FOSSIL_SPECIMEN', 'LIVING_SPECIMEN') + AND NOT ARRAY_CONTAINS(issue, 'ZERO_COORDINATE') + AND NOT ARRAY_CONTAINS(issue, 'COORDINATE_OUT_OF_RANGE') + AND NOT ARRAY_CONTAINS(issue, 'COORDINATE_INVALID') + AND NOT ARRAY_CONTAINS(issue, 'COUNTRY_COORDINATE_MISMATCH') + AND level1gid = 'BEL.2_1' + AND \"year\" >= 2007 + AND \"year\" <= 2022 + AND speciesKey IS NOT NULL + AND decimalLatitude IS NOT NULL + AND decimalLongitude IS NOT NULL + AND class = 'Aves' + AND collectionCode != 'ABV' + GROUP BY + \"year\", + mgrsCode, + speciesKey, + family, + species + ORDER BY + \"year\" ASC, + mgrsCode ASC, + speciesKey ASC" +# nolint end + +birdcube_data_total <- download_occ_cube( + sql_query = query_birdcube, + file = "birdcube_10km.csv", + path = data_path, + overwrite = FALSE +) +``` + +We get a big dataframe with all occurrences. + +```{r} +# Explore dataframe +glimpse(birdcube_data_total) +``` + +# Select Flanders grid cells +The datacubes cover multiple zones although Flanders is present only in zone 31U. + +```{r} +# Number of rows per zone +table(substring(abv_data_total$mgrscode, 1, 3)) +``` + +We load in the UTM grid for Flanders (10 km) and add 31U to the tag names. + +```{r} +# Read UTM 10 km grid and add new column with correct MGRS code +utm_grid <- read_sf(file.path(data_path, "utm_grid", "utm10_vlgrens_zBRU.shp")) +utm_grid <- utm_grid %>% + mutate(mgrscode = paste0("31U", TAG)) + +# Explore dataframe +glimpse(utm_grid) +``` + +We add the geometry to the data layers by taking an inner join. + +```{r} +# Add UTM geometry by taking an inner join +abv_data_total_sf <- utm_grid %>% + inner_join(abv_data_total, by = join_by(mgrscode)) %>% + st_sf(sf_column_name = "geometry") + +# Visualise spatial distribution of the ABV data +utm_grid %>% + left_join(abv_data_total %>% + group_by(mgrscode) %>% + summarise(n_species = n_distinct(species), .groups = "drop"), + by = join_by(mgrscode)) %>% + ggplot() + + geom_sf(aes(fill = n_species), col = alpha("white", 0)) + + scale_fill_viridis_c(option = "inferno") + + ggtitle("ABV data") +``` + +We select cube data from Flanders and add the geometry to the data layers by taking an inner join. + +```{r} +# Add UTM geometry and select data by taking an inner join +birdcube_data_total_sf <- utm_grid %>% + inner_join(birdcube_data_total, by = join_by(mgrscode)) %>% + st_sf(sf_column_name = "geometry") +``` + +```{r} +# Visualise spatial distribution data cube as number of species +utm_grid %>% + left_join(birdcube_data_total %>% + group_by(mgrscode) %>% + summarise(n_species = n_distinct(species), .groups = "drop"), + by = join_by(mgrscode)) %>% + ggplot() + + geom_sf(aes(fill = n_species), col = alpha("white", 0)) + + scale_fill_viridis_c(option = "inferno") + + ggtitle("Bird cube data from Flanders") +``` + +# Correction of species names + +There are some double accepted species names that cause trouble. + +```{r} +abv_data_total_sf <- abv_data_total_sf %>% + mutate( + species = case_when( + species == "Dendrocopus major" ~ "Dendrocopos major", + species == "Saxicola torquatus" ~ "Saxicola rubicola", + TRUE ~ species + ), + specieskey = case_when( + species == "Dendrocopos major" ~ 2477968, + species == "Saxicola rubicola" ~ 4408759, + TRUE ~ specieskey + ) + ) +``` + +```{r} +birdcube_data_total_sf <- birdcube_data_total_sf %>% + mutate( + species = case_when( + species == "Poecile montanus" ~ "Parus montanus", + TRUE ~ species + ), + specieskey = case_when( + species == "Parus montanus" ~ 4409010, + TRUE ~ specieskey + ) + ) +``` + +# Write out data + +We select the columns we want in a logical order: + +```{r} +abv_data_out_sf <- abv_data_total_sf %>% + select("mgrscode", "year", "specieskey", "species", "family", "n", + "mincoordinateuncertaintyinmeters", "familycount", "geometry") +abv_data_out <- st_drop_geometry(abv_data_out_sf) + +birdcube_data_out_sf <- birdcube_data_total_sf %>% + select("mgrscode", "year", "specieskey", "species", "family", "n", + "mincoordinateuncertaintyinmeters", "familycount", "geometry") +birdcube_data_out <- st_drop_geometry(birdcube_data_out_sf) +``` + +We write out the data for exploration and analysis. + +```{r} +out_path <- here("data", "interim") +dir.create(out_path, showWarnings = FALSE, recursive = TRUE) + +# Structured data +## CSV +write_csv(abv_data_out, + file.path(out_path, "abv_data_cube_10km.csv")) + +## Spatial object +write_sf(abv_data_out_sf, + file.path(out_path, "abv_data_cube_10km.gpkg")) + +# Unstructured data +## CSV +write_csv(birdcube_data_out, + file.path(out_path, "birdflanders_cube_10km.csv")) + +## Spatial object +write_sf(birdcube_data_out_sf, + file.path(out_path, "birdflanders_cube_10km.gpkg")) +``` diff --git a/source/R/download_occ_cube.R b/source/R/download_occ_cube.R index cc9469b..0525d47 100644 --- a/source/R/download_occ_cube.R +++ b/source/R/download_occ_cube.R @@ -7,7 +7,7 @@ download_occ_cube <- function(sql_query, file, path, overwrite = FALSE) { file_path <- file.path(path, file) if (file.exists(file_path) && !overwrite) { message(paste("File already exists. Reading existing file.", - "Set `overwrite = TRUE` to overwrite file.", sep = "\n")) + "Set `overwrite = TRUE` to overwrite file.", sep = "\n")) occ_cube <- readr::read_csv(file = file_path, show_col_types = FALSE) @@ -34,7 +34,8 @@ download_occ_cube <- function(sql_query, file, path, overwrite = FALSE) { readr::write_csv( x = occ_cube, file = file_path, - append = FALSE) + append = FALSE + ) # Return tibble return(occ_cube) diff --git a/source/R/get_abv_species.R b/source/R/get_abv_species.R new file mode 100644 index 0000000..57cba3d --- /dev/null +++ b/source/R/get_abv_species.R @@ -0,0 +1,40 @@ +# First source functions in taxon_mapping.R + +example_specs <- c( + "Cetti's zanger", "Putter", "Kleine mantelmeeuw", "Roek", "Kuifeend", + "Halsbandparkiet", "Aalscholver", "Kauw", "Buizerd", "Nijlgans", + "Roodborsttapuit", "Boomklever", "Meerkoet", "Zwarte roodstaart", + "Grote bonte specht", "Roodborst", "Krakeend", "Boomleeuwerik", + "Bonte vliegenvanger", "Grauwe gans", "Torenvalk", "Zwartkop", + "Witte kwikstaart", "Boomkruiper", "Grasmus", "Pimpelmees", "Vink", + "Boerenzwaluw", "Tjiftjaf", "Zwarte kraai", "Houtduif", "Kleine karekiet", + "Fazant", "Gaai", "Groene specht", "Ekster", "Koolmees", "Gele kwikstaart", + "Groenling", "Holenduif", "Winterkoning", "Scholekster", "Koekoek", + "Heggenmus", "Spreeuw", "Turkse tortel", "Veldleeuwerik", "Geelgors", + "Goudhaan", "Kuifmees", "Zilvermeeuw", "Matkop", "Huismus", "Wilde eend", + "Waterhoen", "Zanglijster", "Merel", "Tuinfluiter", "Zwarte mees", "Patrijs", + "Graspieper", "Fitis", "Stadsduif", "Wielewaal", "Grutto", "Kievit", + "Grote lijster", "Ringmus", "Sprinkhaanzanger", "Kokmeeuw", "Sperwer", + "Bruine kiekendief", "Fuut", "Gekraagde roodstaart", "Bergeend", "Kneu", + "Rietzanger", "Blauwe reiger", "Wulp", "Blauwborst", "Zwarte specht", + "Boompieper", "Rietgors", "Canadese gans", "Spotvogel", "Bosrietzanger", + "Knobbelzwaan", "Havik", "Glanskop", "Middelste Bonte Specht", "Tafeleend", + "Gierzwaluw", "Nachtegaal", "Huiszwaluw", "Staartmees", "Dodaars" +) +abv_ana_birds <- data.frame( + id = seq_along(example_specs), + dwc_vernacularName = example_specs, + dwc_class = rep("Aves", 3) +) +abv_ana_birds + +abv_birds <- map_taxa_from_vernacular( + vernacular_name_df = abv_ana_birds, + vernacular_name_col = "dwc_vernacularName", + filter_cols = list(class = "dwc_class"), + out_cols = "species", + increment = 100 +) +abv_birds + +write.csv(abv_birds, "./data/interim/abv_birds.csv") diff --git a/source/prepare_abv_data.Rmd b/source/R/prepare_abv_data.Rmd similarity index 83% rename from source/prepare_abv_data.Rmd rename to source/R/prepare_abv_data.Rmd index 92a5013..08a8946 100644 --- a/source/prepare_abv_data.Rmd +++ b/source/R/prepare_abv_data.Rmd @@ -45,7 +45,8 @@ dir.create(data_path, showWarnings = FALSE, recursive = TRUE) # Download data from zenodo if necessary file_sampling_framework <- file.path( data_path, - "steekproefkader.csv") + "steekproefkader.csv" +) if (file.exists(file_sampling_framework)) { sampling_framework_abv <- read_csv(file_sampling_framework, @@ -54,7 +55,8 @@ if (file.exists(file_sampling_framework)) { download_zenodo( doi = "10.5281/zenodo.10103472", path = data_path, - files = list("steekproefkader.csv")) + files = list("steekproefkader.csv") + ) sampling_framework_abv <- read_csv(file_sampling_framework, show_col_types = FALSE) @@ -70,27 +72,27 @@ head(sampling_framework_abv) ## Occurrence data -The ABV data is downloaded as a cube from GBIF.org. +The ABV data is downloaded as a cube from GBIF.org. We set the randomization to zero, since the monitoring takes place in specific squares. The zip file is stored under *./data/raw*. -> GBIF.org (10 March 2025) GBIF Occurrence Download https://doi.org/10.15468/dl.qqzgt3 +> GBIF.org (15 April 2025) GBIF Occurrence Download https://doi.org/10.15468/dl.ddzhrc ```{r} # nolint start: line_length_linter. query_abv <- "SELECT \"year\", - GBIF_MGRSCode(1000, decimalLatitude, decimalLongitude, - COALESCE(coordinateUncertaintyInMeters, 1000)) AS mgrsCode, + GBIF_MGRSCode(1000, decimalLatitude, decimalLongitude, 0) AS mgrsCode, speciesKey, species, family, COUNT(*) AS n, - MIN(COALESCE(coordinateUncertaintyInMeters, 1000)) AS minCoordinateUncertaintyInMeters, + MIN(COALESCE(coordinateUncertaintyInMeters, 0)) AS minCoordinateUncertaintyInMeters, IF(ISNULL(family), NULL, SUM(COUNT(*)) OVER (PARTITION BY family)) AS familyCount FROM occurrence WHERE occurrenceStatus = 'PRESENT' + AND NOT occurrence.basisofrecord IN ('FOSSIL_SPECIMEN', 'LIVING_SPECIMEN') AND NOT ARRAY_CONTAINS(issue, 'ZERO_COORDINATE') AND NOT ARRAY_CONTAINS(issue, 'COORDINATE_OUT_OF_RANGE') AND NOT ARRAY_CONTAINS(issue, 'COORDINATE_INVALID') @@ -132,10 +134,10 @@ glimpse(abv_data_total) # Unstructured data -The ABV data is downloaded as a cube from GBIF.org. +The cube data is downloaded as a cube from GBIF.org. The zip file is stored under *./data/raw*. -> GBIF.org (10 March 2025) GBIF Occurrence Download https://doi.org/10.15468/dl.eusvcc +> GBIF.org (15 April 2025) GBIF Occurrence Download https://doi.org/10.15468/dl.mn4ybb ```{r} # nolint start: line_length_linter. @@ -153,6 +155,7 @@ query_birdcube <- "SELECT occurrence WHERE occurrenceStatus = 'PRESENT' + AND NOT occurrence.basisofrecord IN ('FOSSIL_SPECIMEN', 'LIVING_SPECIMEN') AND NOT ARRAY_CONTAINS(issue, 'ZERO_COORDINATE') AND NOT ARRAY_CONTAINS(issue, 'COORDINATE_OUT_OF_RANGE') AND NOT ARRAY_CONTAINS(issue, 'COORDINATE_INVALID') @@ -233,9 +236,9 @@ utm_grid %>% left_join(sampling_framework_abv, by = join_by(mgrscode)) %>% st_sf(sf_column_name = "geometry") %>% ggplot() + - geom_sf(aes(fill = Stratum), - col = alpha("white", 0)) + - ggtitle("Sampling framework ABV") + geom_sf(aes(fill = Stratum), + col = alpha("white", 0)) + + ggtitle("Sampling framework ABV") ``` ```{r} @@ -252,8 +255,8 @@ utm_grid %>% by = join_by(mgrscode)) %>% mutate(sample = ifelse(is.na(presence), "no", "yes")) %>% ggplot() + - geom_sf(aes(fill = sample), col = alpha("white", 0)) + - ggtitle("ABV sampling locations") + geom_sf(aes(fill = sample), col = alpha("white", 0)) + + ggtitle("ABV sampling locations") ``` We select cube data from Flanders and add the geometry to the data layers by taking an inner join. @@ -273,12 +276,12 @@ utm_grid %>% summarise(n_species = n_distinct(species), .groups = "drop"), by = join_by(mgrscode)) %>% ggplot() + - geom_sf(aes(fill = n_species), col = alpha("white", 0)) + - scale_fill_viridis_c(option = "inferno") + - ggtitle("Bird cube data from Flanders") + geom_sf(aes(fill = n_species), col = alpha("white", 0)) + + scale_fill_viridis_c(option = "inferno") + + ggtitle("Bird cube data from Flanders") ``` -We see a striped pattern because of large uncertainty. +We see a dotted pattern because of large uncertainty. We filter `minCoordinateUncertaintyInMeters` smaller or equal to 1000 meters: ```{r} @@ -290,9 +293,9 @@ utm_grid %>% summarise(n_species = n_distinct(species), .groups = "drop"), by = join_by(mgrscode)) %>% ggplot() + - geom_sf(aes(fill = n_species), col = alpha("white", 0)) + - scale_fill_viridis_c(option = "inferno") + - ggtitle("Bird cube data from Flanders") + geom_sf(aes(fill = n_species), col = alpha("white", 0)) + + scale_fill_viridis_c(option = "inferno") + + ggtitle("Bird cube data from Flanders") ``` # Correction of species names @@ -305,11 +308,13 @@ abv_data_total_sf <- abv_data_total_sf %>% species = case_when( species == "Dendrocopus major" ~ "Dendrocopos major", species == "Saxicola torquatus" ~ "Saxicola rubicola", - TRUE ~ species), + TRUE ~ species + ), specieskey = case_when( species == "Dendrocopos major" ~ 2477968, species == "Saxicola rubicola" ~ 4408759, - TRUE ~ specieskey) + TRUE ~ specieskey + ) ) ``` @@ -318,10 +323,12 @@ birdcube_data_total_sf <- birdcube_data_total_sf %>% mutate( species = case_when( species == "Poecile montanus" ~ "Parus montanus", - TRUE ~ species), + TRUE ~ species + ), specieskey = case_when( species == "Parus montanus" ~ 4409010, - TRUE ~ specieskey) + TRUE ~ specieskey + ) ) ``` @@ -335,8 +342,9 @@ abv_data_out_sf <- abv_data_total_sf %>% "mincoordinateuncertaintyinmeters", "familycount", "geometry") abv_data_out <- st_drop_geometry(abv_data_out_sf) -colnames(sampling_framework_abv_sf) <- tolower(colnames( - sampling_framework_abv_sf)) +colnames(sampling_framework_abv_sf) <- tolower( + colnames(sampling_framework_abv_sf) +) sampling_framework_out_sf <- sampling_framework_abv_sf %>% select("mgrscode", "stratum", "oppervlakte", "landbouw", "urbaan", "bos", "suburbaan", "heideduin", "moeraswater", "geometry") @@ -357,22 +365,22 @@ dir.create(out_path, showWarnings = FALSE, recursive = TRUE) # Structured data ## CSV write_csv(abv_data_out, - file.path(out_path, "abv_data_cube.csv")) + file.path(out_path, "abv_data_cube_1km.csv")) write_csv(sampling_framework_out, file.path(out_path, "sampling_framework_abv.csv")) ## Spatial object write_sf(abv_data_out_sf, - file.path(out_path, "abv_data_cube.gpkg")) + file.path(out_path, "abv_data_cube_1km.gpkg")) write_sf(sampling_framework_out_sf, file.path(out_path, "sampling_framework_abv.gpkg")) # Unstructured data ## CSV write_csv(birdcube_data_out, - file.path(out_path, "birdcubeflanders.csv")) + file.path(out_path, "birdflanders_cube_1km.csv")) ## Spatial object write_sf(birdcube_data_out_sf, - file.path(out_path, "birdcubeflanders.gpkg")) + file.path(out_path, "birdflanders_cube_1km.gpkg")) ``` diff --git a/source/R/taxon_mapping.R b/source/R/taxon_mapping.R new file mode 100644 index 0000000..66c60e4 --- /dev/null +++ b/source/R/taxon_mapping.R @@ -0,0 +1,219 @@ +# Function to determine the structure of the data frames in a list +get_df_structure <- function(df_list) { + require("dplyr") + + non_na_df <- df_list %>% plyr::compact() %>% first() + empty_df <- purrr::map(non_na_df, ~ NA) + empty_df <- as_tibble(empty_df) + return(empty_df) +} + +# Function to find the taxon keys that match search values best +find_df_name <- function(df_list, search_value, lang = NA) { + require("dplyr") + + # Check how many times the value is present in each dataframe + contains_value <- purrr::map(df_list, function(df) { + if (is.na(lang)) { + vernacular_names <- df %>% + pull(.data$vernacularName) + } else { + vernacular_names <- df %>% + dplyr::filter(.data$language == lang) %>% + pull(.data$vernacularName) + } + + # Count number of matches (exact match and ignore case) + sum(grepl(paste0("^\\s*", search_value, "\\s*$"), vernacular_names, + ignore.case = TRUE)) + }) + + # Sort species keys from most to least matches + contains_value <- contains_value[order(unlist(contains_value), + decreasing = TRUE)] + contains_value <- contains_value[contains_value != 0] + + # Get species key with most matches + df_name <- names(contains_value)[1] + + # Return the name + return(df_name) +} + +# Return accepted taxonomic information +get_accepted_name_usage <- function(taxon_data) { + require("dplyr") + + if ("acceptedKey" %in% colnames(taxon_data)) { + taxon_key <- taxon_data %>% pull(.data$acceptedKey) + return(rgbif::name_usage(taxon_key)$data) + } else { + taxon_key <- taxon_data %>% pull(.data$key) + return(rgbif::name_usage(taxon_key)$data) + } +} + +# Match vernacular names with GBIF taxonomic backbone +# Inspired by: +# https://gist.github.com/damianooldoni/3fa9cc1ffa67377a9757df097d48d19f +match_vernacular_name <- function( + vernacular_name_df, + filter_cols = NULL, + lang = NA, + increment = 0, + ...) { + # Get vernacular name + vernacular_name <- pull(vernacular_name_df[1]) + + # Get limit + dots <- list(...) + if ("limit" %in% names(list(...))) { + limit <- dots$limit + dots <- dots[names(dots) != "limit"] + } else { + limit <- 100 + } + + # Loop variable + stop_loop <- FALSE + + # Loop until match is found or limit reaches maximum of 3000 + while (isFALSE(stop_loop) && limit < 3000) { + # Lookup vernacular names in GBIF backbone + gbif_lookup <- do.call( + rgbif::name_lookup, + c(list(vernacular_name), + list(datasetKey = "d7dddbf4-2cf0-4f39-9b2a-bb099caae36c"), + list(limit = limit), + dots) + ) + + # Return taxon data frame if match found + if (nrow(gbif_lookup$data) > 0) { + # Define vernacular names and taxon data + vernacular_names <- gbif_lookup$names + taxon_data <- gbif_lookup$data + + # Use filter columns if provided + if (!is.null(filter_cols)) { + # In case of NA values + cols_to_remove <- vernacular_name_df %>% + select(where(~ any(is.na(.)))) %>% + colnames() + filter_cols <- filter_cols[!filter_cols %in% cols_to_remove] + + # Create the join condition dynamically + join_condition <- stats::setNames(names(filter_cols), + unlist(filter_cols)) + + # Perform the inner join to select taxon data + taxon_data <- vernacular_name_df %>% + select(where(~ all(!is.na(.)))) %>% + inner_join(taxon_data, by = join_condition, keep = TRUE) %>% + select(-all_of(setdiff(colnames(vernacular_name_df), cols_to_remove))) + + # Use species keys to select vernacular names + indices <- names(vernacular_names) %in% taxon_data$key + vernacular_names <- vernacular_names[indices] + } + } else { + return(NA_character_) + } + + # Increment limit if required + if (length(vernacular_names) == 0 && increment > 0) { + limit <- limit + increment + } else { + stop_loop <- TRUE + } + } + + + # Search taxon key in vernacular names + if (length(vernacular_names) > 0) { + taxon_key <- find_df_name(vernacular_names, vernacular_name, lang) + + # Return NA if no good match found + if (is.na(taxon_key)) { + return(NA_character_) + # Return match with taxon key + } else { + out_data <- taxon_data[taxon_data$key == taxon_key, ] + out_data <- out_data[, colSums(is.na(out_data)) < nrow(out_data)] + + return(get_accepted_name_usage(out_data)) + } + } else { + return(NA_character_) + } +} + +# Input dataframe with vernacular names and get taxon information +map_taxa_from_vernacular <- function( + vernacular_name_df, + vernacular_name_col = "vernacularName", + out_cols = "scientificName", + filter_cols = NULL, + lang = NA, + increment = 0, + ...) { + require("dplyr") + require("tidyr") + + group_cols <- unlist(filter_cols, use.names = FALSE) + + # Match vernacular names to get taxonomic info + matched_names_df <- vernacular_name_df %>% + + # group by vernacular name and compact the data + group_by(across(all_of(c(vernacular_name_col, group_cols)))) %>% + nest() %>% + ungroup() %>% + nest(match_df = all_of(c(vernacular_name_col, group_cols))) %>% + + # find scientific name for each (distinct) vernacular name + mutate( + taxon_df = purrr::map( + .data$match_df, + match_vernacular_name, + filter_cols = filter_cols, + lang = lang, + increment = increment, + ... + ) + ) %>% + unnest("match_df") %>% + + # Remove unneeded columns + select(-"data") + + # Create output dataframe + out_df <- matched_names_df %>% + + # Unnest taxon dataframes + mutate(taxon_df = purrr::map(.data$taxon_df, ~ { + # Unnesting only possible if all dataframes have the same structure + if (all(is.na(.x))) { + get_df_structure(matched_names_df$taxon_df) + } else { + .x + } + }) + ) %>% + unnest("taxon_df") %>% + ungroup() %>% + + # Add other columns from input df + right_join( + vernacular_name_df, + by = c(vernacular_name_col, group_cols) + ) %>% + + # Set desired column(s) at the right side + select(all_of(names(vernacular_name_df)), all_of(out_cols)) %>% + + # Reorder rows based on original order in input df + right_join(vernacular_name_df, by = names(vernacular_name_df)) + + return(out_df) +} diff --git a/source/R/vis_trend_comp.R b/source/R/vis_trend_comp.R new file mode 100644 index 0000000..5cca2b0 --- /dev/null +++ b/source/R/vis_trend_comp.R @@ -0,0 +1,84 @@ +# Make figure to illustrate the principle of comparing trends + +set.seed(123) + +# Generate three types of relationships +n <- 20 + +# 1. Strong positive correlation +data1_pos <- cumsum(rnorm(n)) +data2_pos <- data1_pos + rnorm(n, sd = 1) +cor_pos <- cor(data1_pos, data2_pos) + +# 2. No correlation +data1_none <- cumsum(rnorm(n)) +data2_none <- rnorm(n) +cor_none <- cor(data1_none, data2_none) + +# 3. Strong negative correlation +data1_neg <- cumsum(rnorm(n)) +data2_neg <- -data1_neg + rnorm(n, sd = 1) +cor_neg <- cor(data1_neg, data2_neg) + +# Set up 3 rows and 2 columns of plots +par(mfrow = c(3, 2), mar = c(4, 4, 3, 1)) + +# ----------- Positive Correlation ----------- +# Time series +plot(data1_pos, + type = "l", + col = "#1B9E77", + lwd = 2, + ylim = range(c(data1_pos, data2_pos)), + ylab = "Value", + xlab = "Time", + main = "Two trends with positive correlation") +lines(data2_pos, col = "#D95F02", lwd = 2) +legend("topleft", + legend = c("Data 1", "Data 2"), + col = c("#1B9E77", "#D95F02"), + lwd = 2) + +# Scatterplot +plot(data1_pos, data2_pos, pch = 19, col = "darkgreen", + xlab = "Value in dataset 1", ylab = "Value in dataset 2", + main = paste("R =", round(cor_pos, 2))) +abline(lm(data2_pos ~ data1_pos), col = "black", lwd = 2, lty = 2) + +# ----------- No Correlation ----------- +plot(data1_none, + type = "l", + col = "#1B9E77", + lwd = 2, + ylim = range(c(data1_none, data2_none)), + ylab = "Value", xlab = "Time", + main = "Two trends with no correlation") +lines(data2_none, col = "#D95F02", lwd = 2) +legend("topleft", + legend = c("Data 1", "Data 2"), + col = c("#1B9E77", "#D95F02"), + lwd = 2) + +plot(data1_none, data2_none, pch = 19, col = "darkgreen", + xlab = "Value in dataset 1", ylab = "Value in dataset 2", + main = paste("R =", round(cor_none, 2))) +abline(lm(data2_none ~ data1_none), col = "black", lwd = 2, lty = 2) + +# ----------- Negative Correlation ----------- +plot(data1_neg, + type = "l", + col = "#1B9E77", + lwd = 2, + ylim = range(c(data1_neg, data2_neg)), + ylab = "Value", + xlab = "Time", + main = "Two trends with negative correlation") +lines(data2_neg, col = "#D95F02", lwd = 2) +legend("topleft", + legend = c("Data 1", "Data 2"), + col = c("#1B9E77", "#D95F02"), lwd = 2) + +plot(data1_neg, data2_neg, pch = 19, col = "darkgreen", + xlab = "Value in dataset 1", ylab = "Value in dataset 2", + main = paste("R =", round(cor_neg, 2))) +abline(lm(data2_neg ~ data1_neg), col = "black", lwd = 2, lty = 2) diff --git a/source/comp_indicators.Rmd b/source/comp_indicators.Rmd deleted file mode 100644 index 0cf29c4..0000000 --- a/source/comp_indicators.Rmd +++ /dev/null @@ -1,487 +0,0 @@ ---- -title: "B-cubed indicators for cube vs structured data" -author: "Emma Cartuyvels, Ward Langeraert, Toon Van Daele" -date: "2024-10-31" -output: html_document ---- - -# Introduction - -In this document we compare all B-cubed biodiversity indicators between actual cube data and data from structured monitoring. - -```{r setup, include=FALSE} -library(sf) -library(rgbif) -library(dplyr) -library(b3gbi) - -knitr::opts_chunk$set(echo = TRUE) -``` - -```{r} -data_path <- here::here("data") -``` - -To start we'll read in both the biodiversity and the ABV data directly from GBIF. - -```{r read data, eval=FALSE} -# nolint start -birdcubeflanders_year <- occ_download_sql( - user = Sys.getenv("USER"), - pwd = Sys.getenv("PSWD"), - email = Sys.getenv("MAIL"), - "SELECT - \"year\", - GBIF_EEARGCode( - 1000, - decimalLatitude, - decimalLongitude, - COALESCE(coordinateUncertaintyInMeters, 1000) - ) AS eeaCellCode, - speciesKey, - species, - family, - COUNT(*) AS n, - MIN(COALESCE(coordinateUncertaintyInMeters, 1000)) AS minCoordinateUncertaintyInMeters, - IF(ISNULL(family), NULL, SUM(COUNT(*)) OVER (PARTITION BY family)) AS familyCount - FROM - occurrence - WHERE - occurrenceStatus = 'PRESENT' - AND NOT ARRAY_CONTAINS(issue, 'ZERO_COORDINATE') - AND NOT ARRAY_CONTAINS(issue, 'COORDINATE_OUT_OF_RANGE') - AND NOT ARRAY_CONTAINS(issue, 'COORDINATE_INVALID') - AND NOT ARRAY_CONTAINS(issue, 'COUNTRY_COORDINATE_MISMATCH') - AND level1gid = 'BEL.2_1' - AND \"year\" >= 2007 - AND \"year\" <= 2022 - AND speciesKey IS NOT NULL - AND decimalLatitude IS NOT NULL - AND decimalLongitude IS NOT NULL - AND class = 'Aves' - AND collectionCode != 'ABV' - GROUP BY - \"year\", - eeaCellCode, - speciesKey, - family, - species - ORDER BY - \"year\" ASC, - eeaCellCode ASC, - speciesKey ASC" -) -# nolint end -``` - -```{r, eval=FALSE} -occ_download_wait(birdcubeflanders_year) - -birdcubeflanders <- occ_download_get(birdcubeflanders_year, - path = data_path) |> - occ_download_import() - -write.csv(birdcubeflanders, paste0(data_path, "./interim/birdcubeflanders.csv")) -``` - -```{r, eval=FALSE} -# nolint start -abv_data_down <- occ_download_sql( - user = Sys.getenv("USER"), - pwd = Sys.getenv("PSWD"), - email = Sys.getenv("MAIL"), - "SELECT - \"year\", - GBIF_EEARGCode( - 1000, - decimalLatitude, - decimalLongitude, - COALESCE(coordinateUncertaintyInMeters, 1000) - ) AS eeaCellCode, - speciesKey, - species, - family, - COUNT(*) AS n, - MIN(COALESCE(coordinateUncertaintyInMeters, 1000)) AS minCoordinateUncertaintyInMeters, - IF(ISNULL(family), NULL, SUM(COUNT(*)) OVER (PARTITION BY family)) AS familyCount - FROM - occurrence - WHERE - occurrenceStatus = 'PRESENT' - AND NOT ARRAY_CONTAINS(issue, 'ZERO_COORDINATE') - AND NOT ARRAY_CONTAINS(issue, 'COORDINATE_OUT_OF_RANGE') - AND NOT ARRAY_CONTAINS(issue, 'COORDINATE_INVALID') - AND NOT ARRAY_CONTAINS(issue, 'COUNTRY_COORDINATE_MISMATCH') - AND level1gid = 'BEL.2_1' - AND \"year\" >= 2007 - AND \"year\" <= 2022 - AND speciesKey IS NOT NULL - AND decimalLatitude IS NOT NULL - AND decimalLongitude IS NOT NULL - AND class = 'Aves' - AND collectionCode = 'ABV' - GROUP BY - \"year\", - eeaCellCode, - speciesKey, - family, - species - ORDER BY - \"year\" ASC, - eeaCellCode ASC, - speciesKey ASC" -) -# nolint end -``` - -```{r, eval=FALSE} -occ_download_wait(abv_data_down) - -abv_data <- occ_download_get(abv_data_down, - path = data_path) |> - occ_download_import() - -write.csv(abv_data, "../data/interim/abv_data.csv") -``` - -```{r} -birdcubeflanders <- read.csv("../data/interim/birdcubeflanders.csv") - -birdcube <- process_cube(birdcubeflanders, - cols_occurrences = "n") -``` - -```{r} -abv_data <- read.csv("../data/interim/abv_data.csv") - -abv <- process_cube(abv_data, - cols_occurrences = "n") -``` - -# General Biodiversity indicators - -## Observed richness - -### Map -```{r observed richness 1km map, fig.show="hold", out.width="50%"} -obs_richness_map_birdcube <- obs_richness_map(birdcube, cell_size = 1) -plot(obs_richness_map_birdcube) - -obs_richness_map_abv <- obs_richness_map(abv, cell_size = 1) -plot(obs_richness_map_abv) -``` - -```{r observed richness 10km map, fig.show="hold", out.width="50%"} -obs_richness_map_birdcube <- obs_richness_map(birdcube, cell_size = 10) -plot(obs_richness_map_birdcube) - -obs_richness_map_abv <- obs_richness_map(abv, cell_size = 10) -plot(obs_richness_map_abv) -``` -These maps, as expected, show clear differences in the species richness observed, we also see no similar pattern in where higher species richness is seen. - -### Time series - -```{r observed richness over time, fig.show="hold", out.width="50%"} -obs_richness_ts_birdcube <- obs_richness_ts(birdcube) -plot(obs_richness_ts_birdcube) - -obs_richness_ts_abv <- obs_richness_ts(abv) -plot(obs_richness_ts_abv) -``` -## Total occurrences - -### Map - -```{r total occurrences map, fig.show="hold", out.width="50%"} -total_occ_map_birdcube <- total_occ_map(birdcube, cell_size = 10) -plot(total_occ_map_birdcube) - -total_occ_map_abv <- total_occ_map(abv, cell_size = 10) -plot(total_occ_map_abv) -``` - -### Time series - -```{r total occurences time series, fig.show="hold", out.width="50%"} -total_occ_ts_birdcube <- total_occ_ts(birdcube) -plot(total_occ_ts_birdcube) - -total_occ_ts_abv <- total_occ_ts(abv) -plot(total_occ_ts_abv) -``` - -## Pielou evenness - -### Map - -```{r pielou evenness map, fig.show="hold", out.width="50%"} -pielou_evenness_map_birdcube <- pielou_evenness_map(birdcube, cell_size = 10) -plot(pielou_evenness_map_birdcube) - -pielou_evenness_map_abv <- pielou_evenness_map(abv, cell_size = 10) -plot(pielou_evenness_map_abv) -``` - -### Time series - -```{r pielou evenness time series, fig.show="hold", out.width="50%"} -pielou_evenness_ts_birdcube <- pielou_evenness_ts(birdcube) -plot(pielou_evenness_ts_birdcube) - -pielou_evenness_ts_abv <- pielou_evenness_ts(abv) -plot(pielou_evenness_ts_abv) -``` - -## Williams' Evenness - -### Map - -```{r williams evenness map, fig.show="hold", out.width="50%"} -williams_evenness_map_birdcube <- williams_evenness_map(birdcube, - cell_size = 10) -plot(williams_evenness_map_birdcube) - -williams_evenness_map_abv <- williams_evenness_map(abv, cell_size = 10) -plot(williams_evenness_map_abv) -``` - -### Time series - -```{r williams evenness time series, fig.show="hold", out.width="50%"} -williams_evenness_ts_birdcube <- williams_evenness_ts(birdcube) -plot(williams_evenness_ts_birdcube) - -williams_evenness_ts_abv <- williams_evenness_ts(abv) -plot(williams_evenness_ts_abv) -``` - -## Cumulative Species Richness - -### Time series - -```{r cumulative species richness over time, fig.show="hold", out.width="50%"} -cum_richness_ts_birdcube <- cum_richness_ts(birdcube) -plot(cum_richness_ts_birdcube) - -cum_richness_ts_abv <- cum_richness_ts(abv) -plot(cum_richness_ts_abv) -``` - -## Density of Occurrences - -### Map - -```{r density of occurences map, fig.show="hold", out.width="50%"} -occ_density_map_birdcube <- occ_density_map(birdcube, cell_size = 10) -plot(occ_density_map_birdcube) - -occ_density_map_abv <- occ_density_map(abv, cell_size = 10) -plot(occ_density_map_abv) -``` - -### Time series - -```{r density of occurences time series, fig.show="hold", out.width="50%"} -occ_density_ts_birdcube <- occ_density_ts(birdcube) -plot(occ_density_ts_birdcube) - -occ_density_ts_abv <- occ_density_ts(abv) -plot(occ_density_ts_abv) -``` - -## Abundance-Based Rarity - -### Map - -```{r abundance-based rarity map, fig.show="hold", out.width="50%"} -ab_rarity_map_birdcube <- ab_rarity_map(birdcube, cell_size = 10) -plot(ab_rarity_map_birdcube) - -ab_rarity_map_abv <- ab_rarity_map(abv, cell_size = 10) -plot(ab_rarity_map_abv) -``` - -### Time series - -```{r abundance-based rarity time series, fig.show="hold", out.width="50%"} -ab_rarity_ts_birdcube <- ab_rarity_ts(birdcube) -plot(ab_rarity_ts_birdcube) - -ab_rarity_ts_abv <- ab_rarity_ts(abv) -plot(ab_rarity_ts_abv) -``` - -## Area-Based Rarity - -### Map - -```{r area-based rarity map, fig.show="hold", out.width="50%"} -area_rarity_map_birdcube <- area_rarity_map(birdcube, cell_size = 10) -plot(area_rarity_map_birdcube) - -area_rarity_map_abv <- area_rarity_map(abv, cell_size = 10) -plot(area_rarity_map_abv) -``` - -### Time series - -```{r area-based rarity time series, fig.show="hold", out.width="50%"} -area_rarity_ts_birdcube <- area_rarity_ts(birdcube) -plot(area_rarity_ts_birdcube) - -area_rarity_ts_abv <- area_rarity_ts(abv) -plot(area_rarity_ts_abv) -``` - -## Mean Year of Occurrence - -### Map - -```{r newness map, fig.show="hold", out.width="50%"} -newness_map_birdcube <- newness_map(birdcube, cell_size = 10) -plot(newness_map_birdcube) - -newness_map_abv <- newness_map(abv, cell_size = 10) -plot(newness_map_abv) -``` - -### Time series - -```{r newness time series, fig.show="hold", out.width="50%"} -newness_ts_birdcube <- newness_ts(birdcube) -plot(newness_ts_birdcube) - -newness_ts_abv <- newness_ts(abv) -plot(newness_ts_abv) -``` - -## Taxonomic Distinctness - -### Map - -```{r taxonomic distinctness map, fig.show="hold", out.width="50%"} -tax_distinct_map_birdcube <- tax_distinct_map(birdcube, cell_size = 10, - check = TRUE) -plot(tax_distinct_map_birdcube) - -tax_distinct_map_abv <- tax_distinct_map(abv, cell_size = 10, check = TRUE) -plot(tax_distinct_map_abv) -``` - -### Time series - -```{r taxonomic distinctness time series, eval = FALSE, fig.show="hold", out.width="50%"} -tax_distinct_ts_birdcube <- tax_distinct_ts(birdcube) -plot(tax_distinct_ts_birdcube) - -tax_distinct_ts_abv <- tax_distinct_ts(abv, check = TRUE) -plot(tax_distinct_ts_abv) -``` - -## Species Richness (Estimated by Coverage-Based Rarefaction) - -### Map - -```{r species richness map, eval = FALSE, fig.show="hold", out.width="50%"} -hill0_map_birdcube <- hill0_map(birdcube, cell_size = 10) -plot(hill0_map_birdcube) - -hill0_map_abv <- hill0_map(abv, cell_size = 10) -plot(hill0_map_abv) -``` - -### Time series - -```{r species richness time series, fig.show="hold", out.width="50%"} -hill0_ts_birdcube <- hill0_ts(birdcube) -plot(hill0_ts_birdcube) - -hill0_ts_abv <- hill0_ts(abv) -plot(hill0_ts_abv) -``` - -## Hill-Shannon Diversity (Estimated by Coverage-Based Rarefaction) - -### Map - -```{r hill-shannnon diversity map, eval = FALSE, fig.show="hold", out.width="50%"} -hill1_map_birdcube <- hill1_map(birdcube, cell_size = 10) -plot(hill1_map_birdcube) - -hill1_map_abv <- hill1_map(abv, cell_size = 10) -plot(hill1_map_abv) -``` - -### Time series - -```{r hill-shannon diversity time series, fig.show="hold", out.width="50%"} -hill1_ts_birdcube <- hill1_ts(birdcube) -plot(hill1_ts_birdcube) - -hill1_ts_abv <- hill1_ts(abv) -plot(hill1_ts_abv) -``` - -## Hill-Simpson Diversity (Estimated by Coverage-Based Rarefaction) - -### Map - -```{r hill-simpson diversity map, eval=FALSE, fig.show="hold", out.width="50%"} -hill2_map_birdcube <- hill2_map(birdcube, cell_size = 10) -plot(hill2_map_birdcube) - -hill2_map_abv <- hill2_map(abv, cell_size = 10) -plot(hill2_map_abv) -``` - -### Time series - -```{r hill-simpson diversity time series, fig.show="hold", out.width="50%"} -hill2_ts_birdcube <- hill2_ts(birdcube) -plot(hill2_ts_birdcube) - -hill2_ts_abv <- hill2_ts(abv) -plot(hill2_ts_abv) -``` - -## Occupancy Turnover - -### Time series - -```{r occupancy turnover time series, fig.show="hold", out.width="50%"} -occ_turnover_ts_birdcube <- occ_turnover_ts(birdcube) -plot(occ_turnover_ts_birdcube) - -occ_turnover_ts_abv <- occ_turnover_ts(abv) -plot(occ_turnover_ts_abv) -``` - - -# Species-specific Biodiversity indicators - -To compare these indicators we selected three species: the extremely common ..., generalist ..., the rare species ... (work with tabs possibly) - -## Species occurrences - -## Species range - -Ferro and Flick (2015) find that you need 15 collections of a specimen to predict it's distribution. - -### Map - -```{r} -spec_range_map_birdcube <- spec_range_map(birdcube) -plot(spec_range_map_birdcube, species = "Luscinia svecica") -``` - -### Time series - -```{r} -spec_range_ts_birdcube <- spec_range_ts(birdcube) -plot(spec_range_ts_birdcube, species = "Luscinia svecica") -``` - - - - diff --git a/source/expl_analysis.Rmd b/source/expl_analysis.Rmd deleted file mode 100644 index 18321c8..0000000 --- a/source/expl_analysis.Rmd +++ /dev/null @@ -1,685 +0,0 @@ ---- -title: "Exploratory analysis" -author: "Emma Cartuyvels, Ward Langeraert, Toon Van Daele" -date: "2024-07-24" -output: - html_document: - code_folding: hide ---- - -In this document we explore the ABV data set, the cube data generated for birds in Flanders and if there is any indication that the occurrences in both datasets show similar trends. - -```{r setup, include=FALSE} -knitr::opts_chunk$set(echo = TRUE, warning = FALSE) - -library(sf) -library(lubridate) -library(mapview) -library(vcd) # For Cohen’s Kappa to measure Inter-Rater Reliability -library(vegan) # For Bray-Curtis dissimilarity and other ecological metrics -library(dplyr) -library(ggplot2) -library(tidyr) -library(stringr) -library(plotly) -library(ggpubr) -library(INBOtheme) - -conflicted::conflicts_prefer(dplyr::filter) -``` - -```{r data, cache=TRUE} -birdcubeflanders_year_sf <- read_sf(here::here("data", "interim", - "birdcubeflanders_year.gpkg")) - -abv_data_total_sf <- read_sf(here::here("data", "interim", - "abv_data_total.gpkg")) -``` - -We noticed some problems with species names: *Poecile montanus* and *Parus montanus*, *Dendrocopus major* and *Dendrocopos major* both refer to the same species. Since both species names are accepted names in GBIF we need to manually correct this (an issue was made for this with GBIF). *Saxicola torquatus* is most likely a wrong name and needs to be replaced with *Saxicola rubicola* (an issue was also opened for this with the data publisher of the ABV data). - -We summarise the ABV data per year and per km² so that it is comparable with the cube data. - -```{r transform abv, message=FALSE} -abv_data_total <- abv_data_total_sf |> - st_drop_geometry() |> - mutate(cyclus = case_when( - year >= 2007 & year <= 2009 ~ 1, - year >= 2010 & year <= 2012 ~ 2, - year >= 2013 & year <= 2015 ~ 3, - year >= 2016 & year <= 2018 ~ 4, - year >= 2019 & year <= 2021 ~ 5, - year >= 2022 & year <= 2024 ~ 6 - )) |> - mutate(species = case_when( - species == "Parus montanus" ~ "Poecile montanus", - species == "Dendrocopus major" ~ "Dendrocopos major", - species == "Saxicola torquatus" ~ "Saxicola rubicola", - TRUE ~ species - )) |> - group_by(species) |> - mutate(n_obs = n()) |> - ungroup() |> - mutate(category = cut(n_obs, - breaks = c(0, 10, 100, 1000, 10000, +Inf), - labels = c("Very rare", "Rare", "Common", - "Very common", "Extremely common"), - right = FALSE)) - -birdcubeflanders_year <- birdcubeflanders_year_sf |> - st_drop_geometry() |> - mutate(cyclus = case_when( - year >= 2007 & year <= 2009 ~ 1, - year >= 2010 & year <= 2012 ~ 2, - year >= 2013 & year <= 2015 ~ 3, - year >= 2016 & year <= 2018 ~ 4, - year >= 2019 & year <= 2021 ~ 5, - year >= 2022 & year <= 2024 ~ 6 - )) - -abv_data_total_tf <- abv_data_total |> - group_by(species, year, TAG, category) |> - summarise(n = sum(individualCount)) |> - ungroup() -``` - -# Introduction - -To do: assess data quality across spatial, temporal, and taxonomical dimensions - -## The ABV dataset - -The ABV dataset, which stands for Algemene Broedvogelmonitoring Vlaanderen (Common Breeding Bird Survey Flanders), is a structured monitoring dataset that tracks a group of approximately 100 common breeding bird species in Flanders, Belgium. Monitoring began in 2007 and the protocol involves selecting a random sample of 1200 UTM 1x1 km grid cells, stratified by land use. These cells are divided into groups of 300, and 300 grid cells are visited each year on a three-year rotation. Each grid cell contains six monitoring locations where bird counts are conducted. The data collection is standardized, with each grid cell being visited three times a year at fixed intervals (at least two weeks apart). - -```{r} -summary(abv_data_total[, c("individualCount", - "eventDate", - "year", - "month")]) -``` - -```{r} -abv_data_total |> - group_by(TAG) |> - summarise(n_visits = n_distinct(year)) |> - ggplot(aes(x = n_visits)) + - geom_histogram() -``` - -Out of the `r length(unique(abv_data_total$TAG))` visited km² over 150 were visited only once, while some were visited up to 13 times. This inconsistency in the number of visits is probably corrected for in the analysis of the ABV data, should we do the same? - -```{r} -abv_data_total |> - ggplot(aes(x = year)) + - geom_bar() + - scale_x_continuous(breaks = sort(unique(abv_data_total$year))) -``` - -We see that the total number of observations is relatively stable over the years. - -```{r, message=FALSE} -abv_data_total_tf |> - group_by(species) |> - summarise(n_obs = n()) |> - ggplot(aes(x = n_obs)) + - geom_histogram() + - labs(x = "Number of observations (grouped per km² and year)", - y = "Number of species") -``` - -There are 182 species present in the dataset. There are 32 species that were observed less than 10 times, 45 species that were observed more than 1000 times and 16 species that were observed more than 10 000 times. This dataset also contains absence data, which is not included/not present? in the cube. - -```{r} -abv_data_total |> - distinct(category, species) |> - count(category) |> - knitr::kable() -``` - -## The cube data - -The cube contains 2 011 808 observations. There are 666 species present in the data. 355 of these were observed less than a 100 times, 197 were observed more than 1000 times. More information can be found [here]( https://docs.b-cubed.eu/occurrence-cube/specification/#dimensions). - -The cube is made up of several datasets: - - - Waarnemingen.be - Bird occurrences in Flanders and the Brussels Capital Region, Belgium - - Watervogels - Wintering waterbirds in Flanders, Belgium - - HG_OOSTENDE - Herring gulls (Larus argentatus, Laridae) breeding at the southern North Sea coast (Belgium) - - EOD – eBird Observation Dataset - - Waarnemingen.be - Non-native animal occurrences in Flanders and the Brussels Capital Region, Belgium - - LBBG_ZEEBRUGGE - Lesser black-backed gulls (Larus fuscus, Laridae) breeding at the southern North Sea coast (Belgium and the Netherlands) - - Broedvogels - Atlas of the breeding birds in Flanders 2000-2002 - - European Seabirds At Sea (ESAS) - - And 80+ smaller datasets - -With the first dataset (waarnemingen.be) containing most of the observations (67%). For further analyses it is important to know that waarnemingen.be data was last published in 2019 and currently runs only to 31 December 2018. - -```{r} -birdcubeflanders_year |> - ggplot(aes(x = year)) + - geom_bar() + - scale_x_continuous(breaks = sort(unique(birdcubeflanders_year$year))) -``` - -We clearly see a big drop-off in the number of observations after 2018. - - -```{r, message=FALSE} -birdcubeflanders_year |> - count(species, name = "n_obs") |> - ggplot(aes(x = n_obs)) + - geom_histogram() + - labs(x = "Number of observations (grouped per km² and year)", - y = "Number of species") -``` - -```{r} -birdcubeflanders_year |> - count(species, name = "n_obs") |> - mutate(category = cut(n_obs, - breaks = c(-Inf, 0, 1, 10, 100, 1000, 10000, Inf), - right = FALSE)) |> - count(category) |> - knitr::kable() -``` - -### Filter cube for specific ABV squares and years - -```{r} -utm_year <- abv_data_total |> - st_drop_geometry() |> - distinct(TAG, year) -``` - -```{r} -filt_birdcube <- utm_year |> - left_join(birdcubeflanders_year, by = c("TAG", "year")) -``` - -```{r} -filt_birdcube |> - count(species, name = "n_obs") |> - mutate(category = cut(n_obs, - breaks = c(-Inf, 0, 1, 10, 100, 1000, 10000, Inf), - right = FALSE)) |> - count(category) |> - knitr::kable() -``` - - -# Comparing the data - -```{r} -studied_spec <- unique(abv_data_total$species) |> - na.omit() -``` - -Let's check if these species are observed in the same UTM squares for the full period. Let's make this a function depending on period and species that gives us the percentage of squares. - -```{r} -range_comp <- function(sel_species, period = 2007:2022, - dataset1 = abv_data_total, - dataset2 = birdcubeflanders_year) { - - # We filter both datasets for the species and period of interest - # and group them by TAG (identifier of utm square) - set_abv <- dataset1 |> - st_drop_geometry() |> - filter(.data$species %in% sel_species, - .data$year %in% period, - .data$individualCount > 0) |> - group_by(.data$TAG) |> - summarise(n = sum(.data$individualCount)) - - set_cube <- dataset2 |> - st_drop_geometry() |> - filter(.data$species %in% sel_species, - .data$year %in% period) |> - group_by(.data$TAG) |> - summarise(n = sum(.data$n)) - - total_abv <- length(set_abv$TAG) - perc_abv <- (total_abv / length(unique(dataset1$TAG))) * 100 - - total_cube <- length(set_cube$TAG) - perc_cube <- (total_cube / length(unique(dataset2$TAG))) * 100 - - overlap_all_abv_cube <- length( - which(set_cube$TAG %in% unique(abv_data_total$TAG)) - ) - perc_overlap_all <- ( - overlap_all_abv_cube / length(unique(dataset1$TAG))) * 100 - - total_overlap <- length(which(set_cube$TAG %in% set_abv$TAG)) - perc <- (total_overlap / total_abv) * 100 - - list(total_abv, perc_abv, - total_cube, perc_cube, - overlap_all_abv_cube, perc_overlap_all, - total_overlap, perc) -} - -``` - -```{r, cache=TRUE} -comp_range_data <- as.data.frame(studied_spec) -comp_range_data$abv_squares <- NA -comp_range_data$perc_abv_total_abv <- NA -comp_range_data$cube_squares <- NA -comp_range_data$perc_cube_total_cube <- NA -comp_range_data$overlap_birdcube_total_abv <- NA -comp_range_data$perc_birdcube_total_abv <- NA -comp_range_data$overlap_birdcube_spec_abv <- NA -comp_range_data$percentage_birdcube_spec_abv <- NA - -for (i in studied_spec){ - test <- range_comp(i, period = 2007:2018) - - comp_range_data[comp_range_data$studied_spec == i, 2] <- test[1] - comp_range_data[comp_range_data$studied_spec == i, 3] <- test[2] - comp_range_data[comp_range_data$studied_spec == i, 4] <- test[3] - comp_range_data[comp_range_data$studied_spec == i, 5] <- test[4] - comp_range_data[comp_range_data$studied_spec == i, 6] <- test[5] - comp_range_data[comp_range_data$studied_spec == i, 7] <- test[6] - comp_range_data[comp_range_data$studied_spec == i, 8] <- test[7] - comp_range_data[comp_range_data$studied_spec == i, 9] <- test[8] -} - -``` - -```{r} -comp_range_data |> - inner_join(abv_data_total |> distinct(species, category), - by = join_by(studied_spec == species)) |> - DT::datatable() |> - DT::formatRound(columns = c("perc_abv_total_abv", - "perc_cube_total_cube", - "perc_birdcube_total_abv", - "percentage_birdcube_spec_abv"), digits = 2) -``` - -This table shows the number of ABV squares were a species was observed, the percentage of total ABV squares, the number of cube squares were the species was observed, the percentage of all cube squares, the number of ABV squares were the species was observed based on the birdcube data, the percentage compared to all ABV squares, the number of squares occupied by the species in both the ABV and birdcube data and the percentage of this compared to the number of squares occupied by this species in the ABV data. - -Overall we see an overlap of `r round(mean(comp_range_data$percentage_birdcube_spec_abv, na.rm = TRUE), digits = 3)`. - -```{r, message=FALSE} -comp_range_data |> - inner_join(abv_data_total |> distinct(species, category), - by = join_by(studied_spec == species)) |> - ggplot(aes(x = perc_abv_total_abv, y = perc_birdcube_total_abv, - color = category)) + - geom_point() + - stat_cor(mapping = aes(color = NULL), - label.x.npc = "centre", - label.y.npc = "bottom", - method = "pearson") + - labs(x = "Percentage of ABV squares occupied\nby species in ABV dataset", - y = "Percentage of ABV squares occupied\nby species in cube dataset") -``` - -If we look at the graph it appears that for the subset of ABV squares the number of squares in which a species is observed in the ABV is correlated to the number of squares in which a species is observed in the cube. - -```{r, message=FALSE} -comp_range_data |> - inner_join(abv_data_total |> distinct(species, category), - by = join_by(studied_spec == species)) |> - ggplot(aes(x = perc_abv_total_abv, y = perc_cube_total_cube, - color = category)) + - geom_point() + - stat_cor(mapping = aes(color = NULL), - label.x.npc = "centre", - label.y.npc = "bottom", - method = "pearson") + - labs(x = "Percentage of ABV squares occupied\nby species in ABV dataset", - y = "Percentage of cube squares occupied\nby species in cube dataset") -``` - -```{r, message=FALSE} -comp_range_data |> - inner_join(abv_data_total |> distinct(species, category), - by = join_by(studied_spec == species)) |> - plot_ly(x = ~perc_abv_total_abv, - y = ~perc_cube_total_cube, - color = ~category, - text = ~studied_spec) |> - plotly::layout( - xaxis = list( - title = "Percentage of ABV squares occupied\nby species in ABV dataset"), - yaxis = list( - title = "Percentage of cube squares occupied\nby species in cube dataset") - ) -``` - -If we look at the graph it appears that the number of squares in which a species is observed in the ABV is correlated to the number of squares in which a species is observed in the cube. The outliers appear to be either gulls or invasive species, seeming to indicate an effect of specific datasets. - -```{r} -comp_range_data |> - inner_join(abv_data_total |> distinct(species, category), - by = join_by(studied_spec == species)) |> - filter(category %in% c("Common", "Very common", "Extremely common")) |> - ggplot(aes(x = perc_abv_total_abv, y = perc_cube_total_cube, - color = category)) + - geom_point() + - stat_cor(mapping = aes(color = NULL), - label.x.npc = "centre", - label.y.npc = "bottom", - method = "pearson") + - labs(x = "Percentage of ABV squares occupied\nby species in ABV dataset", - y = "Percentage of cube squares occupied\nby species in cube dataset") -``` - -```{r} -comp_range_data2 <- data.frame(studied_spec = rep(studied_spec, 4), - abv_squares = NA, - perc_abv_total_abv = NA, - cube_squares = NA, - perc_cube_total_cube = NA, - cyclus = NA) - -start_year <- 2007 -end_year <- 2018 - -cycle_starts <- seq(from = start_year, to = end_year, by = 3) -c <- 1 -j <- 1 - -for (cycle_start in cycle_starts) { - for (i in studied_spec) { - comp_range_data2$cyclus[j] <- c - comp_range_data2$studied_spec[j] <- i - - test <- range_comp(i, period = cycle_start:(cycle_start + 2)) - - comp_range_data2$abv_squares[j] <- test[[1]] - comp_range_data2$perc_abv_total_abv[j] <- test[[2]] - comp_range_data2$cube_squares[j] <- test[[3]] - comp_range_data2$perc_cube_total_cube[j] <- test[[4]] - - j <- j + 1 - } - c <- c + 1 -} -``` - -```{r, message=FALSE} -comp_range_data2 |> - inner_join(abv_data_total |> distinct(species, category), - by = join_by(studied_spec == species)) |> - ggplot(aes(x = perc_abv_total_abv, y = perc_cube_total_cube, - color = category)) + - geom_point() + - stat_cor(mapping = aes(color = NULL), - label.x.npc = "centre", - label.y.npc = "bottom", - method = "pearson") + - facet_grid("cyclus", - scales = "free_y") + - labs(x = "Percentage of ABV squares occupied\nby species in ABV dataset", - y = "Percentage of cube squares occupied\nby species in cube dataset") -``` - -This graph shows the same figure as above but split for each full cycle of ABV observations. - -# 1. Trend Analysis -## Correlation of time series of species occurrences - -```{r Correlation of time series per year of species occurrences, message=FALSE} -time_series_1 <- abv_data_total |> - st_drop_geometry() %>% - group_by(species, year) %>% - summarize(occurrence = sum(occurrenceStatus == "PRESENT")) - -time_series_2 <- birdcubeflanders_year |> - st_drop_geometry() |> - group_by(species, year) |> - summarize(occurrence = n()) - -# Pearson Correlation for each species -# inner_join makes sure that only species-year combinations present -# in both datasets are included -time_series_cor <- time_series_1 %>% - inner_join(time_series_2, - by = c("species", "year"), - suffix = c("_1", "_2")) %>% - group_by(species) %>% - summarize(correlation = cor(occurrence_1, occurrence_2, method = "pearson")) -``` - -```{r} -DT::datatable(time_series_cor) |> - DT::formatRound(columns = "correlation", digits = 2) -``` - -```{r Correlation of time series per cyclus of species occurrences, message=FALSE} -time_series_1 <- abv_data_total |> - st_drop_geometry() %>% - group_by(species, cyclus) %>% - summarize(occurrence = sum(occurrenceStatus == "PRESENT")) |> - filter(cyclus < 5) - -time_series_2 <- birdcubeflanders_year |> - st_drop_geometry() |> - group_by(species, cyclus) |> - summarize(occurrence = n()) |> - filter(cyclus < 5) - -# Pearson Correlation for each species -# inner_join makes sure that only species-year combinations present -# in both datasets are included -time_series_cor <- time_series_1 %>% - inner_join(time_series_2, - by = c("species", "cyclus"), - suffix = c("_1", "_2")) %>% - group_by(species) %>% - summarize(correlation = cor(occurrence_1, occurrence_2, method = "pearson")) -``` - -```{r} -DT::datatable(time_series_cor) |> - DT::formatRound(columns = "correlation", digits = 2) -``` - -```{r Correlation of time series per cyclus of species numbers, message=FALSE} -time_series_1 <- abv_data_total |> - st_drop_geometry() %>% - group_by(species, cyclus) %>% - summarize(abundance = sum(individualCount)) |> - filter(cyclus < 5) - -time_series_2 <- birdcubeflanders_year |> - st_drop_geometry() |> - group_by(species, cyclus) |> - summarize(abundance = sum((n))) |> - filter(cyclus < 5) - -# Pearson Correlation for each species -# inner_join makes sure that only species-year combinations present -# in both datasets are included -time_series_cor <- time_series_1 %>% - inner_join(time_series_2, - by = c("species", "cyclus"), - suffix = c("_1", "_2")) %>% - group_by(species) %>% - summarize(correlation = cor(abundance_1, abundance_2, method = "pearson")) -``` - -```{r} -time_series_cor |> - inner_join(abv_data_total |> distinct(species, category), - by = join_by(species == species)) |> - DT::datatable() |> - DT::formatRound(columns = "correlation", digits = 2) -``` - -```{r} -time_series_cor |> - inner_join(abv_data_total |> distinct(species, category), - by = join_by(species == species)) |> - summarize("mean correlation" = mean(correlation, na.rm = TRUE), - .by = category) %>% - knitr::kable(digits = 3) -``` - - -## Trend similarity - -```{r, message=FALSE} -abv_dif <- abv_data_total |> - group_by(cyclus, species) |> - summarise(total = sum(individualCount)) |> - pivot_wider(names_from = cyclus, - names_prefix = "abv_", - values_from = total, - values_fill = 0) |> - mutate(dif1 = abv_2 - abv_1, - dif2 = abv_3 - abv_2, - dif3 = abv_4 - abv_3) - -cube_dif <- birdcubeflanders_year |> - filter(species %in% studied_spec) |> - group_by(cyclus, species) |> - summarise(total = sum(n)) |> - pivot_wider(names_from = cyclus, - names_prefix = "cube_", - values_from = total, - values_fill = 0) |> - mutate(dif1_cube = cube_2 - cube_1, - dif2_cube = cube_3 - cube_2, - dif3_cube = cube_4 - cube_3) |> - select(species, dif1_cube, dif2_cube, dif3_cube) - -comp_dir <- abv_dif |> - select(species, dif1, dif2, dif3) |> - inner_join(cube_dif, by = join_by(species)) |> - mutate(dif1 = dif1 > 0, - dif2 = dif2 > 0, - dif3 = dif3 > 0, - dif1_cube = dif1_cube > 0, - dif2_cube = dif2_cube > 0, - dif3_cube = dif3_cube > 0) |> - pivot_longer( - cols = !species - ) |> - mutate(set = ifelse(str_detect(name, "cube"), - "cube", - "abv")) |> - mutate(dif = str_sub(name, 1, 4)) |> - select(-name) |> - pivot_wider(names_from = set, - values_from = value) - - -Kappa(table(comp_dir[, c(3, 4)])) -``` - -Value of k | Strength of agreement -------- | -------- -< 0 | Poor -0.01 - 0.20 | Slight -0.21 - 0.40 | Fair -0.41 - 0.60 | Moderate -0.61 - 0.80 | Substantial -0.81 - 1.00 | Almost perfect - -```{r Kappa for common species, message=FALSE} -abv_dif <- abv_data_total |> - filter(category %in% c("Rare")) |> - group_by(cyclus, species) |> - summarise(total = sum(individualCount)) |> - pivot_wider(names_from = cyclus, - names_prefix = "abv_", - values_from = total, - values_fill = 0) |> - mutate(dif1 = abv_2 - abv_1, - dif2 = abv_3 - abv_2, - dif3 = abv_4 - abv_3) - -cube_dif <- birdcubeflanders_year |> - filter(species %in% abv_dif$species) |> - group_by(cyclus, species) |> - summarise(total = sum(n)) |> - pivot_wider(names_from = cyclus, - names_prefix = "cube_", - values_from = total, - values_fill = 0) |> - mutate(dif1_cube = cube_2 - cube_1, - dif2_cube = cube_3 - cube_2, - dif3_cube = cube_4 - cube_3) |> - select(species, dif1_cube, dif2_cube, dif3_cube) - -comp_dir <- abv_dif |> - select(species, dif1, dif2, dif3) |> - inner_join(cube_dif, by = join_by(species)) |> - mutate(dif1 = dif1 > 0, - dif2 = dif2 > 0, - dif3 = dif3 > 0, - dif1_cube = dif1_cube > 0, - dif2_cube = dif2_cube > 0, - dif3_cube = dif3_cube > 0) |> - pivot_longer( - cols = !species - ) |> - mutate(set = ifelse(str_detect(name, "cube"), - "cube", - "abv")) |> - mutate(dif = str_sub(name, 1, 4)) |> - select(-name) |> - pivot_wider(names_from = set, - values_from = value) - - -Kappa(table(comp_dir[, c(3, 4)])) -``` - -# 2. Occupancy Rate Comparison - -Compare the occupancy rate (percentage of km² where a species is present) between the two datasets for each species. - -all abv squares 936 -all birdcube squares 13596 - -Kappa is not a good measure for comparing two discrete continuous variables, better to use this later when comparing categories, i.e. increase, decrease, ... - -```{r, message=FALSE} -occupancy_1 <- abv_data_total %>% - group_by(species, TAG) %>% - summarize(occupancy_rate_1 = mean(occurrenceStatus == "PRESENT")) - -occupancy_2 <- birdcubeflanders_year %>% - group_by(species) %>% - summarize(occupancy_rate_2 = mean(n())) - -# use Kappa from vcd package -``` - -# 3. Species Richness and Composition - -```{r} -# Species richness per dataset -richness_1 <- abv_data_total |> - group_by(TAG) |> - summarize(richness = n_distinct(species)) - -richness_2 <- birdcubeflanders_year |> - group_by(TAG) |> - summarize(richness = n_distinct(species)) - -# Bray-Curtis dissimilarity -species_composition_1 <- abv_data_total |> - drop_na(species) |> - count(species) |> - pivot_wider(names_from = species, - values_from = n, - values_fill = 0) - -species_composition_2 <- birdcubeflanders_year |> - filter(species %in% studied_spec) |> - count(species) |> - pivot_wider(names_from = species, - values_from = n, - values_fill = 0) - -bray_curtis <- vegdist(rbind(species_composition_1[-1], - species_composition_2[-1]), method = "bray") -bray_curtis -``` diff --git a/source/pipelines/biodiversity_indicators/R/read_data.R b/source/pipelines/biodiversity_indicators/R/read_data.R index 1d0053e..be181d3 100644 --- a/source/pipelines/biodiversity_indicators/R/read_data.R +++ b/source/pipelines/biodiversity_indicators/R/read_data.R @@ -1,3 +1,90 @@ -path_to_interim <- function(path_to_data, file) { +path_to_interim <- function(path_to_data, dataset, spat_res) { + file <- paste0(dataset, "_cube_", spat_res, ".csv") file.path(path_to_data, "interim", file) } + +read_andid <- function(data_file, dataset, spat_res) { + require("dplyr") + + data <- read.csv(data_file) + + output <- data |> + mutate(id_dataset = dataset, + id_spat_res = spat_res) + + return(output) +} + +add_cyclus <- function(data) { + require("dplyr") + + output <- data |> + mutate(cyclus = case_when( + year >= 2007 & year <= 2009 ~ 1, + year >= 2010 & year <= 2012 ~ 2, + year >= 2013 & year <= 2015 ~ 3, + year >= 2016 & year <= 2018 ~ 4, + year >= 2019 & year <= 2021 ~ 5, + year >= 2022 & year <= 2024 ~ 6 + )) + + return(output) +} + + +filter_1 <- function(data) { + require("dplyr") + + abv_birds <- read.csv("./data/interim/abv_birds.csv") + + output <- data |> + filter(.data$species %in% abv_birds$species) + + return(output) +} + +#' Rules (loosely based on ABV): +#' 1) A square is only relevant is the species was observed in +#' more than one time period +#' 2) A minimum of three relevant squares to include the species +#' 3) A minimum of a hundred observations to include the species + +filter_2 <- function(data, time_period = "year") { + require("dplyr") + + output <- data |> + group_by(.data$mgrscode, .data$species) |> + mutate(periods = n_distinct(!!sym(time_period))) |> + ungroup() |> + filter(.data$periods > 1) |> + group_by(.data$species) |> + mutate(squares = n_distinct(.data$mgrscode)) |> + ungroup() |> + filter(.data$squares > 2) |> + group_by(.data$species) |> + mutate(obs = n()) |> + ungroup() |> + filter(.data$obs > 100) |> + mutate(id_filter_per = time_period) + + return(output) +} + +filter_3 <- function(data, time_period = "year") { + require("dplyr") + + output <- data |> + group_by(.data$id_dataset, + .data$id_spat_res, + .data$species, + !!sym(time_period)) |> + summarise(n = sum(n)) |> + ungroup() |> + group_by(!!sym(time_period)) |> + mutate(total_obs = sum(n)) |> + ungroup() |> + mutate(n = .data$n / .data$total_obs) |> + mutate(id_filter_per = time_period) + + return(output) +} diff --git a/source/pipelines/biodiversity_indicators/_targets.R b/source/pipelines/biodiversity_indicators/_targets.R index 306b0f2..8fca61f 100644 --- a/source/pipelines/biodiversity_indicators/_targets.R +++ b/source/pipelines/biodiversity_indicators/_targets.R @@ -3,7 +3,7 @@ library(targets) # Set target options: tar_option_set( - packages = c("b3gbi"), + packages = c("b3gbi", "tidyverse"), format = "qs" # Optionally set the default storage format. qs is fast. ) @@ -21,121 +21,111 @@ tar_config_set( "_targets/"), config = "_targets.yaml", project = "biodiversity_indicators", - use_crew = TRUE) + use_crew = TRUE +) -# Run the R scripts in the R/ folder with your custom functions: +# Run the R scripts in the R/ folder with our custom functions: tar_source(file.path(targets_project_dir, "biodiversity_indicators", "R")) -# Replace the target list below with your own: +# The target list: list( - tarchetypes::tar_file( - abv_data_file, - path_to_interim(path_to_data = path_to_data, file = "abv_data.csv") - ), - tar_target( - abv_data, - read.csv(abv_data_file) - ), - tar_target( - abv, - process_cube(abv_data, - cols_occurrences = "n") - ), - tarchetypes::tar_file( - birdcube_data_file, - path_to_interim(path_to_data = path_to_data, file = "birdcubeflanders.csv") - ), tar_target( - birdcube_data, - read.csv(birdcube_data_file) + time_period, + c("year", "cyclus") ), tar_target( - birdcube, - process_cube(birdcube_data, - cols_occurrences = "n") + spat_res, + c("1km", "10km") ), tar_target( - obs_richness_map_abv_1, - obs_richness_map(abv, cell_size = 1) + dataset, + c("abv_data", "birdflanders") ), - tar_target( - obs_richness_map_cube_1, - obs_richness_map(birdcube, cell_size = 1) - ), - tar_target( - obs_richness_map_abv_10, - obs_richness_map(abv, cell_size = 10) - ), - tar_target( - obs_richness_map_cube_10, - obs_richness_map(birdcube, cell_size = 10) - ), - tar_target( - obs_richness_ts_abv, - obs_richness_ts(abv) + tarchetypes::tar_file( + data_file, + path_to_interim(path_to_data = path_to_data, + dataset = dataset, + spat_res = spat_res), + pattern = cross(dataset, spat_res) ), tar_target( - obs_richness_ts_cube, - obs_richness_ts(birdcube) + data_int1, + read_andid(data_file, dataset, spat_res), + pattern = map(data_file, cross(dataset, spat_res)) ), tar_target( - total_occ_map_abv, - total_occ_map(abv, cell_size = 10) + data, + add_cyclus(data_int1), + pattern = map(data_int1) ), tar_target( - total_occ_map_cube, - total_occ_map(birdcube, cell_size = 10) + filter1, + filter_1(data), + pattern = map(data) ), tar_target( - total_occ_ts_abv, - total_occ_ts(abv) + filter2, + filter_2(data, time_period), + pattern = cross(data, time_period) ), tar_target( - total_occ_ts_cube, - total_occ_ts(birdcube) + filter3, + filter_3(data, time_period), + pattern = cross(data, time_period) ), tar_target( - pielou_evenness_map_abv, - pielou_evenness_map(abv, cell_size = 10) + data_cubes, + process_cube(data, + cols_occurrences = "n"), + pattern = map(data) ), tar_target( - pielou_evenness_map_cube, - pielou_evenness_map(birdcube, cell_size = 10) + obs_richness_map, + obs_richness_map(data, cell_size = 10), + pattern = map(data) ), tar_target( - pielou_evenness_ts_cube, - pielou_evenness_ts(birdcube) + obs_richness_ts, + obs_richness_ts(data), + pattern = map(data) ), tar_target( - spec_occ_map_abv, - spec_occ_map(abv) + total_occ_map, + total_occ_map(data, cell_size = 10), + pattern = map(data) ), tar_target( - spec_occ_map_cube, - spec_occ_map(birdcube) + total_occ_ts, + total_occ_ts(data), + pattern = map(data) ), tar_target( - spec_occ_ts_abv, - spec_occ_ts(abv) + pielou_evenness_map, + pielou_evenness_map(data, cell_size = 10), + pattern = map(data) ), tar_target( - spec_occ_ts_cube, - spec_occ_ts(birdcube) + pielou_evenness_ts, + pielou_evenness_ts(data), + pattern = map(data) ), tar_target( - spec_range_map_abv, - spec_range_map(abv) + spec_occ_map, + spec_occ_map(data), + pattern = map(data) ), tar_target( - spec_range_map_cube, - spec_range_map(birdcube) + spec_occ_ts, + spec_occ_ts(data), + pattern = map(data) ), tar_target( - spec_range_ts_abv, - spec_range_ts(abv) + spec_range_map, + spec_range_map(data) ), tar_target( - spec_range_ts_cube, - spec_range_ts(birdcube) + spec_range_ts, + spec_range_ts(data), + pattern = map(data) ) ) diff --git a/source/pipelines/biodiversity_indicators/_targets/meta/meta b/source/pipelines/biodiversity_indicators/_targets/meta/meta index eff531a..63e919d 100644 --- a/source/pipelines/biodiversity_indicators/_targets/meta/meta +++ b/source/pipelines/biodiversity_indicators/_targets/meta/meta @@ -4,33 +4,57 @@ read_abv_gbif|function|0f973c90384d7a5f||||||||||||||| get_data|function|3341b9af0729302c||||||||||||||| obs_richness_map_abv|stem|c674139a986d3be1|f97003405992072d|f7144d68646b34bb|544418752||t20137.5632771888s|s145198b|145198|qs|local|vector|||17.78|| pielou_evenness_ts_abv|stem|a2e48c996ea6b50b|d315fc72cb502da2|f7144d68646b34bb|1463916421||t20140.3442884191s|s2977b|2977|qs|local|vector|||6.25|| -abv_data_file|stem|527d35ac7669ff2c|b6acb50d3d39ac90|89e666f628adf0f2|-226531251|C:/R/git_repositories/comp-unstructured-data/data/interim/abv_data.csv|t20148.7027485294s|s10444864b|10444864|file|local|vector|||0.72|| -birdcube_data_file|stem|519c1c07bacce2ea|48a9ff406ed7d0bc|89e666f628adf0f2|1618870224|C:/R/git_repositories/comp-unstructured-data/data/interim/birdcubeflanders.csv|t20148.6974539314s|s238113817b|238113817|file|local|vector|||0|| -abv_data|stem|b480912d37d3df6c|e229d1a198614366|d57bebce74690f1d|-790396526||t20148.7030385626s|s734920b|734920|qs|local|vector|||0.36|| -birdcube_data|stem|dbc57af89615930d|c219f922643095f8|fce17d6d9fbc08c0|804043826||t20148.7031682901s|s12335314b|12335314|qs|local|vector|||10.55|| -abv|stem|5f686d3ec523e2b9|f5ad0c893737fd7c|14ed9db9b45c9c17|1730016550||t20148.7031815535s|s892668b|892668|qs|local|vector|||1.07|| -birdcube|stem|525ff548687311f9|c6a94986654dd8a8|e37ed227da3f0d82|803444483||t20148.7033767325s|s19947156b|19947156|qs|local|vector|||15.87|| -spec_range_map_abv|stem|430c6c7111b831a4|35ed1551765724ea|0beb31fed06a2cff|565055131||t20148.7034706916s|s73356b|73356|qs|local|vector|||8.08|| -spec_range_ts_abv|stem|c612df97d7866008|b8ec36b075ba8929|0beb31fed06a2cff|212098582||t20148.7044819747s|s43034b|43034|qs|local|vector|||87.36|| -obs_richness_map_abv_1|stem|6a1b0136ccf8544d|f97003405992072d|0beb31fed06a2cff|-1849121129||t20148.70473719s|s145112b|145112|qs|local|vector|||22.01|| -spec_occ_ts_abv|stem|523e2ea4f5c7f53a|50e92be5970898cb|0beb31fed06a2cff|187662183||t20148.7057596212s|s50987b|50987|qs|local|vector|||88.33|| -total_occ_ts_abv|stem|a0e21bd2ab55ebc5|b8a82c3c5a65f3b7|0beb31fed06a2cff|-1571972827||t20148.7059303702s|s2927b|2927|qs|local|vector|||14.73|| -pielou_evenness_map_abv|stem|b744a6c7761236f0|a800a80fe2e65de3|0beb31fed06a2cff|10455673||t20148.705980857s|s12855b|12855|qs|local|vector|||4.34|| -obs_richness_map_abv_10|stem|f8293f6ba6236a08|c32e1c894e01b54f|0beb31fed06a2cff|-2121955916||t20148.7060292033s|s11761b|11761|qs|local|vector|||4.17|| -obs_richness_ts_abv|stem|36c2af1ea14027e3|0e1e13ab8edf2b2e|0beb31fed06a2cff|-1906844265||t20148.7060780799s|s2515b|2515|qs|local|vector|||4.22|Bootstrapped confidence intervals cannot be calculated for the chosen indicator.| -total_occ_map_abv|stem|c91cdb87695f8b61|37bf422ad4f6ba1e|0beb31fed06a2cff|2144595936||t20148.7061280773s|s12077b|12077|qs|local|vector|||4.3|| -spec_occ_map_abv|stem|74530206571cff64|1c6c3070fe27f0f9|0beb31fed06a2cff|1040382643||t20148.7061883524s|s94259b|94259|qs|local|vector|||5.18|| -spec_occ_map_cube|stem|28530d44bf976352|2923183735c9a153|a5c31b662884b2bb|-1523928707||t20148.7069280832s|s358297b|358297|qs|local|vector|||63.85|| -spec_range_map_cube|stem|976c9922669aebb5|06c24be89a7622f3|a5c31b662884b2bb|1805158784||t20148.7076471583s|s272603b|272603|qs|local|vector|||62.08|| -obs_richness_ts_cube|stem|1da223133056ce2b|52c081f2d4708827|a5c31b662884b2bb|-109269857||t20148.7083309265s|s7144b|7144|qs|local|vector|||59.06|Bootstrapped confidence intervals cannot be calculated for the chosen indicator.| -spec_range_ts_cube|stem|6c83010cb134e752|7eee1844a6ef902f|a5c31b662884b2bb|1980473897||t20148.7162639453s|s121432b|121432|qs|local|vector|||685.39|| -obs_richness_map_cube_10|stem|dee458f23e469676|e2419c1564d217a1|a5c31b662884b2bb|-677520515||t20148.7168106236s|s17281b|17281|qs|local|vector|||47.22|| -total_occ_ts_cube|stem|7e9ed41e6af6a6c6|8d72fc3bd535dd65|a5c31b662884b2bb|1560660880||t20148.7194517961s|s7573b|7573|qs|local|vector|||228.19|| -total_occ_map_cube|stem|d821b4ccf042456d|9d28fc231c116a65|a5c31b662884b2bb|301642794||t20148.7199878716s|s17723b|17723|qs|local|vector|||46.31|| -pielou_evenness_map_cube|stem|ad77e6d4da619406|86405f793fb7b14a|a5c31b662884b2bb|1936146881||t20148.7205271543s|s18484b|18484|qs|local|vector|||46.58|| -pielou_evenness_ts_cube|stem|50bd767cd851c70d|fab57ea502389048|a5c31b662884b2bb|163671844||t20148.7210688536s|s7647b|7647|qs|local|vector|||46.8|| -spec_occ_ts_cube|stem|068cd37666dc5abc|7a477387f00a1819|a5c31b662884b2bb|-1997763936||t20148.727323413s|s148416b|148416|qs|local|vector|||540.39|| -obs_richness_map_cube_1|stem|89d23ee87b4a659e|66f1d7db7aad6b78|a5c31b662884b2bb|-1405350137||t20148.7771409072s|s184803b|184803|qs|local|vector|||64.83|| -path_to_interim|function|36603da5829a6104 +abv_data_file|stem|7611b266957a43b6|473f1bc40fd02efd|89e666f628adf0f2|-226531251|C:/R/git_repositories/comp-unstructured-data/data/interim/abv_data_cube.csv|t20158.5921368483s|s8382131b|8382131|file|local|vector|||0.25|| +birdcube_data_file|stem|8d890e239bd6e3da|48a9ff406ed7d0bc|89e666f628adf0f2|1618870224|C:/R/git_repositories/comp-unstructured-data/data/interim/birdcubeflanders.csv|t20158.5921762536s|s131093150b|131093150|file|local|vector|||0|| +abv_data|stem|ffc4acff06355bdc|e229d1a198614366|439ce68bf7ab8f45|-790396526||t20158.6339841969s|s724496b|724496|qs|local|vector|||0.28|| +birdcube_data|stem|dcb5751a10150d70|c219f922643095f8|ddae7105d9c344f0|804043826||t20158.6340435949s|s11929908b|11929908|qs|local|vector|||4.66|| +abv|stem|dde4ca7fc6bc205c|f5ad0c893737fd7c|f37040db9a9619e2|1730016550||t20158.6340554601s|s935058b|935058|qs|local|vector|||0.96|| +birdcube|stem|55abd30bbf557d06|c6a94986654dd8a8|b466598f23a6fea5|803444483||t20158.634203875s|s16711186b|16711186|qs|local|vector|||12.19|| +spec_range_map_abv|stem|e8708c53f8de34f5|35ed1551765724ea|48693b457575dc57|565055131||t20158.6342801921s|s72576b|72576|qs|local|vector|||6.56|| +spec_range_ts_abv|stem|c89bcb9ac832dc45|b8ec36b075ba8929|48693b457575dc57|212098582||t20158.6350572053s|s42928b|42928|qs|local|vector|||67.13|| +obs_richness_map_abv_1|stem|14bfe0df9039322c|f97003405992072d|48693b457575dc57|-1849121129||t20158.6352988397s|s173370b|173370|qs|local|vector|||20.86|| +spec_occ_ts_abv|stem|ee0b2ec04db81045|50e92be5970898cb|48693b457575dc57|187662183||t20158.6361699466s|s50860b|50860|qs|local|vector|||75.25|| +total_occ_ts_abv|stem|15a56f3d64537fb6|b8a82c3c5a65f3b7|48693b457575dc57|-1571972827||t20158.6363018328s|s2920b|2920|qs|local|vector|||11.39|| +pielou_evenness_map_abv|stem|5d29acbc63d76806|a800a80fe2e65de3|48693b457575dc57|10455673||t20158.6363420391s|s12863b|12863|qs|local|vector|||3.45|| +obs_richness_map_abv_10|stem|b34fcc76969a3835|c32e1c894e01b54f|48693b457575dc57|-2121955916||t20158.6363808642s|s11766b|11766|qs|local|vector|||3.33|| +obs_richness_ts_abv|stem|9644cd31dc1332a6|0e1e13ab8edf2b2e|48693b457575dc57|-1906844265||t20158.6364160165s|s2515b|2515|qs|local|vector|||3.04|Bootstrapped confidence intervals cannot be calculated for the chosen indicator.| +total_occ_map_abv|stem|7def397faa6bd48d|37bf422ad4f6ba1e|48693b457575dc57|2144595936||t20158.6364537496s|s12082b|12082|qs|local|vector|||3.25|| +spec_occ_map_abv|stem|67695a6880e34b63|1c6c3070fe27f0f9|48693b457575dc57|1040382643||t20158.6365008247s|s93142b|93142|qs|local|vector|||4.05|| +spec_occ_map_cube|stem|3592e8d69972782e|2923183735c9a153|21a17055e3d0b3c7|-1523928707||t20158.6371076394s|s356674b|356674|qs|local|vector|||52.39|| +spec_range_map_cube|stem|f4818f9514ddc012|06c24be89a7622f3|21a17055e3d0b3c7|1805158784||t20158.6376604113s|s271814b|271814|qs|local|vector|||47.72|| +obs_richness_map_cube_1|stem|1db2833c6f88346a|66f1d7db7aad6b78|21a17055e3d0b3c7|-1405350137||t20158.6383333858s|s185150b|185150|qs|local|vector|||58.12|| +obs_richness_ts_cube|stem|2b5e10076c11945b|52c081f2d4708827|21a17055e3d0b3c7|-109269857||t20158.638862681s|s7200b|7200|qs|local|vector|||45.72|Bootstrapped confidence intervals cannot be calculated for the chosen indicator.| +spec_range_ts_cube|stem|099c97fa5be98d30|7eee1844a6ef902f|21a17055e3d0b3c7|1980473897||t20158.645873159s|s121334b|121334|qs|local|vector|||605.69|| +obs_richness_map_cube_10|stem|091c12b2456d92e6|e2419c1564d217a1|21a17055e3d0b3c7|-677520515||t20158.6464733134s|s17346b|17346|qs|local|vector|||51.85|| +total_occ_ts_cube|stem|205592e3722d5f58|8d72fc3bd535dd65|21a17055e3d0b3c7|1560660880||t20158.6492685363s|s7627b|7627|qs|local|vector|||241.49|| +total_occ_map_cube|stem|fdf708e692257c61|9d28fc231c116a65|21a17055e3d0b3c7|301642794||t20158.6498149922s|s17784b|17784|qs|local|vector|||47.21|| +pielou_evenness_map_cube|stem|5c0097bb1122dc9f|86405f793fb7b14a|21a17055e3d0b3c7|1936146881||t20158.650341915s|s18555b|18555|qs|local|vector|||45.52|| +pielou_evenness_ts_cube|stem|e9efca9afd39fd94|fab57ea502389048|21a17055e3d0b3c7|163671844||t20158.6508713463s|s7724b|7724|qs|local|vector|||45.75|| +spec_occ_ts_cube|stem|a2abe72f1ece2e6b|7a477387f00a1819|21a17055e3d0b3c7|-1997763936||t20158.6571229358s|s148306b|148306|qs|local|vector|||540.11|| +spat_res|stem|d4acb59215e478b6|43ed5bdf25dcedef|2c530c1562a7fbd1|-1448122772||t20193.574718615s|s91b|91|qs|local|vector||spat_res_1a0154cae0fe4fc9*spat_res_c0bfdc4196f8ca59|0.3|| +time_period|stem|97f9982511938ea5|b115c63f54fe6958|2c530c1562a7fbd1|857672053||t20193.5747187147s|s94b|94|qs|local|vector||time_period_dab20f8537398031*time_period_b36c363a5c648bd4|0|| +dataset|stem|9819a9bb24ce2e15|1f039bd635d5bfde|2c530c1562a7fbd1|728897448||t20193.5747196546s|s104b|104|qs|local|vector||dataset_2b42df6066ba2e2b*dataset_df353d93eae6ce18|0|| +data_file_33f5d8f414e50ce0|branch|8bbc194aa2f9e7fa|143e47c8d42a9cd8|af6f77ec7cd79831|-612687626|C:/R/git_repositories/comp-unstructured-data/data/interim/abv_data_cube_1km.csv|t20193.5193709323s|s5910704b|5910704|file|local|vector|data_file||0|| +data_file_b13120dd7d93ccbf|branch|2f2d990f10e4b04a|143e47c8d42a9cd8|af6f77ec7cd79831|-1611092828|C:/R/git_repositories/comp-unstructured-data/data/interim/abv_data_cube_10km.csv|t20193.5416756165s|s3911488b|3911488|file|local|vector|data_file||0.01|| +data_file_c84d6875bc9545f8|branch|1996c8274c871f7c|143e47c8d42a9cd8|af6f77ec7cd79831|522324761|C:/R/git_repositories/comp-unstructured-data/data/interim/birdflanders_cube_1km.csv|t20193.5193918844s|s342934864b|342934864|file|local|vector|data_file||0|| +data_file_0c364efbdedc63dc|branch|03e51fbe9de5dcd3|143e47c8d42a9cd8|af6f77ec7cd79831|-397872104|C:/R/git_repositories/comp-unstructured-data/data/interim/birdflanders_cube_10km.csv|t20193.5417375101s|s25258044b|25258044|file|local|vector|data_file||0|| +data_file|pattern|87261ee9caef7b44|143e47c8d42a9cd8||-679625259||||378015100|file|local|vector||data_file_33f5d8f414e50ce0*data_file_b13120dd7d93ccbf*data_file_c84d6875bc9545f8*data_file_0c364efbdedc63dc|0.01|| +read_andid|function|374bf358729fae4f +path_to_interim|function|e6d463e1b14a7d03 +add_cyclus|function|d91da45c987e0333 path_to_data|object|1451679656d9b4bf +filter_1|function|8bd8f150443f31f2 +filter_2|function|d9ad4e89bd86d097 +filter_3|function|e411a93fa52ef3d4 targets_project_dir|object|0e32c01c2f47553e +data_int1_4e0ddc1e46e67b31|branch|6e8a13ff889a53bc|3e84f23fafc28fb6|d5d3ee32aadba0fe|-1478453661||t20193.5750290974s|s480063b|480063|qs|local|vector|data_int1||0.17|| +data_int1_f69ba7312c773bff|branch|08dceb417538ac00|3e84f23fafc28fb6|22f72bcf86a84beb|1615689541||t20193.575030695s|s327112b|327112|qs|local|vector|data_int1||0.11|| +data_int1_cd7b82424432d5bb|branch|ccec1fc8c0fdcb95|3e84f23fafc28fb6|8f604f3e089ec0af|656525044||t20193.5751826437s|s32246410b|32246410|qs|local|vector|data_int1||11.62|| +data_int1_61356e5feec2697a|branch|1c7838adde3ec8eb|3e84f23fafc28fb6|3ce956f49c70e9ad|-821842562||t20193.5751926685s|s1778689b|1778689|qs|local|vector|data_int1||0.73|| +data_int1|pattern|a3a4c06c1426f3ae|3e84f23fafc28fb6||375786356||||34832274|qs|local|vector||data_int1_4e0ddc1e46e67b31*data_int1_f69ba7312c773bff*data_int1_cd7b82424432d5bb*data_int1_61356e5feec2697a|12.63|| +data_08c515db9c4b5bf7|branch|5b473f555394b3ac|4473cbe572d2b471|e01cae5a1002f163|179986700||t20193.5751936561s|s493324b|493324|qs|local|vector|data||0.02|| +data_1dcf0e1e5dd6bbeb|branch|c243f2cfd0dd5113|4473cbe572d2b471|22d9f7bc7cc04f6e|-2102977253||t20193.5751942869s|s333394b|333394|qs|local|vector|data||0|| +data_6eafae49e97811b6|branch|c26ba61794ab200f|4473cbe572d2b471|164014c64e48510a|502617006||t20193.5752398918s|s32663204b|32663204|qs|local|vector|data||0.63|| +data_be1675c01c6e7f25|branch|86a5f7a2901a58e1|4473cbe572d2b471|af3ca08d0e85075f|-719191508||t20193.5752455252s|s1783507b|1783507|qs|local|vector|data||0.21|| +data|pattern|22e73ef11e5a71a6|4473cbe572d2b471||33550471||||35273429|qs|local|vector||data_08c515db9c4b5bf7*data_1dcf0e1e5dd6bbeb*data_6eafae49e97811b6*data_be1675c01c6e7f25|0.86|| +spec_occ_map_c8c7d569dcd26c74|branch||70da23dec145062a|1256cb02cdb6bf16|302565825||t20193.5752464068s||0|qs|local|vector|spec_occ_map||0.01||Object class not recognized. diff --git a/source/pipelines/exploratory_analysis/R/expl_data.R b/source/pipelines/exploratory_analysis/R/expl_data.R index e69de29..2f5088b 100644 --- a/source/pipelines/exploratory_analysis/R/expl_data.R +++ b/source/pipelines/exploratory_analysis/R/expl_data.R @@ -0,0 +1,88 @@ +my_group_by <- function(data, cols) { + require("dplyr") + + group_by(data, pick({{ cols }})) +} + +range_comp <- function(data) { + require("dplyr") + require("tidyr") + + dataset_least_species <- data |> + group_by(.data$id_dataset) |> + summarize(n_species = n_distinct(.data$species)) |> + filter(.data$n_species == min(.data$n_species)) |> + pull(.data$id_dataset) + + species_list <- data |> + filter(.data$id_dataset == dataset_least_species) |> + select(.data$species) |> + distinct() |> + pull() + + comp_range_data <- data |> + filter(.data$species %in% species_list) |> + group_by(pick(matches("^id_"))) |> + mutate(tot_n_dist_gridcells = n_distinct(.data$mgrscode)) |> + ungroup() |> + my_group_by(c(c(.data$species, + .data$tot_n_dist_gridcells), + matches("^id_"))) |> + summarise(n_dist_gridcells = n_distinct(.data$mgrscode)) |> + ungroup() |> + mutate(percentage = .data$n_dist_gridcells / .data$tot_n_dist_gridcells) |> + pivot_wider(id_cols = c(.data$id_spat_res, + .data$species, + matches("^id_filter")), + names_from = .data$id_dataset, + values_from = c(.data$n_dist_gridcells, .data$percentage)) |> + left_join(data |> + filter(.data$id_dataset == "abv_data") |> + distinct(.data$species, .data$category), + by = join_by(.data$species)) + + return(comp_range_data) +} + +trend_comp <- function(data, time_period) { + require("dplyr") + require("tidyr") + + dataset_least_species <- data |> + group_by(.data$id_dataset) |> + summarize(n_species = n_distinct(.data$species)) |> + filter(.data$n_species == min(.data$n_species)) |> + pull(.data$id_dataset) + + species_list <- data |> + filter(.data$id_dataset == dataset_least_species) |> + select(.data$species) |> + distinct() |> + pull() + + trend_range_data <- data |> + filter(.data$species %in% species_list) |> + my_group_by(c(c(.data$species, !!sym(time_period)), matches("^id_"))) |> + summarize(occurrence = sum(n)) |> + ungroup() |> + pivot_wider(id_cols = c(.data$id_spat_res, + .data$species, + !!sym(time_period), + matches("^id_filter")), + names_from = .data$id_dataset, + values_from = .data$occurrence) |> + drop_na() |> + my_group_by(c(c(.data$species, .data$id_spat_res), + matches("^id_filter"))) |> + summarise(correlation = cor(.data$abv_data, + .data$birdflanders, + method = "pearson")) |> + ungroup() |> + left_join(data |> + filter(.data$id_dataset == "abv_data") |> + distinct(.data$species, .data$category), + by = join_by(.data$species)) |> + mutate(time_period = time_period) + + return(trend_range_data) +} diff --git a/source/pipelines/exploratory_analysis/R/read_data.R b/source/pipelines/exploratory_analysis/R/read_data.R index 1d0053e..39d5d19 100644 --- a/source/pipelines/exploratory_analysis/R/read_data.R +++ b/source/pipelines/exploratory_analysis/R/read_data.R @@ -1,3 +1,111 @@ -path_to_interim <- function(path_to_data, file) { +path_to_interim <- function(path_to_data, dataset, spat_res) { + file <- paste0(dataset, "_cube_", spat_res, ".csv") file.path(path_to_data, "interim", file) } + +read_andid <- function(data_file, dataset, spat_res) { + require("dplyr") + + data <- read.csv(data_file) + + output <- data |> + mutate(id_dataset = dataset, + id_spat_res = spat_res) + + return(output) +} + +add_cyclus <- function(data) { + require("dplyr") + + output <- data |> + mutate(cyclus = case_when( + year >= 2007 & year <= 2009 ~ 1, + year >= 2010 & year <= 2012 ~ 2, + year >= 2013 & year <= 2015 ~ 3, + year >= 2016 & year <= 2018 ~ 4, + year >= 2019 & year <= 2021 ~ 5, + year >= 2022 & year <= 2024 ~ 6 + )) + + return(output) +} + +add_category <- function(data) { + require("dplyr") + + output <- data |> + group_by(.data$species) |> + mutate(n_obs = sum(.data$n)) |> + ungroup() |> + mutate(category = cut(.data$n_obs, + breaks = c(0, 10, 100, 1000, 10000, +Inf), + labels = c("Very rare", "Rare", "Common", + "Very common", "Extremely common"), + right = FALSE)) + + return(output) +} + + +filter_1 <- function(data) { + abv_birds <- read.csv("./data/interim/abv_birds.csv") + + output <- data |> + filter(.data$species %in% abv_birds$species) + + return(output) +} + +#' Rules (loosely based on ABV): +#' 1) A square is only relevant is the species was observed in +#' more than one time period +#' 2) A minimum of three relevant squares to include the species +#' 3) A minimum of a hundred observations to include the species + +filter_2 <- function(data, time_period = "year") { + require("dplyr") + + output <- data |> + group_by(.data$mgrscode, .data$species) |> + mutate(periods = n_distinct(!!sym(time_period))) |> + ungroup() |> + filter(.data$periods > 1) |> + group_by(.data$species) |> + mutate(squares = n_distinct(.data$mgrscode)) |> + ungroup() |> + filter(.data$squares > 2) |> + group_by(.data$species) |> + mutate(obs = n()) |> + ungroup() |> + filter(.data$obs > 100) |> + mutate(id_filter_per = .data$time_period) + + return(output) +} + +filter_3 <- function(data, time_period = "year") { + require("dplyr") + + output <- data |> + group_by(.data$id_dataset, + .data$id_spat_res, + .data$species, + .data$category, + !!sym(time_period)) |> + summarise(n = sum(.data$n)) |> + ungroup() |> + group_by(!!sym(time_period)) |> + mutate(total_obs = sum(.data$n)) |> + ungroup() |> + mutate(n = .data$n / .data$total_obs) + + if ("id_filter_per" %in% colnames(data)) { + output$id_filter_per <- data$id_filter_per[1] + output$id_filter_per2 <- time_period + } else { + output$id_filter_per <- time_period + } + + return(output) +} diff --git a/source/pipelines/exploratory_analysis/_targets.R b/source/pipelines/exploratory_analysis/_targets.R index c6dd33f..a3f028d 100644 --- a/source/pipelines/exploratory_analysis/_targets.R +++ b/source/pipelines/exploratory_analysis/_targets.R @@ -4,10 +4,7 @@ library(targets) # Set target options: tar_option_set( - packages = c("rgbif", - "sf", - "dplyr" - ), + packages = c("tidyverse"), format = "qs" # Optionally set the default storage format. qs is fast. ) @@ -17,32 +14,108 @@ path_to_data <- rprojroot::find_root(rprojroot::is_git_root) |> file.path("data") tar_config_set( - script = file.path(targets_project_dir, "target_workflow", "_targets.R"), - store = file.path(targets_project_dir, "target_workflow", + script = file.path(targets_project_dir, "exploratory_analysis", "_targets.R"), + store = file.path(targets_project_dir, "exploratory_analysis", "_targets/"), config = "_targets.yaml", - project = "target_workflow", - use_crew = TRUE) + project = "exploratory_analysis", + use_crew = TRUE +) -# Run the R scripts in the R/ folder with your custom functions: -tar_source(file.path(targets_project_dir, "target_workflow", "R")) +# Run the R scripts in the R/ folder with our custom functions: +tar_source(file.path(targets_project_dir, "exploratory_analysis", "R")) -# Replace the target list below with your own: +# List of targets: list( - tarchetypes::tar_file( - abv_data_file, - path_to_interim(path_to_data = path_to_data, file = "abv_data.csv") + tar_target( + time_period, + c("year", "cyclus") + ), + tar_target( + spat_res, + c("1km", "10km") ), tar_target( - abv_data, - read.csv(abv_data_file) + dataset, + c("abv_data", "birdflanders") ), tarchetypes::tar_file( - birdcube_data_file, - path_to_interim(path_to_data = path_to_data, file = "birdcubeflanders.csv") + data_file, + path_to_interim(path_to_data = path_to_data, + dataset = dataset, + spat_res = spat_res), + pattern = cross(dataset, spat_res) + ), + tar_target( + data_int1, + read_andid(data_file, dataset, spat_res), + pattern = map(data_file, cross(dataset, spat_res)) + ), + tar_target( + data_int2, + add_cyclus(data_int1), + pattern = map(data_int1) + ), + tar_target( + data, + add_category(data_int2), + pattern = map(data_int2) + ), + tar_target( + filter1, + filter_1(data), + pattern = map(data) + ), + tar_target( + filter2, + filter_2(data, time_period), + pattern = cross(data, time_period) + ), + tar_target( + filter3, + filter_3(data, time_period), + pattern = cross(data, time_period) + ), + tar_target( + filter4, + filter_3(filter2, time_period), + pattern = cross(filter2, time_period) + ), + tar_target( + range_comp_0, + range_comp(data) + ), + tar_target( + range_comp_1, + range_comp(filter1) + ), + tar_target( + range_comp_2, + range_comp(filter2) + ), + tar_target( + trend_comp_0, + trend_comp(data, time_period), + pattern = map(time_period) + ), + tar_target( + trend_comp_1, + trend_comp(filter1, time_period), + pattern = map(time_period) + ), + tar_target( + trend_comp_2, + trend_comp(filter2, time_period), + pattern = map(time_period) + ), + tar_target( + trend_comp_3, + trend_comp(filter3, time_period), + pattern = map(time_period) ), tar_target( - birdcube_data, - read.csv(birdcube_data_file) + trend_comp_4, + trend_comp(filter4, time_period), + pattern = map(time_period) ) ) diff --git a/source/pipelines/exploratory_analysis/_targets/meta/meta b/source/pipelines/exploratory_analysis/_targets/meta/meta index 530e835..9c44e9f 100644 --- a/source/pipelines/exploratory_analysis/_targets/meta/meta +++ b/source/pipelines/exploratory_analysis/_targets/meta/meta @@ -1,36 +1,102 @@ name|type|data|command|depend|seed|path|time|size|bytes|format|repository|iteration|parent|children|seconds|warnings|error -read_cube_gbif|function|fb7318b724fcb3bf||||||||||||||| -read_abv_gbif|function|0f973c90384d7a5f||||||||||||||| -get_data|function|3341b9af0729302c||||||||||||||| -obs_richness_map_abv|stem|c674139a986d3be1|f97003405992072d|f7144d68646b34bb|544418752||t20137.5632771888s|s145198b|145198|qs|local|vector|||17.78|| -pielou_evenness_ts_abv|stem|a2e48c996ea6b50b|d315fc72cb502da2|f7144d68646b34bb|1463916421||t20140.3442884191s|s2977b|2977|qs|local|vector|||6.25|| -abv_data_file|stem|527d35ac7669ff2c|b6acb50d3d39ac90|89e666f628adf0f2|-226531251|C:/R/git_repositories/comp-unstructured-data/data/interim/abv_data.csv|t20148.7027485294s|s10444864b|10444864|file|local|vector|||0.72|| -birdcube_data_file|stem|519c1c07bacce2ea|48a9ff406ed7d0bc|89e666f628adf0f2|1618870224|C:/R/git_repositories/comp-unstructured-data/data/interim/birdcubeflanders.csv|t20148.6974539314s|s238113817b|238113817|file|local|vector|||0|| -abv_data|stem|b480912d37d3df6c|e229d1a198614366|d57bebce74690f1d|-790396526||t20148.7030385626s|s734920b|734920|qs|local|vector|||0.36|| -birdcube_data|stem|dbc57af89615930d|c219f922643095f8|fce17d6d9fbc08c0|804043826||t20148.7031682901s|s12335314b|12335314|qs|local|vector|||10.55|| -abv|stem|5f686d3ec523e2b9|f5ad0c893737fd7c|14ed9db9b45c9c17|1730016550||t20148.7031815535s|s892668b|892668|qs|local|vector|||1.07|| -birdcube|stem|525ff548687311f9|c6a94986654dd8a8|e37ed227da3f0d82|803444483||t20148.7033767325s|s19947156b|19947156|qs|local|vector|||15.87|| -spec_range_map_abv|stem|430c6c7111b831a4|35ed1551765724ea|0beb31fed06a2cff|565055131||t20148.7034706916s|s73356b|73356|qs|local|vector|||8.08|| -spec_range_ts_abv|stem|c612df97d7866008|b8ec36b075ba8929|0beb31fed06a2cff|212098582||t20148.7044819747s|s43034b|43034|qs|local|vector|||87.36|| -obs_richness_map_abv_1|stem|6a1b0136ccf8544d|f97003405992072d|0beb31fed06a2cff|-1849121129||t20148.70473719s|s145112b|145112|qs|local|vector|||22.01|| -spec_occ_ts_abv|stem|523e2ea4f5c7f53a|50e92be5970898cb|0beb31fed06a2cff|187662183||t20148.7057596212s|s50987b|50987|qs|local|vector|||88.33|| -total_occ_ts_abv|stem|a0e21bd2ab55ebc5|b8a82c3c5a65f3b7|0beb31fed06a2cff|-1571972827||t20148.7059303702s|s2927b|2927|qs|local|vector|||14.73|| -pielou_evenness_map_abv|stem|b744a6c7761236f0|a800a80fe2e65de3|0beb31fed06a2cff|10455673||t20148.705980857s|s12855b|12855|qs|local|vector|||4.34|| -obs_richness_map_abv_10|stem|f8293f6ba6236a08|c32e1c894e01b54f|0beb31fed06a2cff|-2121955916||t20148.7060292033s|s11761b|11761|qs|local|vector|||4.17|| -obs_richness_ts_abv|stem|36c2af1ea14027e3|0e1e13ab8edf2b2e|0beb31fed06a2cff|-1906844265||t20148.7060780799s|s2515b|2515|qs|local|vector|||4.22|Bootstrapped confidence intervals cannot be calculated for the chosen indicator.| -total_occ_map_abv|stem|c91cdb87695f8b61|37bf422ad4f6ba1e|0beb31fed06a2cff|2144595936||t20148.7061280773s|s12077b|12077|qs|local|vector|||4.3|| -spec_occ_map_abv|stem|74530206571cff64|1c6c3070fe27f0f9|0beb31fed06a2cff|1040382643||t20148.7061883524s|s94259b|94259|qs|local|vector|||5.18|| -spec_occ_map_cube|stem|28530d44bf976352|2923183735c9a153|a5c31b662884b2bb|-1523928707||t20148.7069280832s|s358297b|358297|qs|local|vector|||63.85|| -spec_range_map_cube|stem|976c9922669aebb5|06c24be89a7622f3|a5c31b662884b2bb|1805158784||t20148.7076471583s|s272603b|272603|qs|local|vector|||62.08|| -obs_richness_ts_cube|stem|1da223133056ce2b|52c081f2d4708827|a5c31b662884b2bb|-109269857||t20148.7083309265s|s7144b|7144|qs|local|vector|||59.06|Bootstrapped confidence intervals cannot be calculated for the chosen indicator.| -spec_range_ts_cube|stem|6c83010cb134e752|7eee1844a6ef902f|a5c31b662884b2bb|1980473897||t20148.7162639453s|s121432b|121432|qs|local|vector|||685.39|| -obs_richness_map_cube_10|stem|dee458f23e469676|e2419c1564d217a1|a5c31b662884b2bb|-677520515||t20148.7168106236s|s17281b|17281|qs|local|vector|||47.22|| -total_occ_ts_cube|stem|7e9ed41e6af6a6c6|8d72fc3bd535dd65|a5c31b662884b2bb|1560660880||t20148.7194517961s|s7573b|7573|qs|local|vector|||228.19|| -total_occ_map_cube|stem|d821b4ccf042456d|9d28fc231c116a65|a5c31b662884b2bb|301642794||t20148.7199878716s|s17723b|17723|qs|local|vector|||46.31|| -pielou_evenness_map_cube|stem|ad77e6d4da619406|86405f793fb7b14a|a5c31b662884b2bb|1936146881||t20148.7205271543s|s18484b|18484|qs|local|vector|||46.58|| -pielou_evenness_ts_cube|stem|50bd767cd851c70d|fab57ea502389048|a5c31b662884b2bb|163671844||t20148.7210688536s|s7647b|7647|qs|local|vector|||46.8|| -spec_occ_ts_cube|stem|068cd37666dc5abc|7a477387f00a1819|a5c31b662884b2bb|-1997763936||t20148.727323413s|s148416b|148416|qs|local|vector|||540.39|| -path_to_interim|function|36603da5829a6104 +time_period|stem|97f9982511938ea5|b115c63f54fe6958|2c530c1562a7fbd1|857672053||t20179.4246139028s|s94b|94|qs|local|vector||time_period_dab20f8537398031*time_period_b36c363a5c648bd4|0|| +spat_res|stem|d4acb59215e478b6|43ed5bdf25dcedef|2c530c1562a7fbd1|-1448122772||t20188.5427551964s|s91b|91|qs|local|vector||spat_res_1a0154cae0fe4fc9*spat_res_c0bfdc4196f8ca59|0|| +dataset|stem|9819a9bb24ce2e15|1f039bd635d5bfde|2c530c1562a7fbd1|728897448||t20188.5791002638s|s104b|104|qs|local|vector||dataset_2b42df6066ba2e2b*dataset_df353d93eae6ce18|0.02|| +data_int1_4e0ddc1e46e67b31|branch|6e8a13ff889a53bc|3e84f23fafc28fb6|d5d3ee32aadba0fe|-1478453661||t20189.4866942468s|s480063b|480063|qs|local|vector|data_int1||0.3|| +data_int2_08c515db9c4b5bf7|branch|5b473f555394b3ac|4473cbe572d2b471|e01cae5a1002f163|-2102858047||t20189.4867903977s|s493324b|493324|qs|local|vector|data_int2||0.01|| +data_bfcf276d75ccbb00|branch|2f1922a125ae64dc|eef692ddc9f35da8|f22860f4a417fa34|-1482191240||t20189.4868216211s|s601166b|601166|qs|local|vector|data||0.05|| +filter1_42bfd2a7ebc44dbb|branch|7d458b3c55b6c6e8|cca699f3c9647249|ffd17838c1b18f11|1041230766||t20189.5098692486s|s560911b|560911|qs|local|vector|filter1||0.04|| +filter2_f1f74701e3e5f85a|branch|08c0025986e5ead4|684559232c0eb4d3|755268d6fc79a85e|-2082714772||t20189.5099023814s|s545370b|545370|qs|local|vector|filter2||0.63|| +filter2_6cd7fbfdce023604|branch|b9b0d67792be4048|684559232c0eb4d3|755268d6fc79a85e|701116656||t20189.5099105298s|s556886b|556886|qs|local|vector|filter2||0.61|| +data_file_33f5d8f414e50ce0|branch|8bbc194aa2f9e7fa|143e47c8d42a9cd8|af6f77ec7cd79831|-612687626|C:/R/git_repositories/comp-unstructured-data/data/interim/abv_data_cube_1km.csv|t20193.5193709323s|s5910704b|5910704|file|local|vector|data_file||0|| +data_file_b13120dd7d93ccbf|branch|2f2d990f10e4b04a|143e47c8d42a9cd8|af6f77ec7cd79831|-1611092828|C:/R/git_repositories/comp-unstructured-data/data/interim/abv_data_cube_10km.csv|t20193.5416756165s|s3911488b|3911488|file|local|vector|data_file||0|| +data_file_c84d6875bc9545f8|branch|1996c8274c871f7c|143e47c8d42a9cd8|af6f77ec7cd79831|522324761|C:/R/git_repositories/comp-unstructured-data/data/interim/birdflanders_cube_1km.csv|t20193.5193918844s|s342934864b|342934864|file|local|vector|data_file||0|| +data_file_0c364efbdedc63dc|branch|03e51fbe9de5dcd3|143e47c8d42a9cd8|af6f77ec7cd79831|-397872104|C:/R/git_repositories/comp-unstructured-data/data/interim/birdflanders_cube_10km.csv|t20193.5417375101s|s25258044b|25258044|file|local|vector|data_file||0|| +data_file|pattern|87261ee9caef7b44|143e47c8d42a9cd8||-679625259||||372104396|file|local|vector||data_file_33f5d8f414e50ce0*data_file_b13120dd7d93ccbf*data_file_c84d6875bc9545f8*data_file_0c364efbdedc63dc|0|| +data_int1_f69ba7312c773bff|branch|08dceb417538ac00|3e84f23fafc28fb6|22f72bcf86a84beb|1615689541||t20193.5547220617s|s327112b|327112|qs|local|vector|data_int1||0.11|| +data_int1_cd7b82424432d5bb|branch|ccec1fc8c0fdcb95|3e84f23fafc28fb6|8f604f3e089ec0af|656525044||t20193.5548739945s|s32246410b|32246410|qs|local|vector|data_int1||11.61|| +data_int1_61356e5feec2697a|branch|1c7838adde3ec8eb|3e84f23fafc28fb6|3ce956f49c70e9ad|-821842562||t20193.5548838954s|s1778689b|1778689|qs|local|vector|data_int1||0.74|| +data_int1|pattern|a3a4c06c1426f3ae|3e84f23fafc28fb6||375786356||||34352211|qs|local|vector||data_int1_4e0ddc1e46e67b31*data_int1_f69ba7312c773bff*data_int1_cd7b82424432d5bb*data_int1_61356e5feec2697a|12.46|| +data_int2_1dcf0e1e5dd6bbeb|branch|c243f2cfd0dd5113|4473cbe572d2b471|22d9f7bc7cc04f6e|-497335474||t20193.5548846601s|s333394b|333394|qs|local|vector|data_int2||0.01|| +data_int2_6eafae49e97811b6|branch|c26ba61794ab200f|4473cbe572d2b471|164014c64e48510a|933664862||t20193.554929042s|s32663204b|32663204|qs|local|vector|data_int2||0.59|| +data_int2_be1675c01c6e7f25|branch|86a5f7a2901a58e1|4473cbe572d2b471|af3ca08d0e85075f|-471024916||t20193.554934077s|s1783507b|1783507|qs|local|vector|data_int2||0.03|| +data_int2|pattern|2823efff8f4b7dbd|4473cbe572d2b471||489839562||||34780105|qs|local|vector||data_int2_08c515db9c4b5bf7*data_int2_1dcf0e1e5dd6bbeb*data_int2_6eafae49e97811b6*data_int2_be1675c01c6e7f25|0.63|| +data_ce6caa27ccc901c8|branch|02eb40f337a5e1f6|eef692ddc9f35da8|de38078c5b3c5c68|753906313||t20193.5549347994s|s407464b|407464|qs|local|vector|data||0.02|| +data_10fa602920492c3f|branch|d88c04e314ef7531|eef692ddc9f35da8|c50f85e8fc9178c8|-90071810||t20193.5549770508s|s37950368b|37950368|qs|local|vector|data||0.43|| +data_00692a5475eb6791|branch|6cb100a4143f5c44|eef692ddc9f35da8|d055eb3760aa3557|-1750430802||t20193.5549803908s|s2095572b|2095572|qs|local|vector|data||0.02|| +data|pattern|3b8fc72431e155d6|eef692ddc9f35da8||33550471||||40453404|qs|local|vector||data_bfcf276d75ccbb00*data_ce6caa27ccc901c8*data_10fa602920492c3f*data_00692a5475eb6791|0.47|| +range_comp_0|stem|db39ef384c7bb655|125f5787a253f428|e852fab0b5bade9f|1263720519||t20193.5550318712s|s8083b|8083|qs|local|vector|||1.91|| +filter1_19fe388d423215dd|branch|67e91e5cff653215|cca699f3c9647249|40442e78fcae61f3|-1390461926||t20193.5550325823s|s369534b|369534|qs|local|vector|filter1||0|| +filter1_7d983590cfa805b1|branch|e7f3e165996ef028|cca699f3c9647249|e85a0963edcac9e2|14026918||t20193.5550753455s|s25174505b|25174505|qs|local|vector|filter1||0.59|| +filter1_ea17909750229f50|branch|5026b56c71e74aa5|cca699f3c9647249|ef047897ee5200e1|-2195098||t20193.5550782893s|s804387b|804387|qs|local|vector|filter1||0.03|| +filter1|pattern|566938ea43804523|cca699f3c9647249||2019627227||||26348426|qs|local|vector||filter1_42bfd2a7ebc44dbb*filter1_19fe388d423215dd*filter1_7d983590cfa805b1*filter1_ea17909750229f50|0.62|| +filter2_27b7b1b2c2027995|branch|1a1b6aebc71ed3e7|684559232c0eb4d3|f25cf173b81632bb|-1549267551||t20193.5550819936s|s449218b|449218|qs|local|vector|filter2||0.25|| +filter2_524fe83e402ece3e|branch|e5a6ebca51b410b8|684559232c0eb4d3|f25cf173b81632bb|-780957488||t20193.5550851967s|s435588b|435588|qs|local|vector|filter2||0.2|| +filter2_ca364804d5cde2cb|branch|1cda845c2b1d58b2|684559232c0eb4d3|ce633f885eb33061|857011199||t20193.5553439001s|s44835468b|44835468|qs|local|vector|filter2||18.81|| +filter2_0c571300c9872792|branch|4999e45904dc6bec|684559232c0eb4d3|ce633f885eb33061|662669663||t20193.5556131914s|s41424020b|41424020|qs|local|vector|filter2||19.61|| +filter2_3a1d0f94e1b7d815|branch|bed73b51f62c7e83|684559232c0eb4d3|34110d7adb1456f5|-1927638439||t20193.5556248003s|s2376579b|2376579|qs|local|vector|filter2||0.72|| +filter2_6f29943e8c757d6c|branch|8ddff9b99a96789c|684559232c0eb4d3|34110d7adb1456f5|1395566489||t20193.555636304s|s2298640b|2298640|qs|local|vector|filter2||0.73|| +filter2|pattern|4ffe20fafc6913b6|684559232c0eb4d3||191120004||||91819513|qs|local|vector||filter2_f1f74701e3e5f85a*filter2_6cd7fbfdce023604*filter2_27b7b1b2c2027995*filter2_524fe83e402ece3e*filter2_ca364804d5cde2cb*filter2_0c571300c9872792*filter2_3a1d0f94e1b7d815*filter2_6f29943e8c757d6c|40.32|| +filter3|pattern|0663ec58417d59d4|401d98c529a52799||-1314917285||||177401|qs|local|vector||filter3_f1f74701e3e5f85a*filter3_6cd7fbfdce023604*filter3_27b7b1b2c2027995*filter3_524fe83e402ece3e*filter3_ca364804d5cde2cb*filter3_0c571300c9872792*filter3_3a1d0f94e1b7d815*filter3_6f29943e8c757d6c|0.88|| +trend_comp_0_7e52083f5afe5d32|branch|f1c579da1f9c7378|10af96594c220449|18dd8028e80757f0|-1322497073||t20193.5557363451s|s5741b|5741|qs|local|vector|trend_comp_0||1.6|There were 22 warnings in summarise.The first warning wasℹ In argument correlation corabv_data, birdflanders, method pearson.ℹ In group 33 species Anser albifrons id_spat_res 10km.Caused by warning in cor the standard deviation is zeroℹ Run dplyrlast_dplyr_warnings to see the 21 remaining warnings.| +trend_comp_0_2d015e7bca5c3c76|branch|6262322f5d472bfc|10af96594c220449|18dd8028e80757f0|-880916140||t20193.5557791223s|s5777b|5777|qs|local|vector|trend_comp_0||1.33|There were 18 warnings in summarise.The first warning wasℹ In argument correlation corabv_data, birdflanders, method pearson.ℹ In group 33 species Anser albifrons id_spat_res 10km.Caused by warning in cor the standard deviation is zeroℹ Run dplyrlast_dplyr_warnings to see the 17 remaining warnings.| +trend_comp_0|pattern|0d80f249771af7e6|10af96594c220449||-2028922957||||11518|qs|local|vector||trend_comp_0_7e52083f5afe5d32*trend_comp_0_2d015e7bca5c3c76|2.93|| +range_comp_1|stem|a8c58d7e86fab6cd|f536ed47d27f11cb|f21dc4257d138d14|1899932574||t20193.5558182598s|s4728b|4728|qs|local|vector|||1.56|| +trend_comp_1_7e52083f5afe5d32|branch|a90fccbc267b82e5|8a00db3c12a850b6|ecc1a46c2f97037c|-1970434452||t20193.5558530262s|s3439b|3439|qs|local|vector|trend_comp_1||1.01|| +trend_comp_1_2d015e7bca5c3c76|branch|3282e6368693dc8c|8a00db3c12a850b6|ecc1a46c2f97037c|-268899675||t20193.5558857954s|s3431b|3431|qs|local|vector|trend_comp_1||1.22|| +trend_comp_1|pattern|feefaeb7b07ffdc3|8a00db3c12a850b6||259795514||||6870|qs|local|vector||trend_comp_1_7e52083f5afe5d32*trend_comp_1_2d015e7bca5c3c76|2.23|| +range_comp_2|stem|a2f85d4dc007de4a|a9504f8677ce4ac4|abf1f704feadce5d|-1679452881||t20193.5559836434s|s7007b|7007|qs|local|vector|||3.33|| +trend_comp_2_7e52083f5afe5d32|branch|fb219841fb8edc26|34775b72a938fc27|b2774768093c8b2e|-683159121||t20193.55606449s|s4757b|4757|qs|local|vector|trend_comp_2||1.84|| +trend_comp_2_2d015e7bca5c3c76|branch|eda9042c3cf96647|34775b72a938fc27|b2774768093c8b2e|-489654775||t20193.5561479438s|s4745b|4745|qs|local|vector|trend_comp_2||2.06|| +trend_comp_2|pattern|e7faef4f3db62f4a|34775b72a938fc27||699221411||||9502|qs|local|vector||trend_comp_2_7e52083f5afe5d32*trend_comp_2_2d015e7bca5c3c76|3.9|| +trend_comp_3_7e52083f5afe5d32|branch|a27c6deb6a430a20|b529d541e210bca2|1aef5dfad8209646|-349869486||t20193.5561500788s|s5877b|5877|qs|local|vector|trend_comp_3||0.16|| +trend_comp_3_2d015e7bca5c3c76|branch|402e6ac9ac86a355|b529d541e210bca2|1aef5dfad8209646|1055399620||t20193.5561519361s|s5852b|5852|qs|local|vector|trend_comp_3||0.12|| +trend_comp_3|pattern|7f1435cd435001cb|b529d541e210bca2||-1621708013||||11729|qs|local|vector||trend_comp_3_7e52083f5afe5d32*trend_comp_3_2d015e7bca5c3c76|0.28|| +filter4_f1f74701e3e5f85a|branch|e0f5013c4ac50975|49624bd67afef38f|60a0de4c9fb6e3d5|568476268||t20195.4027077903s|s75018b|75018|qs|local|vector|filter4||1.08|| +filter4_6cd7fbfdce023604|branch|95d5fa99f88d4c01|49624bd67afef38f|60a0de4c9fb6e3d5|103138254||t20195.4028086588s|s33785b|33785|qs|local|vector|filter4||1.18|| +filter4_27b7b1b2c2027995|branch|e0f5013c4ac50975|49624bd67afef38f|7720890a721181f0|450728687||t20195.402907541s|s75018b|75018|qs|local|vector|filter4||1.14|| +filter4_524fe83e402ece3e|branch|95d5fa99f88d4c01|49624bd67afef38f|7720890a721181f0|-1088646393||t20195.4030082501s|s33785b|33785|qs|local|vector|filter4||1.61|| +filter4_ca364804d5cde2cb|branch|e0f5013c4ac50975|49624bd67afef38f|2116766cb7da4412|448216610||t20195.4031578814s|s75018b|75018|qs|local|vector|filter4||1.81|| +filter4_0c571300c9872792|branch|95d5fa99f88d4c01|49624bd67afef38f|2116766cb7da4412|349050165||t20195.4033016174s|s33785b|33785|qs|local|vector|filter4||2.24|| +filter4_3a1d0f94e1b7d815|branch|e0f5013c4ac50975|49624bd67afef38f|3410df7e63ba1fcc|1197576747||t20195.403422569s|s75018b|75018|qs|local|vector|filter4||1.08|| +filter4_6f29943e8c757d6c|branch|95d5fa99f88d4c01|49624bd67afef38f|3410df7e63ba1fcc|805520881||t20195.4035760898s|s33785b|33785|qs|local|vector|filter4||1.33|| +filter3_f1f74701e3e5f85a|branch|1358bb0af5e550ed|401d98c529a52799|68982e66b5b33783|349156039||t20195.4286404199s|s15501b|15501|qs|local|vector|filter3||0.04|| +filter3_6cd7fbfdce023604|branch|4fa42df0c3ab9833|401d98c529a52799|68982e66b5b33783|-182404387||t20195.4286413507s|s8979b|8979|qs|local|vector|filter3||0.05|| +filter3_27b7b1b2c2027995|branch|3f6c0ff7a5f98fa8|401d98c529a52799|85ef351bd1483558|1194421557||t20195.4286420835s|s15509b|15509|qs|local|vector|filter3||0.05|| +filter3_524fe83e402ece3e|branch|11aef56adb7b8625|401d98c529a52799|85ef351bd1483558|-1223537644||t20195.4286427862s|s8980b|8980|qs|local|vector|filter3||0.03|| +filter3_ca364804d5cde2cb|branch|a0500ffec14e3917|401d98c529a52799|6b277ecad52cbec9|569029593||t20195.4286730665s|s48809b|48809|qs|local|vector|filter3||0.5|| +filter3_0c571300c9872792|branch|8ce9d5215e01a149|401d98c529a52799|6b277ecad52cbec9|1742703209||t20195.4286987808s|s27624b|27624|qs|local|vector|filter3||0.47|| +filter3_3a1d0f94e1b7d815|branch|fa90602b7f6f3000|401d98c529a52799|e6bc9be500c7153a|-113780780||t20195.4287008758s|s48808b|48808|qs|local|vector|filter3||0.04|| +filter3_6f29943e8c757d6c|branch|a11c550918ff6c14|401d98c529a52799|e6bc9be500c7153a|2097128119||t20195.4287035102s|s27671b|27671|qs|local|vector|filter3||0.1|| +filter4_f0d02e9b16c4bd32|branch|f60f14c2abd38b99|49624bd67afef38f|f7478009790fa65d|-2089822089||t20195.428704365s|s10774b|10774|qs|local|vector|filter4||0.04|| +filter4_3440dcc77181d695|branch|3dae13acdbac53bd|49624bd67afef38f|f7478009790fa65d|1382556661||t20195.4287052734s|s5613b|5613|qs|local|vector|filter4||0.04|| +filter4_d650a3c4232e3ef9|branch|75df1d75870e9072|49624bd67afef38f|af91bd3c112a4e36|-1824333314||t20195.4287060793s|s10632b|10632|qs|local|vector|filter4||0.03|| +filter4_2c24d7827c724c06|branch|abad4be9e526afe7|49624bd67afef38f|af91bd3c112a4e36|1653105033||t20195.4287068692s|s5548b|5548|qs|local|vector|filter4||0.03|| +filter4_019548f7428fee3b|branch|49bbf2c47189e26d|49624bd67afef38f|380a86934af5936f|-216156951||t20195.4287079189s|s10920b|10920|qs|local|vector|filter4||0.05|| +filter4_d57128e6e9c99248|branch|32a05e094e3e8cb5|49624bd67afef38f|380a86934af5936f|-1590559766||t20195.4287086066s|s5768b|5768|qs|local|vector|filter4||0.03|| +filter4_2e59362b044047b0|branch|5a2e971574e12b3c|49624bd67afef38f|74e7380b5c4e0fac|-2124559314||t20195.4287094092s|s10981b|10981|qs|local|vector|filter4||0.03|| +filter4_391c0ff13089983d|branch|66cab329529a5d18|49624bd67afef38f|74e7380b5c4e0fac|-1321859629||t20195.4287100219s|s5687b|5687|qs|local|vector|filter4||0.02|| +filter4_527e9a9da74dad47|branch|338eddbc2a3750a2|49624bd67afef38f|d712c018f0c1b75d|-251262667||t20195.4287364737s|s33827b|33827|qs|local|vector|filter4||0.39|| +filter4_67d9483d5225d669|branch|c1403a0f2b454ebc|49624bd67afef38f|d712c018f0c1b75d|-1291609541||t20195.4287632931s|s16920b|16920|qs|local|vector|filter4||0.45|| +filter4_16bc95a6d3024bc1|branch|823d66f7634b5c16|49624bd67afef38f|a0edccc729511220|-932121279||t20195.4287886754s|s32992b|32992|qs|local|vector|filter4||0.39|| +filter4_a6d4bd59d16ede4d|branch|937bafc63efdb1ab|49624bd67afef38f|a0edccc729511220|28672390||t20195.4288149546s|s16501b|16501|qs|local|vector|filter4||0.41|| +filter4_8d6ba41b14d4a1a7|branch|cb43b838731e8acd|49624bd67afef38f|69b12369841f5a36|1299343190||t20195.4288171701s|s33497b|33497|qs|local|vector|filter4||0.05|| +filter4_cadc247eb088c912|branch|1926f85a1c87a941|49624bd67afef38f|69b12369841f5a36|1285228038||t20195.428819752s|s16472b|16472|qs|local|vector|filter4||0.03|| +filter4_ffb974910136e0ef|branch|70a408918e1ba9d7|49624bd67afef38f|b86d6dfa5e30b7a8|748927140||t20195.428822151s|s33171b|33171|qs|local|vector|filter4||0.05|| +filter4_673a500abeb8ddbb|branch|bc372a4eb380cdb4|49624bd67afef38f|b86d6dfa5e30b7a8|1739950318||t20195.4288245721s|s16395b|16395|qs|local|vector|filter4||0.06|| +filter4|pattern|b2d927092982d5eb|49624bd67afef38f||-66882420||||265698|qs|local|vector||filter4_f0d02e9b16c4bd32*filter4_3440dcc77181d695*filter4_d650a3c4232e3ef9*filter4_2c24d7827c724c06*filter4_019548f7428fee3b*filter4_d57128e6e9c99248*filter4_2e59362b044047b0*filter4_391c0ff13089983d*filter4_527e9a9da74dad47*filter4_67d9483d5225d669*filter4_16bc95a6d3024bc1*filter4_a6d4bd59d16ede4d*filter4_8d6ba41b14d4a1a7*filter4_cadc247eb088c912*filter4_ffb974910136e0ef*filter4_673a500abeb8ddbb|2.1|| +read_andid|function|374bf358729fae4f +my_group_by|function|4c7c5bba22fac320 +path_to_interim|function|e6d463e1b14a7d03 +add_category|function|962b21f257a9d25b +add_cyclus|function|d91da45c987e0333 path_to_data|object|1451679656d9b4bf +filter_1|function|8bd8f150443f31f2 +filter_2|function|d9ad4e89bd86d097 +filter_3|function|901ef5fd12aca534 targets_project_dir|object|0e32c01c2f47553e -obs_richness_map_cube_1|stem|89d23ee87b4a659e|66f1d7db7aad6b78|a5c31b662884b2bb|-1405350137||t20148.7771409072s|s184803b|184803|qs|local|vector|||64.83|| +trend_comp|function|313e67883dc6bbe5 +range_comp|function|78224f499968dff4 +trend_comp_4_7e52083f5afe5d32|branch|99326a84752ee45c|8b7924d4815ed03e|33b32d4f2fa00ea0|-876578913||t20195.4311537544s|s4792b|4792|qs|local|vector|trend_comp_4||0.14|| +trend_comp_4_2d015e7bca5c3c76|branch|d88b337a9b2f0fcf|8b7924d4815ed03e|33b32d4f2fa00ea0|-1448276416||t20195.431155014s|s4796b|4796|qs|local|vector|trend_comp_4||0.07|| +trend_comp_4|pattern|9a888fdfe162761a|8b7924d4815ed03e||1403839019||||9588|qs|local|vector||trend_comp_4_7e52083f5afe5d32*trend_comp_4_2d015e7bca5c3c76|0.21|| diff --git a/source/pipelines/exploratory_analysis/run_pipeline.R b/source/pipelines/exploratory_analysis/run_pipeline.R index a788f12..3a84c1d 100644 --- a/source/pipelines/exploratory_analysis/run_pipeline.R +++ b/source/pipelines/exploratory_analysis/run_pipeline.R @@ -1,17 +1,38 @@ # run the pipeline - library(targets) -Sys.setenv(TAR_PROJECT = "biodiversity_indicators") +Sys.setenv(TAR_PROJECT = "exploratory_analysis") tar_make() + # inspect pipeline tar_prune() -tar_visnetwork() +tar_visnetwork(targets_only = TRUE) meta <- tar_meta() + # debug pipeline +# R console +library(targets) + +tar_read() + +# Restart your R session. +rstudioapi::restartSession() + +# Loads globals like tar_option_set() packages, simulate_data(), and +# analyze_data(): +tar_load_globals() + +# Load the data that the target depends on. +tar_load(dataset1) + +# Run the command of the errored target. +analyze_data(dataset1) +#> Error in na.fail.default(list(measurement = c(1L, 2L, 3L, 4L, 1L, 2L, : +#> missing values in object + abv <- tar_read(abv) diff --git a/source/reports/comparing_biodiv_indicators/comp_indicators_target.Rmd b/source/reports/comparing_biodiv_indicators/comp_indicators_target.Rmd index 8cea40b..e90c451 100644 --- a/source/reports/comparing_biodiv_indicators/comp_indicators_target.Rmd +++ b/source/reports/comparing_biodiv_indicators/comp_indicators_target.Rmd @@ -1,7 +1,7 @@ --- title: "Compare biodiversity indicators from the b3gbi package" author: "Emma Cartuyvels, Ward Langeraert, Toon Van Daele" -date: +date: "`r Sys.Date()`" output: html_document: code_folding: hide @@ -12,7 +12,7 @@ output: # Introduction -In this document we compare a selection of B-cubed biodiversity indicators between actual cube data and data from structured monitoring. For more information on both datasets check to ... document. +In this document we compare a selection of B-cubed biodiversity indicators between actual cube data and data from structured monitoring. For more information on both datasets check the *Exploratory analysis from targets pipeline* html. ```{r setup, include = FALSE} library(knitr) @@ -32,55 +32,67 @@ library(ggplot2) ### Map ```{r total occurrences map, fig.show="hold", out.width="50%"} -total_occ_map_birdcube <- tar_read(total_occ_map_cube) -plot(total_occ_map_birdcube) + - labs(title = "Birdcube Flanders") - total_occ_map_abv <- tar_read(total_occ_map_abv) -plot(total_occ_map_abv) + - labs(title = "ABV") +p <- plot(total_occ_map_abv) +p + labs(title = paste0(p$labels$title, " - ABV")) + +total_occ_map_birdcube <- tar_read(total_occ_map_cube) +p <- plot(total_occ_map_birdcube) +p + labs(title = paste0(p$labels$title, " - Cube")) ``` +What's interesting is that there are clearly certain areas in the cube that have much more occurrences than others. The ones on the coast are probably due to the gull data, the ones near and under Antwerp will need to be investigated further. + ### Time series ```{r total occurrences time series, fig.show="hold", out.width="50%"} -total_occ_ts_birdcube <- tar_read(total_occ_ts_cube) -plot(total_occ_ts_birdcube) - total_occ_ts_abv <- tar_read(total_occ_ts_abv) -plot(total_occ_ts_abv) +p <- plot(total_occ_ts_abv) +p + labs(title = paste0(p$labels$title, " - ABV")) + +total_occ_ts_birdcube <- tar_read(total_occ_ts_cube) +p <- plot(total_occ_ts_birdcube) +p + labs(title = paste0(p$labels$title, " - Cube")) ``` +Both data sets show big differences in the number of observations over the years, this will probably impact other indicators if this is not corrected for. + ## Observed richness ### Map 1 km² ```{robserved richness map 1km, fig.show="hold", out.width="50%"} obs_richness_map_abv_1 <- tar_read(obs_richness_map_abv_1) -plot(obs_richness_map_abv_1) +p <- plot(obs_richness_map_abv_1) +p + labs(title = paste0(p$labels$title, " - ABV")) obs_richness_map_cube_1 <- tar_read(obs_richness_map_cube_1) -plot(obs_richness_map_cube_1) +p <- plot(obs_richness_map_cube_1) +p + labs(title = paste0(p$labels$title, " - Cube")) ``` ### Map 10 km² ```{r observed richness map, fig.show="hold", out.width="50%"} obs_richness_map_abv_10 <- tar_read(obs_richness_map_abv_10) -plot(obs_richness_map_abv_10) +p <- plot(obs_richness_map_abv_10) +p + labs(title = paste0(p$labels$title, " - ABV")) obs_richness_map_cube_10 <- tar_read(obs_richness_map_cube_10) -plot(obs_richness_map_cube_10) +p <- plot(obs_richness_map_cube_10) +p + labs(title = paste0(p$labels$title, " - Cube")) ``` ### Time series ```{r observed richness time series, fig.show="hold", out.width="50%"} obs_richness_ts_abv <- tar_read(obs_richness_ts_abv) -plot(obs_richness_ts_abv) +p <- plot(obs_richness_ts_abv) +p + labs(title = paste0(p$labels$title, " - ABV")) obs_richness_ts_cube <- tar_read(obs_richness_ts_cube) -plot(obs_richness_ts_cube) +p <- plot(obs_richness_ts_cube) +p + labs(title = paste0(p$labels$title, " - Cube")) ``` ## Pielou evenness @@ -89,20 +101,24 @@ plot(obs_richness_ts_cube) ```{r pielou evenness map, fig.show="hold", out.width="50%"} pielou_evenness_map_abv <- tar_read(pielou_evenness_map_abv) -plot(pielou_evenness_map_abv) +p <- plot(pielou_evenness_map_abv) +p + labs(title = paste0(p$labels$title, " - ABV")) pielou_evenness_map_cube <- tar_read(pielou_evenness_map_cube) -plot(pielou_evenness_map_cube) +p <- plot(pielou_evenness_map_cube) +p + labs(title = paste0(p$labels$title, " - Cube")) ``` ### Time series ```{r pielou evenness time series, fig.show="hold", out.width="50%"} pielou_evenness_ts_abv <- tar_read(pielou_evenness_ts_abv) -plot(pielou_evenness_ts_abv) +p <- plot(pielou_evenness_ts_abv) +p + labs(title = paste0(p$labels$title, " - ABV")) pielou_evenness_ts_cube <- tar_read(pielou_evenness_ts_cube) -plot(pielou_evenness_ts_cube) +p <- plot(pielou_evenness_ts_cube) +p + labs(title = paste0(p$labels$title, " - Cube")) ``` # Species-specific indicators {.tabset} @@ -121,20 +137,24 @@ We chose three species to compare the indicators between the two data sets: ```{r species occurrences map cettis warbler, fig.show="hold", out.width="50%"} spec_occ_map_abv <- tar_read(spec_occ_map_abv) -plot(spec_occ_map_abv, species = "Cettia cetti") +p <- plot(spec_occ_map_abv, species = "Cettia cetti") +p spec_occ_map_cube <- tar_read(spec_occ_map_cube) -plot(spec_occ_map_cube, species = "Cettia cetti") +p <- plot(spec_occ_map_cube, species = "Cettia cetti") +p ``` #### Time series ```{r species occurrences time series cettis warbler, fig.show="hold", out.width="50%"} spec_occ_ts_abv <- tar_read(spec_occ_ts_abv) -plot(spec_occ_ts_abv, species = "Cettia cetti") +p <- plot(spec_occ_ts_abv, species = "Cettia cetti") +p spec_occ_ts_cube <- tar_read(spec_occ_ts_cube) -plot(spec_occ_ts_cube, species = "Cettia cetti") +p <- plot(spec_occ_ts_cube, species = "Cettia cetti") +p ``` ### Species range @@ -143,20 +163,24 @@ plot(spec_occ_ts_cube, species = "Cettia cetti") ```{r species range map cettis warbler, fig.show="hold", out.width="50%"} spec_range_map_abv <- tar_read(spec_range_map_abv) -plot(spec_range_map_abv, species = "Cettia cetti") +p <- plot(spec_range_map_abv, species = "Cettia cetti") +p spec_range_map_cube <- tar_read(spec_range_map_cube) -plot(spec_range_map_cube, species = "Cettia cetti") +p <- plot(spec_range_map_cube, species = "Cettia cetti") +p ``` #### Time series ```{r species range time series cettis warbler, fig.show="hold", out.width="50%"} spec_range_ts_abv <- tar_read(spec_range_ts_abv) -plot(spec_range_ts_abv, species = "Cettia cetti") +p <- plot(spec_range_ts_abv, species = "Cettia cetti") +p spec_range_ts_cube <- tar_read(spec_range_ts_cube) -plot(spec_range_ts_cube, species = "Cettia cetti") +p <- plot(spec_range_ts_cube, species = "Cettia cetti") +p ``` ## Eurasian tree sparrow @@ -167,20 +191,24 @@ plot(spec_range_ts_cube, species = "Cettia cetti") ```{r species occurrences map Eurasian tree sparrow, fig.show="hold", out.width="50%"} spec_occ_map_abv <- tar_read(spec_occ_map_abv) -plot(spec_occ_map_abv, species = "Passer montanus") +p <- plot(spec_occ_map_abv, species = "Passer montanus") +p spec_occ_map_cube <- tar_read(spec_occ_map_cube) -plot(spec_occ_map_cube, species = "Passer montanus") +p <- plot(spec_occ_map_cube, species = "Passer montanus") +p ``` #### Time series ```{r species occurrences time series Eurasian tree sparrow, fig.show="hold", out.width="50%"} spec_occ_ts_abv <- tar_read(spec_occ_ts_abv) -plot(spec_occ_ts_abv, species = "Passer montanus") +p <- plot(spec_occ_ts_abv, species = "Passer montanus") +p spec_occ_ts_cube <- tar_read(spec_occ_ts_cube) -plot(spec_occ_ts_cube, species = "Passer montanus") +p <- plot(spec_occ_ts_cube, species = "Passer montanus") +p ``` ### Species range @@ -189,20 +217,24 @@ plot(spec_occ_ts_cube, species = "Passer montanus") ```{r species range map Eurasian tree sparrow, fig.show="hold", out.width="50%"} spec_range_map_abv <- tar_read(spec_range_map_abv) -plot(spec_range_map_abv, species = "Passer montanus") +p <- plot(spec_range_map_abv, species = "Passer montanus") +p spec_range_map_cube <- tar_read(spec_range_map_cube) -plot(spec_range_map_cube, species = "Passer montanus") +p <- plot(spec_range_map_cube, species = "Passer montanus") +p ``` #### Time series ```{r species range time series Eurasian tree sparrow, fig.show="hold", out.width="50%"} spec_range_ts_abv <- tar_read(spec_range_ts_abv) -plot(spec_range_ts_abv, species = "Passer montanus") +p <- plot(spec_range_ts_abv, species = "Passer montanus") +p spec_range_ts_cube <- tar_read(spec_range_ts_cube) -plot(spec_range_ts_cube, species = "Passer montanus") +p <- plot(spec_range_ts_cube, species = "Passer montanus") +p ``` ## Common nightingale @@ -213,20 +245,24 @@ plot(spec_range_ts_cube, species = "Passer montanus") ```{r species occurrences map Common nightingale, fig.show="hold", out.width="50%"} spec_occ_map_abv <- tar_read(spec_occ_map_abv) -plot(spec_occ_map_abv, species = "Luscinia megarhynchos") +p <- plot(spec_occ_map_abv, species = "Luscinia megarhynchos") +p spec_occ_map_cube <- tar_read(spec_occ_map_cube) -plot(spec_occ_map_cube, species = "Luscinia megarhynchos") +p <- plot(spec_occ_map_cube, species = "Luscinia megarhynchos") +p ``` #### Time series ```{r species occurrences time series Common nightingale, fig.show="hold", out.width="50%"} spec_occ_ts_abv <- tar_read(spec_occ_ts_abv) -plot(spec_occ_ts_abv, species = "Luscinia megarhynchos") +p <- plot(spec_occ_ts_abv, species = "Luscinia megarhynchos") +p spec_occ_ts_cube <- tar_read(spec_occ_ts_cube) -plot(spec_occ_ts_cube, species = "Luscinia megarhynchos") +p <- plot(spec_occ_ts_cube, species = "Luscinia megarhynchos") +p ``` ### Species range @@ -235,19 +271,23 @@ plot(spec_occ_ts_cube, species = "Luscinia megarhynchos") ```{r species range map Common nightingale, fig.show="hold", out.width="50%"} spec_range_map_abv <- tar_read(spec_range_map_abv) -plot(spec_range_map_abv, species = "Luscinia megarhynchos") +p <- plot(spec_range_map_abv, species = "Luscinia megarhynchos") +p spec_range_map_cube <- tar_read(spec_range_map_cube) -plot(spec_range_map_cube, species = "Luscinia megarhynchos") +p <- plot(spec_range_map_cube, species = "Luscinia megarhynchos") +p ``` #### Time series ```{r species range time series Common nightingale, fig.show="hold", out.width="50%"} spec_range_ts_abv <- tar_read(spec_range_ts_abv) -plot(spec_range_ts_abv, species = "Luscinia megarhynchos") +p <- plot(spec_range_ts_abv, species = "Luscinia megarhynchos") +p spec_range_ts_cube <- tar_read(spec_range_ts_cube) -plot(spec_range_ts_cube, species = "Luscinia megarhynchos") +p <- plot(spec_range_ts_cube, species = "Luscinia megarhynchos") +p ``` diff --git a/source/reports/explorative_analysis/expl_analysis_targets.Rmd b/source/reports/explorative_analysis/expl_analysis_targets.Rmd new file mode 100644 index 0000000..d6fb8c7 --- /dev/null +++ b/source/reports/explorative_analysis/expl_analysis_targets.Rmd @@ -0,0 +1,574 @@ +--- +title: "Exploratory analysis from targets pipeline" +author: "Emma Cartuyvels, Ward Langeraert, Toon Van Daele" +date: "`r Sys.Date()`" +output: + html_document: + code_folding: hide + toc: true + toc_float: true + toc_collapsed: true +editor_options: + chunk_output_type: console +--- + +```{r setup, include = FALSE} +library(knitr) +opts_chunk$set(collapse = TRUE, comment = "#>") + +library(targets) +Sys.setenv(TAR_PROJECT = "exploratory_analysis") + +library(ggplot2) +library(ggpubr) +library(dplyr) +library(ggforce) +library(crosstalk) +library(DT) +library(plotly) +``` + +```{r consistent colors} +# Create a custom color scale +library(RColorBrewer) +my_colors <- brewer.pal(5, "Dark2") +names(my_colors) <- c("Very rare", "Rare", "Common", + "Very common", "Extremely common") +col_scale <- scale_color_manual(name = "Category", + values = my_colors) +fill_scale <- scale_fill_manual(name = "Category", + values = my_colors) +``` + +In this document we explore the ABV data set, the cube data generated for birds in Flanders and if there is any indication that the occurrences in both datasets show similar ranges and trends. We also explore if certain filters can help improve performance of maps and trends based on the cube data when comparing these to the structured ABV data. + +The following filters are used: + + - No filter + - Filter 1: only species that were analysed in the ABV framework ([further reading](https://inbo.github.io/abv-rapport/2023/methodologie/verwerking.html)) + - Filter 2: loosely based on the rules set in the ABV framework + - A square is only relevant for a species if that species is observed in this square for more than one time period + - A minimum of three relevant squares are needed to include the species + - A minimum of a hundred observations are needed to include the species + - Filter 3: number of observations is weighed by total number of observations for that year + +## Data exploration + +```{r} +abv <- tar_read(data) |> + filter(id_dataset == "abv_data", + id_spat_res == "1km") +cube <- tar_read(data) |> + filter(id_dataset == "birdflanders", + id_spat_res == "1km") +``` + +During our initial analysis we noticed some problems with species names: + + - *Poecile montanus* and *Parus montanus*, *Dendrocopus major* and *Dendrocopos major* both refer to the same species. Both species names are accepted names in GBIF but they are not interlinked. + - *Saxicola torquatus* is most likely a wrong name and needs to be replaced with *Saxicola rubicola*. + +Issues were opened respectively trough GBIF and with the ABV data publisher. Species names were manually changed to allow comparisons in further analyses. However, it is important to remember that similar errors will probably occur when generating different cubes. + +### The ABV dataset + +The ABV dataset, which stands for Algemene Broedvogelmonitoring Vlaanderen (Common Breeding Bird Survey Flanders), is a structured monitoring dataset that tracks a group of approximately 100 common breeding bird species in Flanders, Belgium. Monitoring began in 2007 and the protocol involves selecting a random sample of 1200 UTM 1x1 km grid cells, stratified by land use. These cells are divided into groups of 300, and 300 grid cells are visited each year on a three-year rotation. Each grid cell contains six monitoring locations where bird counts are conducted. The data collection is standardized, with each grid cell being visited three times a year at fixed intervals (at least two weeks apart). + +```{r} +abv |> + ggplot(aes(x = year, y = n)) + + geom_bar(stat = "identity", + fill = "#1B9E77") + + scale_x_continuous(breaks = sort(unique(abv$year))) +``` + +The ABV data ranges from over 40.000 observations in 2007 to just over 20.000 observations in 2022. + +```{r} +abv |> + group_by(species) |> + summarise(n_obs = sum(n)) |> + ggplot(aes(x = n_obs)) + + geom_histogram(fill = "#D95F02", color = "black") + + labs(x = "Number of observations", + y = "Number of species") +``` + +180 bird species were recorded, of which 34 were recorded < 10 times (very rare), 29 were recorded between 10 and 100 times (rare), 59 were recorded between 100 and 1000 times (common), 43 were recorded between 1000 and 10000 times (very common) and 15 were recorded more than 10 000 times (extremely common). + +This categorization is used throughout further analyses. + +```{r} +abv |> + distinct(category, species) |> + count(category) |> + knitr::kable() +``` + +### The cube data + +A data cube is a multidimensional representation of data that allows for efficient storage, retrieval, and analysis of information along multiple dimensions. In the context of biodiversity data, a data cube can integrate various dimensions such as taxonomic (what), temporal (when) and spatial (where), enabling researchers to explore complex ecological patterns and relationships more effectively. More information can be found [here](https://docs.b-cubed.eu/occurrence-cube/specification/#dimensions). + +We downloaded the occurrence cube from GBIF, selecting all birds in Flanders aggregated on the utm1-grid and per year. Observations from ABV monitoring (also published on GBIF) were excluded. + +The cube is made up of several datasets: + + - Waarnemingen.be - Bird occurrences in Flanders and the Brussels Capital Region, Belgium + - Watervogels - Wintering waterbirds in Flanders, Belgium + - HG_OOSTENDE - Herring gulls (Larus argentatus, Laridae) breeding at the southern North Sea coast (Belgium) + - EOD – eBird Observation Dataset + - Waarnemingen.be - Non-native animal occurrences in Flanders and the Brussels Capital Region, Belgium + - LBBG_ZEEBRUGGE - Lesser black-backed gulls (Larus fuscus, Laridae) breeding at the southern North Sea coast (Belgium and the Netherlands) + - Broedvogels - Atlas of the breeding birds in Flanders 2000-2002 + - European Seabirds At Sea (ESAS) + - And 80+ smaller datasets + +With the first dataset (waarnemingen.be) containing most of the observations (67%). For further analyses it is important to know that waarnemingen.be data was last published in 2019 and currently runs only to 31-12-2018. + +```{r} +cube |> + ggplot(aes(x = year, y = n)) + + geom_bar(stat = "identity", + fill = "#7570B3") + + scale_x_continuous(breaks = sort(unique(cube$year))) +``` + +```{r} +cube |> + group_by(species) |> + summarise(n_obs = sum(n)) |> + ggplot(aes(x = n_obs)) + + geom_histogram(fill = "#E7298A", color = "black") + + labs(x = "Number of observations", + y = "Number of species") +``` + +```{r} +cube |> + group_by(species) |> + summarise(n_obs = sum(n)) |> + mutate(category = cut(n_obs, + breaks = c(-Inf, 0, 1, 10, 100, 1000, 10000, Inf), + right = FALSE)) |> + count(category) |> + knitr::kable() +``` + + +The cube contains information on 663 species. 25 of these are hybrids. 355 of these were observed less than a 100 times, 197 were observed more than 1000 times. + +```{r} +cube |> + ggplot(aes(x = mincoordinateuncertaintyinmeters)) + + geom_histogram(fill = "#66A61E", color = "black", + bins = 100) + + labs(x = "Minimum coordinate uncertainty in meters", + y = "Count") + + facet_zoom(zoom.size = 0.5, + ylim = c(0, 300), + xlim = c(10500, 150000), + horizontal = FALSE) +``` + + +## Occupancy comparison {.tabset} + +### No filter + +```{r} +tar_read(range_comp_0) |> + filter(id_spat_res == "1km") |> + ggplot(aes(x = percentage_abv_data, + y = percentage_birdflanders, + color = category)) + + geom_point() + + col_scale + + stat_cor(mapping = aes(color = NULL), + label.x.npc = "centre", + label.y.npc = "bottom", + method = "pearson") + + labs(x = "Percentage of ABV squares occupied\nby species in ABV dataset", + y = "Percentage of cube squares occupied\nby species in cube dataset") + + guides(color = guide_legend(override.aes = aes(label = ""))) +``` + +### Filter 1 + +```{r} +tar_read(range_comp_1) |> + filter(id_spat_res == "1km") |> + ggplot(aes(x = percentage_abv_data, + y = percentage_birdflanders, + color = category)) + + geom_point() + + col_scale + + stat_cor(mapping = aes(color = NULL), + label.x.npc = "centre", + label.y.npc = "bottom", + method = "pearson") + + labs(x = "Percentage of ABV squares occupied\nby species in ABV dataset", + y = "Percentage of cube squares occupied\nby species in cube dataset") + + guides(color = guide_legend(override.aes = aes(label = ""))) +``` + +### Filter 2 + +```{r} +tar_read(range_comp_2) |> + filter(id_spat_res == "1km") |> + filter(id_filter_per == "year") |> + ggplot(aes(x = percentage_abv_data, + y = percentage_birdflanders, + color = category)) + + geom_point() + + col_scale + + stat_cor(mapping = aes(color = NULL), + label.x.npc = "centre", + label.y.npc = "bottom", + method = "pearson") + + labs(x = "Percentage of ABV squares occupied\nby species in ABV dataset", + y = "Percentage of cube squares occupied\nby species in cube dataset") + + guides(color = guide_legend(override.aes = aes(label = ""))) +``` + +## + +Given the limitations of the ABV monitoring setup we can't compare exact patterns in biodiversity, for this we refer to the South-African analysis. + +## Trend comparison {.tabset} + +### No filter +```{r} +p0 <- tar_read(trend_comp_0) |> + filter(id_spat_res == "1km", + time_period == "year") |> + ggplot(aes(x = correlation, fill = category)) + + geom_histogram() + + xlim(-1, 1) + + fill_scale + + labs(x = "Correlation", + y = "Number of species") + +p0 +``` + +### Filter 1 +```{r} +p1 <- tar_read(trend_comp_1) |> + filter(id_spat_res == "1km", + time_period == "year") |> + ggplot(aes(x = correlation, fill = category)) + + geom_histogram() + + xlim(-1, 1) + + fill_scale + + labs(x = "Correlation", + y = "Number of species") + +p1 +``` + +### Filter 2 +```{r} +p2 <- tar_read(trend_comp_2) |> + filter(id_spat_res == "1km", + time_period == "year") |> + ggplot(aes(x = correlation, fill = category)) + + geom_histogram() + + xlim(-1, 1) + + fill_scale + + labs(x = "Correlation", + y = "Number of species") + +p2 +``` + +### Filter 3 +```{r} +p3 <- tar_read(trend_comp_3) |> + filter(id_spat_res == "1km", + time_period == "year", + id_filter_per == "year") |> + ggplot(aes(x = correlation, fill = category)) + + geom_histogram() + + xlim(-1, 1) + + fill_scale + + labs(x = "Correlation", + y = "Number of species") +``` + +### Filter 4 +```{r} +p4 <- tar_read(trend_comp_4) |> + filter(id_spat_res == "1km", + time_period == "year", + id_filter_per == "year", + id_filter_per2 == "year") |> + ggplot(aes(x = correlation, fill = category)) + + geom_histogram() + + xlim(-1, 1) + + fill_scale + + labs(x = "Correlation", + y = "Number of species") +``` + +## + +```{r, include=FALSE} +library(cowplot) + +middle_row <- plot_grid(p1 + theme(legend.position = "none"), + p2 + theme(legend.position = "none"), + labels = c("B", "C"), + label_size = 12) + +bottom_row <- plot_grid(p3 + theme(legend.position = "none"), + p4 + theme(legend.position = "none"), + labels = c("D", "E"), + label_size = 12) + +plot_grid(p0, + middle_row, + bottom_row, + nrow = 3, + labels = c("A", "", ""), + label_size = 12) +``` + +## Trend comparison with data aggregated per cycle {.tabset} + +```{r} +species_list_cor <- c("Motacilla flava", "Sylvia communis", "Motacilla alba", + "Cyanistes caeruleus", "Passer domesticus", + "Prunella modularis", "Parus montanus", "Chloris chloris", + "Anthus trivialis") +``` + +We include the trends of nine species: + - Three have a strong positive correlation: **Parus montanus**, **Chloris chloris** and **Anthus trivialis** + - Three have a almost zero correlation: **Cyanistes caeruleus**, **Passer domesticus** and **Prunella modularis** + - Three have a strong negative correlation: **Motacilla flava**, **Sylvia communis** and **Motacilla alba** + +### No filter +```{r} +trend_comp_0_cyclus <- tar_read(trend_comp_0) |> + filter(id_spat_res == "1km", + time_period == "cyclus") + +p0 <- trend_comp_0_cyclus |> + ggplot(aes(x = correlation, fill = category)) + + geom_histogram() + + xlim(-1, 1) + + fill_scale + + labs(x = "Correlation", + y = "Number of species") + +p0 +``` + +```{r} +shared_filter0_cyclus <- SharedData$new( + tar_read(data) |> + filter(species %in% species_list_cor, + id_spat_res == "1km") |> + group_by(species, id_dataset, cyclus) |> + summarise(n = sum(n)) +) + + +bscols( + widths = c(3, 9), + filter_select("species", "Species", + shared_filter0_cyclus, + ~species, + multiple = FALSE), + plot_ly(shared_filter0_cyclus, + x = ~cyclus, + y = ~n, + color = ~id_dataset) |> add_lines() +) +``` + +### Filter 1 +```{r} +trend_comp_1_cyclus <- tar_read(trend_comp_1) |> + filter(id_spat_res == "1km", + time_period == "cyclus") + +p1 <- trend_comp_1_cyclus |> + ggplot(aes(x = correlation, fill = category)) + + geom_histogram() + + xlim(-1, 1) + + fill_scale + + labs(x = "Correlation", + y = "Number of species") + +p1 +``` + +```{r} +shared_filter1_cyclus <- SharedData$new( + tar_read(filter1) |> + filter(species %in% species_list_cor, + id_spat_res == "1km") |> + group_by(species, id_dataset, cyclus) |> + summarise(n = sum(n)) +) + + +bscols( + widths = c(3, 9), + filter_select("species", "Species", + shared_filter1_cyclus, + ~species, + multiple = FALSE), + plot_ly(shared_filter1_cyclus, + x = ~cyclus, + y = ~n, + color = ~id_dataset) |> + add_lines() +) +``` + +### Filter 2 +```{r} +trend_comp_2_cyclus <- tar_read(trend_comp_2) |> + filter(id_spat_res == "1km", + time_period == "cyclus") + +p2 <- trend_comp_2_cyclus |> + ggplot(aes(x = correlation, fill = category)) + + geom_histogram() + + xlim(-1, 1) + + fill_scale + + labs(x = "Correlation", + y = "Number of species") + +p2 +``` + +```{r} +shared_filter2_cyclus <- SharedData$new( + tar_read(filter2) |> + filter(species %in% species_list_cor, + id_spat_res == "1km", + id_filter_per == "cyclus") |> + group_by(species, id_dataset, cyclus) |> + summarise(n = sum(n)) +) + + +bscols( + widths = c(3, 9), + filter_select("species", "Species", + shared_filter2_cyclus, + ~species, + multiple = FALSE), + plot_ly(shared_filter2_cyclus, + x = ~cyclus, + y = ~n, + color = ~id_dataset) |> add_lines() +) +``` + +### Filter 3 +```{r} +trend_comp_3_cyclus <- tar_read(trend_comp_3) |> + filter(id_spat_res == "1km", + time_period == "cyclus", + id_filter_per == "cyclus") + +p3 <- trend_comp_3_cyclus |> + ggplot(aes(x = correlation, fill = category)) + + geom_histogram() + + xlim(-1, 1) + + fill_scale + + labs(x = "Correlation", + y = "Number of species") + +p3 +``` +<<<<<<< HEAD +======= + +```{r} +shared_filter3_cyclus <- SharedData$new(tar_read(filter3) |> + filter(species %in% species_list_cor, + id_spat_res == "1km", + id_filter_per == "cyclus")) + + +bscols( + widths = c(3, 9), + filter_select("species", "Species", + shared_filter3_cyclus, + ~species, + multiple = FALSE), + plot_ly(shared_filter3_cyclus, + x = ~cyclus, + y = ~n, + color = ~id_dataset) |> add_lines() +) +``` + +### Filter 4 +```{r} +trend_comp_4_cyclus <- tar_read(trend_comp_4) |> + filter(id_spat_res == "1km", + time_period == "cyclus", + id_filter_per == "cyclus", + id_filter_per2 == "cyclus") + +p4 <- trend_comp_4_cyclus |> + ggplot(aes(x = correlation, fill = category)) + + geom_histogram() + + xlim(-1, 1) + + fill_scale + + labs(x = "Correlation", + y = "Number of species") + +p4 +``` + +```{r} +shared_filter4_cyclus <- SharedData$new(tar_read(filter4) |> + filter(species %in% species_list_cor, + id_spat_res == "1km", + id_filter_per == "cyclus", + id_filter_per2 == "cyclus")) + + +bscols( + widths = c(3, 9), + filter_select("species", "Species", + shared_filter4_cyclus, + ~species, + multiple = FALSE), + plot_ly(shared_filter4_cyclus, + x = ~cyclus, + y = ~n, + color = ~id_dataset) |> add_lines() +) +``` + +## + +```{r, include=FALSE} +library(cowplot) + +middle_row <- plot_grid(p1 + theme(legend.position = "none"), + p2 + theme(legend.position = "none"), + labels = c("B", "C"), + label_size = 12) + +bottom_row <- plot_grid(p3 + theme(legend.position = "none"), + p4 + theme(legend.position = "none"), + labels = c("D", "E"), + label_size = 12) + +plot_grid(p0, + middle_row, + bottom_row, + nrow = 3, + labels = c("A", "", ""), + label_size = 12) +``` diff --git a/source/taxonomic_hierarchy.RDS b/source/taxonomic_hierarchy.RDS deleted file mode 100644 index 962010f..0000000 Binary files a/source/taxonomic_hierarchy.RDS and /dev/null differ diff --git a/source/trend_mod.Rmd b/source/trend_mod.Rmd deleted file mode 100644 index 2eea46c..0000000 --- a/source/trend_mod.Rmd +++ /dev/null @@ -1,122 +0,0 @@ ---- -title: '"Trend modelling"' -author: "Emma Cartuyvels, Ward Langeraert, Toon Van Daele" -date: "2024-10-29" -output: html_document ---- - -```{r setup, include=FALSE} -library(sf) -library(trias) -library(tidyverse) -knitr::opts_chunk$set(echo = TRUE) -``` - -```{r data, cache=TRUE} -birdcubeflanders_year_sf <- read_sf(here::here("data", "interim", - "birdcubeflanders_year.gpkg")) - -abv_data_total_sf <- read_sf(here::here("data", "interim", - "abv_data_total.gpkg")) -``` - -```{r transform abv} -abv_data_total <- abv_data_total_sf |> - st_drop_geometry() |> - mutate(cyclus = case_when( - year >= 2007 & year <= 2009 ~ 1, - year >= 2010 & year <= 2012 ~ 2, - year >= 2013 & year <= 2015 ~ 3, - year >= 2016 & year <= 2018 ~ 4, - year >= 2019 & year <= 2021 ~ 5, - year >= 2022 & year <= 2024 ~ 6 - )) |> - mutate(species = case_when( - species == "Parus montanus" ~ "Poecile montanus", - species == "Dendrocopus major" ~ "Dendrocopos major", - species == "Saxicola torquatus" ~ "Saxicola rubicola", - TRUE ~ species - )) |> - group_by(species) |> - mutate(n_obs = n()) |> - ungroup() |> - mutate(category = cut(n_obs, - breaks = c(0, 10, 100, 1000, 10000, +Inf), - labels = c("Very rare", "Rare", "Common", - "Very common", "Extremely common"), - right = FALSE)) - -birdcubeflanders_year <- birdcubeflanders_year_sf |> - st_drop_geometry() |> - mutate(cyclus = case_when( - year >= 2007 & year <= 2009 ~ 1, - year >= 2010 & year <= 2012 ~ 2, - year >= 2013 & year <= 2015 ~ 3, - year >= 2016 & year <= 2018 ~ 4, - year >= 2019 & year <= 2021 ~ 5, - year >= 2022 & year <= 2024 ~ 6 - )) - -abv_data_total_tf <- abv_data_total |> - group_by(species, year, TAG, category) |> - summarise(n = sum(individualCount)) |> - ungroup() -``` - -```{r} -abv_ana_birds <- c( - "Cetti's zanger", "Putter", "Kleine mantelmeeuw", "Roek", "Kuifeend", - "Halsbandparkiet", "Aalscholver", "Kauw", "Buizerd", "Nijlgans", - "Roodborsttapuit", "Boomklever", "Meerkoet", "Zwarte roodstaart", - "Grote bonte specht", "Roodborst", "Krakeend", "Boomleeuwerik", - "Bonte vliegenvanger", "Grauwe gans", "Torenvalk", "Zwartkop", - "Witte kwikstaart", "Boomkruiper", "Grasmus", "Pimpelmees", "Vink", - "Boerenzwaluw", "Tjiftjaf", "Zwarte kraai", "Houtduif", "Kleine karekiet", - "Fazant", "Gaai", "Groene specht", "Ekster", "Koolmees", "Gele kwikstaart", - "Groenling", "Holenduif", "Winterkoning", "Scholekster", "Koekoek", - "Heggenmus", "Spreeuw", "Turkse tortel", "Veldleeuwerik", "Geelgors", - "Goudhaan", "Kuifmees", "Zilvermeeuw", "Matkop", "Huismus", "Wilde eend", - "Waterhoen", "Zanglijster", "Merel", "Tuinfluiter", "Zwarte mees", "Patrijs", - "Graspieper", "Fitis", "Stadsduif", "Wielewaal", "Grutto", "Kievit", - "Grote lijster", "Ringmus", "Sprinkhaanzanger", "Kokmeeuw", "Sperwer", - "Bruine kiekendief", "Fuut", "Gekraagde roodstaart", "Bergeend", "Kneu", - "Rietzanger", "Blauwe reiger", "Wulp", "Blauwborst", "Zwarte specht", - "Boompieper", "Rietgors", "Canadese gans", "Spotvogel", "Bosrietzanger", - "Knobbelzwaan", "Havik", "Glanskop", "Middelste Bonte Specht", "Tafeleend", - "Gierzwaluw", "Nachtegaal", "Huiszwaluw", "Staartmees", "Dodaars" -) - -abv_ana_birds <- data.frame(abv_ana_birds) |> - left_join(abv_data_total |> distinct(vernacularName, species), - by = join_by(abv_ana_birds == vernacularName)) - -sel_spec <- abv_ana_birds$species -``` - -```{r} -data_path <- here::here("data", "raw") - -red_list_fl1 <- read_delim( - file.path(data_path, - "dwca-rl-flanders-validated-checklist-v1.7", - "distribution.txt"), - delim = "\t", - show_col_types = FALSE) - -red_list_fl2 <- read_delim( - file.path(data_path, - "dwca-rl-flanders-validated-checklist-v1.7", - "taxon.txt"), - delim = "\t", - show_col_types = FALSE) -``` - -```{r} -sel_data <- birdcubeflanders_year |> - group_by(year, species) |> - mutate() |> - filter(species == "Cettia cetti") - -gam(n ~ s(year) + s(tot_birds), k = 5) -``` -