Skip to content

Commit 7af0cbf

Browse files
author
EmmaCartuyvels1
committed
Update data names and following dependencies
1 parent 9969df9 commit 7af0cbf

File tree

1 file changed

+39
-40
lines changed

1 file changed

+39
-40
lines changed

source/expl_analysis.Rmd

Lines changed: 39 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -33,10 +33,10 @@ conflicted::conflicts_prefer(dplyr::filter)
3333

3434
```{r data, cache=TRUE}
3535
birdcubeflanders_year_sf <- read_sf(here::here("data", "interim",
36-
"birdcubeflanders_year.gpkg"))
36+
"birdflanders_cube_1km.gpkg"))
3737
3838
abv_data_total_sf <- read_sf(here::here("data", "interim",
39-
"abv_data_total.gpkg"))
39+
"abv_data_cube_1km.gpkg"))
4040
```
4141

4242
We noticed some problems with species names: *Poecile montanus* and *Parus montanus*, *Dendrocopus major* and *Dendrocopos major* both refer to the same species. Since both species names are accepted names in GBIF we need to manually correct this (an issue was made for this with GBIF). *Saxicola torquatus* is most likely a wrong name and needs to be replaced with *Saxicola rubicola* (an issue was also opened for this with the data publisher of the ABV data).
@@ -81,8 +81,8 @@ birdcubeflanders_year <- birdcubeflanders_year_sf |>
8181
))
8282
8383
abv_data_total_tf <- abv_data_total |>
84-
group_by(species, year, TAG, category) |>
85-
summarise(n = sum(individualCount)) |>
84+
group_by(species, year, mgrscode, category) |>
85+
summarise(n = sum(n)) |>
8686
ungroup()
8787
```
8888

@@ -95,21 +95,19 @@ To do: assess data quality across spatial, temporal, and taxonomical dimensions
9595
The ABV dataset, which stands for Algemene Broedvogelmonitoring Vlaanderen (Common Breeding Bird Survey Flanders), is a structured monitoring dataset that tracks a group of approximately 100 common breeding bird species in Flanders, Belgium. Monitoring began in 2007 and the protocol involves selecting a random sample of 1200 UTM 1x1 km grid cells, stratified by land use. These cells are divided into groups of 300, and 300 grid cells are visited each year on a three-year rotation. Each grid cell contains six monitoring locations where bird counts are conducted. The data collection is standardized, with each grid cell being visited three times a year at fixed intervals (at least two weeks apart).
9696

9797
```{r}
98-
summary(abv_data_total[, c("individualCount",
99-
"eventDate",
100-
"year",
101-
"month")])
98+
summary(abv_data_total[, c("n",
99+
"year")])
102100
```
103101

104102
```{r}
105103
abv_data_total |>
106-
group_by(TAG) |>
104+
group_by(mgrscode) |>
107105
summarise(n_visits = n_distinct(year)) |>
108106
ggplot(aes(x = n_visits)) +
109107
geom_histogram()
110108
```
111109

112-
Out of the `r length(unique(abv_data_total$TAG))` visited km² over 150 were visited only once, while some were visited up to 13 times. This inconsistency in the number of visits is probably corrected for in the analysis of the ABV data, <span style="color: red;">should we do the same?</span>
110+
Out of the `r length(unique(abv_data_total$mgrscode))` visited km² over 150 were visited only once, while some were visited up to 13 times. This inconsistency in the number of visits is probably corrected for in the analysis of the ABV data, <span style="color: red;">should we do the same?</span>
113111

114112
```{r}
115113
abv_data_total |>
@@ -130,7 +128,7 @@ abv_data_total_tf |>
130128
y = "Number of species")
131129
```
132130

133-
There are 182 species present in the dataset. There are 32 species that were observed less than 10 times, 45 species that were observed more than 1000 times and 16 species that were observed more than 10 000 times. This dataset also contains absence data, which is not included/not present? in the cube.
131+
There are 180 species present in the dataset. There are 38 species that were observed less than 10 times, 69 species that were observed more than 100 times and 30 species that were observed more than 1000 times.
134132

135133
```{r}
136134
abv_data_total |>
@@ -141,7 +139,7 @@ abv_data_total |>
141139

142140
## The cube data
143141

144-
The cube contains 2 011 808 observations. There are 666 species present in the data. 355 of these were observed less than a 100 times, 197 were observed more than 1000 times. More information can be found [here]( https://docs.b-cubed.eu/occurrence-cube/specification/#dimensions).
142+
The cube contains 2 011 808 observations. There are 664 species present in the data. 358 of these were observed less than a 100 times, 197 were observed more than 1000 times. More information can be found [here]( https://docs.b-cubed.eu/occurrence-cube/specification/#dimensions).
145143

146144
The cube is made up of several datasets:
147145

@@ -191,12 +189,12 @@ birdcubeflanders_year |>
191189
```{r}
192190
utm_year <- abv_data_total |>
193191
st_drop_geometry() |>
194-
distinct(TAG, year)
192+
distinct(mgrscode, year)
195193
```
196194

197195
```{r}
198196
filt_birdcube <- utm_year |>
199-
left_join(birdcubeflanders_year, by = c("TAG", "year"))
197+
left_join(birdcubeflanders_year, by = c("mgrscode", "year"))
200198
```
201199

202200
```{r}
@@ -226,36 +224,35 @@ range_comp <- function(period = 2007:2022,
226224
sel_species = unique(dataset1$species)) {
227225
228226
# We filter both datasets for the species and period of interest
229-
# and group them by TAG (identifier of utm square)
227+
# and group them by mgrscode (identifier of utm square)
230228
set_abv <- dataset1 |>
231229
st_drop_geometry() |>
232230
filter(.data$species %in% sel_species,
233-
.data$year %in% period,
234-
.data$individualCount > 0) |>
235-
group_by(.data$TAG) |>
236-
summarise(n = sum(.data$individualCount))
231+
.data$year %in% period) |>
232+
group_by(.data$mgrscode) |>
233+
summarise(n = sum(.data$n))
237234
238235
set_cube <- dataset2 |>
239236
st_drop_geometry() |>
240237
filter(.data$species %in% sel_species,
241238
.data$year %in% period) |>
242-
group_by(.data$TAG) |>
239+
group_by(.data$mgrscode) |>
243240
summarise(n = sum(.data$n))
244241
245-
total_abv <- length(set_abv$TAG)
246-
perc_abv <- (total_abv / length(unique(dataset1$TAG))) * 100
242+
total_abv <- length(set_abv$mgrscode)
243+
perc_abv <- (total_abv / length(unique(dataset1$mgrscode))) * 100
247244
248-
total_cube <- length(set_cube$TAG)
249-
perc_cube <- (total_cube / length(unique(dataset2$TAG))) * 100
245+
total_cube <- length(set_cube$mgrscode)
246+
perc_cube <- (total_cube / length(unique(dataset2$mgrscode))) * 100
250247
251248
overlap_all_abv_cube <- length(
252-
which(set_cube$TAG %in% unique(abv_data_total$TAG))
249+
which(set_cube$mgrscode %in% unique(abv_data_total$mgrscode))
253250
)
254251
perc_overlap_all <- (
255-
overlap_all_abv_cube / length(unique(dataset1$TAG))
252+
overlap_all_abv_cube / length(unique(dataset1$mgrscode))
256253
) * 100
257254
258-
total_overlap <- length(which(set_cube$TAG %in% set_abv$TAG))
255+
total_overlap <- length(which(set_cube$mgrscode %in% set_abv$mgrscode))
259256
perc <- (total_overlap / total_abv) * 100
260257
261258
list(total_abv, perc_abv,
@@ -278,7 +275,7 @@ comp_range_data$overlap_birdcube_spec_abv <- NA
278275
comp_range_data$percentage_birdcube_spec_abv <- NA
279276
280277
for (i in studied_spec){
281-
test <- range_comp(i, period = 2007:2018)
278+
test <- range_comp(sel_species = i, period = 2007:2018)
282279
283280
comp_range_data[comp_range_data$studied_spec == i, 2] <- test[1]
284281
comp_range_data[comp_range_data$studied_spec == i, 3] <- test[2]
@@ -395,7 +392,7 @@ for (cycle_start in cycle_starts) {
395392
comp_range_data2$cyclus[j] <- c
396393
comp_range_data2$studied_spec[j] <- i
397394
398-
test <- range_comp(i, period = cycle_start:(cycle_start + 2))
395+
test <- range_comp(sel_species = i, period = cycle_start:(cycle_start + 2))
399396
400397
comp_range_data2$abv_squares[j] <- test[[1]]
401398
comp_range_data2$perc_abv_total_abv[j] <- test[[2]]
@@ -434,7 +431,7 @@ This graph shows the same figure as above but split for each full cycle of ABV o
434431
time_series_1 <- abv_data_total |>
435432
st_drop_geometry() %>%
436433
group_by(species, year) %>%
437-
summarize(occurrence = sum(occurrenceStatus == "PRESENT"))
434+
summarize(occurrence = n())
438435
439436
time_series_2 <- birdcubeflanders_year |>
440437
st_drop_geometry() |>
@@ -461,7 +458,7 @@ DT::datatable(time_series_cor) |>
461458
time_series_1 <- abv_data_total |>
462459
st_drop_geometry() %>%
463460
group_by(species, cyclus) %>%
464-
summarize(occurrence = sum(occurrenceStatus == "PRESENT")) |>
461+
summarize(occurrence = n()) |>
465462
filter(cyclus < 5)
466463
467464
time_series_2 <- birdcubeflanders_year |>
@@ -490,7 +487,7 @@ DT::datatable(time_series_cor) |>
490487
time_series_1 <- abv_data_total |>
491488
st_drop_geometry() %>%
492489
group_by(species, cyclus) %>%
493-
summarize(abundance = sum(individualCount)) |>
490+
summarize(abundance = sum(n)) |>
494491
filter(cyclus < 5)
495492
496493
time_series_2 <- birdcubeflanders_year |>
@@ -533,7 +530,7 @@ time_series_cor |>
533530
```{r, message=FALSE}
534531
abv_dif <- abv_data_total |>
535532
group_by(cyclus, species) |>
536-
summarise(total = sum(individualCount)) |>
533+
summarise(total = sum(n)) |>
537534
pivot_wider(names_from = cyclus,
538535
names_prefix = "abv_",
539536
values_from = total,
@@ -592,7 +589,7 @@ Value of k | Strength of agreement
592589
abv_dif <- abv_data_total |>
593590
filter(category %in% c("Rare")) |>
594591
group_by(cyclus, species) |>
595-
summarise(total = sum(individualCount)) |>
592+
summarise(total = sum(n)) |>
596593
pivot_wider(names_from = cyclus,
597594
names_prefix = "abv_",
598595
values_from = total,
@@ -649,8 +646,8 @@ Kappa is not a good measure for comparing two discrete continuous variables, bet
649646

650647
```{r, message=FALSE}
651648
occupancy_1 <- abv_data_total %>%
652-
group_by(species, TAG) %>%
653-
summarize(occupancy_rate_1 = mean(occurrenceStatus == "PRESENT"))
649+
group_by(species, mgrscode) %>%
650+
summarize(occupancy_rate_1 = mean(n()))
654651
655652
occupancy_2 <- birdcubeflanders_year %>%
656653
group_by(species) %>%
@@ -664,11 +661,11 @@ occupancy_2 <- birdcubeflanders_year %>%
664661
```{r}
665662
# Species richness per dataset
666663
richness_1 <- abv_data_total |>
667-
group_by(TAG) |>
664+
group_by(mgrscode) |>
668665
summarize(richness = n_distinct(species))
669666
670667
richness_2 <- birdcubeflanders_year |>
671-
group_by(TAG) |>
668+
group_by(mgrscode) |>
672669
summarize(richness = n_distinct(species))
673670
674671
# Bray-Curtis dissimilarity
@@ -686,7 +683,9 @@ species_composition_2 <- birdcubeflanders_year |>
686683
values_from = n,
687684
values_fill = 0)
688685
689-
bray_curtis <- vegdist(rbind(species_composition_1[-1],
690-
species_composition_2[-1]), method = "bray")
686+
bray_curtis <- vegdist(bind_rows(species_composition_1[-1],
687+
species_composition_2[-1]),
688+
method = "bray",
689+
na.rm = TRUE)
691690
bray_curtis
692691
```

0 commit comments

Comments
 (0)