You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### Number and proportion of observations per dataset
135
+
132
136
In total, there are `r n_distinct(birdcubeflanders_dataset$datasetname)` component datasets. We look at the number and proportion of observations per dataset in the cube.
We calculate error measures for the indicator based on leave-one-dataset-out cross-validation.
397
403
We use a constant total of grid cells (`r length(unique(birdcube_dataset_filtered$mgrscode))`) such that this is independent from the datasets left out.
We cannot compute the evenness for species only found in a single dataset.
780
786
@@ -824,7 +830,7 @@ grouped_lm(
824
830
)
825
831
```
826
832
827
-
### Effective number of datasets
833
+
####Effective number of datasets
828
834
829
835
The effective number of datasets takes into account the proportion of observations per dataset.
830
836
It is calculated per species $j$ as the exponent of the Shannon Entropy:
@@ -863,7 +869,7 @@ grouped_lm(
863
869
)
864
870
```
865
871
866
-
### Dataset evenness
872
+
####Dataset evenness
867
873
868
874
Dataset evenness is a measure that captures how occurrences of a species are distributed across multiple datasets (0 is highly uneven, 1 is completely even).
869
875
Pielou’s Evenness index $J$ is calculated as the normalised Shannon Entropy:
@@ -942,7 +948,7 @@ plot(m)
942
948
summary(m)
943
949
```
944
950
945
-
##Trends in error: MRE
951
+
### MRE trends
946
952
947
953
We look at trends in CV error measures related to:
948
954
@@ -951,7 +957,7 @@ We look at trends in CV error measures related to:
The effective number of datasets takes into account the proportion of observations per dataset.
1022
1028
It is calculated per species $j$ as the exponent of the Shannon Entropy:
@@ -1055,7 +1061,7 @@ grouped_lm(
1055
1061
)
1056
1062
```
1057
1063
1058
-
### Dataset evenness
1064
+
####Dataset evenness
1059
1065
1060
1066
Dataset evenness is a measure that captures how occurrences of a species are distributed across multiple datasets (0 is highly uneven, 1 is completely even).
1061
1067
Pielou’s Evenness index $J$ is calculated as the normalised Shannon Entropy:
####Distribution of species-level median improvements
1340
+
## Aggregated species-level sensitivity patterns
1341
+
### Distribution of species-level median improvements
1336
1342
At the species level, median improvement scores were centred close to zero, indicating that for most species the omission of individual datasets had limited influence on prevalence estimates. A smaller number of species exhibited consistently positive or negative median improvements, suggesting higher sensitivity to data composition. Overall, cross-validation tends to move prevalence estimates closer to the true value.
Species-level sensitivity differed between rarity classes. Rare species showed a wider spread of median improvements and larger relative changes compared to common species, indicating greater dependence on individual datasets.
There is apparently one very common species that is also highly influenced by one dataset.
1385
1397
1386
-
####Identifying outlying species (diagnostic, not reported)
1398
+
### Identifying outlying species (diagnostic, not reported)
1387
1399
One species showed unusually large relative deteriorations when datasets were omitted. This species can be treated as a diagnostic case which serves to identify potential data or modelling issues.
Without the *waarnemingen.be* dataset, the estimate is much lower.
1410
1422
1411
-
###Influence of individual component datasets
1423
+
## Influence of individual component datasets
1412
1424
The *waarnemingen.be* dataset(s) show the largest influences (in both directions). For rare species, we see improvements, for common, we see deterioration.
1413
1425
1414
1426
```{r}
@@ -1509,7 +1521,7 @@ improvement_df %>%
1509
1521
legend.position = c(0.83, 0.28))
1510
1522
```
1511
1523
1512
-
###Species-level robustness of prevalence estimates
1524
+
## Species-level robustness of prevalence estimates
1513
1525
1514
1526
We summarised dataset-removal sensitivity at the species level by collapsing relative improvement scores into a single robustness metric.
1515
1527
For each species, we define robustness as:
@@ -1537,7 +1549,7 @@ This metric is bounded between 0 (low robustness) and 1 (high robustness).
1537
1549
1538
1550
Using the median ensures robustness against outlying datasets and prevents single influential components from dominating the score.
1539
1551
1540
-
####Robustness by species
1552
+
### Robustness by species
1541
1553
Species-level robustness scores were generally high, with most species exhibiting values close to one, indicating limited sensitivity to the omission of individual datasets. A smaller subset of species showed lower robustness scores, reflecting stronger dependence on specific data components.
Using leave-one-dataset-out cross-validation, we assessed the sensitivity of prevalence estimates derived from the bird data cube to the composition of the underlying datasets, using ABV prevalence as a reference benchmark. Overall, omission of individual component datasets more often reduced than increased the deviation from the reference prevalence, although the magnitude of these improvements was typically small. This indicates that the prevalence indicator is generally robust to changes in dataset composition.
0 commit comments