Skip to content

Commit 6bff76f

Browse files
committed
create figure per year
1 parent 027b24a commit 6bff76f

File tree

1 file changed

+58
-0
lines changed

1 file changed

+58
-0
lines changed

source/dataset_bias_cv.Rmd

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,64 @@ birdcubeflanders_dataset %>%
158158
coord_flip()
159159
```
160160

161+
Over time, we see strong influences of component datasets. The two largest datasets are not covering the full temporal range of the cube. With an extreme drop-off of the largest dataset after 2018.
162+
163+
```{r}
164+
p_dataset_year <- birdcubeflanders_dataset %>%
165+
group_by(datasetname, year) %>%
166+
summarise(n_obs = sum(n), .groups = "drop") %>%
167+
168+
# total per dataset (this is what you intended)
169+
group_by(datasetname) %>%
170+
mutate(order = sum(n_obs)) %>%
171+
ungroup() %>%
172+
173+
# reorder ONCE by total
174+
mutate(datasetname = reorder(datasetname, order, decreasing = TRUE)) %>%
175+
arrange(datasetname) %>%
176+
177+
# numeric rank AFTER stable ordering
178+
mutate(data_id = as.integer(datasetname)) %>%
179+
180+
# KEEP factor, do NOT use ifelse()
181+
mutate(
182+
datasetname = if_else(
183+
data_id > 6,
184+
factor("Other", levels = c(levels(datasetname), "Other")),
185+
datasetname
186+
)
187+
) %>%
188+
189+
# recompute after collapsing
190+
group_by(datasetname, year) %>%
191+
summarise(n_obs = sum(n_obs), .groups = "drop") %>%
192+
193+
# optional: re-order final result
194+
group_by(datasetname) %>%
195+
mutate(order = sum(n_obs)) %>%
196+
ungroup() %>%
197+
198+
# visualisation
199+
ggplot(aes(x = factor(year), y = n_obs, fill = datasetname)) +
200+
geom_bar(stat = "identity", colour = "black", linewidth = 0.2) +
201+
labs(x = "", y = "Number of observations", fill = "") +
202+
scale_fill_discrete(
203+
labels = function(x) str_trunc(x, 40)
204+
) +
205+
theme_bw(base_size = 12) +
206+
theme(legend.position = "bottom",
207+
legend.margin = margin(t = 0, r = 0, b = 5, l = -70),
208+
legend.key.size = unit(0.5, 'cm'), #change legend key size
209+
legend.text = element_text(size = 7.5)) #change legend text font size
210+
p_dataset_year
211+
```
212+
213+
```{r}
214+
ggsave(file.path(out_path, "component_datasets_year.png"),
215+
p_dataset_year,
216+
width = 8, height = 6, dpi = 300)
217+
```
218+
161219
We look at the number and proportion of species per dataset in the cube.
162220

163221
```{r}

0 commit comments

Comments
 (0)