|
1 | 1 | --- |
2 | | -title: 'forcis: An R package for accessing, handling and analysing the FORCIS Foraminifera database' |
| 2 | +title: 'forcis: An R package for accessing, handling and analysing the FORCIS database' |
3 | 3 | tags: |
4 | 4 | - r |
5 | 5 | - database |
6 | 6 | - planktonic foraminifera |
7 | 7 | - biodiversity |
8 | 8 | - species abundance |
9 | 9 | - data visualisation |
10 | | -date: "9 June 2025" |
| 10 | +date: "19 September 2025" |
11 | 11 | output: pdf_document |
12 | 12 | authors: |
13 | 13 | - name: |
@@ -60,25 +60,238 @@ affiliations: |
60 | 60 |
|
61 | 61 | # Summary |
62 | 62 |
|
63 | | -... |
| 63 | +`forcis` is an R package designed to streamline access to the recently published |
| 64 | +FORCIS (Foraminifera Response to Climatic Stress) database [@chaabane2023]. This |
| 65 | +package enables users to easily download the database directly into an R |
| 66 | +environment, filter and select relevant data, convert species counts across |
| 67 | +formats, and visualise the results. |
64 | 68 |
|
65 | 69 |
|
66 | 70 | # Statement of need |
67 | 71 |
|
68 | | -... |
| 72 | +The recently developed FORCIS (Foraminifera Response to Climatic Stress) |
| 73 | +database provides one of the most comprehensive collections of global planktonic |
| 74 | +foraminifera living census data, comprising over 163,000 samples collected via |
| 75 | +various sampling devices (Continuous Plankton Recorder — CPR —, plankton nets, |
| 76 | +pumps, and sediment traps). These samples span a wide temporal range (1910 to |
| 77 | +2018), vertical depths (surface to 5,000 m), and spatial coverage |
| 78 | +[@chaabane2023; @degaridel2022]. FORCIS data are crucial |
| 79 | +for advancing insights into potential spatial and vertical migrations and |
| 80 | +understanding the impacts of global climate change on planktonic foraminifera |
| 81 | +biogeography and their seasonal and vertical distribution patterns observed in |
| 82 | +recent decades. Additionally, FORCIS’s long temporal scope offers a valuable |
| 83 | +resource for investigating the influence of anthropogenic changes on planktonic |
| 84 | +foraminifera distribution and ecology [@chaabane2024]. |
| 85 | + |
| 86 | +However, working with the FORCIS database presents significant challenges due |
| 87 | +to the heterogeneity of the data, which has been compiled from 140 sources, |
| 88 | +each using its own taxonomic framework and reporting formats (\autoref{fig:fig1}). This |
| 89 | +results in variability in data units, such as concentrations, frequencies, and |
| 90 | +raw counts, requiring extensive standardisation for meaningful comparison. |
| 91 | +Furthermore, the metadata associated with each sample — such as location, |
| 92 | +sampling depth, time, and environmental parameters — adds another layer of |
| 93 | +complexity, making data extraction and analysis challenging for users. |
| 94 | + |
| 95 | +{ width=100% } |
| 96 | + |
| 97 | + |
| 98 | +To overcome these obstacles, we developed the `forcis` package, an easy-to-use |
| 99 | +tool made especially for using the R programming environment to access, filter, |
| 100 | +harmonise, and visualise the FORCIS data. The `forcis` package enables users to |
| 101 | +download data directly from [Zenodo](https://doi.org/10.5281/zenodo.7390791) the |
| 102 | +latest version of the FORCIS database, |
| 103 | +filter and select data according to user-specified criteria, harmonise |
| 104 | +taxonomic resolution, convert species counts into uniform units, and visualise |
| 105 | +patterns in diversity and abundance. By combining these features, the package |
| 106 | +enables researchers to access and analyse the data within the FORCIS database |
| 107 | +efficiently, streamlining their investigative efforts. |
| 108 | + |
69 | 109 |
|
70 | 110 |
|
71 | 111 | # Main features |
72 | 112 |
|
73 | | -... |
| 113 | +To facilitate efficient management and analysis of the FORCIS database, the |
| 114 | +`forcis` R package provides a comprehensive set of features fully described in |
| 115 | +the [package vignettes](https://docs.ropensci.org/forcis/articles/), where users can find extensive documentation and |
| 116 | +tutorials on the major features of the package. The recommended workflow and |
| 117 | +the relevant main functions are illustrated in \autoref{fig:fig2}. |
| 118 | + |
| 119 | +{ width=100% } |
| 120 | + |
| 121 | + |
| 122 | +## Download and import FORCIS database in R |
| 123 | + |
| 124 | +The `forcis` R package contains functions that simplify downloading and |
| 125 | +importing FORCIS datasets from [Zenodo](https://doi.org/10.5281/zenodo.7390791). |
| 126 | +The FORCIS database's most recent version can be retrieved using the function |
| 127 | +`download_forcis_db()`. |
| 128 | + |
| 129 | + |
| 130 | +```r |
| 131 | +# Create a data/ directory in the current directory ---- |
| 132 | +dir.create("data") |
| 133 | + |
| 134 | +# Download the latest version of the FORCIS database ---- |
| 135 | +download_forcis_db(path = "data", timeout = 300) |
| 136 | +``` |
| 137 | + |
| 138 | + |
| 139 | +The `read_*_data()` function family helps users in importing dataset specific |
| 140 | +to a particular sampling device, enabling focused analyses. |
| 141 | + |
| 142 | + |
| 143 | +```r |
| 144 | +# Import plankton nets data (previously downloaded) ---- |
| 145 | +net_data <- read_plankton_nets_data(path = "data") |
| 146 | +``` |
| 147 | + |
| 148 | + |
| 149 | +Once the data is imported in R, users can reduce the dataset to include only |
| 150 | +the metadata they are interested in by using the function |
| 151 | +`select_forcis_columns()`. |
| 152 | + |
| 153 | + |
| 154 | +## Harmonising taxonomy |
| 155 | + |
| 156 | +To utilise most features of the `forcis` R package, users need to specify the |
| 157 | +taxonomic framework they wish to apply (\autoref{fig:fig2}). The FORCIS database includes |
| 158 | +counts at three different taxonomic levels: Original Taxonomy (OT), Lumped |
| 159 | +Taxonomy (LT), and Validated Taxonomy (VT). For a detailed explanation of the |
| 160 | +differences between these three taxonomic levels, we refer the reader to the |
| 161 | +FORCIS data descriptor [@chaabane2023]. For selecting the taxonomic |
| 162 | +framework of choice, the users can use the function `select_taxonomy()` |
| 163 | +following the example below: |
| 164 | + |
| 165 | + |
| 166 | +```r |
| 167 | +# Select a taxonomic framework ---- |
| 168 | +net_data_vt <- net_data |> |
| 169 | + select_taxonomy(taxonomy = "VT") |
| 170 | +``` |
| 171 | + |
| 172 | +## Filter data |
| 173 | + |
| 174 | +After selecting the taxonomic framework, the `forcis` R package offers multiple |
| 175 | +functions to efficiently subset the FORCIS datasets. Users may be interested in |
| 176 | +analysing community structure at a specific time, or location, or even |
| 177 | +examining the counts of species of interest. Given the wide range of potential |
| 178 | +research questions, we have implemented six filtering functions within the |
| 179 | +`filter_by_*()` function family, allowing users to customise data extraction |
| 180 | +according to their investigation needs (\autoref{fig:fig2}). |
| 181 | + |
| 182 | + |
| 183 | +```r |
| 184 | +# Filter data by year(s) ---- |
| 185 | +net_data_sub <- net_data_vt |> |
| 186 | + filter_by_year(years = 1992) |
| 187 | + |
| 188 | +# Filter data by spatial bounding box ---- |
| 189 | +net_data_sub <- net_data_vt |> |
| 190 | + filter_by_bbox(bbox = c(45, -61, 82, -24)) |
| 191 | + |
| 192 | +# Filter data by ocean name ---- |
| 193 | +net_data_sub <- net_data_vt |> |
| 194 | + filter_by_ocean(ocean = "Indian Ocean") |
| 195 | + |
| 196 | +# Filter data by species ---- |
| 197 | +net_data_sub <- net_data_vt |> |
| 198 | + filter_by_species(species = "n_pachyderma_VT") |
| 199 | +``` |
| 200 | + |
| 201 | + |
| 202 | +## Transform data |
| 203 | + |
| 204 | +The `compute_*()` function family allows users to convert FORCIS data between |
| 205 | +raw abundance, number concentration, and relative abundance, enabling them to |
| 206 | +use the units that best suit their analyses and facilitating comparison between |
| 207 | +the FORCIS data and their own. |
| 208 | +These functions utilise sample metadata to perform unit conversions. |
| 209 | +Specifically, conversions between raw abundance and number concentration in |
| 210 | +the `forcis` R package are calculated for each taxon using the following |
| 211 | +equations: |
| 212 | + |
| 213 | +$$C_{number} = \frac{N_{raw}}{V_{filtered}}$$ |
| 214 | + |
| 215 | +where $C_{number}$ is the number concentration, $N_{raw}$ is the raw abundance |
| 216 | +(count of individuals), and $V_{filtered}$ is the volume of water filtered |
| 217 | +(in $m^3$ or L, depending on the dataset). |
| 218 | + |
| 219 | +$$Frequency = 100 \cdot \frac{N_{raw}}{N_{total}}$$ |
| 220 | + |
| 221 | +where $Frequency$ is the relative abundance (in percentage), $N_{raw}$ is the |
| 222 | +raw abundance (count of individuals) of a given taxon, and $N_{total}$ is the |
| 223 | +total raw abundance (sum of all individuals in the sample or subsample). |
| 224 | + |
| 225 | +The users can decide whether to convert counts at a sample or subsample level |
| 226 | +(see @chaabane2023) as the `compute_*()` functions propose the |
| 227 | +`aggregate` argument. If `aggregate = TRUE`, the function will return the |
| 228 | +transformed counts of each species using the sample as the unit. If |
| 229 | +`aggregate = FALSE`, it will re-calculate the species' abundance by subsample. |
| 230 | + |
| 231 | + |
| 232 | + |
| 233 | +## Visualisation |
| 234 | + |
| 235 | +The `forcis` package also includes multiple functions to visualise the spatial |
| 236 | +distribution of samples selected by users. The `ggmap_data()` function |
| 237 | +generates publication-ready maps, displaying sample locations at a global scale |
| 238 | +(\autoref{fig:fig3}a). Additionally, users can visualise sample records by various time |
| 239 | +units (season, month, year) and by depth, using the functions from the |
| 240 | +`plot_record_by_*()` function family (\autoref{fig:fig3}b-d). |
| 241 | +These functions can be seamlessly combined with the `filter_by_*()` family of |
| 242 | +functions, allowing users to customise their sample selections according to |
| 243 | +their specific research needs. |
| 244 | + |
| 245 | +{ width=100% } |
| 246 | + |
| 247 | +```r |
| 248 | +# Map raw net data ---- |
| 249 | +ggmap_data(net_data) |
| 250 | + |
| 251 | +# Plot number of records by year of sampling ---- |
| 252 | +plot_record_by_year(net_data) |
| 253 | + |
| 254 | +# Plot number of records by month of sampling ---- |
| 255 | +plot_record_by_month(net_data) |
| 256 | + |
| 257 | +# Plot number of records by depth of sampling ---- |
| 258 | +plot_record_by_depth(net_data) |
| 259 | +``` |
| 260 | + |
| 261 | + |
| 262 | +`forcis` provides five vignettes to learn more about the package: |
| 263 | + |
| 264 | +- the [Get started](https://docs.ropensci.org/forcis/articles/forcis.html) |
| 265 | +vignette describes the core features of the package |
| 266 | +- the [Database versions](https://docs.ropensci.org/forcis/articles/database-versions.html) |
| 267 | +vignette provides information on how to deal with the versioning of the database |
| 268 | +- the [Select and filter data](https://docs.ropensci.org/forcis/articles/select-and-filter-data.html) vignette shows examples to handle the FORCIS data |
| 269 | +- the [Data conversion](https://docs.ropensci.org/forcis/articles/data-conversion.html) |
| 270 | +vignette describes the conversion functions available in `forcis` to compute abundances, concentrations, and frequencies |
| 271 | +- the [Data visualization](https://docs.ropensci.org/forcis/articles/data-visualization.html) |
| 272 | +vignette describes the plotting functions available in `forcis` |
74 | 273 |
|
75 | 274 |
|
76 | 275 | # Acknowledgements |
77 | 276 |
|
78 | | -... |
| 277 | +The FORCIS project is supported by the French Foundation for Biodiversity |
| 278 | +Research ([FRB](https://www.fondationbiodiversite.fr)) through its Centre for |
| 279 | +the Synthesis and Analysis of Biodiversity |
| 280 | +([CESAB](https://www.fondationbiodiversite.fr/en/about-the-foundation/le-cesab/)) |
| 281 | +and co-funded by INSU LEFE program and the Max Planck Institute for Chemistry |
| 282 | +(MPIC) in Mainz. M.G. was supported by a Juan de la Cierva-formacion 2021 |
| 283 | +fellowship (FJC2021–047494-I/MCIN/AEI/10.13039/501100011033) from the European |
| 284 | +Union “NextGenerationEU”/PRTR and by the Beatriu de Pinós programme |
| 285 | +(2022 BP 00209) funded by the Direcció General de Recerca (DGR) del Departament |
| 286 | +de Recerca i Universitats (REU) of the Government of Catalonia. In addition, |
| 287 | +his work received support from the French government under the France 2030 |
| 288 | +investment plan, as part of the Initiative d’Excellence d’Aix-Marseille |
| 289 | +Université (A*MIDEX AMX-20-TRA-029). The authors would like to thank Beatriz |
| 290 | +Milz, Scott Chamberlain and Air Forbes for theirs valuable comments during the |
| 291 | +peer review process in |
| 292 | +[rOpenSci](https://github.com/ropensci/software-review/issues/660). |
79 | 293 |
|
80 | 294 |
|
81 | 295 |
|
82 | | -# References |
83 | 296 |
|
84 | | -... |
| 297 | +# References |
0 commit comments