Skip to content

Commit 9e6766c

Browse files
Working with rpackages (#155)
Added walkthrough for working with R packages. Please feel free to edit/change things as you see fit! --------- Co-authored-by: Mahesh Binzer-Panchal <mahesh.binzer-panchal@nbis.se>
1 parent 22dbff4 commit 9e6766c

File tree

2 files changed

+254
-0
lines changed

2 files changed

+254
-0
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,3 +15,6 @@ results/
1515

1616
# Misc
1717
data/
18+
.Rproj.user
19+
*.Rproj
20+
.Rhistory
Lines changed: 251 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,251 @@
1+
---
2+
title: "Working with R packages"
3+
author:
4+
- "Dania Machlab"
5+
- "Mahesh Binzer-Panchal"
6+
date: "2026-02-12"
7+
date-modified: last-modified
8+
categories: ["R", "R packages", "Version controlling with Bioconductor releases"]
9+
---
10+
11+
When working with R packages, it is good practice to version control the R and package versions being used. Doing so with enough flexibility to allow for interactive work when testing and perfecting scripts is useful since a lot of time is spent in that phase, and a degree of freedom is needed to quickly install new packages to try, or to use the latest packages with implemented bug fixes. Working with R in bioinformatics, we often rely on the useful packages and data structures from [Bioconductor](https://bioconductor.org) which facilitate our analyses. This walkthrough goes through recommended practices on how to work with R packages from that point of view, and how to couple installations with specific Bioconductor releases.
12+
13+
## R package sources
14+
15+
R packages can be installed from several sources including CRAN, GitHub, GitLab, R-Universe, and Bioconductor. Packages like `utils`, `devtools`, `remotes` and `BiocManager` offer functions to install packages from these sources. More details on how to install and manage R packages are covered in the next section.
16+
17+
### CRAN
18+
19+
The comprehensive R archive network (CRAN) contains over 23 thousand packages from all kinds of fields and applications, not just bioinformatics. Packages are submitted to CRAN as source tarballs and old source packages are kept in a public archive.
20+
21+
### R-Universe
22+
23+
This software infrastructure project from [rOpenSci](https://docs.r-universe.dev) is recognized by the R Consortium as critical infrastructure since [2024](https://r-consortium.org/posts/r-universe-named-r-consortiums-newest-top-level-project/). R packages which live on git are built continuously so the binaries are always in sync with the source package. This includes CRAN and Bioconductor packages as well.
24+
25+
### Bioconductor
26+
27+
Bioconductor provides a coordinated distribution of packages that are tested, versioned and released together. There are 2 release cycles per year, approximately 6 months apart, and each is tied to a specific R version. Bug fixes are possible on the current release, whereas more active package development happens on the `devel` branch which will be the future release. Build [reports](https://bioconductor.org/checkResults/) are also made available every few days.
28+
29+
## Installing and managing R packages
30+
31+
When working with and [managing](https://cran.r-project.org/web/packages/BiocManager/vignettes/BiocManager.html) Bioconductor packages, it is good practice to make sure they are coming from the same Bioconductor release, to avoid unnecessary problems. It is important to use the right R version suitable for the release being used. The `BiocManager` package offers useful functions to do these checks.
32+
33+
``` r
34+
## check for the version of Bioconductor currently in use
35+
BiocManager::version()
36+
37+
## check for packages that are out-of-date or from unexpected versions
38+
BiocManager::valid()
39+
40+
## check for available packages on Bioconductor
41+
BiocManager::available()
42+
```
43+
44+
To allow for the flexibility of having several Bioconductor versions on the same computer, it is good practice to create a library path for the specific Bioconductor release and R version, and then use `BiocManager::install()`, and set the `version` argument to the desired (current) Bioconductor release, to install all R packages including CRAN packages in this path. This allows for a degree of freedom and flexibility when working interactively, while also version controlling by the Bioconductor release and installing any R (e.g. CRAN) package in this fashion. For example, on a MacBook with arm64 architecture, this library path would be `~/Library/R/arm64/4.5-Bioc-3.21/library`.
45+
46+
To later use the installed packages, one would need to set the environment variable `R_LIBS_USER` to this path, and invoke R. Alternatively, once in R one can also use `.libPaths()` to add the library path. `R_LIBS_USER` can also be set in the `.Renviron` file which contains environment variables to be set in R sessions. The `usethis` package contains a helper function called `edit_r_environ()` to edit this file. The user must keep this file in mind if switching between different R versions and library paths and make the necessary edits.
47+
48+
Below is an illustration, within an R session, on how `BiocManager::install()` can be used to to install R packages. For packages that need to be installed from GitHub, `BiocManager::install()` uses `remotes::install_github()`.
49+
50+
``` r
51+
## set params
52+
bioc <- "3.21" # with R 4.5
53+
libPath <- "~/Library/R/arm64/4.5-Bioc-3.21/library"
54+
55+
## first time install BiocManager
56+
#install.packages("BiocManager", lib = libPath)
57+
58+
## packages (CRAN, Bioconductor and GitHub)
59+
pkgs <- c("Matrix", "SingleCellExperiment", "scuttle", "tidyverse",
60+
"BiocParallel", "scran", "tidyr", "ggplot2", "patchwork",
61+
"limma", "cowplot", "scater", "JASPAR2024",
62+
"DescTools", "monaLisa", "JASPAR2020", "TFBSTools",
63+
"BSgenome.Mmusculus.UCSC.mm10", "DEXSeq", "GenomicAlignments")
64+
65+
## install pkgs (without updating other packages -
66+
## this may be changed later to apply updates like bug fixes)
67+
BiocManager::install(pkgs = pkgs,
68+
update = FALSE,
69+
ask = TRUE,
70+
checkBuilt = FALSE,
71+
force = FALSE,
72+
version = bioc,
73+
lib = libPath)
74+
```
75+
76+
Of note is also the `BiocArchive` package to install CRAN package versions consistent with older releases of Bioconductor. The example below shows how to install packages compatible with release 3.14.
77+
78+
```{.r}
79+
## install packages matching previous bioconductor release
80+
libPath314 <- "~/Library/R/arm64/4.1-Bioc-3.14/library"
81+
BiocArchive::install(pkgs = pkgs,
82+
update = FALSE,
83+
ask = TRUE,
84+
checkBuilt = FALSE,
85+
force = FALSE,
86+
version = "3.14",
87+
lib = libPath314)
88+
```
89+
90+
### To update or not to update?
91+
92+
When using packages, it is useful to keep up to date with the latest package releases that address issues and bug fixes. At what point of the project you are in may also influence your decision on whether or not to update your R packages. At later stages one may for example want to avoid updates that could break one's code. Consider reading the release notes if you are worried about big changes affecting your code, as well as a separate environment (e.g. Docker container) for testing. Below are some arguments *for* and *against* updating as presented in the [vignette](https://cran.r-project.org/web/packages/BiocManager/vignettes/BiocManager.html) for the `BiocManager` package:
93+
94+
Pros:
95+
96+
- Bug fixes
97+
- Performance
98+
- New features
99+
- Compatibility
100+
- Security
101+
- Documentation
102+
103+
Cons:
104+
105+
- Code breakage
106+
- Version conflicts
107+
- Workflow disruption
108+
- Learning curve
109+
- Temporary instability
110+
111+
### Working as an R Bioconductor package developer
112+
113+
`BiocManager::install()` also allows for the packages that live on the `devel` branch to be downloaded by setting `version=devel`. This is useful when wanting to use a new package from the `devel` branch which is not yet part of an official release. On the other hand, as a package developer and maintainer this also allows you to test modifications to your package with the rest of the packages from the `devel` branch.
114+
115+
## rig
116+
117+
[`rig`](https://github.com/r-lib/rig) may be used to install specific R versions, as well as to launch RStudio with the desired R version. The commands below, which are run from the terminal, illustrate its use on a MacBook with arm64.
118+
119+
``` bash
120+
## list current R installations
121+
rig list
122+
123+
## list available R versions under arm64
124+
rig available --arch arm64
125+
126+
## add R 4.5.2 under arm64
127+
rig add --arch arm64 4.5.2
128+
129+
## launch RStudio with R-4.5.2
130+
rig rstudio 4.5-arm64
131+
```
132+
133+
## Writing R scripts and using `Snakemake`
134+
135+
Once in RStudio, and using the library paths we have created, we can work interactively on our `.R`, `.Rmd` or `.qmd` scripts. We can define a set of parameters to pass on as variables into our scripts. In this framework, we can set those parameters in the Snakefile and pass them on when running or rendering the script. In a `.qmd` file, the `params` YAML option is used to do [this](https://quarto.org/docs/computations/parameters.html). Some additional considerations that are useful are producing both pdf and png versions of your plots and figures, and printing out the date and session information at the end of the script with `date()` and `sessionInfo()`. The code below illustrates how png and pdf versions of the figures can be produced in a `.qmd` file by setting global chunk options as follows in the YAML header :
136+
137+
``` yaml
138+
knitr:
139+
opts_chunk:
140+
dev:
141+
- png
142+
- pdf
143+
```
144+
145+
It is generally good practice to use a workflow management system in your analyses, to keep track of changes and dependencies. Here we rely on [Snakemake](https://snakemake.readthedocs.io/en/stable/) and illustrate how our `.qmd` script can be rendered there. An example rule in our Snakefile called "qcFromCellranger" is depicted below.
146+
147+
``` python
148+
Rscript="/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/bin/Rscript" # <1>
149+
RBiocLib="~/Library/R/arm64/4.5-Bioc-3.21/library" # <1>
150+
151+
rule qcFromCellranger: # <2>
152+
input: # <2>
153+
qmd = "scripts/01_cellrangerQC.qmd" # <2>
154+
output: # <2>
155+
html = "scripts/01_cellrangerQC.html", # <2>
156+
outMetaFile = "generatedFiles/01_metadata.txt" # <2>
157+
shell: # <2>
158+
''' # <2>
159+
# wdir # <2>
160+
cd {basedir} && \ # <2>
161+
# <2>
162+
# setup needed env variables to run quarto with specific R and Rlibs # <2>
163+
export QUARTO_R={Rscript} && \ # <2>
164+
export R_LIBS_USER={RBiocLib} && \ # <2>
165+
# <2>
166+
# render with quarto # <2>
167+
quarto render {input.qmd} -P basedir:{basedir} \ # <2>
168+
-P metaOriginalFile:{originalMetaFile} -P metaExtraFile:{generatedMetaFile} \ # <2>
169+
-P outMetaFile:{output.outMetaFile} # <2>
170+
'''
171+
```
172+
173+
1. This would normally be added to the `config.yaml` file for `Snakemake`.
174+
2. This is the "qcFromCellranger" rule defined in the Snakefile.
175+
176+
## Managing packages with the renv package
177+
178+
`renv` is a package manager for R that helps to manage package dependencies. It is a useful tool for ensuring that your R code is reproducible.
179+
180+
::: {.callout-warning}
181+
If you install R with Conda / Pixi, do not use `renv` to version packages. Instead, use the prebuilt packages that Conda provides.
182+
They are often labeled as `r-packagename`, e.g. `r-base`, `r-tidyverse`, etc. Trying to use `renv` in a complex R environment will
183+
likely lead to compilation headaches as architecture strings may mismatch, leading to failures of package installations as packages are
184+
built from source.
185+
:::
186+
187+
### Initialization
188+
189+
Install `renv` and initialize it.
190+
191+
```{.r}
192+
install.packages("renv")
193+
renv::init(bioconductor = "3.22") # <1>
194+
```
195+
196+
1. Pins Bioconductor to a specific release. The bioconductor release must be compatible with the current R version, and therefore using `rig` to manage R versions is advantageous.
197+
198+
This creates the necessary files for `renv` to work, and should be included in your version control system.
199+
200+
:::{.callout-warning}
201+
Your R session must be restarted for the changes in `.Rprofile` to take effect after `renv::init()`. This is handled automatically in RStudio.
202+
:::
203+
204+
### Package installation
205+
206+
As you work, install packages with an `renv` compatible method, and then snapshot the environment. This installs
207+
packages to the renv cache which is then copied to the renv library.
208+
209+
```{.r}
210+
renv::install("package_name1")
211+
renv::install("package_name2")
212+
# If you prefer pacman
213+
pacman::p_load("package_name1","package_name2", "package_name3")
214+
# Check your code still works
215+
# then update the lockfile
216+
renv::snapshot()
217+
```
218+
219+
:::{.callout-tip}
220+
Use `renv::status()` to check if your environment is up to date with the lockfile.
221+
:::
222+
223+
### Restoring packages
224+
225+
If you are working on a project for the first time, or if you are working on a project that someone else has shared with you, you can use `renv::restore()` to install the packages that are needed for the project. This will install the packages that are listed in the `renv.lock` file.
226+
```{.r}
227+
renv::restore()
228+
```
229+
230+
### Other dependencies
231+
232+
`renv` doesn't manage non-R dependencies like `pandoc`, for example, or system-level dependencies like `libxml2` or `zlib`. Docker
233+
is often used to manage these dependencies instead. The `renv.lock` is copied inside the container, and `renv::restore()` is run
234+
there to produce the same environment as on the host machine.
235+
236+
### Additional resources
237+
238+
- [Introduction to renv](https://rstudio.r-universe.dev/articles/renv/renv.html)
239+
- [Using renv with Bioconductor](https://rstudio.github.io/renv/articles/bioconductor.html)
240+
- [How renv restores packages from R-Universe](https://ropensci.org/blog/2022/01/06/runiverse-renv/)
241+
- [Docker best practices for R developers](https://collabnix.com/10-essential-docker-best-practices-for-r-developers-in-2025/)
242+
243+
## Other mentions
244+
245+
Additional resources to check for managing and using R packages are:
246+
247+
- [pixi](https://pixi.prefix.dev/latest/)
248+
- [Bioconductor Docker containers](https://bioconductor.org/help/docker/)
249+
- [Seqera's Wave](https://seqera.io/wave/)
250+
251+
It is worth keeping in mind that some of these do not allow for specifying the Bioconductor release version when installing packages and that [Bioconda](https://bioconda.github.io) can be a bit behind in terms of package versions, with some of the most recent versions missing.

0 commit comments

Comments
 (0)