|
| 1 | +--- |
| 2 | +title: "Working with R packages" |
| 3 | +author: |
| 4 | + - "Dania Machlab" |
| 5 | + - "Mahesh Binzer-Panchal" |
| 6 | +date: "2026-02-12" |
| 7 | +date-modified: last-modified |
| 8 | +categories: ["R", "R packages", "Version controlling with Bioconductor releases"] |
| 9 | +--- |
| 10 | + |
| 11 | +When working with R packages, it is good practice to version control the R and package versions being used. Doing so with enough flexibility to allow for interactive work when testing and perfecting scripts is useful since a lot of time is spent in that phase, and a degree of freedom is needed to quickly install new packages to try, or to use the latest packages with implemented bug fixes. Working with R in bioinformatics, we often rely on the useful packages and data structures from [Bioconductor](https://bioconductor.org) which facilitate our analyses. This walkthrough goes through recommended practices on how to work with R packages from that point of view, and how to couple installations with specific Bioconductor releases. |
| 12 | + |
| 13 | +## R package sources |
| 14 | + |
| 15 | +R packages can be installed from several sources including CRAN, GitHub, GitLab, R-Universe, and Bioconductor. Packages like `utils`, `devtools`, `remotes` and `BiocManager` offer functions to install packages from these sources. More details on how to install and manage R packages are covered in the next section. |
| 16 | + |
| 17 | +### CRAN |
| 18 | + |
| 19 | +The comprehensive R archive network (CRAN) contains over 23 thousand packages from all kinds of fields and applications, not just bioinformatics. Packages are submitted to CRAN as source tarballs and old source packages are kept in a public archive. |
| 20 | + |
| 21 | +### R-Universe |
| 22 | + |
| 23 | +This software infrastructure project from [rOpenSci](https://docs.r-universe.dev) is recognized by the R Consortium as critical infrastructure since [2024](https://r-consortium.org/posts/r-universe-named-r-consortiums-newest-top-level-project/). R packages which live on git are built continuously so the binaries are always in sync with the source package. This includes CRAN and Bioconductor packages as well. |
| 24 | + |
| 25 | +### Bioconductor |
| 26 | + |
| 27 | +Bioconductor provides a coordinated distribution of packages that are tested, versioned and released together. There are 2 release cycles per year, approximately 6 months apart, and each is tied to a specific R version. Bug fixes are possible on the current release, whereas more active package development happens on the `devel` branch which will be the future release. Build [reports](https://bioconductor.org/checkResults/) are also made available every few days. |
| 28 | + |
| 29 | +## Installing and managing R packages |
| 30 | + |
| 31 | +When working with and [managing](https://cran.r-project.org/web/packages/BiocManager/vignettes/BiocManager.html) Bioconductor packages, it is good practice to make sure they are coming from the same Bioconductor release, to avoid unnecessary problems. It is important to use the right R version suitable for the release being used. The `BiocManager` package offers useful functions to do these checks. |
| 32 | + |
| 33 | +``` r |
| 34 | +## check for the version of Bioconductor currently in use |
| 35 | +BiocManager::version() |
| 36 | + |
| 37 | +## check for packages that are out-of-date or from unexpected versions |
| 38 | +BiocManager::valid() |
| 39 | + |
| 40 | +## check for available packages on Bioconductor |
| 41 | +BiocManager::available() |
| 42 | +``` |
| 43 | + |
| 44 | +To allow for the flexibility of having several Bioconductor versions on the same computer, it is good practice to create a library path for the specific Bioconductor release and R version, and then use `BiocManager::install()`, and set the `version` argument to the desired (current) Bioconductor release, to install all R packages including CRAN packages in this path. This allows for a degree of freedom and flexibility when working interactively, while also version controlling by the Bioconductor release and installing any R (e.g. CRAN) package in this fashion. For example, on a MacBook with arm64 architecture, this library path would be `~/Library/R/arm64/4.5-Bioc-3.21/library`. |
| 45 | + |
| 46 | +To later use the installed packages, one would need to set the environment variable `R_LIBS_USER` to this path, and invoke R. Alternatively, once in R one can also use `.libPaths()` to add the library path. `R_LIBS_USER` can also be set in the `.Renviron` file which contains environment variables to be set in R sessions. The `usethis` package contains a helper function called `edit_r_environ()` to edit this file. The user must keep this file in mind if switching between different R versions and library paths and make the necessary edits. |
| 47 | + |
| 48 | +Below is an illustration, within an R session, on how `BiocManager::install()` can be used to to install R packages. For packages that need to be installed from GitHub, `BiocManager::install()` uses `remotes::install_github()`. |
| 49 | + |
| 50 | +``` r |
| 51 | +## set params |
| 52 | +bioc <- "3.21" # with R 4.5 |
| 53 | +libPath <- "~/Library/R/arm64/4.5-Bioc-3.21/library" |
| 54 | + |
| 55 | +## first time install BiocManager |
| 56 | +#install.packages("BiocManager", lib = libPath) |
| 57 | + |
| 58 | +## packages (CRAN, Bioconductor and GitHub) |
| 59 | +pkgs <- c("Matrix", "SingleCellExperiment", "scuttle", "tidyverse", |
| 60 | + "BiocParallel", "scran", "tidyr", "ggplot2", "patchwork", |
| 61 | + "limma", "cowplot", "scater", "JASPAR2024", |
| 62 | + "DescTools", "monaLisa", "JASPAR2020", "TFBSTools", |
| 63 | + "BSgenome.Mmusculus.UCSC.mm10", "DEXSeq", "GenomicAlignments") |
| 64 | + |
| 65 | +## install pkgs (without updating other packages - |
| 66 | +## this may be changed later to apply updates like bug fixes) |
| 67 | +BiocManager::install(pkgs = pkgs, |
| 68 | + update = FALSE, |
| 69 | + ask = TRUE, |
| 70 | + checkBuilt = FALSE, |
| 71 | + force = FALSE, |
| 72 | + version = bioc, |
| 73 | + lib = libPath) |
| 74 | +``` |
| 75 | + |
| 76 | +Of note is also the `BiocArchive` package to install CRAN package versions consistent with older releases of Bioconductor. The example below shows how to install packages compatible with release 3.14. |
| 77 | + |
| 78 | +```{.r} |
| 79 | +## install packages matching previous bioconductor release |
| 80 | +libPath314 <- "~/Library/R/arm64/4.1-Bioc-3.14/library" |
| 81 | +BiocArchive::install(pkgs = pkgs, |
| 82 | + update = FALSE, |
| 83 | + ask = TRUE, |
| 84 | + checkBuilt = FALSE, |
| 85 | + force = FALSE, |
| 86 | + version = "3.14", |
| 87 | + lib = libPath314) |
| 88 | +``` |
| 89 | + |
| 90 | +### To update or not to update? |
| 91 | + |
| 92 | +When using packages, it is useful to keep up to date with the latest package releases that address issues and bug fixes. At what point of the project you are in may also influence your decision on whether or not to update your R packages. At later stages one may for example want to avoid updates that could break one's code. Consider reading the release notes if you are worried about big changes affecting your code, as well as a separate environment (e.g. Docker container) for testing. Below are some arguments *for* and *against* updating as presented in the [vignette](https://cran.r-project.org/web/packages/BiocManager/vignettes/BiocManager.html) for the `BiocManager` package: |
| 93 | + |
| 94 | +Pros: |
| 95 | + |
| 96 | +- Bug fixes |
| 97 | +- Performance |
| 98 | +- New features |
| 99 | +- Compatibility |
| 100 | +- Security |
| 101 | +- Documentation |
| 102 | + |
| 103 | +Cons: |
| 104 | + |
| 105 | +- Code breakage |
| 106 | +- Version conflicts |
| 107 | +- Workflow disruption |
| 108 | +- Learning curve |
| 109 | +- Temporary instability |
| 110 | + |
| 111 | +### Working as an R Bioconductor package developer |
| 112 | + |
| 113 | +`BiocManager::install()` also allows for the packages that live on the `devel` branch to be downloaded by setting `version=devel`. This is useful when wanting to use a new package from the `devel` branch which is not yet part of an official release. On the other hand, as a package developer and maintainer this also allows you to test modifications to your package with the rest of the packages from the `devel` branch. |
| 114 | + |
| 115 | +## rig |
| 116 | + |
| 117 | +[`rig`](https://github.com/r-lib/rig) may be used to install specific R versions, as well as to launch RStudio with the desired R version. The commands below, which are run from the terminal, illustrate its use on a MacBook with arm64. |
| 118 | + |
| 119 | +``` bash |
| 120 | +## list current R installations |
| 121 | +rig list |
| 122 | + |
| 123 | +## list available R versions under arm64 |
| 124 | +rig available --arch arm64 |
| 125 | + |
| 126 | +## add R 4.5.2 under arm64 |
| 127 | +rig add --arch arm64 4.5.2 |
| 128 | + |
| 129 | +## launch RStudio with R-4.5.2 |
| 130 | +rig rstudio 4.5-arm64 |
| 131 | +``` |
| 132 | + |
| 133 | +## Writing R scripts and using `Snakemake` |
| 134 | + |
| 135 | +Once in RStudio, and using the library paths we have created, we can work interactively on our `.R`, `.Rmd` or `.qmd` scripts. We can define a set of parameters to pass on as variables into our scripts. In this framework, we can set those parameters in the Snakefile and pass them on when running or rendering the script. In a `.qmd` file, the `params` YAML option is used to do [this](https://quarto.org/docs/computations/parameters.html). Some additional considerations that are useful are producing both pdf and png versions of your plots and figures, and printing out the date and session information at the end of the script with `date()` and `sessionInfo()`. The code below illustrates how png and pdf versions of the figures can be produced in a `.qmd` file by setting global chunk options as follows in the YAML header : |
| 136 | + |
| 137 | +``` yaml |
| 138 | +knitr: |
| 139 | + opts_chunk: |
| 140 | + dev: |
| 141 | + - png |
| 142 | + - pdf |
| 143 | +``` |
| 144 | +
|
| 145 | +It is generally good practice to use a workflow management system in your analyses, to keep track of changes and dependencies. Here we rely on [Snakemake](https://snakemake.readthedocs.io/en/stable/) and illustrate how our `.qmd` script can be rendered there. An example rule in our Snakefile called "qcFromCellranger" is depicted below. |
| 146 | + |
| 147 | +``` python |
| 148 | +Rscript="/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/bin/Rscript" # <1> |
| 149 | +RBiocLib="~/Library/R/arm64/4.5-Bioc-3.21/library" # <1> |
| 150 | +
|
| 151 | +rule qcFromCellranger: # <2> |
| 152 | + input: # <2> |
| 153 | + qmd = "scripts/01_cellrangerQC.qmd" # <2> |
| 154 | + output: # <2> |
| 155 | + html = "scripts/01_cellrangerQC.html", # <2> |
| 156 | + outMetaFile = "generatedFiles/01_metadata.txt" # <2> |
| 157 | + shell: # <2> |
| 158 | + ''' # <2> |
| 159 | + # wdir # <2> |
| 160 | + cd {basedir} && \ # <2> |
| 161 | + # <2> |
| 162 | + # setup needed env variables to run quarto with specific R and Rlibs # <2> |
| 163 | + export QUARTO_R={Rscript} && \ # <2> |
| 164 | + export R_LIBS_USER={RBiocLib} && \ # <2> |
| 165 | + # <2> |
| 166 | + # render with quarto # <2> |
| 167 | + quarto render {input.qmd} -P basedir:{basedir} \ # <2> |
| 168 | + -P metaOriginalFile:{originalMetaFile} -P metaExtraFile:{generatedMetaFile} \ # <2> |
| 169 | + -P outMetaFile:{output.outMetaFile} # <2> |
| 170 | + ''' |
| 171 | +``` |
| 172 | + |
| 173 | +1. This would normally be added to the `config.yaml` file for `Snakemake`. |
| 174 | +2. This is the "qcFromCellranger" rule defined in the Snakefile. |
| 175 | + |
| 176 | +## Managing packages with the renv package |
| 177 | + |
| 178 | +`renv` is a package manager for R that helps to manage package dependencies. It is a useful tool for ensuring that your R code is reproducible. |
| 179 | + |
| 180 | +::: {.callout-warning} |
| 181 | +If you install R with Conda / Pixi, do not use `renv` to version packages. Instead, use the prebuilt packages that Conda provides. |
| 182 | +They are often labeled as `r-packagename`, e.g. `r-base`, `r-tidyverse`, etc. Trying to use `renv` in a complex R environment will |
| 183 | +likely lead to compilation headaches as architecture strings may mismatch, leading to failures of package installations as packages are |
| 184 | +built from source. |
| 185 | +::: |
| 186 | + |
| 187 | +### Initialization |
| 188 | + |
| 189 | +Install `renv` and initialize it. |
| 190 | + |
| 191 | +```{.r} |
| 192 | +install.packages("renv") |
| 193 | +renv::init(bioconductor = "3.22") # <1> |
| 194 | +``` |
| 195 | + |
| 196 | +1. Pins Bioconductor to a specific release. The bioconductor release must be compatible with the current R version, and therefore using `rig` to manage R versions is advantageous. |
| 197 | + |
| 198 | +This creates the necessary files for `renv` to work, and should be included in your version control system. |
| 199 | + |
| 200 | +:::{.callout-warning} |
| 201 | +Your R session must be restarted for the changes in `.Rprofile` to take effect after `renv::init()`. This is handled automatically in RStudio. |
| 202 | +::: |
| 203 | + |
| 204 | +### Package installation |
| 205 | + |
| 206 | +As you work, install packages with an `renv` compatible method, and then snapshot the environment. This installs |
| 207 | +packages to the renv cache which is then copied to the renv library. |
| 208 | + |
| 209 | +```{.r} |
| 210 | +renv::install("package_name1") |
| 211 | +renv::install("package_name2") |
| 212 | +# If you prefer pacman |
| 213 | +pacman::p_load("package_name1","package_name2", "package_name3") |
| 214 | +# Check your code still works |
| 215 | +# then update the lockfile |
| 216 | +renv::snapshot() |
| 217 | +``` |
| 218 | + |
| 219 | +:::{.callout-tip} |
| 220 | +Use `renv::status()` to check if your environment is up to date with the lockfile. |
| 221 | +::: |
| 222 | + |
| 223 | +### Restoring packages |
| 224 | + |
| 225 | +If you are working on a project for the first time, or if you are working on a project that someone else has shared with you, you can use `renv::restore()` to install the packages that are needed for the project. This will install the packages that are listed in the `renv.lock` file. |
| 226 | +```{.r} |
| 227 | +renv::restore() |
| 228 | +``` |
| 229 | + |
| 230 | +### Other dependencies |
| 231 | + |
| 232 | +`renv` doesn't manage non-R dependencies like `pandoc`, for example, or system-level dependencies like `libxml2` or `zlib`. Docker |
| 233 | +is often used to manage these dependencies instead. The `renv.lock` is copied inside the container, and `renv::restore()` is run |
| 234 | +there to produce the same environment as on the host machine. |
| 235 | + |
| 236 | +### Additional resources |
| 237 | + |
| 238 | +- [Introduction to renv](https://rstudio.r-universe.dev/articles/renv/renv.html) |
| 239 | +- [Using renv with Bioconductor](https://rstudio.github.io/renv/articles/bioconductor.html) |
| 240 | +- [How renv restores packages from R-Universe](https://ropensci.org/blog/2022/01/06/runiverse-renv/) |
| 241 | +- [Docker best practices for R developers](https://collabnix.com/10-essential-docker-best-practices-for-r-developers-in-2025/) |
| 242 | + |
| 243 | +## Other mentions |
| 244 | + |
| 245 | +Additional resources to check for managing and using R packages are: |
| 246 | + |
| 247 | +- [pixi](https://pixi.prefix.dev/latest/) |
| 248 | +- [Bioconductor Docker containers](https://bioconductor.org/help/docker/) |
| 249 | +- [Seqera's Wave](https://seqera.io/wave/) |
| 250 | + |
| 251 | +It is worth keeping in mind that some of these do not allow for specifying the Bioconductor release version when installing packages and that [Bioconda](https://bioconda.github.io) can be a bit behind in terms of package versions, with some of the most recent versions missing. |
0 commit comments