|
| 1 | +--- |
| 2 | +title: "Bioinformatics tools & workflows on BioCommons partner infrastructures" |
| 3 | +--- |
| 4 | + |
| 5 | +This page details tool request mechanisms and a list of tools and workflows installed across several of the [BioCommons infrastructure partner systems](./1_compute_systems.html). Tool status on other partner infrastructures will be added over time. |
| 6 | + |
| 7 | +----- |
| 8 | + |
| 9 | +----- |
| 10 | + |
| 11 | +```{r setup, include=FALSE, warning = FALSE} |
| 12 | +knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE) |
| 13 | +library(tidyverse) |
| 14 | +library(yaml) |
| 15 | +library(kableExtra) |
| 16 | +#lubridate included to allow for automated provision of date for tool list update |
| 17 | +#see https://r4ds.had.co.nz/dates-and-times.html |
| 18 | +library(lubridate) |
| 19 | + |
| 20 | +#see https://stackoverflow.com/questions/13548266/define-all-functions-in-one-r-file-call-them-from-another-r-file-how-if-pos |
| 21 | +source("functions.R") |
| 22 | +``` |
| 23 | + |
| 24 | +# Requesting tool installations |
| 25 | + |
| 26 | +Tool installs can be requested for the infrastructures listed in the table: |
| 27 | + |
| 28 | +<font size="-1.5"> |
| 29 | +```{r} |
| 30 | + |
| 31 | +tool_request <- tibble( |
| 32 | + Infrastructure = c("[Galaxy Australia](https://usegalaxy.org.au/)", |
| 33 | + "[NCI](https://nci.org.au/)", |
| 34 | + "[Pawsey](https://pawsey.org.au/)", |
| 35 | + "[QRIScloud](https://www.qriscloud.org.au/) / [UQ-RCC](https://rcc.uq.edu.au/)"), |
| 36 | + Process = c("Complete a [tool install request](https://request.usegalaxy.org.au/)", |
| 37 | + "[Contact NCI](https://opus.nci.org.au/display/Help/5.+Software+Applications)", |
| 38 | + "Visit the [Helpdesk Portal](https://support.pawsey.org.au/portal/servicedesk/customer/portal) or email: [help@pawsey.org.au](help@pawsey.org.au) with your request", |
| 39 | + "Complete a [tool install request](https://docs.google.com/forms/d/e/1FAIpQLSefKCBkZbTQ-dUyU-pd4ZypkA-TA5GiMgpsJ2slWmD-B6elEg/viewform)" |
| 40 | + ) |
| 41 | + ) |
| 42 | + |
| 43 | +#see https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html |
| 44 | +tool_request %>% |
| 45 | + kbl(escape = FALSE, align = "l") %>% |
| 46 | + kable_styling(bootstrap_options = c("striped")) |
| 47 | + |
| 48 | +``` |
| 49 | +</font> |
| 50 | + |
| 51 | +```{r} |
| 52 | +###################### |
| 53 | +###load matrix data### |
| 54 | +###################### |
| 55 | +data <- read_matrix("../../external_GitHub_inputs/Matrix of Availability of Bioinformatics Tools across BioCommons - deployment version - Bioinformatics Software Availability.tsv") |
| 56 | + |
| 57 | +###################### |
| 58 | +###load galaxy data### |
| 59 | +###################### |
| 60 | +galaxy <- parse_GA_yaml(list.files("../../external_GitHub_inputs/usegalaxy.org.au/", full.names = TRUE)) |
| 61 | + |
| 62 | +######################### |
| 63 | +###load QRIScloud data### |
| 64 | +######################### |
| 65 | +qris <- read_qris("../../external_GitHub_inputs/qriscloud.txt") %>% |
| 66 | + rename(`QRIScloud / UQ-RCC (Flashlite, Awoonga, Tinaroo)` = qris_cloud) %>% |
| 67 | + #see https://community.rstudio.com/t/which-tidyverse-is-the-equivalent-to-search-replace-from-spreadsheets/3548/7 |
| 68 | + mutate_if(is.character, str_replace_all, pattern = 'genomeanalysistk', replacement = 'gatk') %>% |
| 69 | + mutate_if(is.character, str_replace_all, pattern = '^pacbio$', replacement = 'smrtlink') %>% |
| 70 | + mutate_if(is.character, str_replace_all, pattern = '^soapdenovo$', replacement = 'soapdenovo2') |
| 71 | + |
| 72 | +#################### |
| 73 | +###load Gadi data### |
| 74 | +#################### |
| 75 | +gadi <- read_gadi("../../external_GitHub_inputs/gadi.csv") %>% |
| 76 | + rename(`NCI (Gadi)` = version) |
| 77 | + |
| 78 | +###################### |
| 79 | +###load Pawsey data### |
| 80 | +###################### |
| 81 | +zeus <- read_hpc("../../external_GitHub_inputs/zeus.txt") %>% |
| 82 | + rename(`Pawsey (Zeus)` = version) %>% |
| 83 | + #see https://community.rstudio.com/t/which-tidyverse-is-the-equivalent-to-search-replace-from-spreadsheets/3548/7 |
| 84 | + mutate_if(is.character, str_replace_all, pattern = 'wgs', replacement = 'celera') %>% |
| 85 | + mutate_if(is.character, str_replace_all, pattern = 'trinityrnaseq', replacement = 'trinity') |
| 86 | + |
| 87 | +magnus <- read_hpc("../../external_GitHub_inputs/magnus.txt") %>% |
| 88 | + rename(`Pawsey (Magnus)` = version) |
| 89 | + |
| 90 | +################################# |
| 91 | +###join and process tool lists### |
| 92 | +################################# |
| 93 | +COMPLETE <- join_and_process_tools(matrix_data = data, |
| 94 | + gadi_data = gadi, |
| 95 | + zeus_data = zeus, |
| 96 | + magnus_data = magnus, |
| 97 | + qris_data = qris, |
| 98 | + galaxy_data = galaxy) |
| 99 | + |
| 100 | +###other links |
| 101 | +#see https://stackoverflow.com/a/56683740 |
| 102 | +#see https://stackoverflow.com/questions/43696227/mutate-with-case-when-and-contains |
| 103 | + |
| 104 | +``` |
| 105 | + |
| 106 | +----- |
| 107 | + |
| 108 | +----- |
| 109 | + |
| 110 | +# Tool information {.tabset .tabset-pills} |
| 111 | + |
| 112 | +## Tool install status and version availability |
| 113 | + |
| 114 | +**Note:** |
| 115 | + |
| 116 | +- Table last updated ```r today()```. |
| 117 | +- Software documentation is linked from the *Tool / workflow name* in the first column. |
| 118 | +- For tools which are not currently installed on [Galaxy Australia](https://usegalaxy.org.au/), but which are available [in the Galaxy app store (aka toolshed)](https://toolshed.g2.bx.psu.edu/), the *Available in Galaxy toolshed* column launches a toolshed search using the link label as the search term. |
| 119 | +- The list currently includes some tools that are not used for bioinformatics. |
| 120 | +- The source material for the table is currently manually curated, and while we endeavour to keep the information as current as possible, there is a natural limit to the volume of information maintained here. Production of this information will be automated during 2021, and tools that are not relevant for bioinformatics analyses removed. |
| 121 | + |
| 122 | +<font size="-1.5"> |
| 123 | +```{r} |
| 124 | + |
| 125 | +installs <- COMPLETE %>% |
| 126 | + #see https://tidyr.tidyverse.org/reference/replace_na.html |
| 127 | + replace_na(list(`Galaxy Australia` = "", |
| 128 | + `Available in Galaxy toolshed` = "", |
| 129 | + `Pawsey (Zeus)` = "", |
| 130 | + `Pawsey (Magnus)` = "", |
| 131 | + `QRIScloud / UQ-RCC (Flashlite, Awoonga, Tinaroo)` = "", |
| 132 | + `NCI (Gadi)` = "" |
| 133 | + )) %>% |
| 134 | + select(`Tool / workflow name`, |
| 135 | + `Galaxy Australia`, |
| 136 | + `Available in Galaxy toolshed`, |
| 137 | + `NCI (Gadi)`, |
| 138 | + `Pawsey (Zeus)`, |
| 139 | + `Pawsey (Magnus)`, |
| 140 | + `QRIScloud / UQ-RCC (Flashlite, Awoonga, Tinaroo)` |
| 141 | + ) |
| 142 | + |
| 143 | +#see https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html |
| 144 | +kable(installs, format = "pipe", align = "c") |
| 145 | + |
| 146 | +``` |
| 147 | +</font> |
| 148 | + |
| 149 | +----- |
| 150 | + |
| 151 | +----- |
| 152 | + |
| 153 | +## Tool information |
| 154 | + |
| 155 | +**Note:** |
| 156 | + |
| 157 | +- Table last updated ```r today()```. |
| 158 | +- Software documentation is linked from the *Tool / workflow name* in the first column. |
| 159 | +- The *Primary purpose* column categorises the tools by purpose, using an [EDAM](https://github.com/edamontology/edamontology) *topic* where possible. |
| 160 | +- More information about a tool can be found by following the link in the [bio.tools](https://bio.tools/) column. |
| 161 | +- When a tool has been containerised to allow for easier installation on any compute infrastructure, a link to the containerised software that can be downloaded is shown in the [BioContainers](https://biocontainers.pro/#/) column. |
| 162 | +- The list currently includes some tools that are not used for bioinformatics. |
| 163 | +- The source material for the table is currently manually curated, and while we endeavour to keep the information as current as possible, there is a natural limit to the volume of information maintained here. Production of this information will be automated during 2021, and tools that are not relevant for bioinformatics analyses removed. |
| 164 | + |
| 165 | +<font size="-1.5"> |
| 166 | +```{r} |
| 167 | +info <- COMPLETE %>% |
| 168 | + #see https://tidyr.tidyverse.org/reference/replace_na.html |
| 169 | + replace_na(list(`bio.tools` = "", |
| 170 | + `BioContainers` = "", |
| 171 | + `Primary purpose (EDAM, if available)` = "")) %>% |
| 172 | + select(`Tool / workflow name`, |
| 173 | + `Primary purpose (EDAM, if available)`, |
| 174 | + `bio.tools`, |
| 175 | + `BioContainers`) |
| 176 | + |
| 177 | +kable(info, format = "pipe", align = "c") |
| 178 | +``` |
| 179 | +</font> |
| 180 | + |
| 181 | +----- |
| 182 | + |
| 183 | +----- |
| 184 | + |
| 185 | +# How were the tools and workflows selected? |
| 186 | + |
| 187 | +The tool list here includes those that have been identified through [BioCommons Community consultations](https://www.biocommons.org.au/get-involved) and other engagements, as well as through matching to the tool sets installed across the various BioCommons partner compute infrastructures. The list is not intended to be exhaustive: i.e. it does not contain *ALL* bioinformatics tools. |
| 188 | + |
| 189 | +----- |
| 190 | + |
| 191 | +----- |
| 192 | + |
| 193 | +# How was the list generated? |
| 194 | + |
| 195 | +RStudio (see session info at the bottom of this page) was used for processing. |
| 196 | + |
| 197 | +The broad steps are described here: |
| 198 | + |
| 199 | +<font size="-1.5"> |
| 200 | + |
| 201 | +1. The manually curated tool list and separate tool lists provided by the BioCommons partner infrastructures were parsed (Galaxy Australia, NCI, Pawsey and QRISCloud - RCC). |
| 202 | +2. The complete set of tool lists were joined using the ```tool ID``` as a key, and |
| 203 | +3. Links were embedded, where available, to the original documentation URLs (homepage or GitHub repository, for example) as well as the registry entries on bio.tools and BioContainers. |
| 204 | +4. Galaxy wrapping often entails that tool suites are separated into their individual tool components. As such, if a Galaxy Australia tool was matched to the manually curated tool list, a tool shed search link was embedded in the Galaxy Australia column. This allows a user to search for the tool using a relevant search term (e.g. vcftools) to identify all the component tools available in the toolshed. |
| 205 | + |
| 206 | +</font> |
| 207 | + |
| 208 | +----- |
| 209 | + |
| 210 | +----- |
| 211 | + |
| 212 | +## Session information |
| 213 | + |
| 214 | +```{r} |
| 215 | +sessionInfo() |
| 216 | +``` |
| 217 | + |
| 218 | +----- |
| 219 | + |
| 220 | +----- |
| 221 | + |
| 222 | +```{r echo = FALSE, message = FALSE} |
| 223 | +#see https://bookdown.org/yihui/rmarkdown-cookbook/write-bib.html |
| 224 | +library(knitr) |
| 225 | +write_bib(c(.packages()), "./outputs/tools_packages.bib") |
| 226 | +``` |
0 commit comments