Skip to content

Commit fb057c9

Browse files
authored
Merge pull request #1425 from nf-core/arm_3.16.1
Add profile for ARM compatibility
2 parents 3671d59 + 424137a commit fb057c9

File tree

19 files changed

+473
-148
lines changed

19 files changed

+473
-148
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1313
- [PR #1422](https://github.com/nf-core/rnaseq/pull/1422) - Bump lots of modules so that conda versions have ARM builds
1414
- [PR #1423](https://github.com/nf-core/rnaseq/pull/1423) - Bump STAR version for version with ARM Conda build
1515
- [PR #1424](https://github.com/nf-core/rnaseq/pull/1424) - Patch sortmerna to 4.3.7 for ARM compatibility
16+
- [PR #1425](https://github.com/nf-core/rnaseq/pull/1425) - Add profile for ARM compatibility
1617

1718
## [[3.16.1](https://github.com/nf-core/rnaseq/releases/tag/3.16.1)] - 2024-10-16
1819

conf/arm.config

Lines changed: 274 additions & 0 deletions
Large diffs are not rendered by default.

docs/usage.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,10 @@ If you would like to reduce the number of reads used in the analysis, for exampl
116116
117117
## Alignment options
118118

119+
:::note
120+
The `--aligner hisat2` option is not currently supported using ARM architecture ('-profile arm')
121+
:::
122+
119123
By default, the pipeline uses [STAR](https://github.com/alexdobin/STAR) (i.e. `--aligner star_salmon`) to map the raw FastQ reads to the reference genome, project the alignments onto the transcriptome and to perform the downstream BAM-level quantification with [Salmon](https://salmon.readthedocs.io/en/latest/salmon.html). STAR is fast but requires a lot of memory to run, typically around 38GB for the Human GRCh37 reference genome. Since the [RSEM](https://github.com/deweylab/RSEM) (i.e. `--aligner star_rsem`) workflow in the pipeline also uses STAR you should use the [HISAT2](https://ccb.jhu.edu/software/hisat2/index.shtml) aligner (i.e. `--aligner hisat2`) if you have memory limitations.
120124

121125
You also have the option to pseudoalign and quantify your data directly with [Salmon](https://salmon.readthedocs.io/en/latest/salmon.html) or [Kallisto](https://pachterlab.github.io/kallisto/) by specifying `salmon` or `kallisto` to the `--pseudo_aligner` parameter. The selected pseudoaligner will then be run in addition to the standard alignment workflow defined by `--aligner`, mainly because it allows you to obtain QC metrics with respect to the genomic alignments. However, you can provide the `--skip_alignment` parameter if you would like to run Salmon or Kallisto in isolation. By default, the pipeline will use the genome fasta and gtf file to generate the transcripts fasta file, and then to build the Salmon index. You can override these parameters using the `--transcript_fasta` and `--salmon_index` parameters, respectively.
@@ -298,6 +302,10 @@ By default, the input GTF file will be filtered to ensure that sequence names co
298302

299303
## Contamination screening options
300304

305+
:::note
306+
The `--contaminant_screening` option is not currently available using ARM architecture ('-profile arm')
307+
:::
308+
301309
The pipeline provides the option to scan unaligned reads for contamination from other species using [Kraken2](https://ccb.jhu.edu/software/kraken2/), with the possibility of applying corrections from [Bracken](https://ccb.jhu.edu/software/bracken/). Since running Bracken is not computationally expensive, we recommend always using it to refine the abundance estimates generated by Kraken2.
302310

303311
It is important to note that the accuracy of Kraken2 is [highly dependent on the database](https://doi.org/10.1099/mgen.0.000949) used. Specifically, it is [crucial](https://doi.org/10.1128/mbio.01607-23) to ensure that the host genome is included in the database. If you are particularly concerned about certain contaminants, it may be beneficial to use a smaller, more focused database containing primarily those contaminants instead of the full standard database. Various pre-built databases [are available for download](https://benlangmead.github.io/aws-indexes/k2), and instructions for building a custom database can be found in the [Kraken2 documentation](https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown). Additionally, genomes of contaminants detected in previous sequencing experiments are available on the [OpenContami website](https://openlooper.hgc.jp/opencontami/help/help_oct.php).
@@ -356,6 +364,26 @@ genome: 'GRCh37'
356364

357365
You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-co.re/launch).
358366

367+
### Running on Linux ARM architectures
368+
369+
:::warning
370+
Please note that the ARM profile is experimental. It is expected to function correctly in all cases unless explicitly indicated otherwise—currently, exceptions include the use of the hisat2 aligner and contaminant screening via kraken2. However, because testing is presently conducted manually, we cannot guarantee its reliability.
371+
:::
372+
373+
The pipeline can be executed in an ARM compatible mode by specifying the ARM profile, for example:
374+
375+
```bash
376+
nextflow run \
377+
nf-core/rnaseq \
378+
--input <SAMPLESHEET> \
379+
--outdir <OUTDIR> \
380+
--gtf <GTF> \
381+
--fasta <GENOME FASTA> \
382+
-profile docker,arm
383+
```
384+
385+
This will use ARM-compatible containers, and apply a small number of overrides to Conda definitions to support ARM operations.
386+
359387
### Updating the pipeline
360388

361389
When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline:
@@ -420,6 +448,8 @@ If `-profile` is not specified, the pipeline will run locally and expect all sof
420448
- A generic configuration profile to enable [Wave](https://seqera.io/wave/) containers. Use together with one of the above (requires Nextflow ` 24.03.0-edge` or later).
421449
- `conda`
422450
- A generic configuration profile to be used with [Conda](https://conda.io/docs/). Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter, Charliecloud, or Apptainer.
451+
- `arm`
452+
- A configuration profile that will set `docker.runOptions` appropriately for ARM architectures, and apply overrides supplying ARM-compatible containers and Conda environments. See [Running on Linux ARM architectures](#running-on-linux-arm-architectures).
423453

424454
### `-resume`
425455

modules.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -245,7 +245,7 @@
245245
},
246246
"trimgalore": {
247247
"branch": "master",
248-
"git_sha": "49f4e50534fe4b64101e62ea41d5dc43b1324358",
248+
"git_sha": "8c5eeedd45e295fc9a4f164631da6a8b37e6b9c6",
249249
"installed_by": ["fastq_fastqc_umitools_trimgalore"]
250250
},
251251
"tximeta/tximport": {
@@ -333,7 +333,7 @@
333333
},
334334
"fastq_fastqc_umitools_trimgalore": {
335335
"branch": "master",
336-
"git_sha": "49f4e50534fe4b64101e62ea41d5dc43b1324358",
336+
"git_sha": "8c5eeedd45e295fc9a4f164631da6a8b37e6b9c6",
337337
"installed_by": ["fastq_qc_trim_filter_setstrandedness", "subworkflows"]
338338
},
339339
"fastq_qc_trim_filter_setstrandedness": {

modules/nf-core/trimgalore/environment.yml

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

modules/nf-core/trimgalore/main.nf

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

modules/nf-core/trimgalore/tests/main.nf.test.snap

Lines changed: 10 additions & 10 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

nextflow.config

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,7 @@ profiles {
186186
}
187187
arm {
188188
docker.runOptions = '-u $(id -u):$(id -g) --platform=linux/amd64'
189+
includeConfig 'conf/arm.config'
189190
}
190191
singularity {
191192
singularity.enabled = true

subworkflows/local/align_star/nextflow.config

Lines changed: 47 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,55 @@
1+
def generateStarAlignArgs(save_unaligned, contaminant_screening, extra_star_align_args) {
2+
def argsToMap = { String args ->
3+
args.split(/\s(?=--)/).collectEntries {
4+
def parts = it.trim().split(/\s+/, 2)
5+
[(parts[0]): parts.size() > 1 ? parts[1] : '']
6+
}
7+
}
8+
9+
def base_args = """
10+
--quantMode TranscriptomeSAM
11+
--twopassMode Basic
12+
--outSAMtype BAM Unsorted
13+
--readFilesCommand zcat
14+
--runRNGseed 0
15+
--outFilterMultimapNmax 20
16+
--alignSJDBoverhangMin 1
17+
--outSAMattributes NH HI AS NM MD
18+
--outSAMstrandField intronMotif
19+
""".trim()
20+
21+
if (save_unaligned || contaminant_screening) {
22+
base_args += "\n--outReadsUnmapped Fastx"
23+
}
24+
25+
def final_args_map = argsToMap(base_args) + (extra_star_align_args ? argsToMap(extra_star_align_args) : [:])
26+
final_args_map.collect { key, value -> "${key} ${value}".trim() }.join(' ')
27+
}
28+
129
if (!params.skip_alignment && params.aligner == 'star_salmon') {
230
process {
3-
withName: '.*:ALIGN_STAR:STAR_ALIGN|.*:ALIGN_STAR:STAR_ALIGN_IGENOMES' {
4-
ext.args = {
5-
// Function to convert argument strings into a map
6-
def argsToMap = { String args ->
7-
args.split("\\s(?=--)").collectEntries {
8-
def parts = it.trim().split(/\s+/, 2)
9-
[(parts.first()): parts.last()]
10-
}
11-
}
12-
13-
// Initialize the map with preconfigured values
14-
def preset_args_map = argsToMap("""
15-
--quantMode TranscriptomeSAM
16-
--twopassMode Basic
17-
--outSAMtype BAM Unsorted
18-
--readFilesCommand zcat
19-
--runRNGseed 0
20-
--outFilterMultimapNmax 20
21-
--alignSJDBoverhangMin 1
22-
--outSAMattributes NH HI AS NM MD
23-
--quantTranscriptomeSAMoutput BanSingleEnd
24-
--outSAMstrandField intronMotif
25-
${params.save_unaligned || params.contaminant_screening ? '--outReadsUnmapped Fastx' : ''}
26-
""".trim())
2731

28-
// Consolidate the extra arguments
29-
def final_args_map = preset_args_map + (params.extra_star_align_args ? argsToMap(params.extra_star_align_args) : [:])
32+
// We have to condition this, because the args are slightly different between the latest STAR and the one compatible with iGenomes
3033

31-
// Convert the map back to a list and then to a single string
32-
final_args_map.collect { key, value -> "${key} ${value}" }.join(' ').trim()
34+
withName: '.*:ALIGN_STAR:STAR_ALIGN' {
35+
ext.args = {
36+
generateStarAlignArgs(
37+
params.save_unaligned,
38+
params.contaminant_screening,
39+
(params.extra_star_align_args ?: '') + ' --quantTranscriptomeSAMoutput BanSingleEnd'
40+
)
41+
}
42+
}
43+
withName: '.*:ALIGN_STAR:STAR_ALIGN_IGENOMES' {
44+
ext.args = {
45+
generateStarAlignArgs(
46+
params.save_unaligned,
47+
params.contaminant_screening,
48+
(params.extra_star_align_args ?: '') + ' --quantTranscriptomeBan Singleend'
49+
)
3350
}
51+
}
52+
withName: '.*:ALIGN_STAR:STAR_ALIGN|.*:ALIGN_STAR:STAR_ALIGN_IGENOMES' {
3453

3554
publishDir = [
3655
[

0 commit comments

Comments
 (0)