Skip to content

Commit a830125

Browse files
docs: Add rnaseq-nf tutorial (#6440)
* Add rnaseq-nf tutorial Signed-off-by: Christopher Hakkaart <[email protected]> * Add page target Signed-off-by: Christopher Hakkaart <[email protected]> * Remove MULTIQC mermaid Signed-off-by: Christopher Hakkaart <[email protected]> * Rename tutorial page Signed-off-by: Christopher Hakkaart <[email protected]> * Make h3 headings Signed-off-by: Christopher Hakkaart <[email protected]> * Add references and fix headings Signed-off-by: Christopher Hakkaart <[email protected]> * Add links to pipeline Signed-off-by: Christopher Hakkaart <[email protected]> * Add links to repo Signed-off-by: Christopher Hakkaart <[email protected]> * Imprive consistency Signed-off-by: Christopher Hakkaart <[email protected]> * cleanup Signed-off-by: Ben Sherman <[email protected]> * Update workflow outputs tutorial [wip] Signed-off-by: Ben Sherman <[email protected]> * Align with rnaseq-nf Signed-off-by: Ben Sherman <[email protected]> * Clean up language for consistency and fix typos Signed-off-by: Christopher Hakkaart <[email protected]> --------- Signed-off-by: Christopher Hakkaart <[email protected]> Signed-off-by: Ben Sherman <[email protected]> Co-authored-by: Ben Sherman <[email protected]>
1 parent 3a53ba4 commit a830125

File tree

3 files changed

+267
-104
lines changed

3 files changed

+267
-104
lines changed

docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -171,6 +171,7 @@ developer/packages
171171
:caption: Tutorials
172172
:maxdepth: 1
173173
174+
tutorials/rnaseq-nf
174175
tutorials/data-lineage
175176
tutorials/workflow-outputs
176177
tutorials/metrics

docs/tutorials/rnaseq-nf.md

Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
(rnaseq-nf-page)=
2+
3+
# Getting started with rnaseq-nf
4+
5+
[`rnaseq-nf`](https://github.com/nextflow-io/rnaseq-nf) is a basic Nextflow pipeline for RNA-Seq analysis that performs quality control, transcript quantification, and result aggregation. The pipeline processes paired-end FASTQ files, generates quality control reports with [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/), quantifies transcripts with [Salmon](https://salmon.readthedocs.io/en/latest/salmon.html), and produces a unified report with [MultiQC](https://seqera.io/multiqc/).
6+
7+
This tutorial describes the architecture of the [`rnaseq-nf`](https://github.com/nextflow-io/rnaseq-nf) pipeline and provides instructions on how to run it.
8+
9+
## Pipeline architecture
10+
11+
The pipeline is organized into modular workflows and processes that coordinate data flow from input files through analysis steps to final outputs.
12+
13+
### Entry workflow
14+
15+
The [entry workflow](https://github.com/nextflow-io/rnaseq-nf/blob/master/main.nf) orchestrates the entire pipeline by coordinating input parameters and data flow:
16+
17+
```{mermaid}
18+
flowchart TB
19+
subgraph " "
20+
subgraph params
21+
v0["transcriptome"]
22+
v1["reads"]
23+
v5["multiqc"]
24+
v2["outdir"]
25+
end
26+
v4([RNASEQ])
27+
v6([MULTIQC])
28+
v0 --> v4
29+
v1 --> v4
30+
v4 --> v6
31+
v5 --> v6
32+
end
33+
```
34+
35+
Data flow:
36+
37+
- The `transcriptome` and `reads` parameters are passed to the `RNASEQ` subworkflow, which performs indexing, quality control, and quantification.
38+
39+
- The outputs from `RNASEQ`, along with the MultiQC configuration (`multiqc`), are passed to the `MULTIQC` module, which aggregates results into a unified HTML report.
40+
41+
- The `outdir` parameter defines where all results are published.
42+
43+
### `RNASEQ`
44+
45+
The [`RNASEQ`](https://github.com/nextflow-io/rnaseq-nf/blob/master/modules/rnaseq.nf) subworkflow coordinates three processes that run in parallel and sequence:
46+
47+
```{mermaid}
48+
flowchart TB
49+
subgraph RNASEQ
50+
subgraph take
51+
v0["read_pairs_ch"]
52+
v1["transcriptome"]
53+
end
54+
v2([INDEX])
55+
v4([FASTQC])
56+
v6([QUANT])
57+
subgraph emit
58+
v8["fastqc"]
59+
v9["quant"]
60+
end
61+
v1 --> v2
62+
v0 --> v4
63+
v0 --> v6
64+
v2 --> v6
65+
v4 --> v8
66+
v6 --> v9
67+
end
68+
```
69+
70+
Inputs (`take:`):
71+
72+
- `read_pairs_ch`: A channel of paired-end read files
73+
- `transcriptome`: A reference transcriptome file
74+
75+
Data flow (`main:`):
76+
77+
- [`INDEX`](https://github.com/nextflow-io/rnaseq-nf/blob/master/modules/index/main.nf) creates a Salmon index from the `transcriptome` input (runs once).
78+
79+
- [`FASTQC`](https://github.com/nextflow-io/rnaseq-nf/blob/master/modules/fastqc/main.nf) analyzes the samples in the `read_pairs_ch` channel in parallel (runs independently for each sample).
80+
81+
- [`QUANT`](https://github.com/nextflow-io/rnaseq-nf/blob/master/modules/quant/main.nf) quantifies transcripts using the index from `INDEX` and the samples in the `read_pairs_ch` channel (runs for each sample after `INDEX` completes).
82+
83+
Outputs (`emit:`):
84+
85+
- `fastqc`: The results from `FASTQC`
86+
87+
- `quant`: The results from `QUANT`
88+
89+
### `MULTIQC`
90+
91+
The [`MULTIQC`](https://github.com/nextflow-io/rnaseq-nf/blob/master/modules/multiqc/main.nf) process aggregates all quality control and quantification outputs into a comprehensive HTML report.
92+
93+
Inputs:
94+
95+
- Input files: All collected outputs from the `RNASEQ` subworkflow (FastQC reports and Salmon quantification files).
96+
- `config`: MultiQC configuration files and branding (logo, styling).
97+
98+
Process execution:
99+
100+
- `MULTIQC` scans all input files, extracts metrics and statistics, and generates a unified report.
101+
102+
Outputs:
103+
104+
- `multiqc_report.html`: A single consolidated HTML report providing an overview of:
105+
- General stats
106+
- Salmon fragment length distribution
107+
- FastQC quality control
108+
- Software versions
109+
110+
## Pipeline parameters
111+
112+
The pipeline behavior can be customized using command-line parameters to specify input data, output locations, and configuration files.
113+
114+
The pipeline accepts the following command-line parameters:
115+
116+
- `--reads`: Path to paired-end FASTQ files (default: `data/ggal/ggal_gut_{1,2}.fq`).
117+
118+
- `--transcriptome`: Path to reference transcriptome FASTA (default: `data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa`).
119+
120+
- `--outdir`: Output directory for results (default: `results`).
121+
122+
- `--multiqc`: Path to MultiQC configuration directory (default: `multiqc`).
123+
124+
## Configuration profiles
125+
126+
Configuration profiles allow you to customize how and where the pipeline runs by specifying the `-profile` flag. Multiple profiles can be specified as a comma-separated list. Profiles are defined in the [`nextflow.config`](https://github.com/nextflow-io/rnaseq-nf/blob/master/nextflow.config) file in the base directory.
127+
128+
<h3>Software profiles</h3>
129+
130+
Software profiles specify how software dependencies for processes should be provisioned:
131+
132+
- `conda`: Provision a Conda environment for each process based on its required Conda packages
133+
- `docker`: Use a Docker container which contains all required dependencies
134+
- `singularity`: Use a Singularity container which contains all required dependencies
135+
- `wave`: Provision a Wave container for each process based on its required Conda packages
136+
137+
:::{note}
138+
The respective container runtime or package manager must be installed to use these profiles.
139+
:::
140+
141+
<h3>Execution profiles</h3>
142+
143+
Execution profiles specify the compute and storage environment used by the pipeline:
144+
145+
- `slurm`: Run on a SLURM HPC cluster
146+
- `batch`: Run on AWS Batch
147+
- `google-batch`: Run on Google Cloud Batch
148+
- `azure-batch`: Run on Azure Batch
149+
150+
:::{note}
151+
Depending on your environment, you may need to configure underlying infrastructure such as resource pools, storage, and credentials.
152+
:::
153+
154+
## Test data
155+
156+
The pipeline includes test data in the [`data/ggal/`](https://github.com/nextflow-io/rnaseq-nf/tree/master/data/ggal) directory for demonstration and validation purposes:
157+
158+
- Paired-end FASTQ files from four tissue samples (gut, liver, lung, spleen):
159+
- `ggal_gut_{1,2}.fq`
160+
- `ggal_liver_{1,2}.fq`
161+
- `ggal_lung_{1,2}.fq`
162+
- `ggal_spleen_{1,2}.fq`
163+
164+
- Reference transcriptome:
165+
- `ggal_1_48850000_49020000.Ggal71.500bpflank.fa`
166+
167+
By default, only the `gut` sample is processed. You can use the `all-reads` profile to process all four tissue samples.
168+
169+
## Quick start
170+
171+
The [`rnaseq-nf`](https://github.com/nextflow-io/rnaseq-nf) pipeline is executable out-of-the-box. This section provides examples for running the pipeline with different configurations.
172+
173+
### Basic execution
174+
175+
Run the pipeline with default parameters using Docker:
176+
177+
```bash
178+
nextflow run nextflow-io/rnaseq-nf -profile docker
179+
```
180+
181+
### Configuring individual parameters
182+
183+
Override default parameters to use custom input files and output locations:
184+
185+
```bash
186+
nextflow run nextflow-io/rnaseq-nf \
187+
--reads '/path/to/reads/*_{1,2}.fastq.gz' \
188+
--transcriptome '/path/to/transcriptome.fa' \
189+
--outdir 'my_results' \
190+
-profile docker
191+
```
192+
193+
### Using profiles
194+
195+
Specify configuration profiles to customize runtime environments and data sources:
196+
197+
```bash
198+
# Use Conda to provision software dependencies
199+
nextflow run nextflow-io/rnaseq-nf -profile conda
200+
201+
# Run on a SLURM cluster
202+
nextflow run nextflow-io/rnaseq-nf -profile slurm
203+
204+
# Combine multiple profiles: process all reads using Docker
205+
nextflow run nextflow-io/rnaseq-nf -profile all-reads,docker
206+
```
207+
208+
:::{tip}
209+
See [Configuration profiles](#configuration-profiles) for more information about profiles.
210+
:::
211+
212+
## Expected outputs
213+
214+
The [`rnaseq-nf`](https://github.com/nextflow-io/rnaseq-nf) pipeline produces the following outputs in the `results` directory:
215+
216+
```
217+
results/
218+
├── fastqc_<SAMPLE_ID>_logs/ # FastQC quality reports per sample
219+
│ ├── <SAMPLE_ID>_1_fastqc.html
220+
│ ├── <SAMPLE_ID>_1_fastqc.zip
221+
│ ├── <SAMPLE_ID>_2_fastqc.html
222+
│ └── <SAMPLE_ID>_2_fastqc.zip
223+
└── multiqc_report.html # Aggregated QC and Salmon report
224+
```
225+
226+
The MultiQC report (`multiqc_report.html`) can be viewed in a web browser.

0 commit comments

Comments
 (0)