Skip to content

Commit e16b8dc

Browse files
committed
Improve docs
1 parent 3f34c9a commit e16b8dc

File tree

2 files changed

+44
-32
lines changed

2 files changed

+44
-32
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz,a
7676

7777
Each row represents a fastq file (single-end) or a pair of fastq files (paired end). Rows with the same sample identifier are considered technical replicates and merged automatically. The strandedness refers to the library preparation and will be automatically inferred if set to `auto`.
7878

79-
The pipeline also supports providing pre-aligned BAM files from previous runs as input by using the optional `genome_bam` and `transcriptome_bam` columns in the samplesheet. This is particularly useful for reprocessing data or running downstream analysis steps without repeating the computationally expensive alignment step. When using `--save_align_intermeds`, the pipeline generates a complete samplesheet with BAM paths for convenient future reanalysis.
79+
The pipeline supports a two-step reprocessing workflow using BAM files from previous runs. Run initially with `--save_align_intermeds` to generate a samplesheet with BAM paths, then reprocess using `--skip_alignment` for efficient downstream analysis without repeating expensive alignment steps. This feature is designed specifically for pipeline-generated BAMs.
8080

8181
> [!WARNING]
8282
> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).

docs/usage.md

Lines changed: 43 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -106,48 +106,60 @@ An [example samplesheet](../assets/samplesheet.csv) has been provided with the p
106106

107107
> **NB:** The `group` and `replicate` columns were replaced with a single `sample` column as of v3.1 of the pipeline. The `sample` column is essentially a concatenation of the `group` and `replicate` columns, however it now also offers more flexibility in instances where replicate information is not required e.g. when sequencing clinical samples. If all values of `sample` have the same number of underscores, fields defined by these underscore-separated names may be used in the PCA plots produced by the pipeline, to regain the ability to represent different groupings.
108108
109-
### Using BAM files as input
109+
### BAM input for reprocessing workflow
110110

111-
The pipeline supports providing pre-aligned BAM files as input instead of, or in addition to, FASTQ files. This functionality is primarily designed for reusing BAM files generated by previous runs of this pipeline, allowing you to:
111+
The pipeline supports a **two-step workflow** for efficient reprocessing without expensive alignment steps. This feature is designed specifically for re-running with BAM files generated by previous runs of this same pipeline.
112112

113-
- Skip computationally expensive alignment steps when reprocessing data
114-
- Run downstream analysis and QC steps on existing alignments
115-
- Process a mix of newly sequenced samples (FASTQ) and previously processed samples (BAM)
113+
#### Step 1: Initial run with BAM generation
116114

117-
To use BAM files as input, add the optional `genome_bam` and/or `transcriptome_bam` columns to your samplesheet:
115+
Run the pipeline normally, adding `--save_align_intermeds` to publish BAM files and generate a reusable samplesheet:
118116

119-
```csv title="samplesheet_with_bams.csv"
120-
sample,fastq_1,fastq_2,strandedness,genome_bam,transcriptome_bam
121-
SAMPLE1,,,forward,/path/to/sample1.markdup.sorted.bam,/path/to/sample1.toTranscriptome.bam
122-
SAMPLE2,sample2_R1.fastq.gz,sample2_R2.fastq.gz,forward,,
117+
```bash
118+
nextflow run nf-core/rnaseq \
119+
--input samplesheet.csv \
120+
--save_align_intermeds \
121+
--outdir results_initial \
122+
-profile docker
123123
```
124124

125-
**Important notes:**
125+
This creates `samplesheets/samplesheet_with_bams.csv` containing paths to the generated BAM files.
126126

127-
- BAM files should preferably come from previous runs of this pipeline to ensure compatibility
128-
- The pipeline will automatically index provided BAM files
129-
- You can provide just `genome_bam`, just `transcriptome_bam`, or both
130-
- When using BAM input, you can leave the FASTQ columns empty or omit them
131-
- Mixed samplesheets (some samples with FASTQ, others with BAM) are supported
132-
- For BAM file locations from pipeline outputs, see the [output documentation](https://nf-co.re/rnaseq/output)
133-
- **Automated samplesheet generation**: When using `--save_align_intermeds`, the pipeline automatically generates a `samplesheet_with_bams.csv` file in the `samplesheets/` directory containing all samples with their BAM file paths. For FASTQ-derived samples, this includes paths to newly generated BAMs; for BAM input samples, it preserves the original input paths. This complete samplesheet can be used directly for future pipeline runs
127+
#### Step 2: Reprocessing run with BAM input
128+
129+
Use the auto-generated samplesheet to reprocess data, skipping alignment:
130+
131+
```bash
132+
nextflow run nf-core/rnaseq \
133+
--input samplesheets/samplesheet_with_bams.csv \
134+
--skip_alignment \
135+
--outdir results_reprocessed \
136+
-profile docker
137+
```
138+
139+
The pipeline will skip alignment and indexing steps, putting the BAM files through post-processing and quantification only.
134140

135-
### Reprocessing workflow with BAM input
141+
#### Example of generated samplesheet
136142

137-
When reprocessing data using the auto-generated `samplesheet_with_bams.csv` from a previous run:
143+
The `samplesheet_with_bams.csv` will look like:
138144

139-
1. **Use the generated samplesheet**: The `samplesheet_with_bams.csv` contains all necessary BAM file paths
140-
2. **Skip alignment steps**: Add `--skip_alignment` to prevent unnecessary index generation and alignment processing
141-
3. **Example command**:
142-
```bash
143-
nextflow run nf-core/rnaseq \
144-
--input samplesheets/samplesheet_with_bams.csv \
145-
--skip_alignment \
146-
--outdir results_reprocessed \
147-
-profile docker
148-
```
145+
```csv
146+
sample,fastq_1,fastq_2,strandedness,genome_bam,percent_mapped,transcriptome_bam
147+
SAMPLE1,/path/sample1_R1.fastq.gz,/path/sample1_R2.fastq.gz,forward,results/star_salmon/SAMPLE1.markdup.sorted.bam,85.2,results/star_salmon/SAMPLE1.Aligned.toTranscriptome.out.bam
148+
SAMPLE2,/path/sample2_R1.fastq.gz,,reverse,results/star_salmon/SAMPLE2.sorted.bam,92.1,results/star_salmon/SAMPLE2.Aligned.toTranscriptome.out.bam
149+
```
150+
151+
#### Important limitations
152+
153+
> **⚠️ Warning**: This feature is designed specifically for BAM files generated by this pipeline. Using arbitrary BAM files from other sources is **not officially supported** and will likely only work via the two-step workflow described above. Users attempting to use other BAMs do so at their own risk.
154+
155+
**Key technical details:**
156+
157+
- The pipeline automatically indexes provided BAM files
158+
- You can provide just `genome_bam`, just `transcriptome_bam`, or both
159+
- Mixed samplesheets (some samples with FASTQ, others with BAM) are supported
160+
- For BAM file locations from pipeline outputs, see the [output documentation](https://nf-co.re/rnaseq/output)
149161

150-
This approach allows you to efficiently reprocess data for downstream analysis (quantification, differential expression, QC) without repeating the time-consuming alignment steps.
162+
This workflow is ideal for tweaking downstream processing steps (quantification methods, QC parameters, differential expression analysis) without repeating time-consuming alignment.
151163

152164
## FASTQ sampling
153165

0 commit comments

Comments
 (0)