Improve docs

pinin4fjords · pinin4fjords · commit e16b8dc07ee0 · 2025-09-09T17:31:48.000+01:00
diff --git a/README.md b/README.md
@@ -76,7 +76,7 @@ CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz,a
 
 Each row represents a fastq file (single-end) or a pair of fastq files (paired end). Rows with the same sample identifier are considered technical replicates and merged automatically. The strandedness refers to the library preparation and will be automatically inferred if set to `auto`.
 
-The pipeline also supports providing pre-aligned BAM files from previous runs as input by using the optional `genome_bam` and `transcriptome_bam` columns in the samplesheet. This is particularly useful for reprocessing data or running downstream analysis steps without repeating the computationally expensive alignment step. When using `--save_align_intermeds`, the pipeline generates a complete samplesheet with BAM paths for convenient future reanalysis.
+The pipeline supports a two-step reprocessing workflow using BAM files from previous runs. Run initially with `--save_align_intermeds` to generate a samplesheet with BAM paths, then reprocess using `--skip_alignment` for efficient downstream analysis without repeating expensive alignment steps. This feature is designed specifically for pipeline-generated BAMs.
 
 > [!WARNING]
 > Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).
diff --git a/docs/usage.md b/docs/usage.md
@@ -106,48 +106,60 @@ An [example samplesheet](../assets/samplesheet.csv) has been provided with the p
 
 > **NB:** The `group` and `replicate` columns were replaced with a single `sample` column as of v3.1 of the pipeline. The `sample` column is essentially a concatenation of the `group` and `replicate` columns, however it now also offers more flexibility in instances where replicate information is not required e.g. when sequencing clinical samples. If all values of `sample` have the same number of underscores, fields defined by these underscore-separated names may be used in the PCA plots produced by the pipeline, to regain the ability to represent different groupings.
 
-### Using BAM files as input
+### BAM input for reprocessing workflow
 
-The pipeline supports providing pre-aligned BAM files as input instead of, or in addition to, FASTQ files. This functionality is primarily designed for reusing BAM files generated by previous runs of this pipeline, allowing you to:
+The pipeline supports a **two-step workflow** for efficient reprocessing without expensive alignment steps. This feature is designed specifically for re-running with BAM files generated by previous runs of this same pipeline.
 
-- Skip computationally expensive alignment steps when reprocessing data
-- Run downstream analysis and QC steps on existing alignments
-- Process a mix of newly sequenced samples (FASTQ) and previously processed samples (BAM)
+#### Step 1: Initial run with BAM generation
 
-To use BAM files as input, add the optional `genome_bam` and/or `transcriptome_bam` columns to your samplesheet:
+Run the pipeline normally, adding `--save_align_intermeds` to publish BAM files and generate a reusable samplesheet:
 
-```csv title="samplesheet_with_bams.csv"
-sample,fastq_1,fastq_2,strandedness,genome_bam,transcriptome_bam
-SAMPLE1,,,forward,/path/to/sample1.markdup.sorted.bam,/path/to/sample1.toTranscriptome.bam
-SAMPLE2,sample2_R1.fastq.gz,sample2_R2.fastq.gz,forward,,
+```bash
+nextflow run nf-core/rnaseq \
+  --input samplesheet.csv \
+  --save_align_intermeds \
+  --outdir results_initial \
+  -profile docker
 ```
 
-**Important notes:**
+This creates `samplesheets/samplesheet_with_bams.csv` containing paths to the generated BAM files.
 
-- BAM files should preferably come from previous runs of this pipeline to ensure compatibility
-- The pipeline will automatically index provided BAM files
-- You can provide just `genome_bam`, just `transcriptome_bam`, or both
-- When using BAM input, you can leave the FASTQ columns empty or omit them
-- Mixed samplesheets (some samples with FASTQ, others with BAM) are supported
-- For BAM file locations from pipeline outputs, see the [output documentation](https://nf-co.re/rnaseq/output)
-- **Automated samplesheet generation**: When using `--save_align_intermeds`, the pipeline automatically generates a `samplesheet_with_bams.csv` file in the `samplesheets/` directory containing all samples with their BAM file paths. For FASTQ-derived samples, this includes paths to newly generated BAMs; for BAM input samples, it preserves the original input paths. This complete samplesheet can be used directly for future pipeline runs
+#### Step 2: Reprocessing run with BAM input
+
+Use the auto-generated samplesheet to reprocess data, skipping alignment:
+
+```bash
+nextflow run nf-core/rnaseq \
+  --input samplesheets/samplesheet_with_bams.csv \
+  --skip_alignment \
+  --outdir results_reprocessed \
+  -profile docker
+```
+
+The pipeline will skip alignment and indexing steps, putting the BAM files through post-processing and quantification only.
 
-### Reprocessing workflow with BAM input
+#### Example of generated samplesheet
 
-When reprocessing data using the auto-generated `samplesheet_with_bams.csv` from a previous run:
+The `samplesheet_with_bams.csv` will look like:
 
-1. **Use the generated samplesheet**: The `samplesheet_with_bams.csv` contains all necessary BAM file paths
-2. **Skip alignment steps**: Add `--skip_alignment` to prevent unnecessary index generation and alignment processing
-3. **Example command**:
-   ```bash
-   nextflow run nf-core/rnaseq \
-     --input samplesheets/samplesheet_with_bams.csv \
-     --skip_alignment \
-     --outdir results_reprocessed \
-     -profile docker
-   ```
+```csv
+sample,fastq_1,fastq_2,strandedness,genome_bam,percent_mapped,transcriptome_bam
+SAMPLE1,/path/sample1_R1.fastq.gz,/path/sample1_R2.fastq.gz,forward,results/star_salmon/SAMPLE1.markdup.sorted.bam,85.2,results/star_salmon/SAMPLE1.Aligned.toTranscriptome.out.bam
+SAMPLE2,/path/sample2_R1.fastq.gz,,reverse,results/star_salmon/SAMPLE2.sorted.bam,92.1,results/star_salmon/SAMPLE2.Aligned.toTranscriptome.out.bam
+```
+
+#### Important limitations
+
+> **⚠️ Warning**: This feature is designed specifically for BAM files generated by this pipeline. Using arbitrary BAM files from other sources is **not officially supported** and will likely only work via the two-step workflow described above. Users attempting to use other BAMs do so at their own risk.
+
+**Key technical details:**
+
+- The pipeline automatically indexes provided BAM files
+- You can provide just `genome_bam`, just `transcriptome_bam`, or both
+- Mixed samplesheets (some samples with FASTQ, others with BAM) are supported
+- For BAM file locations from pipeline outputs, see the [output documentation](https://nf-co.re/rnaseq/output)
 
-This approach allows you to efficiently reprocess data for downstream analysis (quantification, differential expression, QC) without repeating the time-consuming alignment steps.
+This workflow is ideal for tweaking downstream processing steps (quantification methods, QC parameters, differential expression analysis) without repeating time-consuming alignment.
 
 ## FASTQ sampling