RNAcentral · Vicbeg · Feb 16, 2026 · Feb 19, 2026 · Feb 19, 2026 · Feb 23, 2026
diff --git a/.editorconfig b/.editorconfig
@@ -0,0 +1,12 @@
+root = true
+
+[*]
+charset = utf-8
+end_of_line = lf
+insert_final_newline = true
+trim_trailing_whitespace = true
+indent_size = 4
+indent_style = space
+
+[*.{md,yml,yaml,html,css,scss,js,cff}]
+indent_size = 2
diff --git a/.gitignore b/.gitignore
@@ -1,9 +1,14 @@
 .nextflow*
+.nf-test*
 work/
-data/
 results/
 .DS_Store
 testing/
 testing*
 *.pyc
 null/
+
+assets/testdata/yeast/*
+local.config
+nf-metro/old
+nf-metro/final
diff --git a/.nf-core.yml b/.nf-core.yml
@@ -2,7 +2,12 @@ repository_type: pipeline
 
 nf_core_version: 3.5.2
 
-lint: {}
+lint:
+  files_unchanged:
+    - assets/email_template.html
+    - assets/nf-core-rnastructurome_logo_light.png
+    - docs/images/nf-core-rnastructurome_logo_light.png
+    - docs/images/nf-core-rnastructurome_logo_dark.png
 
 template:
   org: nf-core

diff --git a/.vscode/settings.json b/.vscode/settings.json
@@ -1,3 +1,26 @@
 {
-    "markdown.styles": ["public/vscode_markdown.css"]
+    "markdown.styles": ["public/vscode_markdown.css"],
+    "files.exclude": {
+        "**/.venv": true,
+        "**/.nf-test": true,
+        "**/work": true
+    },
+    "search.exclude": {
+        "**/.venv": true,
+        "**/.nf-test": true,
+        "**/work": true
+    },
+    "files.watcherExclude": {
+        "**/.venv/**": true,
+        "**/.nf-test/**": true,
+        "**/work/**": true
+    },
+    "nextflow.files.exclude": [
+        ".git",
+        ".nf-test",
+        ".nf-test/**",
+        "work",
+        "work/**",
+        ".venv"
+    ]
 }
diff --git a/CITATIONS.md b/CITATIONS.md
@@ -10,13 +10,37 @@
 
 ## Pipeline tools
 
+- [Bowtie](https://pubmed.ncbi.nlm.nih.gov/19261174/)
+
+  > Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25. PubMed PMID: 19261174; PubMed Central PMCID: PMC2690996.
+
+- [Bowtie2](https://pubmed.ncbi.nlm.nih.gov/22388286/)
+
+  > Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923. PubMed PMID: 22388286; PubMed Central PMCID: PMC3322381.
+
+- [Cutadapt](https://doi.org/10.14806/ej.17.1.200)
+
+  > Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal. 2011;17(1):10-12. doi: 10.14806/ej.17.1.200.
+
 - [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
 
-> Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].
+  > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].
 
 - [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
 
-> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
+  > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
+
+- [RNAFramework](https://pubmed.ncbi.nlm.nih.gov/28531371/)
+
+  > Incarnato D, Neri F, Anselmi F, Oliviero S. RNA structure framework: automated transcriptome-wide reconstruction of RNA secondary structures from high-throughput structure probing data. Bioinformatics. 2016 Aug 15;32(16):2533-5. doi: 10.1093/bioinformatics/btw239. PubMed PMID: 28531371.
+
+- [SAMtools](https://pubmed.ncbi.nlm.nih.gov/33590861/)
+
+  > Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H. Twelve years of SAMtools and BCFtools. Gigascience. 2021 Feb 16;10(2):giab008. doi: 10.1093/gigascience/giab008. PubMed PMID: 33590861; PubMed Central PMCID: PMC7931819.
+
+- [UMI-tools](https://pubmed.ncbi.nlm.nih.gov/28100584/)
+
+  > Smith T, Heger A, Sudbery I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res. 2017 Mar;27(3):491-499. doi: 10.1101/gr.209601.116. Epub 2017 Jan 18. PubMed PMID: 28100584; PubMed Central PMCID: PMC5340976.
 
 ## Software packaging/containerisation tools
 

diff --git a/README.md b/README.md
@@ -21,7 +21,7 @@
 
 ## Introduction
 
-**nf-core/rnastructurome** is a bioinformatics pipeline that ...
+**nf-core/rnastructurome** is a bioinformatics pipeline for the analysis of chemical-based high-throughput RNA structure probing data. It accepts FASTQ files from SHAPE or DMS experiments using either the **RT-stop** or **mutational profiling (MaP)** principle, and processes them from raw reads through alignment and deduplication to per-base reactivity scores and RNA secondary structure predictions using the [RNAFramework](https://rnaframework.readthedocs.io) toolkit.
 
 <!-- TODO nf-core:
    Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
@@ -65,6 +65,52 @@ nextflow run nf-core/rnastructurome \
    --outdir <OUTDIR>
 ```
 
+For `rf-norm`, the samplesheet must include `cell_line`, `condition`, and `replicate` columns so samples are paired correctly during normalisation.
+
+The minimum required samplesheet columns are `sample`, `fastq_1`, `cell_line`, `condition`, and `replicate`.
+
+- Samples are grouped by identical `cell_line` and `replicate`.
+- `treated` may be analysed on its own.
+- `untreated` requires a matching `treated` sample with the same `cell_line` and `replicate`.
+- `denatured` requires matching `treated` and `untreated` samples with the same `cell_line` and `replicate`.
+
+`rf-norm` defaults are selected automatically from the probing principle and available controls:
+
+- `RT-stop` with matching `untreated`: Ding scoring (`-sm 1`) with Box-plot normalisation (`-nm 3`)
+- `RT-stop` without `untreated`: Rouskin scoring (`-sm 2`) with 90% Winsorizing (`-nm 2`)
+- `MaP` with matching `untreated` and optional `denatured`: Siegfried scoring (`-sm 3`) with Box-plot normalisation (`-nm 3`)
+- `MaP` without `untreated`: Zubradt scoring (`-sm 4`) with Box-plot normalisation (`-nm 3`)
+
+Additional `rf-norm` parameters exposed by the pipeline:
+
+- `--rfnorm_remap_reactivities`: remap normalized reactivities to the 0-1 range.
+- `--rfnorm_reactive_bases <string>`: set the reactive bases used for normalization windows, e.g. `AC` for DMS.
+- `--rfnorm_norm_window <int>`: set the normalization window size.
+- `--rfnorm_window_offset <int>`: set the normalization window offset.
+- `--rfnorm_dynamic_window <int>`: use dynamic normalization windows with at least this many reactive bases.
+- `--rfnorm_norm_independent`: normalize each reactive base independently.
+- `--rfnorm_norm_factor <float[,float]>`: supply a fixed normalization factor for all transcripts. For 90% Winsorizing, provide two comma-separated values.
+- `--rfnorm_raw`: score raw reactivities without applying normalization.
+- `--rfnorm_pseudocount <float>`: set the Ding pseudocount.
+- `--rfnorm_max_score <float>`: set the Ding maximum score.
+- `--rfnorm_ignore_lower_than_untreated`: set reactivities lower than untreated to zero for Ding/Siegfried methods.
+- `--rfnorm_max_untreated_mut <float>`: set the Siegfried untreated mutation cutoff.
+- `--rfnorm_max_mutation_rate <float>`: set the MaP mutation-rate cutoff.
+- `--rfnorm_mean_coverage <float>`: discard transcripts below this mean coverage.
+- `--rfnorm_median_coverage <float>`: discard transcripts below this median coverage.
+- `--rfnorm_nan <int>`: report positions below this coverage as NaN. Default: `10`.
+- `--rfnorm_img`: generate rf-norm plots. This automatically uses `--rnaframework_r_path` to locate `R` inside the RNAframework container.
+
+If `--rfnorm_reactive_bases` is not provided, the pipeline sets `AC` automatically for samples with `method=DMS`. All other methods fall back to the RNAFramework default (`N`, all bases).
+
+Example:
+
+```csv
+sample,fastq_1,cell_line,condition,replicate
+HEK293T_treated_rep1,HEK293T_treated_rep1.fastq.gz,HEK293T,treated,1
+HEK293T_untreated_rep1,HEK293T_untreated_rep1.fastq.gz,HEK293T,untreated,1
+```
+
 > [!WARNING]
 > Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).
 

diff --git a/assets/methods_description_template.yml b/assets/methods_description_template.yml
@@ -3,21 +3,26 @@ description: "Suggested text and references to use when describing pipeline usag
 section_name: "nf-core/rnastructurome Methods Description"
 section_href: "https://github.com/nf-core/rnastructurome"
 plot_type: "html"
-## TODO nf-core: Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
-## You inject any metadata in the Nextflow '${workflow}' object
 data: |
   <h4>Methods</h4>
   <p>Data was processed using nf-core/rnastructurome v${workflow.manifest.version} ${doi_text} of the nf-core collection of workflows (<a href="https://doi.org/10.1038/s41587-020-0439-x">Ewels <em>et al.</em>, 2020</a>), utilising reproducible software environments from the Bioconda (<a href="https://doi.org/10.1038/s41592-018-0046-7">Grüning <em>et al.</em>, 2018</a>) and Biocontainers (<a href="https://doi.org/10.1093/bioinformatics/btx192">da Veiga Leprevost <em>et al.</em>, 2017</a>) projects.</p>
   <p>The pipeline was executed with Nextflow v${workflow.nextflow.version} (<a href="https://doi.org/10.1038/nbt.3820">Di Tommaso <em>et al.</em>, 2017</a>) with the following command:</p>
   <pre><code>${workflow.commandLine}</code></pre>
-  <p>${tool_citations}</p>
+  <p>Tools used in the workflow included: FastQC (Andrews 2010), Cutadapt (Martin 2011), UMI-tools (Smith <em>et al.</em>, 2017), Bowtie (Langmead <em>et al.</em>, 2009), Bowtie2 (Langmead &amp; Salzberg 2012), SAMtools (Danecek <em>et al.</em>, 2021), RNAFramework (Incarnato <em>et al.</em>, 2018), and MultiQC (Ewels <em>et al.</em>, 2016).</p>
   <h4>References</h4>
   <ul>
     <li>Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316-319. doi: <a href="https://doi.org/10.1038/nbt.3820">10.1038/nbt.3820</a></li>
     <li>Ewels, P. A., Peltzer, A., Fillinger, S., Patel, H., Alneberg, J., Wilm, A., Garcia, M. U., Di Tommaso, P., & Nahnsen, S. (2020). The nf-core framework for community-curated bioinformatics pipelines. Nature Biotechnology, 38(3), 276-278. doi: <a href="https://doi.org/10.1038/s41587-020-0439-x">10.1038/s41587-020-0439-x</a></li>
     <li>Grüning, B., Dale, R., Sjödin, A., Chapman, B. A., Rowe, J., Tomkins-Tinch, C. H., Valieris, R., Köster, J., & Bioconda Team. (2018). Bioconda: sustainable and comprehensive software distribution for the life sciences. Nature Methods, 15(7), 475–476. doi: <a href="https://doi.org/10.1038/s41592-018-0046-7">10.1038/s41592-018-0046-7</a></li>
     <li>da Veiga Leprevost, F., Grüning, B. A., Alves Aflitos, S., Röst, H. L., Uszkoreit, J., Barsnes, H., Vaudel, M., Moreno, P., Gatto, L., Weber, J., Bai, M., Jimenez, R. C., Sachsenberg, T., Pfeuffer, J., Vera Alvarez, R., Griss, J., Nesvizhskii, A. I., & Perez-Riverol, Y. (2017). BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics (Oxford, England), 33(16), 2580–2582. doi: <a href="https://doi.org/10.1093/bioinformatics/btx192">10.1093/bioinformatics/btx192</a></li>
-    ${tool_bibliography}
+    <li>Andrews S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data. URL: <a href="https://www.bioinformatics.babraham.ac.uk/projects/fastqc/">https://www.bioinformatics.babraham.ac.uk/projects/fastqc/</a></li>
+    <li>Martin M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal, 17(1), 10–12. doi: <a href="https://doi.org/10.14806/ej.17.1.200">10.14806/ej.17.1.200</a></li>
+    <li>Smith T, Heger A, Sudbery I. (2017). UMI-tools: modelling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Research, 27(3), 491–499. doi: <a href="https://doi.org/10.1101/gr.209601.116">10.1101/gr.209601.116</a></li>
+    <li>Langmead B, Trapnell C, Pop M, Salzberg SL. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology, 10(3), R25. doi: <a href="https://doi.org/10.1186/gb-2009-10-3-r25">10.1186/gb-2009-10-3-r25</a></li>
+    <li>Langmead B, Salzberg SL. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4), 357–359. doi: <a href="https://doi.org/10.1038/nmeth.1923">10.1038/nmeth.1923</a></li>
+    <li>Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H. (2021). Twelve years of SAMtools and BCFtools. GigaScience, 10(2), giab008. doi: <a href="https://doi.org/10.1093/gigascience/giab008">10.1093/gigascience/giab008</a></li>
+    <li>Incarnato D, Neri F, Anselmi F, Oliviero S. (2016). RNA structure framework: automated transcriptome-wide reconstruction of RNA secondary structures from high-throughput structure probing data. Bioinformatics, 32(16), 2533–2535. doi: <a href="https://doi.org/10.1093/bioinformatics/btw239">10.1093/bioinformatics/btw239</a></li>
+    <li>Ewels P, Magnusson M, Lundin S, Käller M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics, 32(19), 3047–3048. doi: <a href="https://doi.org/10.1093/bioinformatics/btw354">10.1093/bioinformatics/btw354</a></li>
   </ul>
   <div class="alert alert-info">
     <h5>Notes:</h5>

diff --git a/assets/multiqc_config.yml b/assets/multiqc_config.yml
@@ -3,12 +3,24 @@ report_comment: >
   analysis pipeline. For information about how to interpret these results, please see the
   <a href="https://nf-co.re/rnastructurome/dev/docs/output" target="_blank">documentation</a>.
 report_section_order:
-  "nf-core-rnastructurome-methods-description":
-    order: -1000
-  software_versions:
-    order: -1001
+  fastqc-status-check-heatmap:
+    order: 1000
+  "nf-core-rnastructurome-count-progression":
+    order: 1001
+  cutadapt_filtered_reads_plot:
+    order: 1100
+  cutadapt_trimmed_sequences_plot_5:
+    order: 1101
+  cutadapt_trimmed_sequences_plot_3:
+    order: 1102
+  "nf-core-rnastructurome-cutadapt-adapters":
+    order: 1103
   "nf-core-rnastructurome-summary":
-    order: -1002
+    order: 3
+  software_versions:
+    order: 2
+  "nf-core-rnastructurome-methods-description":
+    order: 1
 
 export_plots: true
 

diff --git a/assets/nf-core-rnastructurome_logo_light.png b/assets/nf-core-rnastructurome_logo_light.png
diff --git a/assets/samplesheet.csv b/assets/samplesheet.csv
@@ -1,3 +1,3 @@
-sample,fastq_1,fastq_2
-SAMPLE_PAIRED_END,/path/to/fastq/files/AEG588A1_S1_L002_R1_001.fastq.gz,/path/to/fastq/files/AEG588A1_S1_L002_R2_001.fastq.gz
-SAMPLE_SINGLE_END,/path/to/fastq/files/AEG588A4_S4_L003_R1_001.fastq.gz,
+sample,fastq_1,fastq_2,cell_line,condition,replicate,organism
+SAMPLE_PAIRED_END,/path/to/fastq/files/AEG588A1_S1_L002_R1_001.fastq.gz,/path/to/fastq/files/AEG588A1_S1_L002_R2_001.fastq.gz,HEK293T,treated,1,Homo sapiens
+SAMPLE_SINGLE_END,/path/to/fastq/files/AEG588A4_S4_L003_R1_001.fastq.gz,,HEK293T,treated,1,Homo sapiens
diff --git a/assets/schema_input.json b/assets/schema_input.json
@@ -13,6 +13,10 @@
                 "errorMessage": "Sample name must be provided and cannot contain spaces",
                 "meta": ["id"]
             },
+            "sample_id": {
+                "type": "string",
+                "meta": ["sample_id"]
+            },
             "fastq_1": {
                 "type": "string",
                 "format": "file-path",
@@ -26,8 +30,68 @@
                 "exists": true,
                 "pattern": "^([\\S\\s]*\\/)?[^\\s\\/]+\\.f(ast)?q\\.gz$",
                 "errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
+            },
+            "method": {
+                "type": "string",
+                "meta": ["method"]
+            },
+            "principle": {
+                "type": "string",
+                "meta": ["principle"]
+            },
+            "organism": {
+                "type": "string",
+                "meta": ["organism"],
+                "description": "Organism name used for transcript reference resolution. Latin binomials such as 'Homo sapiens' are normalised to Ensembl species format."
+            },
+            "adapter_3p": {
+                "type": "string",
+                "meta": ["adapter_3p"]
+            },
+            "adapter_5p": {
+                "type": "string",
+                "meta": ["adapter_5p"]
+            },
+            "umi_pattern": {
+                "type": "string",
+                "meta": ["umi_pattern"]
+            },
+            "condition": {
+                "type": "string",
+                "enum": ["treated", "untreated", "denatured"],
+                "meta": ["condition"],
+                "description": "Sample condition for rf-norm grouping (treated / untreated / denatured). Samples are normalised only against rows with the same cell_line and replicate. Untreated requires a matching treated sample; denatured requires matching treated and untreated samples.",
+                "help_text": "TODO (rnacentral-probing-metadata-main): merge_metadata.py must be updated to populate this column from the YAML 'conditions' field. Until then, samples without a condition are treated as 'treated'."
+            },
+            "cell_line": {
+                "type": "string",
+                "pattern": "^\\S+$",
+                "meta": ["cell_line"],
+                "description": "Cell-line identifier used by rf-norm pairing. Only samples with the same cell_line and replicate are normalised together.",
+                "help_text": "Populate this column for all rf-norm inputs. rf-norm will error if cell_line is missing."
+            },
+            "replicate": {
+                "anyOf": [
+                    {
+                        "type": "string",
+                        "pattern": "^\\S+$"
+                    },
+                    {
+                        "type": "integer"
+                    }
+                ],
+                "meta": ["replicate"],
+                "description": "Replicate identifier used by rf-norm pairing. Only samples with the same cell_line and replicate are normalised together.",
+                "help_text": "Populate this column for all rf-norm inputs. rf-norm will error if replicate is missing. Use a stable value such as rep1, rep2, or 1. Matching is exact after Nextflow converts the value to text."
+            },
+            "group": {
+                "type": "string",
+                "pattern": "^\\S+$",
+                "meta": ["group"],
+                "description": "Deprecated legacy rf-norm grouping key. rf-norm now requires explicit cell_line and replicate columns instead.",
+                "help_text": "Retained only for input compatibility while upstream metadata generators are updated. It is not used by rf-norm."
             }
         },
-        "required": ["sample", "fastq_1"]
+        "required": ["sample", "fastq_1", "cell_line", "condition", "replicate"]
     }
 }