Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
47da8ce
Download modules
Vicbeg Feb 16, 2026
6fd8023
Add samplesheet parsing
Vicbeg Feb 19, 2026
b24db42
Implement cutadapt and bowtie
Vicbeg Feb 19, 2026
8c9b639
Add umitools and update
Vicbeg Feb 23, 2026
503c3da
Add samtools and umitools implementation
Vicbeg Feb 24, 2026
0743ede
Add RNAFramework rf-count/norm/fold local modules and fix nf-core com…
pmb59 Feb 25, 2026
f169bdc
Fix linting and pre-commit CI failures: restore template files, add s…
pmb59 Feb 25, 2026
8409659
fix remaining lint failures and fill in README introduction
pmb59 Feb 25, 2026
8494486
docs: add missing tool citations and editorconfig
pmb59 Feb 25, 2026
3a190de
Merge pull request #1 from RNAcentral/dev-pm
Vicbeg Feb 25, 2026
b341839
Check if fasta is provided before downloading
Vicbeg Feb 26, 2026
beb5084
Merge branch 'dev' of https://github.com/RNAcentral/nf-core-rnastruct…
Vicbeg Feb 26, 2026
866b063
Add nextflow and testfiles to .gitignore
Vicbeg Feb 26, 2026
1589d3d
Add bowtie mapping options and defaults
Vicbeg Feb 26, 2026
c86ff6f
Fixes and add nf-core/samtools/faidx module
Vicbeg Feb 27, 2026
da30138
Add options for RFNORM and tests
Vicbeg Feb 27, 2026
dd18321
Add options for rf-norm and run MultiQC before
Vicbeg Mar 2, 2026
744386f
Added cat fastq to handle resequencing of samples
Vicbeg Mar 3, 2026
1893f7d
Remove duplicates properly
Vicbeg Mar 4, 2026
aec65c9
wired in rf-fold, tests and update docs
Vicbeg Mar 5, 2026
309b9fe
Switched to using transcriptome and fixed tests
Vicbeg Mar 7, 2026
297e6ac
Fixed remaining issues with transcript fasta use
Vicbeg Mar 8, 2026
5bc2f26
Add test for Ensembl transcriptome module and remove genome_build fro…
Vicbeg Mar 9, 2026
3dfe0b3
Added fasta sorting to match rf-index
Vicbeg Mar 9, 2026
2c671cf
Update config for cutadapt, add public rnaframework image for singula…
Vicbeg Mar 10, 2026
d549158
Exposed log files for RNAFramework and changed config for FASTQC reso…
Vicbeg Mar 10, 2026
a9c274d
updated docker, singularity config
Vicbeg Mar 10, 2026
bade7e5
Fix path issues for singularity
Vicbeg Mar 10, 2026
bad1b4a
configurating multiQC
Vicbeg Mar 10, 2026
70985b6
fix for MultiQC
Vicbeg Mar 10, 2026
42468a5
Tweeking MultiQC output
Vicbeg Mar 10, 2026
eb35364
New image with just RNAframework modules and multiqc config updates.
Vicbeg Mar 11, 2026
51b94a4
Image changes for RNAframework modules only.
Vicbeg Mar 11, 2026
6d6954c
Addind metadata and fix multiqc config
Vicbeg Mar 11, 2026
d2edaa0
Fix Cutadapt MultiQC helper
Vicbeg Mar 11, 2026
0f5599a
Fix to RNAFramework paths
Vicbeg Mar 11, 2026
3059260
Increase threading for RNAframework modules
Vicbeg Mar 12, 2026
adf64f7
Add script to create test fastq files
Vicbeg Mar 12, 2026
b9b3590
use samtools on singularity
Vicbeg Mar 12, 2026
0dd2304
fix subseting script for test data
Vicbeg Mar 12, 2026
5e6014d
Added module to convert .dp to .bp for IGV viewing
Vicbeg Mar 12, 2026
b393ace
Improved joining of samples for normalisation
Vicbeg Mar 12, 2026
6ceda40
fixed issues with processing both treated and untreated samples
Vicbeg Mar 12, 2026
9a93ff3
Fix for RFNORM where log was being written to folder before it finish…
Vicbeg Mar 13, 2026
33a2697
added nf-metro svg (static and animated)
Vicbeg Mar 13, 2026
546eca4
Add tests to make 100% test coverage
Vicbeg Mar 13, 2026
f598037
Fixing linting issues
Vicbeg Mar 15, 2026
48eebf5
Added test data for test profile
Vicbeg Mar 16, 2026
57cacc8
default to windowed folding
Vicbeg Mar 16, 2026
3cbdcbd
make nf-schema compatible with nextflow on the cluster
Vicbeg Mar 16, 2026
d99ed80
Add colours to .bp output and fixed docker profile
Vicbeg Mar 19, 2026
a095190
Fixing configs and refactoring some code
Vicbeg Mar 19, 2026
91ab596
Fixing tests and test data
Vicbeg Mar 19, 2026
41f6add
fix test linting issues
Vicbeg Mar 19, 2026
5ba8a31
Add testdata
Vicbeg Mar 19, 2026
8ee2cf8
remove some unnecessary files
Vicbeg Mar 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
root = true

[*]
charset = utf-8
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true
indent_size = 4
indent_style = space

[*.{md,yml,yaml,html,css,scss,js,cff}]
indent_size = 2
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,9 +1,14 @@
.nextflow*
.nf-test*
work/
data/
results/
.DS_Store
testing/
testing*
*.pyc
null/

assets/testdata/yeast/*
local.config
nf-metro/old
nf-metro/final
7 changes: 6 additions & 1 deletion .nf-core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,12 @@ repository_type: pipeline

nf_core_version: 3.5.2

lint: {}
lint:
files_unchanged:
- assets/email_template.html
- assets/nf-core-rnastructurome_logo_light.png
- docs/images/nf-core-rnastructurome_logo_light.png
- docs/images/nf-core-rnastructurome_logo_dark.png

template:
org: nf-core
Expand Down
25 changes: 24 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,3 +1,26 @@
{
"markdown.styles": ["public/vscode_markdown.css"]
"markdown.styles": ["public/vscode_markdown.css"],
"files.exclude": {
"**/.venv": true,
"**/.nf-test": true,
"**/work": true
},
"search.exclude": {
"**/.venv": true,
"**/.nf-test": true,
"**/work": true
},
"files.watcherExclude": {
"**/.venv/**": true,
"**/.nf-test/**": true,
"**/work/**": true
},
"nextflow.files.exclude": [
".git",
".nf-test",
".nf-test/**",
"work",
"work/**",
".venv"
]
}
28 changes: 26 additions & 2 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,37 @@

## Pipeline tools

- [Bowtie](https://pubmed.ncbi.nlm.nih.gov/19261174/)

> Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25. PubMed PMID: 19261174; PubMed Central PMCID: PMC2690996.

- [Bowtie2](https://pubmed.ncbi.nlm.nih.gov/22388286/)

> Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923. PubMed PMID: 22388286; PubMed Central PMCID: PMC3322381.

- [Cutadapt](https://doi.org/10.14806/ej.17.1.200)

> Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal. 2011;17(1):10-12. doi: 10.14806/ej.17.1.200.

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

> Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].
> Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

- [RNAFramework](https://pubmed.ncbi.nlm.nih.gov/28531371/)

> Incarnato D, Neri F, Anselmi F, Oliviero S. RNA structure framework: automated transcriptome-wide reconstruction of RNA secondary structures from high-throughput structure probing data. Bioinformatics. 2016 Aug 15;32(16):2533-5. doi: 10.1093/bioinformatics/btw239. PubMed PMID: 28531371.

- [SAMtools](https://pubmed.ncbi.nlm.nih.gov/33590861/)

> Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H. Twelve years of SAMtools and BCFtools. Gigascience. 2021 Feb 16;10(2):giab008. doi: 10.1093/gigascience/giab008. PubMed PMID: 33590861; PubMed Central PMCID: PMC7931819.

- [UMI-tools](https://pubmed.ncbi.nlm.nih.gov/28100584/)

> Smith T, Heger A, Sudbery I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res. 2017 Mar;27(3):491-499. doi: 10.1101/gr.209601.116. Epub 2017 Jan 18. PubMed PMID: 28100584; PubMed Central PMCID: PMC5340976.

## Software packaging/containerisation tools

Expand Down
48 changes: 47 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@

## Introduction

**nf-core/rnastructurome** is a bioinformatics pipeline that ...
**nf-core/rnastructurome** is a bioinformatics pipeline for the analysis of chemical-based high-throughput RNA structure probing data. It accepts FASTQ files from SHAPE or DMS experiments using either the **RT-stop** or **mutational profiling (MaP)** principle, and processes them from raw reads through alignment and deduplication to per-base reactivity scores and RNA secondary structure predictions using the [RNAFramework](https://rnaframework.readthedocs.io) toolkit.

<!-- TODO nf-core:
Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
Expand Down Expand Up @@ -65,6 +65,52 @@ nextflow run nf-core/rnastructurome \
--outdir <OUTDIR>
```

For `rf-norm`, the samplesheet must include `cell_line`, `condition`, and `replicate` columns so samples are paired correctly during normalisation.

The minimum required samplesheet columns are `sample`, `fastq_1`, `cell_line`, `condition`, and `replicate`.

- Samples are grouped by identical `cell_line` and `replicate`.
- `treated` may be analysed on its own.
- `untreated` requires a matching `treated` sample with the same `cell_line` and `replicate`.
- `denatured` requires matching `treated` and `untreated` samples with the same `cell_line` and `replicate`.

`rf-norm` defaults are selected automatically from the probing principle and available controls:

- `RT-stop` with matching `untreated`: Ding scoring (`-sm 1`) with Box-plot normalisation (`-nm 3`)
- `RT-stop` without `untreated`: Rouskin scoring (`-sm 2`) with 90% Winsorizing (`-nm 2`)
- `MaP` with matching `untreated` and optional `denatured`: Siegfried scoring (`-sm 3`) with Box-plot normalisation (`-nm 3`)
- `MaP` without `untreated`: Zubradt scoring (`-sm 4`) with Box-plot normalisation (`-nm 3`)

Additional `rf-norm` parameters exposed by the pipeline:

- `--rfnorm_remap_reactivities`: remap normalized reactivities to the 0-1 range.
- `--rfnorm_reactive_bases <string>`: set the reactive bases used for normalization windows, e.g. `AC` for DMS.
- `--rfnorm_norm_window <int>`: set the normalization window size.
- `--rfnorm_window_offset <int>`: set the normalization window offset.
- `--rfnorm_dynamic_window <int>`: use dynamic normalization windows with at least this many reactive bases.
- `--rfnorm_norm_independent`: normalize each reactive base independently.
- `--rfnorm_norm_factor <float[,float]>`: supply a fixed normalization factor for all transcripts. For 90% Winsorizing, provide two comma-separated values.
- `--rfnorm_raw`: score raw reactivities without applying normalization.
- `--rfnorm_pseudocount <float>`: set the Ding pseudocount.
- `--rfnorm_max_score <float>`: set the Ding maximum score.
- `--rfnorm_ignore_lower_than_untreated`: set reactivities lower than untreated to zero for Ding/Siegfried methods.
- `--rfnorm_max_untreated_mut <float>`: set the Siegfried untreated mutation cutoff.
- `--rfnorm_max_mutation_rate <float>`: set the MaP mutation-rate cutoff.
- `--rfnorm_mean_coverage <float>`: discard transcripts below this mean coverage.
- `--rfnorm_median_coverage <float>`: discard transcripts below this median coverage.
- `--rfnorm_nan <int>`: report positions below this coverage as NaN. Default: `10`.
- `--rfnorm_img`: generate rf-norm plots. This automatically uses `--rnaframework_r_path` to locate `R` inside the RNAframework container.

If `--rfnorm_reactive_bases` is not provided, the pipeline sets `AC` automatically for samples with `method=DMS`. All other methods fall back to the RNAFramework default (`N`, all bases).

Example:

```csv
sample,fastq_1,cell_line,condition,replicate
HEK293T_treated_rep1,HEK293T_treated_rep1.fastq.gz,HEK293T,treated,1
HEK293T_untreated_rep1,HEK293T_untreated_rep1.fastq.gz,HEK293T,untreated,1
```

> [!WARNING]
> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).

Expand Down
13 changes: 9 additions & 4 deletions assets/methods_description_template.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,26 @@ description: "Suggested text and references to use when describing pipeline usag
section_name: "nf-core/rnastructurome Methods Description"
section_href: "https://github.com/nf-core/rnastructurome"
plot_type: "html"
## TODO nf-core: Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
## You inject any metadata in the Nextflow '${workflow}' object
data: |
<h4>Methods</h4>
<p>Data was processed using nf-core/rnastructurome v${workflow.manifest.version} ${doi_text} of the nf-core collection of workflows (<a href="https://doi.org/10.1038/s41587-020-0439-x">Ewels <em>et al.</em>, 2020</a>), utilising reproducible software environments from the Bioconda (<a href="https://doi.org/10.1038/s41592-018-0046-7">Grüning <em>et al.</em>, 2018</a>) and Biocontainers (<a href="https://doi.org/10.1093/bioinformatics/btx192">da Veiga Leprevost <em>et al.</em>, 2017</a>) projects.</p>
<p>The pipeline was executed with Nextflow v${workflow.nextflow.version} (<a href="https://doi.org/10.1038/nbt.3820">Di Tommaso <em>et al.</em>, 2017</a>) with the following command:</p>
<pre><code>${workflow.commandLine}</code></pre>
<p>${tool_citations}</p>
<p>Tools used in the workflow included: FastQC (Andrews 2010), Cutadapt (Martin 2011), UMI-tools (Smith <em>et al.</em>, 2017), Bowtie (Langmead <em>et al.</em>, 2009), Bowtie2 (Langmead &amp; Salzberg 2012), SAMtools (Danecek <em>et al.</em>, 2021), RNAFramework (Incarnato <em>et al.</em>, 2018), and MultiQC (Ewels <em>et al.</em>, 2016).</p>
<h4>References</h4>
<ul>
<li>Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316-319. doi: <a href="https://doi.org/10.1038/nbt.3820">10.1038/nbt.3820</a></li>
<li>Ewels, P. A., Peltzer, A., Fillinger, S., Patel, H., Alneberg, J., Wilm, A., Garcia, M. U., Di Tommaso, P., & Nahnsen, S. (2020). The nf-core framework for community-curated bioinformatics pipelines. Nature Biotechnology, 38(3), 276-278. doi: <a href="https://doi.org/10.1038/s41587-020-0439-x">10.1038/s41587-020-0439-x</a></li>
<li>Grüning, B., Dale, R., Sjödin, A., Chapman, B. A., Rowe, J., Tomkins-Tinch, C. H., Valieris, R., Köster, J., & Bioconda Team. (2018). Bioconda: sustainable and comprehensive software distribution for the life sciences. Nature Methods, 15(7), 475–476. doi: <a href="https://doi.org/10.1038/s41592-018-0046-7">10.1038/s41592-018-0046-7</a></li>
<li>da Veiga Leprevost, F., Grüning, B. A., Alves Aflitos, S., Röst, H. L., Uszkoreit, J., Barsnes, H., Vaudel, M., Moreno, P., Gatto, L., Weber, J., Bai, M., Jimenez, R. C., Sachsenberg, T., Pfeuffer, J., Vera Alvarez, R., Griss, J., Nesvizhskii, A. I., & Perez-Riverol, Y. (2017). BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics (Oxford, England), 33(16), 2580–2582. doi: <a href="https://doi.org/10.1093/bioinformatics/btx192">10.1093/bioinformatics/btx192</a></li>
${tool_bibliography}
<li>Andrews S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data. URL: <a href="https://www.bioinformatics.babraham.ac.uk/projects/fastqc/">https://www.bioinformatics.babraham.ac.uk/projects/fastqc/</a></li>
<li>Martin M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal, 17(1), 10–12. doi: <a href="https://doi.org/10.14806/ej.17.1.200">10.14806/ej.17.1.200</a></li>
<li>Smith T, Heger A, Sudbery I. (2017). UMI-tools: modelling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Research, 27(3), 491–499. doi: <a href="https://doi.org/10.1101/gr.209601.116">10.1101/gr.209601.116</a></li>
<li>Langmead B, Trapnell C, Pop M, Salzberg SL. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology, 10(3), R25. doi: <a href="https://doi.org/10.1186/gb-2009-10-3-r25">10.1186/gb-2009-10-3-r25</a></li>
<li>Langmead B, Salzberg SL. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4), 357–359. doi: <a href="https://doi.org/10.1038/nmeth.1923">10.1038/nmeth.1923</a></li>
<li>Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H. (2021). Twelve years of SAMtools and BCFtools. GigaScience, 10(2), giab008. doi: <a href="https://doi.org/10.1093/gigascience/giab008">10.1093/gigascience/giab008</a></li>
<li>Incarnato D, Neri F, Anselmi F, Oliviero S. (2016). RNA structure framework: automated transcriptome-wide reconstruction of RNA secondary structures from high-throughput structure probing data. Bioinformatics, 32(16), 2533–2535. doi: <a href="https://doi.org/10.1093/bioinformatics/btw239">10.1093/bioinformatics/btw239</a></li>
<li>Ewels P, Magnusson M, Lundin S, Käller M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics, 32(19), 3047–3048. doi: <a href="https://doi.org/10.1093/bioinformatics/btw354">10.1093/bioinformatics/btw354</a></li>
</ul>
<div class="alert alert-info">
<h5>Notes:</h5>
Expand Down
22 changes: 17 additions & 5 deletions assets/multiqc_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,24 @@ report_comment: >
analysis pipeline. For information about how to interpret these results, please see the
<a href="https://nf-co.re/rnastructurome/dev/docs/output" target="_blank">documentation</a>.
report_section_order:
"nf-core-rnastructurome-methods-description":
order: -1000
software_versions:
order: -1001
fastqc-status-check-heatmap:
order: 1000
"nf-core-rnastructurome-count-progression":
order: 1001
cutadapt_filtered_reads_plot:
order: 1100
cutadapt_trimmed_sequences_plot_5:
order: 1101
cutadapt_trimmed_sequences_plot_3:
order: 1102
"nf-core-rnastructurome-cutadapt-adapters":
order: 1103
"nf-core-rnastructurome-summary":
order: -1002
order: 3
software_versions:
order: 2
"nf-core-rnastructurome-methods-description":
order: 1

export_plots: true

Expand Down
Binary file modified assets/nf-core-rnastructurome_logo_light.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions assets/samplesheet.csv
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
sample,fastq_1,fastq_2
SAMPLE_PAIRED_END,/path/to/fastq/files/AEG588A1_S1_L002_R1_001.fastq.gz,/path/to/fastq/files/AEG588A1_S1_L002_R2_001.fastq.gz
SAMPLE_SINGLE_END,/path/to/fastq/files/AEG588A4_S4_L003_R1_001.fastq.gz,
sample,fastq_1,fastq_2,cell_line,condition,replicate,organism
SAMPLE_PAIRED_END,/path/to/fastq/files/AEG588A1_S1_L002_R1_001.fastq.gz,/path/to/fastq/files/AEG588A1_S1_L002_R2_001.fastq.gz,HEK293T,treated,1,Homo sapiens
SAMPLE_SINGLE_END,/path/to/fastq/files/AEG588A4_S4_L003_R1_001.fastq.gz,,HEK293T,treated,1,Homo sapiens
66 changes: 65 additions & 1 deletion assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@
"errorMessage": "Sample name must be provided and cannot contain spaces",
"meta": ["id"]
},
"sample_id": {
"type": "string",
"meta": ["sample_id"]
},
"fastq_1": {
"type": "string",
"format": "file-path",
Expand All @@ -26,8 +30,68 @@
"exists": true,
"pattern": "^([\\S\\s]*\\/)?[^\\s\\/]+\\.f(ast)?q\\.gz$",
"errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
},
"method": {
"type": "string",
"meta": ["method"]
},
"principle": {
"type": "string",
"meta": ["principle"]
},
"organism": {
"type": "string",
"meta": ["organism"],
"description": "Organism name used for transcript reference resolution. Latin binomials such as 'Homo sapiens' are normalised to Ensembl species format."
},
"adapter_3p": {
"type": "string",
"meta": ["adapter_3p"]
},
"adapter_5p": {
"type": "string",
"meta": ["adapter_5p"]
},
"umi_pattern": {
"type": "string",
"meta": ["umi_pattern"]
},
"condition": {
"type": "string",
"enum": ["treated", "untreated", "denatured"],
"meta": ["condition"],
"description": "Sample condition for rf-norm grouping (treated / untreated / denatured). Samples are normalised only against rows with the same cell_line and replicate. Untreated requires a matching treated sample; denatured requires matching treated and untreated samples.",
"help_text": "TODO (rnacentral-probing-metadata-main): merge_metadata.py must be updated to populate this column from the YAML 'conditions' field. Until then, samples without a condition are treated as 'treated'."
},
"cell_line": {
"type": "string",
"pattern": "^\\S+$",
"meta": ["cell_line"],
"description": "Cell-line identifier used by rf-norm pairing. Only samples with the same cell_line and replicate are normalised together.",
"help_text": "Populate this column for all rf-norm inputs. rf-norm will error if cell_line is missing."
},
"replicate": {
"anyOf": [
{
"type": "string",
"pattern": "^\\S+$"
},
{
"type": "integer"
}
],
"meta": ["replicate"],
"description": "Replicate identifier used by rf-norm pairing. Only samples with the same cell_line and replicate are normalised together.",
"help_text": "Populate this column for all rf-norm inputs. rf-norm will error if replicate is missing. Use a stable value such as rep1, rep2, or 1. Matching is exact after Nextflow converts the value to text."
},
"group": {
"type": "string",
"pattern": "^\\S+$",
"meta": ["group"],
"description": "Deprecated legacy rf-norm grouping key. rf-norm now requires explicit cell_line and replicate columns instead.",
"help_text": "Retained only for input compatibility while upstream metadata generators are updated. It is not used by rf-norm."
}
},
"required": ["sample", "fastq_1"]
"required": ["sample", "fastq_1", "cell_line", "condition", "replicate"]
}
}
Loading
Loading