You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CONTRIBUTING.md
+25Lines changed: 25 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -73,6 +73,31 @@ Check out our [project roadmap and TODO list](https://docs.google.com/spreadshee
73
73
2.**Benchmarking**:
74
74
- Use `/usr/bin/time` for resource monitoring. Alternatively, hyperfine is great too but. Ideallt - use SLURM and keep track of the job IDs for later analysis with seff/pyseff.
75
75
76
+
## Documentation workflow
77
+
78
+
- Docs source pages are in `docs/mkdocs_docs/`.
79
+
- Docs site navigation is configured in `docs/mkdocs.yml` (`nav:` section).
80
+
- Command docs are under `docs/mkdocs_docs/commands/`.
81
+
- Keep command links in `README.md` aligned with pages listed in `docs/mkdocs.yml`.
82
+
83
+
Use pixi docs tasks:
84
+
- Serve locally (live reload): `pixi run -e dev docs-serve`
85
+
- Build static docs: `pixi run -e dev docs-build`
86
+
- Auto-generate command help pages:
87
+
- create missing pages: `pixi run -e dev python src/setup/export_command_help_to_docs.py`
88
+
- refresh existing auto-generated pages: `pixi run -e dev python src/setup/export_command_help_to_docs.py --overwrite`
89
+
90
+
For command pages that need rich/static sections (mermaid, tables, links), add a
RolyPoly is an RNA virus analysis toolkit, meant to be a "swiss-army knife" for RNA virus discovery and characterization by including a variety of commands, wrappers, parsers, automations, and some "quality of life" features for any many of a virus investigation process (from raw read processing to genome annotation). While it includes an "end-2-end" command that employs an entire pipeline, the main goals of rolypoly are:
6
8
- Help non-computational researchers take a deep dive into their data without compromising on using tools that are non-techie friendly.
7
9
- Help (software) developers of virus analysis pipeline "plug" holes missing from their framework, by using specific RolyPoly commands to add features to their existing code base.
8
10
9
11
## Note - Rolypoly is still under development (contributions welcome!)
10
-
RolyPoly is an open, still in progress project - I aim to summarise the main functionality into a manuscript ~early 2026. Pull requests and contributions are welcome and will be considered (see [CONTRIBUTING.md](CONTRIBUTING.md)).
12
+
RolyPoly is an open, still in progress project - I aim to summarise the main functionality into a manuscript ~mid 2026. Pull requests and contributions are welcome and will be considered (see [CONTRIBUTING.md](CONTRIBUTING.md)).
11
13
This also means that there are bugs, verbose logging even for non debug mode, and some place holders and TODOs here and there.
12
14
13
15
## Installation
@@ -16,10 +18,10 @@ This also means that there are bugs, verbose logging even for non debug mode, an
16
18
**Recommended for most users** who want a "just works" solution and primarily intend to use rolypoly as a CLI tool in an independent environment.
17
19
18
20
We hope to have rolypoly available from bioconda in the near future.
19
-
In the meantime, it can be installed with the [`quick_setup.sh`](https://code.jgi.doe.gov/rolypoly/rolypoly/-/raw/main/src/setup/quick_setup.sh) script, which will also fetch the pre-generated data rolypoly requires.
21
+
In the meantime, it can be installed with the [`quick_setup.sh`](https://raw.githubusercontent.com/UriNeri/rolypoly/main/src/setup/quick_setup.sh) script, which will also fetch the pre-generated data rolypoly requires.
- ✅ [`shrink-reads`](https://urineri.github.io/rolypoly/commands/shrink_reads) — Downsample or subsample reads. Useful for testing or normalizing coverage across samples.
88
-
- ✅ [`mask-dna`](https://urineri.github.io/rolypoly/commands/mask_dna) — Mask DNA regions in RNA-seq reads (bbmap, seqkit). Useful for avoiding mis-filtering of RNA virus reads in because of potential matches to EVEs.
89
+
- ✅ [`shrink-reads`](https://urineri.github.io/rolypoly/commands/read_processing) — Downsample or subsample reads. Useful for testing or normalizing coverage across samples.
90
+
- ✅ [`mask-dna`](https://urineri.github.io/rolypoly/commands/read_processing) — Mask DNA regions in RNA-seq reads (bbmap, seqkit). Useful for avoiding mis-filtering of RNA virus reads in because of potential matches to EVEs.
89
91
90
92
#### Annotation
91
-
- ✅ [`annotate`](https://urineri.github.io/rolypoly/commands/annotate) — Genome feature annotation (wraps the rna and prot commands)
93
+
- ✅ [`annotate`](https://urineri.github.io/rolypoly/commands/annotate_rna/#annotate-rna) — Genome feature annotation (wraps the rna and prot commands)
92
94
- ✅ [`annotate-rna`](https://urineri.github.io/rolypoly/commands/annotate_rna) — RNA secondary structure labelling and ribozyme detection (Infernal, ViennaRNA/linearfold, cmsearch on Rfam...)
93
95
- 🧪 [`annotate-prot`](https://urineri.github.io/rolypoly/commands/annotate_prot) — Gene calling and Protein domain annotation and functional prediction (HMMER, Pfam, custom).
94
96
@@ -99,22 +101,22 @@ Legend:
99
101
#### RNA Virus Identification
100
102
- ✅ [`marker-search`](https://urineri.github.io/rolypoly/commands/marker_search) — Search for viral markers (mainly RdRps, genomad VVs, or user-provided), using profile-based methods (HMMER / MMseqs2).
101
103
- ✅ [`virus-mapping`](https://urineri.github.io/rolypoly/commands/search_viruses) — Map and identify viruses using nucleic acid search (MMseqs2).
102
-
- ✅ `rdrp-motif-search` — Search RdRp motifs (A/B/C/D) in nucleotide or amino acid sequences.
104
+
- ✅ [`rdrp-motif-search`](https://urineri.github.io/rolypoly/commands/rdrp_motif_search) — Search RdRp motifs (A/B/C/D) in nucleotide or amino acid sequences.
103
105
104
106
#### Bining / Clustering
105
-
- 🧪 `cluster` — Average Nucleic identity (ANI) based contig gropuing. Supports several common backends and methods.
106
-
- 🧪 `extend` — Extend sequences by pile-up/assembly. Useful for combining assemblies of with low abundance viruses, or those with high microdiversity, at the cost of worse strain/sub-species resolution (i.e. can condense to a consenus).
107
-
- 🧪 `termini` — Shared termini grouping and motif reporting. Writes assignments + groups tables (TSV/CSV/Parquet/JSONL) and motif FASTA by default.
108
-
- 🧪 `correlate` — group contigs based on co-occurence, co-abundance, minimal correlation (Spearman's) of these, or both.
109
-
- 🤔 `binit` — Combines the above commands with sample information and genome attributes (e.g. require a shared termini AND protein complementarity, like CP + RdRp). See [notebooks/Exprimental/partiti_usecase/partiti_segment_workflow_experimental.ipynb](notebooks/Exprimental/partiti_usecase/partiti_segment_workflow_experimental.ipynb) for candidate workflow.
107
+
- 🧪 [`cluster`](https://urineri.github.io/rolypoly/commands/cluster) — Average Nucleic identity (ANI) based contig grouping. Supports several common backends and methods.
108
+
- 🧪 [`extend`](https://urineri.github.io/rolypoly/commands/extend) — Extend sequences by pile-up/assembly. Useful for combining assemblies of with low abundance viruses, or those with high microdiversity, at the cost of worse strain/sub-species resolution (i.e. can condense to a consensus).
109
+
- 🧪 [`termini`](https://urineri.github.io/rolypoly/commands/binning_termini) — Shared termini grouping and motif reporting. Writes assignments + groups tables (TSV/CSV/Parquet/JSONL) and motif FASTA by default.
110
+
- 🧪 [`correlate`](https://urineri.github.io/rolypoly/commands/binning_correlate) — Group contigs based on co-occurrence, co-abundance, minimal correlation (Spearman's) of these, or both.
111
+
- 🤔 [`binit`](https://urineri.github.io/rolypoly/commands/binit) — Combines the above commands with sample information and genome attributes (e.g. require a shared termini AND protein complementarity, like CP + RdRp). See [notebooks/Exprimental/partiti_usecase/partiti_segment_workflow_experimental.ipynb](notebooks/Exprimental/partiti_usecase/partiti_segment_workflow_experimental.ipynb) for candidate workflow.
110
112
111
113
#### Miscellaneous
112
-
- ✅ `roll` — Run an end-to-end pipeline (before v0.7.1, named `end2end`).
@@ -125,7 +127,7 @@ If you have suggestions for additional commands or features, or want to implemen
125
127
126
128
## Dependencies
127
129
128
-
**📦 Modular Installation Available**: RolyPoly supports both quick setup (one environment with all dependecies for all commands) and modular installation (command-specific environments). The modular approach is particularly useful for software developers who want to integrate specific rolypoly features with minimal dependency conflicts. See the [installation documentation](./docs/docs/mkdocs_docs/installation.md) for details.
130
+
**📦 Modular Installation Available**: RolyPoly supports both quick setup (one environment with all dependecies for all commands) and modular installation (command-specific environments). The modular approach is particularly useful for software developers who want to integrate specific rolypoly features with minimal dependency conflicts. See the [installation documentation](https://urineri.github.io/rolypoly/installation/) for details.
129
131
130
132
Not all 3rd party software is used by all the different commands. RolyPoly includes a "citation reminder" that will try to list all the external software used by a command. The "reminded citations" are pretty printed to console (stdout) and to a logfile. To shut off the terminal citation reminder printing, set `ROLYPOLY_REMIND_CITATIONS` to false in your `rpconfig.json` file.
-`--override-parameters`: JSON-like string of parameters to override. Example: --override-parameters '{"RNAfold": {"temperature": 37}, "ORFfinder": {"minimum_length": 150}}' (type: `TEXT`; default: `{}`)
33
+
-`--skip-steps`: Comma-separated list of steps to skip. Example: --skip-steps RNA_annotation,protein_annotation or --skip-steps IRESfinder,RNAMotif or --skip-steps ORFfinder,hmmsearch (type: `TEXT`; default: ``)
34
+
-`--secondary-structure-tool`: Tool for secondary structure prediction (type: `CHOICE`; default: `LinearFold`)
35
+
-`--ires-tool`: Tool for IRES identification (type: `CHOICE`; default: `IRESfinder`)
36
+
-`--trna-tool`: Tool for tRNA identification (type: `CHOICE`; default: `tRNAscan-SE`)
0 commit comments