Skip to content

Commit c6041ad

Browse files
author
Uri Neri
committed
set maximal python version to 3.13 for pyo3 deps, and add automatic click command to markdown generation
1 parent d1d691e commit c6041ad

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+1880
-426
lines changed

CONTRIBUTING.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,31 @@ Check out our [project roadmap and TODO list](https://docs.google.com/spreadshee
7373
2. **Benchmarking**:
7474
- Use `/usr/bin/time` for resource monitoring. Alternatively, hyperfine is great too but. Ideallt - use SLURM and keep track of the job IDs for later analysis with seff/pyseff.
7575

76+
## Documentation workflow
77+
78+
- Docs source pages are in `docs/mkdocs_docs/`.
79+
- Docs site navigation is configured in `docs/mkdocs.yml` (`nav:` section).
80+
- Command docs are under `docs/mkdocs_docs/commands/`.
81+
- Keep command links in `README.md` aligned with pages listed in `docs/mkdocs.yml`.
82+
83+
Use pixi docs tasks:
84+
- Serve locally (live reload): `pixi run -e dev docs-serve`
85+
- Build static docs: `pixi run -e dev docs-build`
86+
- Auto-generate command help pages:
87+
- create missing pages: `pixi run -e dev python src/setup/export_command_help_to_docs.py`
88+
- refresh existing auto-generated pages: `pixi run -e dev python src/setup/export_command_help_to_docs.py --overwrite`
89+
90+
For command pages that need rich/static sections (mermaid, tables, links), add a
91+
per-command scaffold at:
92+
- `src/setup/help_export_scaffolds/<command_name>.md`
93+
94+
The exporter injects scaffold content into generated pages under **Pinned Sections**.
95+
96+
When adding a new command page:
97+
1. Add the markdown page in `docs/mkdocs_docs/commands/`.
98+
2. Add it to `nav` in `docs/mkdocs.yml`.
99+
3. Add/update links in `README.md` and `docs/mkdocs_docs/commands/index.md`.
100+
76101
## PyPI / TestPyPI release automation
77102

78103
Releases are automated via GitHub Actions using trusted publishing (OIDC), with this flow:

README.md

Lines changed: 23 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
1-
![RolyPoly Logo](https://code.jgi.doe.gov/rolypoly/docs/-/raw/main/docs/rolypoly_logo.png?ref_type=heads)
1+
![RolyPoly Logo](https://raw.githubusercontent.com/UriNeri/rolypoly/main/docs/rolypoly_logo.png)
22

33
# RolyPoly
44

5+
[![PyPI version](https://img.shields.io/pypi/v/rolypoly-tk.svg?cacheSeconds=300)](https://pypi.org/project/rolypoly-tk/) [![Python versions](https://img.shields.io/pypi/pyversions/rolypoly-tk.svg?cacheSeconds=300)](https://pypi.org/project/rolypoly-tk/) [![PyPI Downloads](https://static.pepy.tech/personalized-badge/rolypoly-tk?period=monthly&units=INTERNATIONAL_SYSTEM&left_color=BLACK&right_color=GREEN&left_text=Downloads+%28month%29)](https://pepy.tech/projects/rolypoly-tk) [![License](https://img.shields.io/github/license/UriNeri/rolypoly.svg)](LICENSE) [![Docs](https://img.shields.io/badge/docs-urineri.github.io%2Frolypoly-blue)](https://urineri.github.io/rolypoly/)
6+
57
RolyPoly is an RNA virus analysis toolkit, meant to be a "swiss-army knife" for RNA virus discovery and characterization by including a variety of commands, wrappers, parsers, automations, and some "quality of life" features for any many of a virus investigation process (from raw read processing to genome annotation). While it includes an "end-2-end" command that employs an entire pipeline, the main goals of rolypoly are:
68
- Help non-computational researchers take a deep dive into their data without compromising on using tools that are non-techie friendly.
79
- Help (software) developers of virus analysis pipeline "plug" holes missing from their framework, by using specific RolyPoly commands to add features to their existing code base.
810

911
## Note - Rolypoly is still under development (contributions welcome!)
10-
RolyPoly is an open, still in progress project - I aim to summarise the main functionality into a manuscript ~early 2026. Pull requests and contributions are welcome and will be considered (see [CONTRIBUTING.md](CONTRIBUTING.md)).
12+
RolyPoly is an open, still in progress project - I aim to summarise the main functionality into a manuscript ~mid 2026. Pull requests and contributions are welcome and will be considered (see [CONTRIBUTING.md](CONTRIBUTING.md)).
1113
This also means that there are bugs, verbose logging even for non debug mode, and some place holders and TODOs here and there.
1214

1315
## Installation
@@ -16,10 +18,10 @@ This also means that there are bugs, verbose logging even for non debug mode, an
1618
**Recommended for most users** who want a "just works" solution and primarily intend to use rolypoly as a CLI tool in an independent environment.
1719

1820
We hope to have rolypoly available from bioconda in the near future.
19-
In the meantime, it can be installed with the [`quick_setup.sh`](https://code.jgi.doe.gov/rolypoly/rolypoly/-/raw/main/src/setup/quick_setup.sh) script, which will also fetch the pre-generated data rolypoly requires.
21+
In the meantime, it can be installed with the [`quick_setup.sh`](https://raw.githubusercontent.com/UriNeri/rolypoly/main/src/setup/quick_setup.sh) script, which will also fetch the pre-generated data rolypoly requires.
2022

2123
```bash
22-
curl -O https://code.jgi.doe.gov/rolypoly/rolypoly/-/raw/main/src/setup/quick_setup.sh && \
24+
curl -O https://raw.githubusercontent.com/UriNeri/rolypoly/main/src/setup/quick_setup.sh && \
2325
bash quick_setup.sh
2426
```
2527

@@ -44,7 +46,7 @@ By default if no positional arguments are supplied, rolypoly is installed into t
4446
curl -fsSL https://pixi.sh/install.sh | bash
4547

4648
# Clone the repository
47-
git clone https://code.jgi.doe.gov/rolypoly/rolypoly.git
49+
git clone https://github.com/UriNeri/rolypoly.git
4850
cd rolypoly
4951

5052
# Install for specific functionality (examples):
@@ -84,11 +86,11 @@ Legend:
8486

8587
#### Raw-Reads
8688
-[`filter-reads`](https://urineri.github.io/rolypoly/commands/read_processing) — Host/rRNA/adapters/artifact filtering and QC (bbmap, falco, etc.)
87-
-[`shrink-reads`](https://urineri.github.io/rolypoly/commands/shrink_reads) — Downsample or subsample reads. Useful for testing or normalizing coverage across samples.
88-
-[`mask-dna`](https://urineri.github.io/rolypoly/commands/mask_dna) — Mask DNA regions in RNA-seq reads (bbmap, seqkit). Useful for avoiding mis-filtering of RNA virus reads in because of potential matches to EVEs.
89+
-[`shrink-reads`](https://urineri.github.io/rolypoly/commands/read_processing) — Downsample or subsample reads. Useful for testing or normalizing coverage across samples.
90+
-[`mask-dna`](https://urineri.github.io/rolypoly/commands/read_processing) — Mask DNA regions in RNA-seq reads (bbmap, seqkit). Useful for avoiding mis-filtering of RNA virus reads in because of potential matches to EVEs.
8991

9092
#### Annotation
91-
-[`annotate`](https://urineri.github.io/rolypoly/commands/annotate) — Genome feature annotation (wraps the rna and prot commands)
93+
-[`annotate`](https://urineri.github.io/rolypoly/commands/annotate_rna/#annotate-rna) — Genome feature annotation (wraps the rna and prot commands)
9294
-[`annotate-rna`](https://urineri.github.io/rolypoly/commands/annotate_rna) — RNA secondary structure labelling and ribozyme detection (Infernal, ViennaRNA/linearfold, cmsearch on Rfam...)
9395
- 🧪 [`annotate-prot`](https://urineri.github.io/rolypoly/commands/annotate_prot) — Gene calling and Protein domain annotation and functional prediction (HMMER, Pfam, custom).
9496

@@ -99,22 +101,22 @@ Legend:
99101
#### RNA Virus Identification
100102
-[`marker-search`](https://urineri.github.io/rolypoly/commands/marker_search) — Search for viral markers (mainly RdRps, genomad VVs, or user-provided), using profile-based methods (HMMER / MMseqs2).
101103
-[`virus-mapping`](https://urineri.github.io/rolypoly/commands/search_viruses) — Map and identify viruses using nucleic acid search (MMseqs2).
102-
-`rdrp-motif-search` — Search RdRp motifs (A/B/C/D) in nucleotide or amino acid sequences.
104+
-[`rdrp-motif-search`](https://urineri.github.io/rolypoly/commands/rdrp_motif_search) — Search RdRp motifs (A/B/C/D) in nucleotide or amino acid sequences.
103105

104106
#### Bining / Clustering
105-
- 🧪 `cluster` — Average Nucleic identity (ANI) based contig gropuing. Supports several common backends and methods.
106-
- 🧪 `extend` — Extend sequences by pile-up/assembly. Useful for combining assemblies of with low abundance viruses, or those with high microdiversity, at the cost of worse strain/sub-species resolution (i.e. can condense to a consenus).
107-
- 🧪 `termini` — Shared termini grouping and motif reporting. Writes assignments + groups tables (TSV/CSV/Parquet/JSONL) and motif FASTA by default.
108-
- 🧪 `correlate`group contigs based on co-occurence, co-abundance, minimal correlation (Spearman's) of these, or both.
109-
- 🤔 `binit` — Combines the above commands with sample information and genome attributes (e.g. require a shared termini AND protein complementarity, like CP + RdRp). See [notebooks/Exprimental/partiti_usecase/partiti_segment_workflow_experimental.ipynb](notebooks/Exprimental/partiti_usecase/partiti_segment_workflow_experimental.ipynb) for candidate workflow.
107+
- 🧪 [`cluster`](https://urineri.github.io/rolypoly/commands/cluster) — Average Nucleic identity (ANI) based contig grouping. Supports several common backends and methods.
108+
- 🧪 [`extend`](https://urineri.github.io/rolypoly/commands/extend) — Extend sequences by pile-up/assembly. Useful for combining assemblies of with low abundance viruses, or those with high microdiversity, at the cost of worse strain/sub-species resolution (i.e. can condense to a consensus).
109+
- 🧪 [`termini`](https://urineri.github.io/rolypoly/commands/binning_termini) — Shared termini grouping and motif reporting. Writes assignments + groups tables (TSV/CSV/Parquet/JSONL) and motif FASTA by default.
110+
- 🧪 [`correlate`](https://urineri.github.io/rolypoly/commands/binning_correlate)Group contigs based on co-occurrence, co-abundance, minimal correlation (Spearman's) of these, or both.
111+
- 🤔 [`binit`](https://urineri.github.io/rolypoly/commands/binit) — Combines the above commands with sample information and genome attributes (e.g. require a shared termini AND protein complementarity, like CP + RdRp). See [notebooks/Exprimental/partiti_usecase/partiti_segment_workflow_experimental.ipynb](notebooks/Exprimental/partiti_usecase/partiti_segment_workflow_experimental.ipynb) for candidate workflow.
110112

111113
#### Miscellaneous
112-
-`roll` — Run an end-to-end pipeline (before v0.7.1, named `end2end`).
113-
-`fetch-sra` — Download SRA fastq files (from ENA)
114-
-`fastx-calc` — Calculate per-sequence metrics (length, GC content, hash, ...)
115-
-`fastx-stats` — Calculate (-->aggregate) statistics for sequences (min, max, mean, median, ...) (input is file/s)
116-
-`rename-seqs` — Rename sequences (add a prefix, suffix, hash, running number, etc.)
117-
- 🚧 `quick-taxonomy` — Quick taxonomy assignment. Candidate workflows are [github.com/UriNeri/ictv-mmseqs2-protein-database](https://github.com/UriNeri/ictv-mmseqs2-protein-database) and [github.com/apcamargo/ictv-mmseqs2-protein-database](https://github.com/apcamargo/ictv-mmseqs2-protein-database)
114+
-[`roll`](https://urineri.github.io/rolypoly/commands/end_to_end) — Run an end-to-end pipeline (before v0.7.1, named `end2end`).
115+
-[`fetch-sra`](https://urineri.github.io/rolypoly/commands/misc) — Download SRA fastq files (from ENA)
116+
-[`fastx-calc`](https://urineri.github.io/rolypoly/commands/misc) — Calculate per-sequence metrics (length, GC content, hash, ...)
117+
-[`fastx-stats`](https://urineri.github.io/rolypoly/commands/misc) — Calculate (-->aggregate) statistics for sequences (min, max, mean, median, ...) (input is file/s)
118+
-[`rename-seqs`](https://urineri.github.io/rolypoly/commands/misc) — Rename sequences (add a prefix, suffix, hash, running number, etc.)
119+
- 🚧 [`quick-taxonomy`](https://urineri.github.io/rolypoly/commands/misc) — Quick taxonomy assignment. Candidate workflows are [github.com/UriNeri/ictv-mmseqs2-protein-database](https://github.com/UriNeri/ictv-mmseqs2-protein-database) and [github.com/apcamargo/ictv-mmseqs2-protein-database](https://github.com/apcamargo/ictv-mmseqs2-protein-database)
118120
- 🤔 support for [genotate](https://github.com/deprekate/genotate) for gene prediction.
119121
- 🤔 Genome refinement / strain de-entalgement / variant calling?
120122
- 🤔 Virus feature prediction (+/-ssRNA/dsRNA, circular/linear, mono/poly-segmented, capsid type, etc.)
@@ -125,7 +127,7 @@ If you have suggestions for additional commands or features, or want to implemen
125127

126128
## Dependencies
127129

128-
**📦 Modular Installation Available**: RolyPoly supports both quick setup (one environment with all dependecies for all commands) and modular installation (command-specific environments). The modular approach is particularly useful for software developers who want to integrate specific rolypoly features with minimal dependency conflicts. See the [installation documentation](./docs/docs/mkdocs_docs/installation.md) for details.
130+
**📦 Modular Installation Available**: RolyPoly supports both quick setup (one environment with all dependecies for all commands) and modular installation (command-specific environments). The modular approach is particularly useful for software developers who want to integrate specific rolypoly features with minimal dependency conflicts. See the [installation documentation](https://urineri.github.io/rolypoly/installation/) for details.
129131

130132
Not all 3rd party software is used by all the different commands. RolyPoly includes a "citation reminder" that will try to list all the external software used by a command. The "reminded citations" are pretty printed to console (stdout) and to a logfile. To shut off the terminal citation reminder printing, set `ROLYPOLY_REMIND_CITATIONS` to false in your `rpconfig.json` file.
131133

docs/mkdocs.yml

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,17 +99,38 @@ nav:
9999
- Commands:
100100
- Overview: commands/index.md
101101
- External Data: commands/prepare_external_data.md
102+
- Get Data (Auto Help): commands/get_data.md
103+
- Version (Auto Help): commands/version.md
102104
- End to End: commands/end_to_end.md
105+
- Roll (Auto Help): commands/roll.md
103106
- Read Processing: commands/read_processing.md
107+
- Filter Reads (Auto Help): commands/filter_reads.md
108+
- Shrink Reads (Auto Help): commands/shrink_reads.md
109+
- Mask DNA (Auto Help): commands/mask_dna.md
104110
- Assembly: commands/assembly.md
111+
- Assemble (Auto Help): commands/assemble.md
112+
- Extend (Auto Help): commands/extend.md
105113
- Marker Gene Search: commands/marker_search.md
114+
- RdRp Motif Search (Auto Help): commands/rdrp_motif_search.md
106115
- Assembly Filtering: commands/filter_assembly.md
116+
- Filter Contigs (Auto Help): commands/filter_contigs.md
107117
- Virus Search: commands/search_viruses.md
118+
- Virus Mapping (Auto Help): commands/virus_mapping.md
119+
- Annotate (Auto Help): commands/annotate.md
108120
- RNA Annotation: commands/annotate_rna.md
109121
- Protein Annotation: commands/annotate_prot.md
110122
- Host Classification: commands/host_classify.md
111123
- Binning:
112124
- Termini Analysis: commands/binning_termini.md
113125
- Correlation Analysis: commands/binning_correlate.md
126+
- Correlate (Auto Help): commands/correlate.md
127+
- Termini (Auto Help): commands/termini.md
128+
- Cluster (Auto Help): commands/cluster.md
129+
- Binit (Auto Help): commands/binit.md
114130
- Miscellaneous: commands/misc.md
131+
- FASTX Calc (Auto Help): commands/fastx_calc.md
132+
- FASTX Stats (Auto Help): commands/fastx_stats.md
133+
- Fetch SRA (Auto Help): commands/fetch_sra.md
134+
- Rename Seqs (Auto Help): commands/rename_seqs.md
135+
- Quick Taxonomy (Auto Help): commands/quick_taxonomy.md
115136

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# Annotate
2+
3+
> Auto-generated draft from CLI metadata for `rolypoly annotate`.
4+
> Expand this page with command-specific context, examples, and citations.
5+
6+
## Summary
7+
8+
Run combined RNA + protein annotation on nucleotide viral sequences.
9+
10+
## Description
11+
12+
This command orchestrates `annotate-rna` and `annotate-prot` into a single
13+
workflow and writes results into `rna_annotation/` and
14+
`protein_annotation/` subdirectories under the selected output path.
15+
16+
Use `--skip-steps` to disable specific stages and `--override-parameters`
17+
to forward JSON overrides to sub-tools.
18+
19+
## Usage
20+
21+
```bash
22+
rolypoly annotate [OPTIONS]
23+
```
24+
25+
## Options
26+
27+
- `-i`, `--input`: Input nucleotide sequence file (fasta, fna, fa, or faa) (type: `PATH`; required; default: `Sentinel.UNSET`)
28+
- `-o`, `--output`: Output file location. (type: `TEXT`; default: `rolypoly_annotation`)
29+
- `-t`, `--threads`: Number of threads (type: `INTEGER`; default: `1`)
30+
- `-g`, `--log-file`: Path to log file (type: `TEXT`; default: `/clusterfs/jgi/scratch/science/metagen/neri/code/rolypoly/annotate_logfile.txt`)
31+
- `-M`, `--memory`: Memory in GB. Example: -M 8gb (type: `TEXT`; default: `6gb`)
32+
- `--override-parameters`: JSON-like string of parameters to override. Example: --override-parameters '{"RNAfold": {"temperature": 37}, "ORFfinder": {"minimum_length": 150}}' (type: `TEXT`; default: `{}`)
33+
- `--skip-steps`: Comma-separated list of steps to skip. Example: --skip-steps RNA_annotation,protein_annotation or --skip-steps IRESfinder,RNAMotif or --skip-steps ORFfinder,hmmsearch (type: `TEXT`; default: ``)
34+
- `--secondary-structure-tool`: Tool for secondary structure prediction (type: `CHOICE`; default: `LinearFold`)
35+
- `--ires-tool`: Tool for IRES identification (type: `CHOICE`; default: `IRESfinder`)
36+
- `--trna-tool`: Tool for tRNA identification (type: `CHOICE`; default: `tRNAscan-SE`)
37+
- `--rnamotif-tool`: Tool for RNA sequence motif identification (type: `CHOICE`; default: `lightmotif`)
38+
- `--gene-prediction-tool`: Tool for gene prediction (type: `CHOICE`; default: `pyrodigal`)
39+
- `--domain-db`: Database for domain detection (NOTE: currently packaged with rolypoly data: Pfam, genomad, RVMT) (type: `CHOICE`; default: `Pfam`)
40+
- `--custom-domain-db`: Path to a custom domain database in HMM format (for use with --domain-db custom) (type: `TEXT`; default: ``)
41+
- `--min-orf-length`: Minimum ORF length for gene prediction (type: `INTEGER`; default: `30`)
42+
- `--search-tool`: Tool/command for protein domain detection. hmmer commands are via pyhmmer bindings. (type: `CHOICE`; default: `hmmsearch`)
43+
44+
45+
46+

0 commit comments

Comments
 (0)