11# votuderep
22
33[ ![ Test] ( https://github.com/quadram-institute-bioscience/votuderep/actions/workflows/test.yml/badge.svg )] ( https://github.com/quadram-institute-bioscience/votuderep/actions/workflows/test.yml )
4+ ![ PyPI - Version] ( https://img.shields.io/pypi/v/votuderep )
5+ ![ PyPI - Downloads] ( https://img.shields.io/pypi/dm/votuderep )
6+ ![ Conda Version] ( https://img.shields.io/conda/vn/bioconda/votuderep )
7+ ![ Conda Downloads] ( https://img.shields.io/conda/dn/bioconda/votuderep )
48
59
610![ Logo] ( https://github.com/quadram-institute-bioscience/votuderep/raw/main/votuderep.png )
@@ -10,10 +14,13 @@ using the CheckV method.
1014
1115## Features
1216
13- - ** Dereplicate vOTUs** : Remove redundant viral sequences using BLAST-based ANI clustering
14- - ** Filter by CheckV metrics** : Filter viral contigs based on quality, completeness, and other metrics
15- - ** Tabulate reads** : Generate CSV tables from paired-end sequencing read directories
16- - ** Download training data** : Fetch viral assembly datasets for training purposes
17+ A small toolkit developed for the [ EBAME] ( https://maignienlab.gitlab.io/ebame/ ) workshop with subcommands:
18+
19+ - ** derep** : Remove redundant viral sequences using BLAST-based ANI clustering
20+ - ** filter** : Filter viral contigs based on quality, completeness, and other metrics from CheckV tsv output
21+ - ** tabulate** : Generate CSV tables from paired-end sequencing read directories (for nextflow)
22+ - ** trainingdata** : Fetch viral assembly datasets for training purposes
23+ - ** getdbs** : Download Genomad and CheckV databases
1724
1825## Requirements
1926
@@ -54,7 +61,7 @@ brew install blast
5461
5562## Usage
5663
57- votuderep provides four main commands : ` derep ` , ` filter ` , ` tabulate ` , and ` trainingdata ` .
64+ votuderep provides subcommands : ` derep ` , ` filter ` , ` tabulate ` , and ` trainingdata ` .. .
5865
5966### Dereplicate vOTUs
6067
@@ -88,14 +95,7 @@ votuderep derep -i viral_contigs.fasta -o dereplicated.fasta \
8895votuderep derep -i viral_contigs.fasta -o dereplicated.fasta \
8996 --keep --tmp ./temp_dir
9097```
91-
92- ** How it works:**
93-
94- 1 . Creates a BLAST database from input sequences
95- 2 . Performs all-vs-all BLASTN comparison
96- 3 . Calculates ANI (Average Nucleotide Identity) and coverage
97- 4 . Clusters sequences using greedy centroid-based algorithm
98- 5 . Outputs the longest sequence from each cluster (representative)
98+
9999
100100### Filter by CheckV
101101
0 commit comments