Web of life Integrated Processing Ecosystem (WIPE) is a lightweight CLI toolkit for processing large genome collections (e.g., WoL-style genome indices). It provides batch utilities for genome QC, annotation, summary compilation, and reference/functional database helpers.
Designed for HPC-style workflows where per-genome outputs are written to organized directories and later “compiled” into summary tables.
git clone https://github.com/sherlyn99/wipe.git
cd wipe
pip install -e .
git clone https://github.com/sherlyn99/wipe.git
cd wipe
conda env create -f wipe.yml
conda activate wipe
pip install -e .
Check that the CLI is available:
wipe --help
WIPE is a click CLI with top-level commands plus two subgroups: gsearch and uniref.
Process/merge metadata tables (behavior depends on which optional file paths are provided).
wipe metadata -m metadata.tsv -o OUTDIR
Run CheckM2 QC over genomes described by a metadata TSV.
wipe qc \
-m metadata.tsv \
-o OUTDIR \
-db /path/to/CheckM2_database \
-t 16
Compile per-genome outputs into summary tables.
wipe compile \
-i RESULTS_DIR \
-o OUTDIR \
--checkm2 --linearization --kofamscan --barrnap --proteins \
-c
Batch genome linearization / standardization utilities.
wipe linearize -o OUTDIR
Optional examples:
wipe linearize -o OUTDIR --metadata metadata.tsv
wipe linearize -o OUTDIR --gap "N*20" # stitch contigs together with 20 'N's
wipe linearize -o OUTDIR --filt "plasmid,phage" # filter out plasmids and phages
Batch 16S and phylogenetic marker gene annotation using KO profiles + KO list (and other configured resources).
wipe annotate \
--metadata metadata.tsv \
-o OUTDIR \
--rrna-cutoff 0.67 \
--ko-profiles ko_profiles.tsv \
--ko-list ko_list.tsv \
--tmp-dir /scratch/tmp \
--nthreads 16
Per-genome KO annotation (only) helper driven by metadata.tsv.
wipe annotate-ko \
--metadata metadata.tsv \
-o OUTDIR \
--ko-profiles ko_profiles.tsv \
--ko-list ko_list.tsv \
--tmp-dir /scratch/tmp \
--nthreads 16
Gsearch is an ultra-fast and scalable genome search / classification tool for finding the closest matches of query genomes against very large reference collections (hundreds of thousands of genomes).
Create a genome-search database.
wipe gsearch create -i GENOMES_DIR -o DB_OUTDIR -t 8
Update an existing genome-search database.
wipe gsearch update \
-i NEW_GENOMES_DIR \
-db EXISTING_DB_DIR \
-o DB_OUTDIR \
-t 8 \
-b BACKUP_DIR
This subgropu of commands allow users to generate uniref annotations of genomes, which can then be used to generate genome-function stratified tables using Woltka.
Download UniRef from UniProt.
wipe uniref download --level 90 -o uniref90_download/
Build a DIAMOND database from UniRef FASTA.
wipe uniref build -i uniref90.fasta.gz -o db/uniref90.dmnd -t 16
Process UniRef XML → TSV.
wipe uniref process -i uniref90.xml.gz -o uniref90.tsv
Run DIAMOND blastp against the UniRef DB.
wipe uniref blastp -i PROTEIN_DIR -db db/uniref90.dmnd -o OUTDIR -t 16
Merge UniRef90 and UniRef50 mapping files.
wipe uniref merge-maps \
--uniref90-map uniref90.map.tsv \
--uniref50-map uniref50.map.tsv \
-o merged.map.tsv \
--simplify
Extract UniRef names for IDs in a mapping file.
wipe uniref extract-names \
--map-file merged.map.tsv \
--uniref90-names uniref90_names.tsv \
--uniref50-names uniref50_names.tsv \
-o merged.names.tsv
Many commands assume a metadata TSV with at least:
genome_idlgenome_path(path to genome FASTA, often.fna.gz)
Additional required columns depend on the workflow stage and enabled modules.
Most commands write:
- per-genome outputs under
OUTDIR/<genome_id>/... - compiled summary tables in
OUTDIR/ - logs and “failed genome IDs” lists when applicable