WIPE

Web of life Integrated Processing Ecosystem (WIPE) is a lightweight CLI toolkit for processing large genome collections (e.g., WoL-style genome indices). It provides batch utilities for genome QC, annotation, summary compilation, and reference/functional database helpers.

Designed for HPC-style workflows where per-genome outputs are written to organized directories and later “compiled” into summary tables.

Installation

Option A: Install from source (editable)

git clone https://github.com/sherlyn99/wipe.git
cd wipe
pip install -e .

Option B: Conda environment (recommended on clusters)

Create from `wipe.yml` (recommended)

git clone https://github.com/sherlyn99/wipe.git
cd wipe
conda env create -f wipe.yml
conda activate wipe
pip install -e .

Quickstart

Check that the CLI is available:

wipe --help

Command overview

WIPE is a click CLI with top-level commands plus two subgroups: gsearch and uniref.

Top-level commands

`wipe metadata`

Process/merge metadata tables (behavior depends on which optional file paths are provided).

wipe metadata -m metadata.tsv -o OUTDIR

`wipe qc`

Run CheckM2 QC over genomes described by a metadata TSV.

wipe qc \
  -m metadata.tsv \
  -o OUTDIR \
  -db /path/to/CheckM2_database \
  -t 16

`wipe compile`

Compile per-genome outputs into summary tables.

wipe compile \
  -i RESULTS_DIR \
  -o OUTDIR \
  --checkm2 --linearization --kofamscan --barrnap --proteins \
  -c

`wipe linearize`

Batch genome linearization / standardization utilities.

wipe linearize -o OUTDIR

Optional examples:

wipe linearize -o OUTDIR --metadata metadata.tsv
wipe linearize -o OUTDIR --gap "N*20" # stitch contigs together with 20 'N's
wipe linearize -o OUTDIR --filt "plasmid,phage" # filter out plasmids and phages

`wipe annotate`

Batch 16S and phylogenetic marker gene annotation using KO profiles + KO list (and other configured resources).

wipe annotate \
  --metadata metadata.tsv \
  -o OUTDIR \
  --rrna-cutoff 0.67 \
  --ko-profiles ko_profiles.tsv \
  --ko-list ko_list.tsv \
  --tmp-dir /scratch/tmp \
  --nthreads 16

`wipe annotate-ko`

Per-genome KO annotation (only) helper driven by metadata.tsv.

wipe annotate-ko \
  --metadata metadata.tsv \
  -o OUTDIR \
  --ko-profiles ko_profiles.tsv \
  --ko-list ko_list.tsv \
  --tmp-dir /scratch/tmp \
  --nthreads 16

`gsearch` subgroup

Gsearch is an ultra-fast and scalable genome search / classification tool for finding the closest matches of query genomes against very large reference collections (hundreds of thousands of genomes).

`wipe gsearch create`

Create a genome-search database.

wipe gsearch create -i GENOMES_DIR -o DB_OUTDIR -t 8

`wipe gsearch update`

Update an existing genome-search database.

wipe gsearch update \
  -i NEW_GENOMES_DIR \
  -db EXISTING_DB_DIR \
  -o DB_OUTDIR \
  -t 8 \
  -b BACKUP_DIR

`uniref` subgroup

This subgropu of commands allow users to generate uniref annotations of genomes, which can then be used to generate genome-function stratified tables using Woltka.

`wipe uniref download`

Download UniRef from UniProt.

wipe uniref download --level 90 -o uniref90_download/

`wipe uniref build`

Build a DIAMOND database from UniRef FASTA.

wipe uniref build -i uniref90.fasta.gz -o db/uniref90.dmnd -t 16

`wipe uniref process`

Process UniRef XML → TSV.

wipe uniref process -i uniref90.xml.gz -o uniref90.tsv

`wipe uniref blastp`

Run DIAMOND blastp against the UniRef DB.

wipe uniref blastp -i PROTEIN_DIR -db db/uniref90.dmnd -o OUTDIR -t 16

`wipe uniref merge-maps`

Merge UniRef90 and UniRef50 mapping files.

wipe uniref merge-maps \
  --uniref90-map uniref90.map.tsv \
  --uniref50-map uniref50.map.tsv \
  -o merged.map.tsv \
  --simplify

`wipe uniref extract-names`

Extract UniRef names for IDs in a mapping file.

wipe uniref extract-names \
  --map-file merged.map.tsv \
  --uniref90-names uniref90_names.tsv \
  --uniref50-names uniref50_names.tsv \
  -o merged.names.tsv

Input conventions

Many commands assume a metadata TSV with at least:

genome_id
lgenome_path (path to genome FASTA, often .fna.gz)

Additional required columns depend on the workflow stage and enabled modules.

Output layout

Most commands write:

per-genome outputs under OUTDIR/<genome_id>/...
compiled summary tables in OUTDIR/
logs and “failed genome IDs” lists when applicable

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
tests		tests
wipe		wipe
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py
wipe.yml		wipe.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WIPE

Installation

Option A: Install from source (editable)

Option B: Conda environment (recommended on clusters)

Create from `wipe.yml` (recommended)

Quickstart

Command overview

Top-level commands

`wipe metadata`

`wipe qc`

`wipe compile`

`wipe linearize`

`wipe annotate`

`wipe annotate-ko`

`gsearch` subgroup

`wipe gsearch create`

`wipe gsearch update`

`uniref` subgroup

`wipe uniref download`

`wipe uniref build`

`wipe uniref process`

`wipe uniref blastp`

`wipe uniref merge-maps`

`wipe uniref extract-names`

Input conventions

Output layout

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WIPE

Installation

Option A: Install from source (editable)

Option B: Conda environment (recommended on clusters)

Create from wipe.yml (recommended)

Quickstart

Command overview

Top-level commands

wipe metadata

wipe qc

wipe compile

wipe linearize

wipe annotate

wipe annotate-ko

gsearch subgroup

wipe gsearch create

wipe gsearch update

uniref subgroup

wipe uniref download

wipe uniref build

wipe uniref process

wipe uniref blastp

wipe uniref merge-maps

wipe uniref extract-names

Input conventions

Output layout

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Create from `wipe.yml` (recommended)

`wipe metadata`

`wipe qc`

`wipe compile`

`wipe linearize`

`wipe annotate`

`wipe annotate-ko`

`gsearch` subgroup

`wipe gsearch create`

`wipe gsearch update`

`uniref` subgroup

`wipe uniref download`

`wipe uniref build`

`wipe uniref process`

`wipe uniref blastp`

`wipe uniref merge-maps`

`wipe uniref extract-names`

Packages