A Nextflow pipeline for host read removal from paired-end sequencing data. Given a samplesheet of FASTQ files, hostzap runs one or more host depletion tools and outputs the cleaned reads alongside summary statistics.
| Tool | Method |
|---|---|
| Kraken2 | k-mer classification |
| BBMap | Sequence alignment |
| Hostile | Targeted host removal |
| Deacon | Host sequence depletion |
Each tool can be skipped individually with --skip_kraken2, --skip_bbmap,
--skip_hostile, or --skip_deacon.
nextflow run main.nf \
--input samplesheet.csv \
--outdir results \
--kraken2_db /path/to/kraken2_db \
--hostile_index human-t2t-hlaThe samplesheet must be a CSV with three columns:
sample-id,forward-absolute-filepath,reverse-absolute-filepath
sample1,/data/sample1_R1.fastq.gz,/data/sample1_R2.fastq.gz
- Nextflow (DSL2)
- Docker, Singularity (conda is not tested and not recommended)
