The goal of this pipeline is to assemble phage genomes, annotate genes encoded by the genomes, and assess genome quality and taxonomic classification of the phages.
- Trim sequences using Trimmomatic
- Genome assembly: remove host contaminants from reads, and used leftover (unmapped) reads for assembly using SPADES.
- Perform genome annotation using Bakta.
- Find closely related genomes on NCBI using BLASTn.
- Assess phage genome quality (completeness) using CheckV.
- Assign taxonomy to phages using TaxMyPhage.
This repository contains 2 folders: recipes and scripts.
The recipes folder contains the Apptainer definition files needed to create the Apptainer sif files.
The scripts folder contains the HTcondor .sh and .sub files.
To build the software containers, you will need to start an interactive job, build the container, test it, and move it to a location accessible by the working nodes (e.g. staging, not home). For detailed instructions, visit https://github.com/UW-Madison-Bacteriology-Bioinformatics/chtc-containers.
brief instructions:
cd recipes
nano build.sub
# change the file listed in the transfer_input_files line
condor_submit -i build.sub
# replace "container" with the name of your choice
apptainer build container.sif container.def
apptainer shell -e container.sif
# test container by typing the -h --help command.
exit
mv container.sif /staging/netid/apptainer/.
exit
cd ..
This workflow will create assembled phage genomes, along with genome annotations and taxonomy files. I recommend using Globus.org to transfer files to your ResearchDrive or to your personal endpoint. For instructions, please visit: https://chtc.cs.wisc.edu/uw-research-computing/globus
This pipeline uses the following tools:
- Trimmomatic v0.39 : https://github.com/timflutre/trimmomatic
- SPADES v4.0.0: https://github.com/ablab/spades
- Blastn v.2.16.0: https://blast.ncbi.nlm.nih.gov/Blast.cgi
- CheckV v.1.0.3: https://bitbucket.org/berkeleylab/checkv/src/master/
- TaxMyPhage v.0.3.4: https://github.com/amillard/tax_myPHAGE/blob/main/README.md