While bacteriophages are extremely abundant surprisingly little is known about them. PhageBook is a tool used to analyze and aid in the classification of bacteriophages.
- Clone the PhageBook repository
git clone https://github.com/jblack332017/phagebook - Install pip using the following instructions https://pip.pypa.io/en/stable/installing/
- Install PhageBook, run
sudo pip install --editable .in the phagebook directory - Ensure PhageBook is installed by running
phagebook --help
Phagebook accepts a file containing a single bacteriophage capsid protein amino acid sequence in fasta format. Just like the following phiKZ.fasta:
>phiKZ MCP
MSVHALRELFKHKGEKNYEVFSMEDFIGRLESEIGLNDSVVSQG
RSLISSISHENFGTVQATDIQDAAAIYNKMQMLVNDYGFERVSSSDPQVRAREERVRE
NQITAATMAAIACADETKYIRALRGITKAKASNEDHVKVVQHQFNGPAGGIQVFENGV
GLENYNEKSQRDFRVVTIGYNLAASRQDEFAERIYPTTVINPIEGGVVQVLPYIAVMK
DVYHEVSGVKMDNEEVNMVEAYRDPSILDDESIALIPALDPAGSNADFFVDPALVPPY
TIKNEQNLTITTAPLKANVRLDLMGNSNANLLIQRGMLEVSDTIDPAGRLKNLFVLLG
GKVVKFKVDRLPRAVFQPDLVGDTRNAVIRFDSDDLVVSGDTTFIDGSADGVINDLKT
AKLSLRLSVGFGGTISLSKGDSKFGATDTYVDKVLNEDGQVMDNADPAVKAILDQLTD
LAVIGFELDTRFTNTNRRQRGHLLQTRALQFRHPIPMHAPVTLPMDTMTDEGPGEVVK
ALTVNTNIRNSNNAVKRMLNYLAQLREVVHNGYNRPKFGIIEGALSAVMRPTYRYKEL
DLEKVIDTIKSKDRWDDVCAAILNCVKAELFPAHRDSNIEAAFRVISGNQDETPMYLF
CSDKEIANYLMTKGDDRTLGAYLKYDIVSTNNQLFDGKLVVIPTRAVQQENDILSWGQ
FFYVSTVIADLPITRGGHQVTREIAAIPFNLHVNNIPFALEFKITGFQKVMGETQFNG
KLADLKP
PhageBook will then use blastp and the NCBI Entrata database to compile a a list of similar proteins and their corresponding Bacteriophage genomes.
PhageBook will run the resulting proteins through the multiple sequence alignment tool ClustalW and output the resulting alignment file proteins.aln
PhageBook will also generate a dotplot plot.png using Gepard
The basic PhageBook command is structured phagebook run <email> <path to protein fasta file>
Arguments are required.
emailThis is the email that will be reported to NCBI as part of PhageBook's queriesprotein fasta formatThis is the input protein that will be evaluated
--maxevalueDefault: .15 - The maximum E value accepted during blastp--alignformatDefault: clustal - The output of the alignment--blastp/--no-blastpDefault: blast - Determines if blast will be run
A full command may look like the following:
$ phagebook --maxvalue 1.4 --no-blastp phagebook@example.com phikz.fasta
The phagebook-results file will be generated in directory that you call PhageBook from:
- phagebook-results
-- sequenceIds.txt
-- proteins.fasta
-- genomeIds.txt
-- proteins
--- ... all protein fasta files
-- genomes
--- full.fasta
-- proteins.aln
-- proteins.dnd
-- plot.png
proteins.alnis an alignment file that can be read by clustalxproteins.dndis a dendrogram of the protein alignmentplot.pngis the dot plot generated by Gepard
PhageBook is an open-source project that current supports Mac OS, Linux, and Windows. You are free to alter and use PhageBook. Please contribute and add new functionality and improvements.
