🧬 So, this is a simple little Prokka batch annotation wrapper. It needs .fna and .gbk files as input. It automatically extracts information from the .gbk file with BioPython and adds it to the Prokka input (taxonomy info) to properly annotate each genome. The script should give you summary statistics after the annotation.
It can be useful for batch re-annotation of microbial genomes after retrieval from a database (to unify annotation).
- Python 3.6+ (with Biopython)
- Prokka (with all its sub-dependencies)
- Cock, P. J., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., … others. (2009). Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11), 1422–1423.
- Seemann T. Prokka: rapid prokaryotic genome annotation Bioinformatics 2014 Jul 15;30(14):2068-9. PMID:24642063 DOI:10.1093/bioinformatics/btu153 CFF
First, prepare your data: genome1.fna + genome1.gbk; genome2.fna + genome2.gbk etc., in one directory
To run the script:
python prokka_batch_annotator.py <input_dir> <output_dir> <threads> <kingdom>
Example:
python prokka_batch_annotator.py ./genomes ./prokka_results 8 bacteria
You can override taxonomy, for example:
python prokka_batch_annotator.py ./genomes ./output 4 bacteria \ --genus Arenibacter \ --species latericius
- Original code: @georgiibondarev
- Code comments formatted with assistance from Claude Sonnet 4.5 (Anthropic).
- Code style: black