Skip to content

shirleylijie/awesome-microbiome

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 

Repository files navigation

awesome_microbiome

This repository serves as a continuously updated collection of algorithms, tools, databases, and tutorials for microbiome research.

Table of content

Amplicon data analysis

Tools for amplicon

  • UPARSE - (v12.0-beta1, 2024.6)

  • Mothur - (C++, v1.48.1, 2024.5)

    • Schloss PD, Westcott SL, Ryabin T, et al. Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities. Appl Environ Microbiol 75 (2009). https://doi.org/10.1128/AEM.01541-09
  • dada2 - (R, 1.26, 2022.11)

    • Callahan B, McMurdie P, Rosen M, et al. DADA2: High-resolution sample inference from Illumina amplicon data. Nat Methods 13, 581–583 (2016). https://doi.org/10.1038/nmeth.3869
  • QIIME2 - (Java, 2024-05, 2024.5)

    • Bolyen E, Rideout JR, Dillon MR, et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol 37, 852–857 (2019). https://doi.org/10.1038/s41587-019-0209-9
  • RDP Classifier - (Java, v2.14, 2023.8)

    • Wang Q, Cole JR. Updated RDP taxonomy and RDP Classifier for more accurate taxonomic classification. Microbiol Resour Announc 13, e01063-23 (2024). https://doi.org/10.1128/mra.01063-23
  • Tax4Fun1 - Tax4Fun2 - (R, v1.1.6, 2019.11)

    • Aßhauer KP, Wemheuer B, Daniel R, Meinicke P. Tax4Fun: predicting functional profiles from metagenomic 16S rRNA data, Bioinformatics 31(17), 2882–2884 (2015). https://doi.org/10.1093/bioinformatics/btv287
    • Wemheuer F, Taylor JA, Daniel R, et al. Tax4Fun2: prediction of habitat-specific functional profiles and functional redundancy based on 16S rRNA gene sequences. Environmental Microbiome 15(11) (2020). https://doi.org/10.1186/s40793-020-00358-7
  • PICRUSt - (Python, v1.1.4, 2019.6)

    • Langille M, Zaneveld J, Caporaso J, et al. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat Biotechnol 31, 814–821 (2013). https://doi.org/10.1038/nbt.2676
  • PICRUSt2 - (Python, v2.6.2, 2025.4)

  • graftM - (Python, v0.14.0, 2022.5)

    • Boyd JA, Woodcroft BJ, Tyson GW. GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes. Nucleic Acids Res 46(10), e59 (2018). https://doi.org/10.1093/nar/gky174
  • RiboSnake - (snakemake, v0.10.0, 2024.8)

    • Dörr AK, Welling J, Dörr A, et al. RiboSnake – a user-friendly, robust, reproducible, multipurpose and documentation-extensive pipeline for 16S rRNA gene microbiome analysis. Gigabyte (2024). https://doi.org/10.46471/gigabyte.132
  • ssUMI - (Shell, NoReleaseTag)

    • Lin X, Waring K, Ghezzi H, et al. High accuracy meets high throughput for near full-length 16S ribosomal RNA amplicon sequencing on the Nanopore platform. PNAS Nexus 3(10), pgae411 (2024). https://doi.org/10.1093/pnasnexus/pgae411

rRNA Databases

  • EUKARYOME - (18S/ITS/28S, v1.9.2, 2024.8)

    • Tedersoo L, Moghaddam MSH, Mikryukov V, et al. EUKARYOME: the rRNA gene reference database for identification of all eukaryotes, Database 2024, baae043 (2024). https://doi.org/10.1093/database/baae043
  • Silva - (rRNA, v138.2, 2024.7) - A continuously updated rRNA database with high update frequency, currently the largest and most comprehensive of its kind.

    • Quast C, Pruesse E, Yilmaz P, et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res 41(D1), D590–D596 (2013). https://doi.org/10.1093/nar/gks1219
  • Greengenes2 - (16S, v2022.10, 2022.10) - The first version was last updated in 2012. Greengenes2 was released in 2022 and is updated approximately every two years.

    • DeSantis TZ, Hugenholtz P, Larsen N, et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 72(7), 5069-72 (2006). https://doi.org/10.1128/AEM.03006-05
    • McDonald D, Jiang Y, Balaban M, et al. Greengenes2 unifies microbial data in a single reference tree. Nat Biotechnol 42, 715–718 (2024). https://doi.org/10.1038/s41587-023-01845-1
  • MiDAS - (16S, v5, 2024.7)

  • KSGP - (SSU, v3.1)

    • Grant A, Aleidan A, Davies CS, et al. KSGP 3.1: improved taxonomic annotation of Archaea communities using LotuS2, the Genome Taxonomy Database and RNAseq data. ISME Communications ycaf094 (2025). https://doi.org/10.1093/ismeco/ycaf094

End-to-end workflows for metagenome and isolate

Workflows for metagenome

  • Aviary - (Snakemake, v0.9.2, 2024.09)

    • A pipeline for assembly, binning, annotation, and strain diversity analysis.
  • MAG - (Nextflow, v3.0.3, 2024.8)

    • Krakau S, Straub D, Gourlé H, et al. nf-core/mag: a best-practice pipeline for metagenome hybrid assembly and binning. NAR Genomics and Bioinformatics 4(1), lqac007 (2022). https://doi.org/10.1093/nargab/lqac007
  • MetaWRAP - (Shell/Python, v1.3, 2020.08) - A modular pipeline for metagenomic analysis, covering steps such as quality control, assembly, binning, genome refinement, classification, and annotation. Users can independently run any individual module.

  • VEBA - (Python, v2.3.0, 2024.9)

    • Espinoza JL, Dupont CL. VEBA: a modular end-to-end suite for in silico recovery, clustering, and analysis of prokaryotic, microeukaryotic, and viral genomes from metagenomes. BMC Bioinformatics 23, 419 (2022). https://doi.org/10.1186/s12859-022-04973-8
  • ATLAS - (Python, v2.19.0, 2024.7)

    • Kieser S, Brown J, Zdobnov EM, et al. ATLAS: a Snakemake workflow for assembly, annotation, and genomic binning of metagenome sequence data. BMC Bioinformatics 21, 257 (2020). https://doi.org/10.1186/s12859-020-03585-4
  • anvi'o - (Win/MacOS/Linux, v8, 2023.09)

  • nano-rave - (Nextflow, v1.0.0, 2023.01)

    • Girgis ST, Adika E, Nenyewodey FE, et al. Drug resistance and vaccine target surveillance of Plasmodium falciparum using nanopore sequencing in Ghana. Nat Microbiol 8, 2365–2377 (2023). https://doi.org/10.1038/s41564-023-01516-6
  • metagWGS - (Nextflow, v2.4.2, 2023.6)

  • SqueezeMeta - (C/C++/Python/Perl, v1.6.5, 2024.8)

  • slamM - (Snakemake)

  • CAT_pack - (Python, v6.0.1, 2024.3)

    • von Meijenfeldt FAB, Arkhipova K, Cambuy DD, et al. Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT. Genome Biol 20, 217 (2019). https://doi.org/10.1186/s13059-019-1817-x
  • Metagenomics-Toolkit - (Nextflow, v0.4.5, 2024.10)

    • Belmann P, Osterholz B, Kleinbölting N, et al. Metagenomics-Toolkit: The Flexible and Efficient Cloud-Based Metagenomics Workflow featuring Machine Learning-Enabled Resource Allocation. bioRxiv (2024). https://doi.org/10.1101/2024.10.22.619569
  • BugBuster - Nextflow

  • MARTi

Workflows for isolate

  • Bactopia - (Nextflow/Perl, v3.1.0, 2024.9)

  • ASA³P - (Groovy/JS, v1.3.0, 2020.5)

    • Schwengers O, Hoek A, Fritzenwanker M, et al. ASA3P: An automatic and scalable pipeline for the assembly, annotation and higher-level analysis of closely related bacterial isolates. PLoS Comput Biol 16(3), e1007134 (2020). https://doi.org/10.1371/journal.pcbi.1007134
  • microPIPE - (html tutorial Nextflow, step by step)

    • Murigneux V, Roberts LW, Forde BM, et al. MicroPIPE: validating an end-to-end workflow for high-quality complete bacterial genome construction. BMC Genomics 22, 474 (2021). https://doi.org/10.1186/s12864-021-07767-z
  • AQUAMIS - (Python/Shell, v1.4.2, 2024.6)

    • Deneke C, Brendebach H, Uelze L, et al. Species-Specific Quality Control, Assembly and Contamination Detection in Microbial Isolate Sequences with AQUAMIS. Genes 12(5), 644 (2021). https://doi.org/10.3390/genes12050644
  • Nullarbor - (Perl, v1.41, 2018.6) - Pipeline to generate complete public health microbiology reports from sequenced isolates.

  • ProkEvo - jupyter tutorial

    • Pavlovikj N, Gomes-Neto JC, Deogun JS, Benson AK. ProkEvo: an automated, reproducible, and scalable framework for high-throughput bacterial population genomics analyses. PeerJ 9, e11376 (2021). https://doi.org/10.7717/peerj.11376
  • Public Health Bioinformatics - (WDL, v3.1.0, 2025.7) - TheiaCoV(viral), TheiaProk(bacterial)

  • Public Health Bioinformatics - (WDL, v3.1.0, 2025.7) - TheiaEuk

    • Ambrosio FJ, Scribner MR, Wright SM, et al. TheiaEuk: A Species-Agnostic Bioinformatics Workflow for Fungal Genomic Characterization. Frontiers in Public Health 11 (2023). https://doi.org/10.3389/fpubh.2023.1198213
  • rMAP - (Shell/Python, tutorial)

    • Sserwadda I, Mboowa G. rMAP: the Rapid Microbial Analysis Pipeline for ESKAPE bacterial group whole-genome sequence data. Microbial Genomics 7 (2021). https://doi.org/10.1099/mgen.0.000583
  • TORMES - (Shell, v1.3.0, 2021.8)

  • Hybracter - (Python, v0.10.0, 2024.10)

    • Bouras G, Houtak G, Wick R, et al. Hybracter: Enabling Scalable, Automated, Complete and Accurate Bacterial Genome Assemblies. Microbial Genomics 10, 5 (2024). https://doi.org/10.1099/mgen.0.001244

Microbial genomic resources

Prokaryotic genome

Phage genome

  • inphared

    • Cook R, Brown N, Redgwell T, et al. INfrastructure for a PHAge REference Database: Identification of Large-Scale Biases in the Current Collection of Cultured Phage Genomes. Phage 2, 4 (2021). https://doi.org/10.1089/phage.2021.0007
  • PhageDive - Web Search

    • Rolland C, Wittmann J, Reimer LC, et al. PhageDive: the comprehensive strain database of prokaryotic viral diversity, Nucleic Acids Res 53(D1), D819–D825 (2025). https://doi.org/10.1093/nar/gkae878

Quality control of sequencing data

Reads simulation

  • wgsim - (C) Reads simulator

  • badread - (Python, v0.4.1, 2024.2) - a long read simulator that can imitate many types of read problems

Basecall

  • Dorado - (C++, v0.9.0, 2024.12) - Oxford Nanopore's Basecaller

QC

Correct reads

Consensus sequence from long-reads

  • medaka - (Python, v2.0.0, 2024.9) - a tool to create consensus sequences and variant calls from nanopore sequencing data

Alignment and mapping

Short-read to sequence

Long-read to sequence

Sequence to sequence

Multi sequence alignment

Cluster sequence

Metagenome assembly

Short read only assembly

Long read only assembly

Hybrid assembly

PB HiFi read assembly

Polish

Genome size estimation

Other elements assmebly

Assembly improvement

Binning

Tools and workflows

Strain-level resolve

MAG improvement

MAG assessment

  • BUSCO - (Python, v5.8.1, 2024.10)

    • Manni M, Berkeley MR, Seppey M, et al. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Mol Biol Evol 38(10), 4647–4654 (2021). https://doi.org/10.1093/molbev/msab199
  • CheckM - (Python, v1.2.3, 2024.06)

    • Parks DH, Imelfort M, Skennerton C, et al. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25, 1043-1055 (2015). https://doi.org/10.1101/gr.186072.114
  • CheckM2 - (Python, v1.0.2, 2023.05)

    • Chklovski A, Parks DH, Woodcroft BJ et al. CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning. Nat Methods 20, 1203–1212 (2023). https://doi.org/10.1038/s41592-023-01940-w
  • RefineM - (Python, v0.1.2, 2020.11) - Unsupported

    • Parks DH, Rinke C, Chuvochina M, et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat Microbiol 2, 1533–1542 (2017). https://doi.org/10.1038/s41564-017-0012-7
  • MAGpurify - (Python, v2.1.2, 2020.03)

  • GUNC - (Python, v1.0.6, 2023.11)

  • DFAST_QC - (Python, v1.0.5, 2024.09)

  • dRep - (Python, v3.4.2, 2023.02)

    • Olm MR, Brown CT, Brooks B, Banfield JF. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. The ISME Journal 11(12), 2864–2868 (2017). https://doi.org/10.1038/ismej.2017.126
  • galah - (Rust, v0.4.2, 2024.09) - More scalable dereplication for metagenome assembled genomes

  • 'skDER' - (Python, v1.3.3, 2025.7)

    • Salamzade R, Kottapalli A, Kalan LR. skDER and CiDDER: two scalable approaches for microbial genome dereplication. Microbial Genomics 11(7) (2025). https://doi.org/10.1099/mgen.0.001438
  • DAS Tool - (R/Ruby, v1.1.7, 2024.01)

    • Sieber CMK, Probst AJ, Sharrar A, et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat Microbiol 3, 836–843 (2018). https://doi.org/10.1038/s41564-018-0171-1
  • Binette - (Python, v1.0.3, 2024.9) - A fast and accurate binning refinement tool to constructs high quality MAGs from the output of multiple binning tools

  • MAGqual - (Python, v0.3.0, 2024.8)

Other types of genome recovery

  • BEREN - (Python) - No Release Tag

Gene and element prediction

ORF

Non-coding RNA

Signal peptides

Plasmid databases

Taxonomy profile

Profile read

Profile contig

MAG taxonomy

  • GTDB-Tk - (Python, v1.7.0, 2021.10) - A tools for genome classification based on GTDB. Allocating multiple CPUs can cause a dramatic increase in memory usage. Therefore, the program is typically run with a single CPU. It usually requires large memory, and small nodes (eg, with 120Gb RAM) are generally insufficient.

  • GTDB-Tk 2 - (Python, v2.4.1, 2025.04)

  • tronko - C

    • Pipes L, Nielsen R. A rapid phylogeny-based method for accurate community profiling of large-scale metabarcoding datasets. eLife 13, e85794 (2024). https://doi.org/10.7554/eLife.85794
  • kMetaShot - (Python,2024.9)

    • Defazio G, Tangaro MA, Pesole G, et al. kMetaShot: a fast and reliable taxonomy classifier for metagenome-assembled genomes. Briefings in Bioinformatics 26(1), bbae680 (2025). https://doi.org/10.1093/bib/bbae680

Annotation

Tools for annotation

  • eggNOG-mapper v2 - (Python, v2.1.12, 2023.8)

    • Cantalapiedra CP, Hernández-Plaza A, Letunic I, et al. eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Mol Biol Evol 38(12), 5825–5829 (2021). https://doi.org/10.1093/molbev/msab293
  • KofamKOALA - (Ruby, v1.3.0, 2020.05) - A web version

  • BlastKOALA, GhostKOALA - Web

    • Kanehisa M, Sato Y, and Morishima K。 BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J Mol Biol 428, 726-731 (2016). https://doi.org/10.1016/j.jmb.2015.11.006
  • Macrel - (Python, v1.5.0, 2024.9)

  • RGI - (Python, 6.0.3, 2023.9) - This tool is designed for resistome prediction from protein or nucleotide data, with a complementary database required.

    • Alcock BP, Huynh W, Chalil R, et al. CARD 2023: expanded curation, support for machine learning, and resistome prediction at the Comprehensive Antibiotic Resistance Database. Nucleic Acids Res 51(D1), D690–D699 (2023). https://doi.org/10.1093/nar/gkac920
  • MetaCerberus - (Python, v1.4.0, 2024.8)

    • Figueroa JL III, Dhungel E, Bellanger M, et al. MetaCerberus: distributed highly parallelized HMM-based processing for robust functional annotation across the tree of life. Bioinformatics 40(3), btae119 (2024). https://doi.org/10.1093/bioinformatics/btae119
  • GMSC-mapper - (Python, v0.1.0, 2024.04)

  • Bakta - (Python, v1.10.1, 2024.11) - web: https://bakta.computational.bio

    • Schwengers O, Jelonek L, Dieckmann MA, et al. Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification. Microbial Genomics 7, 11 (2021). https://doi.org/10.1099/mgen.0.000685
  • DRAM - (Python, v1.5.0, 2024.1)

    • Shaffer M, Borton MA, McGivern BB, et al. DRAM for distilling microbial metabolism to automate the curation of microbiome function. Nucleic Acids Res 48(16), 8883–8900 (2020). https://doi.org/10.1093/nar/gkaa621
  • graftM - (Python, v0.14.0, 2022.5)

    • Boyd JA, Woodcroft BJ, Tyson GW. GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes. Nucleic Acids Res 46(10), e59 (2018). https://doi.org/10.1093/nar/gky174
  • CD-Search - Web

  • HMMER3 - (C, v3.4, 2023.8)

  • DeepARG - (Python2.7, 2023.11)

    • Arango-Argoty G, Garner E, Pruden A, et al. DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data. Microbiome 6, 23 (2018). https://doi.org/10.1186/s40168-018-0401-z
  • ARGs-OAP - (Python, v3.2.4, 2023.10)

    • Yang Y, Jiang X, Chai B, et al. ARGs-OAP: online analysis pipeline for antibiotic resistance genes detection from metagenomic data using an integrated structured ARG-database. Bioinformatics 32(15), 2346–2351 (2016). https://doi.org/10.1093/bioinformatics/btw136
  • ARGs-OAP3.0 - (Python, v3.2.4, 2023.10)

  • argNorm - (Python, v0.7.0, 2025.3)

    • Perovic SU, Ramji V, Chong H, et al. argNorm: normalization of antibiotic resistance gene annotations to the Antibiotic Resistance Ontology (ARO). Bioinformatics 41(5), btaf173 (2025). https://doi.org/10.1093/bioinformatics/btaf173
  • OrthoLoger - (bash, v3.5.0, 2024.10)

    • Kuznetsov D, Tegenfeldt F, Manni M, et al. OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity. Nucleic Acids Res 51(D1), D445–D451 (2023). https://doi.org/10.1093/nar/gkac998
  • nail - (Rust, v0.2.0, 2024.7)

  • FastOMA - (Python, v0.3.4, 2024.9)

  • BenchAMRking - web workflow AAMR detection

    • Strepis N, Dollee D, Vrins D, et al. BenchAMRking: a Galaxy-based platform for illustrating the major issues associated with current antimicrobial resistance (AMR) gene prediction workflows. BMC Genomics 26, 27 (2025). https://doi.org/10.1186/s12864-024-11158-5
  • OMArk - (Python, v0.3.0, 2023.10)

  • mettannotator - (Nextflow, 2024.12, v1.4.0)

    • Gurbich TA, Beracochea M, De Silva NH, et al. mettannotator: a comprehensive and scalable Nextflow annotation pipeline for prokaryotic assemblies. Bioinformatics 41(2), btaf037 (2025). https://doi.org/10.1093/bioinformatics/btaf037
  • pseudofinder - (Python, v1.1.0, 2022.3)

  • AntiDeffenseFinder - Web Service

    • Tesson F, Huiting E, Wei L, et al. Exploring the diversity of anti-defense systems across prokaryotes, phages and mobile genetic elements. Nucleic Acids Res 53(1), gkae1171 (2025). https://doi.org/10.1093/nar/gkae1171
  • Genomic conttext - Python - a step by step tutorial

    • Toibazar D, Kulmanov M, Hoehndorf R. Context-based protein function prediction in bacterial genomes. bioRxiv (2024). Context-based protein function prediction in bacterial genomes
  • OMAnnotator, Python

  • MHCScan - Python

    • Garber AI, Nealson KH and Merino N. Large-scale prediction of outer-membrane multiheme cytochromes uncovers hidden diversity of electroactive bacteria and underlying pathways. Front Microbiol 15, 1448685 (2024). https://doi.org/10.3389/fmicb.2024.1448685
  • DRAMMA - Python

    • Rannon E, Shaashua S & Burstein D. DRAMMA: a multifaceted machine learning approach for novel antimicrobial resistance gene detection in metagenomic data. Microbiome 13, 67 (2025). https://doi.org/10.1186/s40168-025-02055-4
  • ChroQueTas - (Shell, v0.4.2, 2024.10) - Fungi, work with FungAMR

  • MICROPHERRET - Python

    • Bizzotto E, Fraulini S, Zampieri G, et al. MICROPHERRET: MICRObial PHEnotypic tRait ClassifieR using Machine lEarning Techniques. Environmental Microbiome 19, 58 (2024). https://doi.org/10.1186/s40793-024-00600-6

Databases for annotation

  • eggNOG 6.0 - (AA Genes, v6.0, 2022.09)

    • Hernández-Plaza A, Szklarczyk D, Botas J, et al. eggNOG 6.0: enabling comparative genomics across 12 535 organisms. Nucleic Acids Res 51(D1), D389–D394 (2023). https://doi.org/10.1093/nar/gkac1022
  • KEGG - (AA Genes, 111.0, 2024.08)

    • Kanehisa M, Sato Y, Kawashima M, et al. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res 44(D1), D457–D462 (2016). https://doi.org/10.1093/nar/gkv1070
  • CAZy - AA, web

    • Drula E, Garron ML, Dogan S, et al. The carbohydrate-active enzyme database: functions and literature. Nucleic Acids Res 50(D1), D571–D577 (2022). https://doi.org/10.1093/nar/gkab1045
  • CARD -(ARG, v3.3.0, 2024.08)

    • Alcock BP, Huynh W, Chalil R, et al. CARD 2023: expanded curation, support for machine learning, and resistome prediction at the Comprehensive Antibiotic Resistance Database. Nucleic Acids Res 51(D1), D690–D699 (2023). https://doi.org/10.1093/nar/gkac920
  • UniProt - (AA genes, v2024_5)

  • ISOSDB - (IS Sequences, v3, 2023.12)

    • Kirsch JM, Hryckowian A, Duerkop B. A metagenomics pipeline reveals insertion sequence-driven evolution of the microbiota. Cell Host Microbe 32(5), 739-754.e4 (2024). https://doi.org/10.1016/j.chom.2024.03.005
  • AMPSphere - (AMP, v2022-03, 2022.03)

    • Santos-Júnior CD, Torres MDT, Duan Y, et al. Discovery of antimicrobial peptides in the global microbiome with machine learning. Cell 187(14), 3761-3778.e16 (2024). https://doi.org/10.1016/j.cell.2024.05.013
  • DRAMP - (AMP, v4.0, 2024.09)

    • Shi G, Kang X, Dong F, et al. DRAMP 3.0: an enhanced comprehensive data repository of antimicrobial peptides. Nucleic Acids Res 50(D1), D488–D496 (2022). https://doi.org/10.1093/nar/gkab651
  • GMSC - (smORF, v1.0, 2024.08)

  • TCDB - (transporters, 2024-09, 2024.09)

    • Saier MH, Reddy VS, Moreno-Hagelsieb G, et al. The Transporter Classification Database (TCDB): 2021 update. Nucleic Acids Res 49(D1), D461–D467 (2021). https://doi.org/10.1093/nar/gkaa1004
  • VFDB - (virulence factors, 2024.9)

    • Liu B, Zheng D, Zhou S, et al. VFDB 2022: a general classification scheme for bacterial virulence factors. Nucleic Acids Res 50(D1), D912–D917 (2022). https://doi.org/10.1093/nar/gkab1107
  • DoriC - (oriC, v12.1)

    • Dong MJ, Luo H, Gao F. DoriC 12.0: an updated database of replication origins in both complete and draft prokaryotic genomes. Nucleic Acids Res 51(D1), D117–D120 (2023). https://doi.org/10.1093/nar/gkac964
  • TIGRAFMs - (protein sequence, v15.0, 2014.9)

    • Haft DH, Loftus BJ, Richardson DL, et al. TIGRFAMs: a protein family resource for the functional identification of proteins, Nucleic Acids Res 29(1), 41–43 (2001). https://doi.org/10.1093/nar/29.1.41
  • Pfam - (protein sequences, v37.0, 2024.5)

  • Rfam - (RNA families, v15, 2024.9)

    • Kalvari I, Nawrocki EP, Ontiveros-Palacios N, et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res 49(D1), D192–D200 (2021). https://doi.org/10.1093/nar/gkaa1047
  • PHI - (Pathogen Host Interactions, v5.0, 2024.3)

    • Urban M, Cuzick A, Seager J, et al. PHI-base in 2022: a multi-species phenotype database for Pathogen–Host Interactions. Nucleic Acids Res 50(D1), D837–D847 (2022). https://doi.org/10.1093/nar/gkab1037
  • oriTDB

    • Liu G, Li X, Guan J, et al. oriTDB: a database of the origin-of-transfer regions of bacterial mobile genetic elements. Nucleic Acids Res 53(D1), D163–D168 (2025). https://doi.org/10.1093/nar/gkae869
  • COG - FTP: https://ftp.ncbi.nlm.nih.gov/pub/COG/

  • OrthoDB - v11.0

    • Kuznetsov D, Tegenfeldt F, Manni M, et al. OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity. Nucleic Acids Res 51(D1), D445–D451 (2023). https://doi.org/10.1093/nar/gkac998
  • CAZyme3D - web

  • S9BactDB - S9 proteases, web

  • FungAMR - (Fungi AMR) - work with ChroQueTas

Gennome content analysis

Protein structure analysis

Elements

  • DeepInverton - Python - NoReleaseTag
    • Wen J, Zhang H, Chu D, et al. Deep learning revealed the distribution and evolution patterns for invertible promoters across bacterial lineages. Nucleic Acids Res 52(21), 12817–12830 (2024). https://doi.org/10.1093/nar/gkae966

Bacteriophage

Metabolic construction

Tools for metabolic analysis

  • antiSMASH - (Python, v7.1.0.1, 2023.11)

    • Blin K, Shaw S, Augustijn HE, et al. antiSMASH 7.0: new and improved predictions for detection, regulation, chemical structures and visualisation. Nucleic Acids Res 51(W1), W46–W50 (2023). https://doi.org/10.1093/nar/gkad344
  • MelonnPan - R, dev

    • Mallick H, Franzosa EA, Mclver LJ, et al. Predictive metabolomic profiling of microbial communities using amplicon or metagenomic sequences. Nat Commun 10, 3136 (2019). https://doi.org/10.1038/s41467-019-10927-1
  • HUMAnN 2.0 - (Python, 最新2版v2.8.2, 2020.4)

  • HUMAnN 3.0 - (Python, 最新3版v3.9, 2024.2)

    • Beghini F, McIver LJ, Blanco-Míguez A, et al. Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. eLife 10, e65088 (2021). https://doi.org/10.7554/eLife.65088
  • HUMAnN 4.0 - (Python, 4.0.0.alpha.1-final, 22024.7) - active dev

  • gapseq - (R, v1.3.1, 2024.8)

    • Zimmermann J, Kaleta C & Waschina S. gapseq: informed prediction of bacterial metabolic pathways and reconstruction of accurate metabolic models.Genome Biol 22, 81 (2021). https://doi.org/10.1186/s13059-021-02295-1
  • ./gapseq pan - (R, v1.3.1, 2024.8) - The source code has been integrated into gapseq and is accessible via ./gapseq pan

    • De Bernardini N, Zampieri G, Campanaro S, et al. pan-Draft: automated reconstruction of species-representative metabolic models from multiple genomes. Genome Biol 25, 280 (2024). https://doi.org/10.1186/s13059-024-03425-1
  • Bactabolize - (Python, v1.0.4, 2024.12)

    • Vezina B, Watts SC, Hawkey J, et al. Bactabolize is a tool for high-throughput generation of bacterial strain-specific metabolic models. eLife 12, RP87406 (2023). https://doi.org/10.7554/eLife.87406.3

Metabolic databases

  • mVOC 4.0

  • SMC - Web

    • Udwary DW, Doering DT, Foster B, et al. The secondary metabolism collaboratory: a database and web discussion portal for secondary metabolite biosynthetic gene clusters. Nucleic Acids Res 53(D1), D717–D723 (2025). https://doi.org/10.1093/nar/gkae1060

Comparative genomics

AAI and ANI

view comparative map

  • gggenes - Draw gene arrow maps in ggplot2

  • LoVis4u - (Python, v0.0.11, 2024.10)

  • gggenomes - gggenomes: A Grammar of Graphics for Comparative Genomics

  • plasmapR - Creating plasmid maps inside ggplot

  • geneviewer - (R, CRAN release, 2025.1) - An R package designed for drawing gene arrow maps

HGT

SV

  • SVbyEye - R, active dev

  • CompareM - (Python, v0.1.2, 2020.12) - Unsupported. Last version relsease on Dec 31 2020

  • gcSV - C/C++, human

  • minipileup - (C, v1.1, 2025.2) - Minipileup is a simple pileup-based variant caller. It takes a reference FASTA and one or multiple alignment BAM as input, and outputs a multi-sample VCF along with allele counts

Visualization

View MSA

  • MSAplot - Python, no release tag. Plot multiple sequence alignment (MSA),with Jupyter notes

  • pyMSAviz - (Python, v0.5.0, 2024.9)

    • Shimoyama Y. pyMSAviz: MSA visualization python package for sequence analysis [Computer software] (2022)
  • ggmsa - (R, v1.0.2, 2021.8) - Visualizing publication-quality multiple sequence alignment using ggplot2,, CRAN移除了

  • IGV

    • Thorvaldsdóttir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Briefings in Bioinformatics 14(2), 178–192 (2013). https://doi.org/10.1093/bib/bbs017
  • pymsaploter - plot Multiple Sequence Alignment (MSA) of Clustal by python package, nuc only

View genome

  • pyGenomeViz - (Python, v1.4.1, 2024.9) - A genome visualization python package for comparative genomics

  • pyCircos - (Python, v0.3.0, 2022.4)

  • pyCirclize - (Python, v1.7.1, 2024.9) - Shimoyama Y. (2022). pyCirclize: Circular visualization in Python [Computer software]

  • circlize - (R, v0.4.16, 2024.2)

View assemblies

Phylogenetics

Build a tree

View tree

  • iTol - Web

    • Letunic I, Bork P. Interactive Tree of Life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Res 52(W1), W78–W82 (2024). https://doi.org/10.1093/nar/gkae268
  • ARB - (需要转发, arb-7.0, 2021.09)

  • FigTree - (Java, v1.4.5-pre, 2024.02)

  • GraPhlAn - (Python, 1.1.3, 2020.06)

    • Asnicar F, Weingart G, Tickle TL, et al. Compact graphical representation of phylogenetic data and metadata with GraPhlAn. PeerJ 3, e1029 (2015). https://doi.org/10.7717/peerj.1029
  • phyTreeViz - (Python, v0.2.0, 2024.1) - Shimoyama Y. (2023). phyTreeViz: Simple phylogenetic tree visualization python package [Computer software]

Microbial diversity analysis

Abundance

Diversity

Network

Interactions

Metatranscriptomics

RNA quantification

RNA assembly

RNA SV

Stats

Surveillance

Modifications

  • Robin - (Python, Robin-d2f9b3, 2025.1) - A package to run real time analysis of nanopore methylation data

    • Deacon S, Cahyani I, Holmes N, et al. ROBIN: A unified nanopore-based assay integrating intraoperative methylome classification and next-day comprehensive profiling for ultra-rapid tumor diagnosis. Neuro-Oncology noaf103 (2025). https://doi.org/10.1093/neuonc/noaf103
  • uncalled4 - (Python, v4.1.0, 2024.8)

    • Kovaka S, Hook PW, Jenike KM, et al. Uncalled4 improves nanopore DNA and RNA modification detection via fast and accurate signal alignment. Nat Methods 22, 681–691 (2025). https://doi.org/10.1038/s41592-025-02631-4

Pangenome related

Contact

This repository was created and maintained by Jie Li.

About

Curated list of algorithms, tools, and databases for microbiome data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published