We used parallel machine learning methods to approach the problem of characterizing virulence factors (VF) in diseased and healthy metagenomes. Using genes from the core set of the Virulence Factor Database (http://www.mgc.ac.cn/VFs/), we used an HMM to profile known virulence factors and apply profiles to diseased and healthy metagenomes. In parallel to this approach, we used a set of labelled pathological and commensal genomes and subtracted the VFDB virulence factor genes from both sets. We then trained the VF-subtracted genomes on an SVM model to classify pathogenic and non-pathogenic genomes. Both techniques form a complementary approach to VF characterization by using well-characterized virulence factors and commensal genomes to profile similar characteristics in the metagenome space (HMM), and by exploring the potential for virulent genes uncharacterized by the VF dataset within the same metagenomes. This combination of techniques can provide spatially-resolved scoring within the metagenome to identify potential virulence factors.
0 commit comments