Update README.md

senorred · web-flow · commit 9b6f44fd1f84 · 2019-08-15T13:04:24.000-07:00
diff --git a/README.md b/README.md
@@ -23,6 +23,27 @@ scikit-learn==0.21.3
 
 Eventually the project will be available on PyPI.
 
+## Methods
+
+Experimentally verified virulence factor genes from the Virulence Factor Database (VFDB: Chen et al 2015, Accessed 8/13/19 https://academic.oup.com/nar/article/44/D1/D694/2503049) were used to represent virulence-associated genes. Example metagenomes used for testing were drawn from public datasets listed on NCBI SRA and included healthy and disease-state human skin metagenomic samples. Specifically, diseased metagenomes were drawn from the Diabetic Foot Ulcer metagenome study (BioProject: PRJNA506988) and healthy foot skin metagenomes were drawn from BioProject: PRJEB30094. Metagenomes were assembled using MetaSPADES. (Nurk et al https://www.ncbi.nlm.nih.gov/pubmed/28298430 )
+
+A Hidden Markov Model (HMM) was applied to the VFDB genes to create virulence profiles. Genes were selected for which at least five different bacterial species were available. Multiple sequence alignments were generated using MUSCLE [1] and HMMs using HMMER3 [2]. Genomes and/or corresponding protein coding sequences were screened with HMMSEARCH[2] using pre-computed significance scores. Scores were calculated as 80% of the envelope alignment score of a representative sequence corresponding to its HMM. Alignments were filtered by custom scripts to extract putative virulence factors’ loci. VF sequences were concatenated, aligned and used as input for phylogenetic analyses. Phylogenetic trees were constructed using RAXML-ng [3] and analyzed using R package Ape[4] and Newick Utilities[5]. Virulence tags were assigned based on the number of virulence loci found and phylogenetic classification. All analyses are described in Snakemake pipeline[6].
+
+A SVM model was also developed to classify virulent and non-virulent gene segments by training on a reference set of labelled pathogen and commensal genomes. The pathogen genomes were acquired from an NCBI Assembly search and included the species identified in the VFDB dataset. Commensal genomes were also acquired from an NCBI Assembly search, and included species selected from the NHSN Common Commensals List (https://www.cdc.gov/nhsn/pdfs/pscmanual/4psc_clabscurrent.pdf) and from Busby et al 2012 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5866053/). 
+
+
+
+## Implementation
+
+## Operation
+![workflow](https://github.com/NCBI-Hackathons/Virulence_Factor_Characterization/blob/master/VFCflow.png)
+
+## Results
+
+## Data and software availability
+
+Data is available from the NCBI Sequence Read Archive under projects ERP112507 and SRP170931. "VirFac" software is made available under the MIT License (see [LICENSE](https://github.com/NCBI-Hackathons/Virulence_Factor_Characterization/blob/master/LICENSE).)
+
 ## Authors
 
 Ousmane H. Cissé  
@@ -34,7 +55,7 @@ Shennan Lu
 
 Liz Norred  
 elizabeth.norred@gmail.com  
-University of Tennessee, Knoxville  
+Bredesen Center for Interdisciplinary Research and Education, University of Tennessee, Knoxville, TN 37922  
 Knoxville TN, 37922 
 
 Justin Payne  
@@ -44,30 +65,10 @@ College Park MD, 20710
 
 Sherry Bhalla  
 sherry.bhalla@mssm.edu  
-Icahn School of Medicine at Mount Sinai, New York, NY, 20019    
-ORCID: 0000-0001-7827-4050  
-Bredesen Center for Interdisciplinary Research and Education, University of Tennessee, Knoxville, TN 37922  
-
-## Methods
-
-Experimentally verified virulence factor genes from the Virulence Factor Database (VFDB: Chen et al 2015, Accessed 8/13/19 https://academic.oup.com/nar/article/44/D1/D694/2503049) were used to represent virulence-associated genes. Example metagenomes used for testing were drawn from public datasets listed on NCBI SRA and included healthy and disease-state human skin metagenomic samples. Specifically, diseased metagenomes were drawn from the Diabetic Foot Ulcer metagenome study (BioProject: PRJNA506988) and healthy foot skin metagenomes were drawn from BioProject: PRJEB30094. Metagenomes were assembled using MetaSPADES. (Nurk et al https://www.ncbi.nlm.nih.gov/pubmed/28298430 )
-
-A Hidden Markov Model (HMM) was applied to the VFDB genes to create virulence profiles. Genes were selected for which at least five different bacterial species were available. Multiple sequence alignments were generated using MUSCLE [1] and HMMs using HMMER3 [2]. Genomes and/or corresponding protein coding sequences were screened with HMMSEARCH[2] using pre-computed significance scores. Scores were calculated as 80% of the envelope alignment score of a representative sequence corresponding to its HMM. Alignments were filtered by custom scripts to extract putative virulence factors’ loci. VF sequences were concatenated, aligned and used as input for phylogenetic analyses. Phylogenetic trees were constructed using RAXML-ng [3] and analyzed using R package Ape[4] and Newick Utilities[5]. Virulence tags were assigned based on the number of virulence loci found and phylogenetic classification. All analyses are described in Snakemake pipeline[6].
-
-A SVM model was also developed to classify virulent and non-virulent gene segments by training on a reference set of labelled pathogen and commensal genomes. The pathogen genomes were acquired from an NCBI Assembly search and included the species identified in the VFDB dataset. Commensal genomes were also acquired from an NCBI Assembly search, and included species selected from the NHSN Common Commensals List (https://www.cdc.gov/nhsn/pdfs/pscmanual/4psc_clabscurrent.pdf) and from Busby et al 2012 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5866053/). 
-
-
+Icahn School of Medicine at Mount Sinai 
+New York, NY, 20019    
+ 
 
-## Implementation
-
-## Operation
-![workflow](https://github.com/NCBI-Hackathons/Virulence_Factor_Characterization/blob/master/VFCflow.png)
-
-## Results
-
-## Data and software availability
-
-Data is available from the NCBI Sequence Read Archive under projects ERP112507 and SRP170931. "VirFac" software is made available under the MIT License (see [LICENSE](https://github.com/NCBI-Hackathons/Virulence_Factor_Characterization/blob/master/LICENSE).)
 
 ## Acknowledgements
 Diabetic Foot Ulcer dataset provided courtesy of UMaryland/CosmosID and described at