-
Notifications
You must be signed in to change notification settings - Fork 1
Investigating BGCs from Corynebacterium simulans
Here, we will take a closer look at the testing dataset included as part of codoff. What is the isolate Corynebacterium simulans PES1 and why is it the test dataset anyway?
This genome features a BGC we investigated in depth for the lsaBGC manuscript for evidence of horizontal transfer across multiple skin-associated Corynebacterium species. Check out Figure 3 and the related text for more information. Essentially, the non-ribosomal peptide synthetase from this BGC was highly similar between diverse genomes from different species and was flanked by transposons.
If you haven't already, first uncompress the testing dataset:
tar -zxvf Csimulans_Data.tar.gzThen, unlike the run_tests.sh script, we can directly just invoke antismash_codoff to run codoff on all BGCs predicted for this genome.
antismash_codoff -a Coryne_simulans_PES1/ -o Coryne_simulans_PES1_antiSMASH_Results/This will take a bit to run, but using cached codon-usage counts for individual genes, should be faster than running codoff individually for each BGC.
Afterwards we can quickly get the "Discordance Percentile" for each BGC via grep "Discordance" Coryne_simulans_PES1_antiSMASH_Results/*. This should display the following:
Coryne_simulans_PES1_antiSMASH_Results/NZ_CP014634.1.region001.txt:Discordance Percentile 4.03
Coryne_simulans_PES1_antiSMASH_Results/NZ_CP014634.1.region002.txt:Discordance Percentile 77.64
Coryne_simulans_PES1_antiSMASH_Results/NZ_CP014634.1.region003.txt:Discordance Percentile 40.0
Coryne_simulans_PES1_antiSMASH_Results/NZ_CP014634.1.region004.txt:Discordance Percentile 17.12
Coryne_simulans_PES1_antiSMASH_Results/NZ_CP014634.1.region005.txt:Discordance Percentile 7.6
Coryne_simulans_PES1_antiSMASH_Results/NZ_CP014634.1.region006.txt:Discordance Percentile 87.26
Coryne_simulans_PES1_antiSMASH_Results/NZ_CP014634.1.region007.txt:Discordance Percentile 95.01
Can you guess which BGC might correspond to the transposon embedded NRPS from the lsaBGC manuscript? If you guessed the first one with the lowest Discordance Percentile you will be correct. Only 4% of genomic regions of similar size to this BGC have codon usage profiles more discordant to the background genome than this region. In this case, it supports our presumption that this BGC is horizontally transferred; however, for other gene clusters, such a signal could simply indicate the gene cluster has regulatory differences to the rest of the genome (e.g. it might be infrequently expressed/translated).