Skip to content

Investigating BGCs from Corynebacterium simulans

Rauf Salamzade edited this page Nov 18, 2025 · 4 revisions

Investigating multiple BGCs from Corynebacterium simulans

Here, we will take a closer look at the testing dataset included as part of codoff. What is the isolate Corynebacterium simulans PES1 and why is it the test dataset anyway?

This genome features a BGC we investigated in depth for the lsaBGC manuscript for evidence of horizontal transfer across multiple skin-associated Corynebacterium species. Check out Figure 3 and the related text for more information. Essentially, the non-ribosomal peptide synthetase from this BGC was highly similar between diverse genomes from different species and was flanked by transposons.

If you haven't already, first uncompress the testing dataset:

tar -zxvf Csimulans_Data.tar.gz

Then, unlike the run_tests.sh script, we can directly just invoke antismash_codoff to run codoff on all BGCs predicted for this genome.

antismash_codoff -a Coryne_simulans_PES1/ -o Coryne_simulans_PES1_antiSMASH_Results/

This will take a bit to run, but using cached codon-usage counts for individual genes, should be faster than running codoff individually for each BGC.

Afterwards we can quickly get the "Discordance Percentile" for each BGC via grep "Discordance" Coryne_simulans_PES1_antiSMASH_Results/*. This should display the following:

Coryne_simulans_PES1_antiSMASH_Results/NZ_CP014634.1.region001.txt:Discordance Percentile	4.03
Coryne_simulans_PES1_antiSMASH_Results/NZ_CP014634.1.region002.txt:Discordance Percentile	77.64
Coryne_simulans_PES1_antiSMASH_Results/NZ_CP014634.1.region003.txt:Discordance Percentile	40.0
Coryne_simulans_PES1_antiSMASH_Results/NZ_CP014634.1.region004.txt:Discordance Percentile	17.12
Coryne_simulans_PES1_antiSMASH_Results/NZ_CP014634.1.region005.txt:Discordance Percentile	7.6
Coryne_simulans_PES1_antiSMASH_Results/NZ_CP014634.1.region006.txt:Discordance Percentile	87.26
Coryne_simulans_PES1_antiSMASH_Results/NZ_CP014634.1.region007.txt:Discordance Percentile	95.01

Can you guess which BGC might correspond to the transposon embedded NRPS from the lsaBGC manuscript? If you guessed the first one with the lowest Discordance Percentile you will be correct. Only 4% of genomic regions of similar size to this BGC have codon usage profiles more discordant to the background genome than this region. In this case, it supports our presumption that this BGC is horizontally transferred; however, for other gene clusters, such a signal could simply indicate the gene cluster has regulatory differences to the rest of the genome (e.g. it might be infrequently expressed/translated).

Clone this wiki locally