Skip to content

Commit 0ae74ed

Browse files
Update and rename CAIS_total_GC_unweighted_by_AA.py to CAIS_total_GC_weighted_by_AA.py
This script is calculates CAIS as controlled by total GC content and weighted by average amino acid frequencies from across the vertebral species dataset. To adapt, calculate the average amino acid frequencies across a given dataset of species and replace the Total_AA_freqtable dictionary. Note "*" character denotes start and stop codons.
1 parent 11329cf commit 0ae74ed

File tree

1 file changed

+15
-9
lines changed

1 file changed

+15
-9
lines changed

CAIS_ENC_calculation/CAIS_total_GC_unweighted_by_AA.py renamed to CAIS_ENC_calculation/CAIS_total_GC_weighted_by_AA.py

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,9 @@
88
import os, sys, json, csv, mysql.connector,datetime,math
99

1010
"""
11-
The purpose of this file is to determine the codon actually adaptation indices (CAAI) for individual species.CAAI is Codon Adaptation Index (1987 version) contolled for Amino Acid composition and potential mutation bias within codon usuage.
11+
The purpose of this file is to determine the codon adaptation index of species (CAIS) for individual species. CAAI is Codon Adaptation Index (1987 version) contolled for Amino Acid composition and potential mutation bias within codon usuage.
1212
This is a metric that describes codon usage patterns.
13-
CAAI is calculated in the following way:
13+
CAIS is calculated in the following way:
1414
1515
I)
1616
@@ -37,7 +37,10 @@
3737
3838
II) Calculate the Amino Acid controlled codon frequency
3939
40-
7)
40+
III) Calculate unweighted CAI (CAAI)
41+
42+
IV) Calculate amino acid frequency controlled CAIS (Weighted_fi_CAAI)
43+
4144
4245
"""
4346

@@ -528,6 +531,7 @@
528531
# the RSCU_i value to one for all codons corresponding to that amino acid. We do this because later we'll
529532
# take the geometric mean of all relative adaptedness values and multiplying by one will not affect this value
530533
# In essence, setting them to one "silences" these.
534+
531535
if Sum[AA] != 0:
532536
if Codon in ['TTA','TAT','ATT','AAT','ATA','TAA','AAA','TTT']:
533537
Prob = (notGC_total_prob*notGC_total_prob*notGC_total_prob)*0.125
@@ -622,7 +626,7 @@
622626
print(RelativeAdaptednessTable)
623627
sys.stdout.flush()
624628

625-
#CAI
629+
#CAAI
626630
LogOfCAAI = 0
627631
for AA in RawCount:
628632
for Codon in RawCount[AA]:
@@ -636,7 +640,7 @@
636640
CAAI = math.exp(LogOfCAAI)
637641

638642

639-
#Weighted wi CAI
643+
#Weighted wi CAAI
640644
Weighted_wi_Log_ofCAAI = 0
641645
for AA in RawCount:
642646
for Codon in RawCount[AA]:
@@ -659,7 +663,7 @@
659663
print(Weighted_wi_Log_ofCAAI)
660664
sys.stdout.flush()
661665

662-
# and we invert the log to get the CAI
666+
# and we invert the log to get the CAAI
663667
Weighted_wi_CAAI = math.exp(Weighted_wi_Log_ofCAAI)
664668
if Verbose == True:
665669
print(Weighted_wi_Log_ofCAAI)
@@ -668,7 +672,7 @@
668672

669673

670674

671-
#Weighted codon frequency CAI
675+
#Weighted codon frequency CAIS
672676
Weighted_fi_Log_ofCAAI = 0
673677
for AA in RawCount:
674678
for Codon in RawCount[AA]:
@@ -691,14 +695,16 @@
691695
print(Weighted_fi_Log_ofCAAI)
692696
sys.stdout.flush()
693697

694-
# and we invert the log to get the CAI
698+
# and we invert the log to get the CAIS
695699
Weighted_fi_CAAI = math.exp(Weighted_fi_Log_ofCAAI)
696700
if Verbose == True:
697701
print(Weighted_fi_Log_ofCAAI)
698702
sys.stdout.flush()
699703

704+
#Weighted_fi_CAAI is the amino acid frequency weighted, Total GC content weighted CAIS
705+
print("%s,%s,%s"%(i,CAAI,Weighted_fi_CAAI))
700706
#print("%s,%s,%s,%s"%(i,CAAI,Weighted_wi_CAAI,Weighted_fi_CAAI))
701-
print("%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s"%(i,Weighted_fi_CAAI,Codon_Summed_freqTable['F'],Codon_Summed_freqTable['L'],Codon_Summed_freqTable['S'],Codon_Summed_freqTable['Y'],Codon_Summed_freqTable['*'],Codon_Summed_freqTable['C'],Codon_Summed_freqTable['W'],Codon_Summed_freqTable['P'],Codon_Summed_freqTable['H'],Codon_Summed_freqTable['Q'],Codon_Summed_freqTable['R'],Codon_Summed_freqTable['I'],Codon_Summed_freqTable['M'],Codon_Summed_freqTable['T'],Codon_Summed_freqTable['N'],Codon_Summed_freqTable['K'],Codon_Summed_freqTable['V'],Codon_Summed_freqTable['A'],Codon_Summed_freqTable['D'],Codon_Summed_freqTable['E'],Codon_Summed_freqTable['G']))
707+
#print("%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s"%(i,Weighted_fi_CAAI,Codon_Summed_freqTable['F'],Codon_Summed_freqTable['L'],Codon_Summed_freqTable['S'],Codon_Summed_freqTable['Y'],Codon_Summed_freqTable['*'],Codon_Summed_freqTable['C'],Codon_Summed_freqTable['W'],Codon_Summed_freqTable['P'],Codon_Summed_freqTable['H'],Codon_Summed_freqTable['Q'],Codon_Summed_freqTable['R'],Codon_Summed_freqTable['I'],Codon_Summed_freqTable['M'],Codon_Summed_freqTable['T'],Codon_Summed_freqTable['N'],Codon_Summed_freqTable['K'],Codon_Summed_freqTable['V'],Codon_Summed_freqTable['A'],Codon_Summed_freqTable['D'],Codon_Summed_freqTable['E'],Codon_Summed_freqTable['G']))
702708

703709
cnx.close()
704710

0 commit comments

Comments
 (0)